woodwork.datatable.DataTable¶
-
class
woodwork.datatable.
DataTable
(dataframe, name=None, index=None, time_index=None, semantic_tags=None, logical_types=None, table_metadata=None, column_metadata=None, use_standard_tags=True, make_index=False, column_descriptions=None, already_sorted=False)[source]¶ -
__init__
(dataframe, name=None, index=None, time_index=None, semantic_tags=None, logical_types=None, table_metadata=None, column_metadata=None, use_standard_tags=True, make_index=False, column_descriptions=None, already_sorted=False)[source]¶ Create DataTable
- Parameters
dataframe (pd.DataFrame, dd.DataFrame, ks.DataFrame, numpy.ndarray) – Dataframe providing the data for the datatable.
name (str, optional) – Name used to identify the datatable.
index (str, optional) – Name of the index column in the dataframe.
time_index (str, optional) – Name of the time index column in the dataframe.
semantic_tags (dict, optional) – Dictionary mapping column names in the dataframe to the semantic tags for the column. The keys in the dictionary should be strings that correspond to columns in the underlying dataframe. There are two options for specifying the dictionary values: (str): If only one semantic tag is being set, a single string can be used as a value. (list[str] or set[str]): If multiple tags are being set, a list or set of strings can be used as the value. Semantic tags will be set to an empty set for any column not included in the dictionary.
logical_types (dict[str -> LogicalType], optional) – Dictionary mapping column names in the dataframe to the LogicalType for the column. LogicalTypes will be inferred for any columns not present in the dictionary.
table_metadata (dict[str -> json serializable], optional) – Dictionary containing extra metadata for the DataTable.
column_metadata (dict[str -> dict[str -> json serializable]], optional) – Dictionary mapping column names to that column’s metadata dictionary.
use_standard_tags (bool, optional) – If True, will add standard semantic tags to columns based on the inferred or specified logical type for the column. Defaults to True.
make_index (bool, optional) – If True, will create a new unique, numeric index column with the name specified by
index
and will add the new index column to the supplied DataFrame. If True, the name specified inindex
cannot match an existing column name indataframe
. If False, the name is specified inindex
must match a column present in thedataframe
. Defaults to False.column_descriptions (dict[str -> str], optional) – Dictionary containing column descriptions
already_sorted (bool, optional) – Indicates whether the input dataframe is already sorted on the time index. If False, will sort the dataframe first on the time_index and then on the index (pandas DataFrame only). Defaults to False.
Methods
__init__
(dataframe[, name, index, …])Create DataTable
add_semantic_tags
(semantic_tags)Adds specified semantic tags to columns.
describe
([include])Calculates statistics for data contained in DataTable.
describe_dict
([include])Calculates statistics for data contained in DataTable.
drop
(columns)Drop specified columns from a DataTable.
head
([n])Shows the first n rows of the DataTable along with typing information.
mutual_information
([num_bins, nrows, …])Calculates mutual information between all pairs of columns in the DataTable that support mutual information.
mutual_information_dict
([num_bins, nrows, …])Calculates mutual information between all pairs of columns in the DataTable that support mutual information.
pop
(column_name)Return a DataColumn and drop it from the DataTable.
remove_semantic_tags
(semantic_tags)Remove the semantic tags for any column names in the provided semantic_tags dictionary.
rename
(columns)Renames columns in a DataTable
reset_semantic_tags
([columns, retain_index_tags])Reset the semantic tags for the specified columns to the default values and return a new DataTable.
select
(include)Create a DataTable including only columns whose logical type and semantic tags are specified in the list of types and tags to include.
set_index
(index)Set the index column and return a new DataTable.
set_time_index
(time_index)Set the time index column.
set_types
([logical_types, semantic_tags, …])Update the logical type and semantic tags for any columns names in the provided types dictionary.
to_csv
(path[, sep, encoding, engine, …])Write DataTable to disk in the CSV format, location specified by path.
Retrieves the DataTable’s underlying dataframe.
to_dictionary
()Get a DataTable’s description
to_parquet
(path[, compression, profile_name])Write DataTable to disk in the parquet format, location specified by path.
to_pickle
(path[, compression, profile_name])Write DataTable to disk in the pickle format, location specified by path.
update_dataframe
(new_df[, already_sorted])Replace the DataTable’s dataframe with a new dataframe, making sure the new dataframe dtypes are updated.
value_counts
([ascending, top_n, dropna])Returns a list of dictionaries with counts for the most frequent values in each column (only
Attributes
df
Purely integer-location based indexing for selection by position.
index
The index column for the table
logical_types
A dictionary containing logical types for each column
ltypes
A series listing the logical types for each column in the table
physical_types
A dictionary containing physical types for each column
semantic_tags
A dictionary containing semantic tags for each column
Returns a tuple representing the dimensionality of the DataTable.
time_index
The time index column for the table
types
Dataframe containing the physical dtypes, logical types and semantic tags for the table
-