API Reference

DataTable

DataTable(dataframe[, name, index, …])

DataTable.shape

Returns a tuple representing the dimensionality of the DataTable.

DataTable.add_semantic_tags(semantic_tags)

Adds specified semantic tags to columns.

DataTable.remove_semantic_tags(semantic_tags)

Remove the semantic tags for any column names in the provided semantic_tags dictionary.

DataTable.reset_semantic_tags([columns, …])

Reset the semantic tags for the specified columns to the default values and return a new DataTable.

DataTable.select(include)

Create a DataTable including only columns whose logical type and semantic tags are specified in the list of types and tags to include.

DataTable.iloc

Purely integer-location based indexing for selection by position.

DataTable.set_index(index)

Set the index column and return a new DataTable.

DataTable.set_logical_types(logical_types[, …])

Update the logical type for any columns names in the provided logical_types dictionary.

DataTable.set_semantic_tags(semantic_tags[, …])

Update the semantic tags for any column names in the provided semantic_tags dictionary.

DataTable.set_time_index(time_index)

Set the time index column.

DataTable.to_dataframe()

Retrieves the DataTable’s underlying dataframe.

DataTable.describe([include])

Calculates statistics for data contained in DataTable.

DataTable.get_mutual_information([num_bins, …])

Calculates mutual information between all pairs of columns in the DataTable that support mutual information.

DataTable.value_counts([ascending, top_n, …])

Returns a list of dictionaries with counts for the most frequent values in each column (only

DataTable.to_csv(path[, sep, encoding, …])

Write DataTable to disk in the CSV format, location specified by path.

DataTable.to_pickle(path[, compression, …])

Write DataTable to disk in the pickle format, location specified by path.

DataTable.to_parquet(path[, compression, …])

Write DataTable to disk in the parquet format, location specified by path.

DataColumn

DataColumn(series[, logical_type, …])

DataColumn.shape

Returns a tuple representing the dimensionality of the DataTable.

DataColumn.iloc

Purely integer-location based indexing for selection by position.

DataColumn.add_semantic_tags(semantic_tags)

Add the specified semantic tags to the column and return a new DataColumn object.

DataColumn.remove_semantic_tags(semantic_tags)

Removes specified semantic tags from column and returns a new column.

DataColumn.reset_semantic_tags([…])

Reset the semantic tags to the default values.

DataColumn.set_logical_type(logical_type[, …])

Update the logical type for the column and return a new DataColumn object.

DataColumn.set_semantic_tags(semantic_tags)

Replace current semantic tags with new values and return a new DataColumn object.

DataColumn.to_series()

Retrieves the DataColumn’s underlying series.

Logical Types

Boolean()

Represents Logical Types that contain binary values indicating true/false.

Categorical([encoding])

Represents Logical Types that contain unordered discrete values that fall into one of a set of possible values.

CountryCode()

Represents Logical Types that contain categorical information specifically used to represent countries.

Datetime([datetime_format])

Represents Logical Types that contain date and time information.

Double()

Represents Logical Types that contain positive and negative numbers, some of which include a fractional component.

Integer()

Represents Logical Types that contain positive and negative numbers without a fractional component, including zero (0).

EmailAddress()

Represents Logical Types that contain email address values.

Filepath()

Represents Logical Types that specify locations of directories and files in a file system.

FullName()

Represents Logical Types that may contain first, middle and last names, including honorifics and suffixes.

IPAddress()

Represents Logical Types that contain IP addresses, including both IPv4 and IPv6 addresses.

LatLong()

Represents Logical Types that contain latitude and longitude values

NaturalLanguage()

Represents Logical Types that contain text or characters representing natural human language

Ordinal(order)

Represents Logical Types that contain ordered discrete values.

PhoneNumber()

Represents Logical Types that contain numeric digits and characters representing a phone number

SubRegionCode()

Represents Logical Types that contain codes representing a portion of a larger geographic region.

Timedelta()

Represents Logical Types that contain values specifying a duration of time

URL()

Represents Logical Types that contain URLs, which may include protocol, hostname and file name

WholeNumber()

Represents Logical Types that contain natural numbers, including zero (0).

ZIPCode()

Represents Logical Types that contain a series of postal codes used by the US Postal Service for representing a group of addresses.

Utils

General Utils

list_logical_types

Returns a dataframe describing all of the available Logical Types.

list_semantic_tags

Returns a dataframe describing all of the common semantic tags.

read_csv

Read data from the specified CSV file and return a Woodwork DataTable

Demo Data

load_retail([id, nrows, return_dataframe])

Load a demo retail dataset into either a DataTable or a DataFrame