API Reference#

WoodworkTableAccessor#

WoodworkTableAccessor(dataframe)

WoodworkTableAccessor.add_semantic_tags(...)

Adds specified semantic tags to columns, updating the Woodwork typing information.

WoodworkTableAccessor.dependence([measures, ...])

Calculates dependence measures between all pairs of columns in the DataFrame that support measuring dependence.

WoodworkTableAccessor.dependence_dict([...])

Calculates dependence measures between all pairs of columns in the DataFrame that support measuring dependence.

WoodworkTableAccessor.describe([include, ...])

Calculates statistics for data contained in the DataFrame.

WoodworkTableAccessor.describe_dict([...])

Calculates statistics for data contained in the DataFrame.

WoodworkTableAccessor.drop(columns[, inplace])

Drop specified columns from a DataFrame.

WoodworkTableAccessor.iloc

Integer-location based indexing for selection by position.

WoodworkTableAccessor.index

The index column for the table

WoodworkTableAccessor.infer_temporal_frequencies([...])

Infers the observation frequency (daily, biweekly, yearly, etc) of each temporal column

WoodworkTableAccessor.init(**kwargs)

Initializes Woodwork typing information for a DataFrame with a partial schema.

WoodworkTableAccessor.init_with_full_schema(schema)

Initializes Woodwork typing information for a DataFrame with a complete schema.

WoodworkTableAccessor.init_with_partial_schema([...])

Initializes Woodwork typing information for a DataFrame with a partial schema.

WoodworkTableAccessor.loc

Access a group of rows by label(s) or a boolean array.

WoodworkTableAccessor.logical_types

A dictionary containing logical types for each column

WoodworkTableAccessor.metadata

Metadata of the DataFrame

WoodworkTableAccessor.mutual_information([...])

Calculates mutual information between all pairs of columns in the DataFrame that support mutual information.

WoodworkTableAccessor.mutual_information_dict([...])

Calculates mutual information between all pairs of columns in the DataFrame that support mutual information.

WoodworkTableAccessor.name

Name of the DataFrame

WoodworkTableAccessor.pearson_correlation([...])

Calculates Pearson correlation coefficient between all pairs of columns in the DataFrame that support correlation.

WoodworkTableAccessor.pearson_correlation_dict([...])

Calculates Pearson correlation coefficient between all pairs of columns in the DataFrame that support correlation.

WoodworkTableAccessor.physical_types

A dictionary containing physical types for each column

WoodworkTableAccessor.pop(column_name)

Return a Series with Woodwork typing information and remove it from the DataFrame.

WoodworkTableAccessor.remove_semantic_tags(...)

Remove the semantic tags for any column names in the provided semantic_tags dictionary, updating the Woodwork typing information.

WoodworkTableAccessor.rename(columns[, inplace])

Renames columns in a DataFrame, maintaining Woodwork typing information.

WoodworkTableAccessor.reset_semantic_tags([...])

Reset the semantic tags for the specified columns to the default values.

WoodworkTableAccessor.schema

A copy of the Woodwork typing information for the DataFrame.

WoodworkTableAccessor.select([include, ...])

Create a DataFrame with Woodwork typing information initialized that includes only columns whose Logical Type and semantic tags match conditions specified in the list of types and tags to include or exclude.

WoodworkTableAccessor.semantic_tags

A dictionary containing semantic tags for each column

WoodworkTableAccessor.set_index(new_index)

Sets the index column of the DataFrame.

WoodworkTableAccessor.set_time_index(...)

Set the time index.

WoodworkTableAccessor.set_types([...])

Update the logical type and semantic tags for any columns names in the provided types dictionaries, updating the Woodwork typing information for the DataFrame.

WoodworkTableAccessor.spearman_correlation([...])

Calculates Spearman correlation coefficient between all pairs of columns in the DataFrame that support correlation.

WoodworkTableAccessor.spearman_correlation_dict([...])

Calculates Spearman correlation coefficient between all pairs of columns in the DataFrame that support correlation.

WoodworkTableAccessor.time_index

The time index column for the table

WoodworkTableAccessor.to_disk(path[, ...])

Write Woodwork table to disk in the format specified by format, location specified by path.

WoodworkTableAccessor.to_dictionary()

Get a dictionary representation of the Woodwork typing information.

WoodworkTableAccessor.types

DataFrame containing the physical dtypes, logical types and semantic tags for the schema.

WoodworkTableAccessor.use_standard_tags

A dictionary containing the use_standard_tags setting for each column in the table

WoodworkTableAccessor.validate_logical_types([...])

Validates the dataframe based on the logical types.

WoodworkTableAccessor.value_counts([...])

Returns a list of dictionaries with counts for the most frequent values in each column (only

WoodworkColumnAccessor#

WoodworkColumnAccessor(series)

WoodworkColumnAccessor.add_semantic_tags(...)

Add the specified semantic tags to the set of tags.

WoodworkColumnAccessor.box_plot_dict([...])

Gets the information necessary to create a box and whisker plot with outliers for a numeric column using the IQR method.

WoodworkColumnAccessor.description

The description of the series

WoodworkColumnAccessor.origin

The origin of the series

WoodworkColumnAccessor.iloc

Integer-location based indexing for selection by position.

WoodworkColumnAccessor.init([logical_type, ...])

Initializes Woodwork typing information for a Series.

WoodworkColumnAccessor.loc

Access a group of rows by label(s) or a boolean array.

WoodworkColumnAccessor.logical_type

The logical type of the series

WoodworkColumnAccessor.metadata

The metadata of the series

WoodworkColumnAccessor.nullable

Whether the column can contain null values.

WoodworkColumnAccessor.remove_semantic_tags(...)

Removes specified semantic tags from the current tags.

WoodworkColumnAccessor.reset_semantic_tags()

Reset the semantic tags to the default values.

WoodworkColumnAccessor.semantic_tags

The semantic tags assigned to the series

WoodworkColumnAccessor.set_logical_type(...)

Update the logical type for the series, clearing any previously set semantic tags, and returning a new series with Woodwork initialied.

WoodworkColumnAccessor.set_semantic_tags(...)

Replace current semantic tags with new values.

WoodworkColumnAccessor.use_standard_tags

WoodworkColumnAccessor.validate_logical_type([...])

Validates series data based on the logical type.

TableSchema#

TableSchema(column_names, logical_types[, ...])

TableSchema.add_semantic_tags(semantic_tags)

Adds specified semantic tags to columns, updating the Woodwork typing information.

TableSchema.index

The index column for the table

TableSchema.get_subset_schema(subset_cols)

Creates a new TableSchema with specified columns, retaining typing information.

TableSchema.logical_types

A dictionary containing logical types for each column

TableSchema.metadata

Metadata of the table

TableSchema.rename(columns)

Renames columns in a TableSchema

TableSchema.remove_semantic_tags(semantic_tags)

Remove the semantic tags for any column names in the provided semantic_tags dictionary, updating the Woodwork typing information.

TableSchema.reset_semantic_tags([columns, ...])

Reset the semantic tags for the specified columns to the default values.

TableSchema.name

Name of schema

TableSchema.semantic_tags

A dictionary containing semantic tags for each column

TableSchema.set_index(new_index[, validate])

Sets the index.

TableSchema.set_time_index(new_time_index[, ...])

Set the time index.

TableSchema.set_types([logical_types, ...])

Update the logical type and semantic tags for any columns names in the provided types dictionaries, updating the TableSchema at those columns.

TableSchema.time_index

The time index column for the table

TableSchema.types

DataFrame containing the physical dtypes, logical types and semantic tags for the TableSchema.

TableSchema.use_standard_tags

ColumnSchema#

ColumnSchema([logical_type, semantic_tags, ...])

ColumnSchema.custom_tags

The custom semantic tag(s) specified for the column.

ColumnSchema.description

Description of the column

ColumnSchema.origin

Origin of the column

ColumnSchema.is_boolean

Whether the ColumnSchema is a Boolean column

ColumnSchema.is_categorical

Whether the ColumnSchema is categorical in nature

ColumnSchema.is_datetime

Whether the ColumnSchema is a Datetime column

ColumnSchema.is_numeric

Whether the ColumnSchema is numeric in nature

ColumnSchema.metadata

Metadata of the column

Serialization#

typing_info_to_dict(dataframe)

Creates the description for a Woodwork table, including typing information for each column and loading information.

Deserialization#

from_disk(path[, filename, ...])

Convenience function to call read_woodwork_table.

read_woodwork_table(path[, filename, ...])

Read Woodwork table from disk, S3 path, or URL.

Logical Types#

Address()

Represents Logical Types that contain address values.

Age()

Represents Logical Types that contain whole numbers indicating a person's age.

AgeFractional()

Represents Logical Types that contain non-negative floating point numbers indicating a person's age.

AgeNullable()

Represents Logical Types that contain whole numbers indicating a person's age.

Boolean([cast_nulls_as])

Represents Logical Types that contain binary values indicating true/false.

BooleanNullable()

Represents Logical Types that contain binary values indicating true/false.

Categorical([encoding])

Represents Logical Types that contain unordered discrete values that fall into one of a set of possible values.

CountryCode()

Represents Logical Types that use the ISO-3166 standard country code to represent countries.

CurrencyCode()

Represents Logical Types that use the ISO-4217 internation standard currency code to represent currencies.

Datetime([datetime_format, timezone])

Represents Logical Types that contain date and time information.

Double()

Represents Logical Types that contain positive and negative numbers, some of which include a fractional component.

EmailAddress()

Represents Logical Types that contain email address values.

Filepath()

Represents Logical Types that specify locations of directories and files in a file system.

Integer()

Represents Logical Types that contain positive and negative numbers without a fractional component, including zero (0).

IntegerNullable()

Represents Logical Types that contain positive and negative numbers without a fractional component, including zero (0).

IPAddress()

Represents Logical Types that contain IP addresses, including both IPv4 and IPv6 addresses.

LatLong()

Represents Logical Types that contain latitude and longitude values in decimal degrees.

NaturalLanguage()

Represents Logical Types that contain text or characters representing natural human language

Ordinal([order])

Represents Logical Types that contain ordered discrete values.

PersonFullName()

Represents Logical Types that may contain first, middle and last names, including honorifics and suffixes.

PhoneNumber()

Represents Logical Types that contain numeric digits and characters representing a phone number.

PostalCode()

Represents Logical Types that contain a series of postal codes for representing a group of addresses.

SubRegionCode()

Represents Logical Types that use the ISO-3166 standard sub-region code to represent a portion of a larger geographic region.

Timedelta()

Represents Logical Types that contain values specifying a duration of time

Unknown()

Represents Logical Types that cannot be inferred as a specific Logical Type.

URL()

Represents Logical Types that contain URLs, which may include protocol, hostname and file name

TypeSystem#

TypeSystem([inference_functions, ...])

TypeSystem.add_type(logical_type[, ...])

Add a new LogicalType to the TypeSystem, optionally specifying the corresponding inference function and a parent type.

TypeSystem.infer_logical_type(series)

Infer the logical type for the given series

TypeSystem.remove_type(logical_type[, treatment])

Remove a logical type from the TypeSystem.

TypeSystem.reset_defaults()

Reset type system to the default settings that were specified at initialization.

TypeSystem.update_inference_function(...)

Update the inference function for the specified LogicalType.

TypeSystem.update_relationship(logical_type, ...)

Add or update a relationship.

Utils#

Type Utils#

list_logical_types

Returns a dataframe describing all of the available Logical Types.

list_semantic_tags

Returns a dataframe describing all of the common semantic tags.

General Utils#

concat_columns

Concatenate Woodwork objects along the columns axis.

get_valid_mi_types

Generate a list of LogicalTypes that are valid for calculating mutual information.

get_valid_pearson_types

Generate a list of LogicalTypes that are valid for calculating Pearson correlation.

get_valid_spearman_types

Generate a list of LogicalTypes that are valid for calculating Spearman correlation.

read_file

Read data from the specified file and return a DataFrame with initialized Woodwork typing information.

get_invalid_schema_message

Return a message indicating the reason that the provided schema cannot be used to initialize Woodwork on the dataframe.

init_series

Initializes Woodwork typing information for a series, numpy.ndarray or pd.api.extensions.

is_schema_valid

Check if a schema is valid for initializing Woodwork on a dataframe

Statistics Utils#

infer_frequency

Infer the frequency of a given Pandas Datetime Series.

Demo Data#

load_retail([id, nrows, init_woodwork])

Load a demo retail dataset into a DataFrame, optionally initializing Woodwork's typing information.