Woodwork

Woodwork is a library that helps with data typing of 2-dimensional tabular data structures.

It provides a DataTable object, which contains the physical, logical, and semantic data types. It can be used with Featuretools, EvalML, and general machine learning applications where logical and semantic typing information is important.

Woodwork provides simple interfaces for adding and updating logical and semantic typing information, as well as selecting data columns based on the types.

Quick Start

Below is an example of using a Woodwork DataTable to automatically infer the Logical Types for a data structure and select columns with specific types.

[1]:
import woodwork as ww

data = ww.demo.load_retail(nrows=100, return_dataframe=True)

dt = ww.DataTable(data, name="retail")
dt.types
[1]:
Physical Type Logical Type Semantic Tag(s)
Data Column
order_product_id Int64 WholeNumber {numeric}
order_id Int64 WholeNumber {numeric}
product_id category Categorical {category}
description string NaturalLanguage {}
quantity Int64 WholeNumber {numeric}
order_date datetime64[ns] Datetime {}
unit_price float64 Double {numeric}
customer_name string NaturalLanguage {}
country string NaturalLanguage {}
total float64 Double {numeric}
cancelled boolean Boolean {}
[2]:
filtered_dt = dt.select(include=['numeric', 'Boolean'])
filtered_dt.to_dataframe().head(5)
[2]:
total unit_price order_id cancelled quantity order_product_id
0 25.245 4.2075 536365 False 6 0
1 33.561 5.5935 536365 False 6 1
2 36.300 4.5375 536365 False 8 2
3 33.561 5.5935 536365 False 6 3
4 33.561 5.5935 536365 False 6 4