Home¶
Woodwork is a library that helps with data typing of 2-dimensional tabular data structures.¶
It provides a special namespace on your DataFrame, ww
, which contains the physical, logical, and semantic data types. It can be used with Featuretools, EvalML, and general machine learning applications where logical and semantic typing information is important.
Woodwork provides simple interfaces for adding and updating logical and semantic typing information, as well as selecting data columns based on the types.
Quick Start¶
Below is an example of using Woodwork to automatically infer the Logical Types for a DataFrame and select columns with specific types.
[1]:
import woodwork as ww
df = ww.demo.load_retail(nrows=100, init_woodwork=False)
df.ww.init(name="retail")
df.ww
[1]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
order_product_id | int64 | Integer | ['numeric'] |
order_id | int64 | Integer | ['numeric'] |
product_id | string | Unknown | [] |
description | string | NaturalLanguage | [] |
quantity | int64 | Integer | ['numeric'] |
order_date | datetime64[ns] | Datetime | [] |
unit_price | float64 | Double | ['numeric'] |
customer_name | category | Categorical | ['category'] |
country | category | Categorical | ['category'] |
total | float64 | Double | ['numeric'] |
cancelled | bool | Boolean | [] |
[2]:
filtered_df = df.ww.select(include=['numeric', 'Boolean'])
filtered_df.head(5)
[2]:
order_product_id | order_id | quantity | unit_price | total | cancelled | |
---|---|---|---|---|---|---|
0 | 0 | 536365 | 6 | 4.2075 | 25.245 | False |
1 | 1 | 536365 | 6 | 5.5935 | 33.561 | False |
2 | 2 | 536365 | 8 | 4.5375 | 36.300 | False |
3 | 3 | 536365 | 6 | 5.5935 | 33.561 | False |
4 | 4 | 536365 | 6 | 5.5935 | 33.561 | False |