Release Notes¶
- v0.2.0 April 20, 2021
Warning
This Woodwork release does not support Python 3.6
- Enhancements
Add validation control to WoodworkTableAccessor (#736)
Store
make_index
value on WoodworkTableAccessor (#780)Add optional
exclude
parameter to WoodworkTableAccessorselect
method (#783)Add validation control to
deserialize.read_woodwork_table
andww.read_csv
(#788)Add
WoodworkColumnAccessor.schema
and handle copying column schema (#799)Allow initializing a
WoodworkColumnAccessor
with aColumnSchema
(#814)Add
__repr__
toColumnSchema
(#817)Add
BooleanNullable
andIntegerNullable
logical types (#830)Add validation control to
WoodworkColumnAccessor
(#833)
- Changes
Rename
FullName
logical type toPersonFullName
(#740)Rename
ZIPCode
logical type toPostalCode
(#741)Update minimum scikit-learn version to 0.22 (#763)
Drop support for Python version 3.6 (#768)
Remove
ColumnNameMismatchWarning
(#777)get_column_dict
does not use standard tags by default (#782)Make
logical_type
andname
params to_get_column_dict
optional (#786)Rename Schema object and files to match new table-column schema structure (#789)
Store column typing information in a
ColumnSchema
object instead of a dictionary (#791)TableSchema
does not use standard tags by default (#806)Store
use_standard_tags
on theColumnSchema
instead of theTableSchema
(#809)Move functions in
column_schema.py
to be methods onColumnSchema
(#829)
- Testing Changes
Add unit tests against minimum dependencies for python 3.6 on PRs and main (#743, #753, #763)
Update spark config for test fixtures (#787)
Separate latest unit tests into pandas, dask, koalas (#813)
Update latest dependency checker to generate separate core, koalas, and dask dependencies (#815, #825)
Ignore latest dependency branch when checking for updates to the release notes (#827)
Change from GitHub PAT to auto generated GitHub Token for dependency checker (#831)
Expand
ColumnSchema
semantic tag testing coverage and nulllogical_type
testing coverage (#832)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
- Breaking Changes
The
ZIPCode
logical type has been renamed toPostalCode
The
FullName
logical type has been renamed toPersonFullName
The
Schema
object has been renamed toTableSchema
With the
ColumnSchema
object, typing information for a column can no longer be accessed withdf.ww.columns[col_name]['logical_type']
. Instead usedf.ww.columns[col_name].logical_type
.The
Boolean
andInteger
logical types will no longer work with data that contains null values. The newBooleanNullable
andIntegerNullable
logical types should be used if null values are present.
- v0.1.0 March 22, 2021
- Enhancements
Implement Schema and Accessor API (#497)
Add Schema class that holds typing info (#499)
Add WoodworkTableAccessor class that performs type inference and stores Schema (#514)
Allow initializing Accessor schema with a valid Schema object (#522)
Add ability to read in a csv and create a DataFrame with an initialized Woodwork Schema (#534)
Add ability to call pandas methods from Accessor (#538, #589)
Add helpers for checking if a column is one of Boolean, Datetime, numeric, or categorical (#553)
Add ability to load demo retail dataset with a Woodwork Accessor (#556)
Add
select
to WoodworkTableAccessor (#548)Add
mutual_information
to WoodworkTableAccessor (#571)Add WoodworkColumnAccessor class (#562)
Add semantic tag update methods to column accessor (#573)
Add
describe
anddescribe_dict
to WoodworkTableAccessor (#579)Add
init_series
util function for initializing a series with dtype change (#581)Add
set_logical_type
method to WoodworkColumnAccessor (#590)Add semantic tag update methods to table schema (#591)
Add warning if additional parameters are passed along with schema (#593)
Better warning when accessing column properties before init (#596)
Update column accessor to work with LatLong columns (#598)
Add
set_index
to WoodworkTableAccessor (#603)Implement
loc
andiloc
for WoodworkColumnAccessor (#613)Add
set_time_index
to WoodworkTableAccessor (#612)Implement
loc
andiloc
for WoodworkTableAccessor (#618)Allow updating logical types with
set_types
and make relevant DataFrame changes (#619)Allow serialization of WoodworkColumnAccessor to csv, pickle, and parquet (#624)
Add DaskColumnAccessor (#625)
Allow deserialization from csv, pickle, and parquet to Woodwork table (#626)
Add
value_counts
to WoodworkTableAccessor (#632)Add KoalasColumnAccessor (#634)
Add
pop
to WoodworkTableAccessor (#636)Add
drop
to WoodworkTableAccessor (#640)Add
rename
to WoodworkTableAccessor (#646)Add DaskTableAccessor (#648)
Add Schema properties to WoodworkTableAccessor (#651)
Add KoalasTableAccessor (#652)
Adds
__getitem__
to WoodworkTableAccessor (#633)Update Koalas min version and add support for more new pandas dtypes with Koalas (#678)
Adds
__setitem__
to WoodworkTableAccessor (#669)
- Changes
Move mutual information logic to statistics utils file (#584)
Bump min Koalas version to 1.4.0 (#638)
Preserve pandas underlying index when not creating a Woodwork index (#664)
Restrict Koalas version to
<1.7.0
due to breaking changes (#674)Clean up dtype usage across Woodwork (#682)
Improve error when calling accessor properties or methods before init (#683)
Remove dtype from Schema dictionary (#685)
Add
include_index
param and allow unique columns in Accessor mutual information (#699)Include DataFrame equality and
use_standard_tags
in WoodworkTableAccessor equality check (#700)Remove
DataTable
andDataColumn
classes to migrate towards the accessor approach (#713)Change
sample_series
dtype to not need conversion and removeconvert_series
util (#720)Rename Accessor methods since
DataTable
has been removed (#723)
- Documentation Changes
Update README.md and Get Started guide to use accessor (#655, #717)
Update Understanding Types and Tags guide to use accessor (#657)
Update docstrings and API Reference page (#660)
Update statistical insights guide to use accessor (#693)
Update Customizing Type Inference guide to use accessor (#696)
Update Dask and Koalas guide to use accessor (#701)
Update index notebook and install guide to use accessor (#715)
Add section to documentation about schema validity (#729)
Update README.md and Get Started guide to use
pd.read_csv
(#730)Make small fixes to documentation formatting (#731)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey, @thehomebrewnerd
- Breaking Changes
The
DataTable
andDataColumn
classes have been removed and replaced by newWoodworkTableAccessor
andWoodworkColumnAccessor
classes which are used through theww
namespace available on DataFrames after importing Woodwork.
- v0.0.11 March 15, 2021
- Documentation Changes
Update to remove warning message from statistical insights guide (#690)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
- v0.0.10 February 25, 2021
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey
- v0.0.9 February 5, 2021
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
- v0.0.8 January 25, 2021
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
- v0.0.7 December 14, 2020
- Changes
Update links to use alteryx org Github URL (#423)
Support column names of any type allowed by the underlying DataFrame (#442)
Use
object
dtype for LatLong columns for easy access to latitude and longitude values (#414)Restrict dask version to prevent 2020.12.0 release from being installed (#453)
Lower minimum requirement for numpy to 1.15.4, and set pandas minimum requirement 1.1.1 (#459)
- Testing Changes
Fix missing test coverage (#436)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd
- v0.0.6 November 30, 2020
- Enhancements
Add support for creating DataTable from Koalas DataFrame (#327)
Add ability to initialize DataTable with numpy array (#367)
Add
describe_dict
method to DataTable (#405)Add
mutual_information_dict
method to DataTable (#404)Add
metadata
to DataTable for user-defined metadata (#392)Add
update_dataframe
method to DataTable to update underlying DataFrame (#407)Sort dataframe if
time_index
is specified, bypass sorting withalready_sorted
parameter. (#410)Add
description
attribute to DataColumn (#416)Implement
DataColumn.__len__
andDataTable.__len__
(#415)
- Changes
Lower moto test requirement for serialization/deserialization (#376)
Make Koalas an optional dependency installable with woodwork[koalas] (#378)
Remove WholeNumber LogicalType from Woodwork (#380)
Updates to LogicalTypes to support Koalas 1.4.0 (#393)
Replace
set_logical_types
andset_semantic_tags
with justset_types
(#379)Remove
copy_dataframe
parameter from DataTable initialization (#398)Implement
DataTable.__sizeof__
to return size of the underlying dataframe (#401)Include Datetime columns in mutual info calculation (#399)
Maintain column order on DataTable operations (#406)
Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
- Breaking Changes
The
DataTable.set_semantic_tags
method was removed.DataTable.set_types
can be used instead.The
DataTable.set_logical_types
method was removed.DataTable.set_types
can be used instead.WholeNumber
was removed from LogicalTypes. Columns that were previously inferred as WholeNumber will now be inferred as Integer.The
DataTable.get_mutual_information
was renamed toDataTable.mutual_information
.The
copy_dataframe
parameter was removed from DataTable initialization.
- v0.0.5 November 11, 2020
- Enhancements
Add
__eq__
to DataTable and DataColumn and update LogicalType equality (#318)Add
value_counts()
method to DataTable (#342)Support serialization and deserialization of DataTables via csv, pickle, or parquet (#293)
Add
shape
property to DataTable and DataColumn (#358)Add
iloc
method to DataTable and DataColumn (#365)Add
numeric_categorical_threshold
config value to allow inferring numeric columns as Categorical (#363)Add
rename
method to DataTable (#367)
- Fixes
Catch non numeric time index at validation (#332)
- Changes
Support logical type inference from a Dask DataFrame (#248)
Fix validation checks and
make_index
to work with Dask DataFrames (#260)Skip validation of Ordinal order values for Dask DataFrames (#270)
Improve support for datetimes with Dask input (#286)
Update
DataTable.describe
to work with Dask input (#296)Update
DataTable.get_mutual_information
to work with Dask input (#300)Modify
to_pandas
function to return DataFrame with correct index (#281)Rename
DataColumn.to_pandas
method toDataColumn.to_series
(#311)Rename
DataTable.to_pandas
method toDataTable.to_dataframe
(#319)Remove UserWarning when no matching columns found (#325)
Remove
copy
parameter fromDataTable.to_dataframe
andDataColumn.to_series
(#338)Allow pandas ExtensionArrays as inputs to DataColumn (#343)
Move warnings to a separate exceptions file and call via UserWarning subclasses (#348)
Make Dask an optional dependency installable with woodwork[dask] (#357)
Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
- Breaking Changes
The
DataColumn.to_pandas
method was renamed toDataColumn.to_series
.The
DataTable.to_pandas
method was renamed toDataTable.to_dataframe
.copy
is no longer a parameter ofDataTable.to_dataframe
orDataColumn.to_series
.
- v0.0.4 October 21, 2020
- Enhancements
Add optional
include
parameter forDataTable.describe()
to filter results (#228)Add
make_index
parameter toDataTable.__init__
to enable optional creation of a new index column (#238)Add support for setting ranking order on columns with Ordinal logical type (#240)
Add
list_semantic_tags
function and CLI to get dataframe of woodwork semantic_tags (#244)Add support for numeric time index on DataTable (#267)
Add pop method to DataTable (#289)
Add entry point to setup.py to run CLI commands (#285)
- Fixes
Allow numeric datetime time indices (#282)
Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
- v0.0.3 October 9, 2020
- Enhancements
Implement setitem on DataTable to create/overwrite an existing DataColumn (#165)
Add
to_pandas
method to DataColumn to access the underlying series (#169)Add list_logical_types function and CLI to get dataframe of woodwork LogicalTypes (#172)
Add
describe
method to DataTable to generate statistics for the underlying data (#181)Add optional
return_dataframe
parameter toload_retail
to return either DataFrame or DataTable (#189)Add
get_mutual_information
method to DataTable to generate mutual information between columns (#203)Add
read_csv
function to create DataTable directly from CSV file (#222)
- Changes
Remove unnecessary
add_standard_tags
attribute from DataTable (#171)Remove standard tags from index column and do not return stats for index column from
DataTable.describe
(#196)Update
DataColumn.set_semantic_tags
andDataColumn.add_semantic_tags
to return new objects (#205)Update various DataTable methods to return new objects rather than modifying in place (#210)
Move datetime_format to Datetime LogicalType (#216)
Do not calculate mutual info with index column in
DataTable.get_mutual_information
(#221)Move setting of underlying physical types from DataTable to DataColumn (#233)
- Documentation Changes
Remove unused code from sphinx conf.py, update with Github URL(#160, #163)
Update README and docs with new Woodwork logo, with better code snippets (#161, #159)
Add DataTable and DataColumn to API Reference (#162)
Add docstrings to LogicalType classes (#168)
Add Woodwork image to index, clear outputs of Jupyter notebook in docs (#173)
Update contributing.md, release.md with all instructions (#176)
Add section for setting index and time index to start notebook (#179)
Rename changelog to Release Notes (#193)
Add section for standard tags to start notebook (#188)
Add Understanding Types and Tags user guide (#201)
Add missing docstring to
list_logical_types
(#202)Add Woodwork Global Configuration Options guide (#215)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
- v0.0.2 September 28, 2020
- Fixes
Fix formatting issue when printing global config variables (#138)
- Documentation Changes
Add working code example to README and create Using Woodwork page (#103)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
- v0.1.0 September 24, 2020
Add
natural_language_threshold
global config option used for Categorical/NaturalLanguage type inference (#135)Add global config options and add
datetime_format
option for type inference (#134)Fix bug with Integer and WholeNumber inference in column with
pd.NA
values (#133)Add
DataTable.ltypes
property to return series of logical types (#131)Add ability to create new datatable from specified columns with
dt[[columns]]
(#127)Handle setting and tagging of index and time index columns (#125)
Add combined tag and ltype selection (#124)
Add changelog, and update changelog check to CI (#123)
Implement
reset_semantic_tags
(#118)Implement DataTable getitem (#119)
Add
remove_semantic_tags
method (#117)Add semantic tag selection (#106)
Add github action, rename to woodwork (#113)
Add license to setup.py (#112)
Reset semantic tags on logical type change (#107)
Add standard numeric and category tags (#100)
Change
semantic_types
tosemantic_tags
, a set of strings (#100)Update dataframe dtypes based on logical types (#94)
Add
select_logical_types
to DataTable (#96)Add pygments to dev-requirements.txt (#97)
Add replacing None with np.nan in DataTable init (#87)
Refactor DataColumn to make
semantic_types
andlogical_type
private (#86)Add pandas_dtype to each Logical Type, and remove dtype attribute on DataColumn (#85)
Add set_semantic_types methods on both DataTable and DataColumn (#75)
Support passing camel case or snake case strings for setting logical types (#74)
Improve flexibility when setting semantic types (#72)
Add Whole Number Inference of Logical Types (#66)
Add
dtypes
property to DataTables andrepr
for DataColumn (#61)Allow specification of semantic types during DataTable creation (#69)
Implements
set_logical_types
on DataTable (#65)Add init files to tests to fix code coverage (#60)
Add AutoAssign bot (#59)
Add logical types validation in DataTables (#49)
Fix working_directory in CI (#57)
Add
infer_logical_types
for DataColumn (#45)Add code coverage (#51)
Improve and refactor the validation checks during initialization of a DataTable (#40)
Add dataframe attribute to DataTable (#39)
Update ReadME with minor usage details (#37)
Add License (#34)
Rename from datatables to datatables (#4)
Add Logical Types, DataTable, DataColumn (#3)
Add Makefile, setup.py, requirements.txt (#2)
Initial Release (#1)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd