Enhancements Add support for automatically inferring the URL and IPAddress logical types (#1122, #1124) Add get_valid_mi_columns method to list columns that have valid logical types for mutual information calculation (#1129) Add attribute to check if column has a nullable logical type (#1127) Changes Update get_invalid_schema_message to improve performance (#1132) Documentation Changes Fix typo in the “Get Started” documentation (#1126) Clean up the logical types guide (#1134) Thanks to the following people for contributing to this release: @ajaypallekonda, @davesque, @jeff-hernandez, @thehomebrewnerd
Add support for automatically inferring the URL and IPAddress logical types (#1122, #1124)
URL
IPAddress
Add get_valid_mi_columns method to list columns that have valid logical types for mutual information calculation (#1129)
get_valid_mi_columns
Add attribute to check if column has a nullable logical type (#1127)
Update get_invalid_schema_message to improve performance (#1132)
get_invalid_schema_message
Fix typo in the “Get Started” documentation (#1126)
Clean up the logical types guide (#1134)
Thanks to the following people for contributing to this release: @ajaypallekonda, @davesque, @jeff-hernandez, @thehomebrewnerd
Fixes Validate schema’s index if being used in partial schema init (#1115) Allow falsy index, time index, and name values to be set along with partial schema at init (#1115) Thanks to the following people for contributing to this release: @tamargrey
Validate schema’s index if being used in partial schema init (#1115)
Allow falsy index, time index, and name values to be set along with partial schema at init (#1115)
Thanks to the following people for contributing to this release: @tamargrey
Enhancements Add 'passthrough' and 'ignore' to tags in list_semantic_tags (#1094) Add initialize with partial table schema (#1100) Apply ordering specified by the Ordinal logical type to underlying series (#1097) Add AgeFractional logical type (#1112) Thanks to the following people for contributing to this release: @davesque, @jeff-hernandez, @tamargrey, @tuethan1999
Add 'passthrough' and 'ignore' to tags in list_semantic_tags (#1094)
'passthrough'
'ignore'
list_semantic_tags
Add initialize with partial table schema (#1100)
Apply ordering specified by the Ordinal logical type to underlying series (#1097)
Ordinal
Add AgeFractional logical type (#1112)
AgeFractional
Thanks to the following people for contributing to this release: @davesque, @jeff-hernandez, @tamargrey, @tuethan1999
:pr:1100: The behavior for init has changed. A full schema is a schema that contains all of the columns of the dataframe it describes whereas a partial schema only contains a subset. A full schema will also require that the schema is valid without having to make any changes to the DataFrame. Before, only a full schema was permitted by the init method so passing a partial schema would error. Additionally, any parameters like logical_types would be ignored if passing in a schema. Now, passing a partial schema to the init method calls the init_with_partial_schema method instead of throwing an error. Information from keyword arguments will override information from the partial schema. For example, if column a has the Integer Logical Type in the partial schema, it’s possible to use the logical_type argument to reinfer it’s logical type by passing {'a': None} or force a type by passing in {'a': Double}. These changes mean that Woodwork init is less restrictive. If no type inference takes place and no changes are required of the DataFrame at initialization, init_with_full_schema should be used instead of init. init_with_full_schema maintains the same functionality as when a schema was passed to the old init.
1100
init
logical_types
init_with_partial_schema
a
logical_type
{'a': None}
{'a': Double}
init_with_full_schema
Fixes Fix bug in _infer_datetime_format with all np.nan input (#1089) Changes The criteria for categorical type inference have changed (#1065) The meaning of both the categorical_threshold and numeric_categorical_threshold settings have changed (#1065) Make sampling for type inference more consistent (#1083) Accessor logic checking if Woodwork has been initialized moved to decorator (#1093) Documentation Changes Fix some release notes that ended up under the wrong release (#1082) Add BooleanNullable and IntegerNullable types to the docs (#1085) Add guide for saving and loading Woodwork DataFrames (#1066) Add in-depth guide on logical types and semantic tags (#1086) Testing Changes Add additional reviewers to minimum and latest dependency checkers (#1070, #1073, #1077) Update the sample_df fixture to have more logical_type coverage (#1058) Thanks to the following people for contributing to this release: @davesque, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
Fix bug in _infer_datetime_format with all np.nan input (#1089)
_infer_datetime_format
np.nan
The criteria for categorical type inference have changed (#1065)
The meaning of both the categorical_threshold and numeric_categorical_threshold settings have changed (#1065)
categorical_threshold
numeric_categorical_threshold
Make sampling for type inference more consistent (#1083)
Accessor logic checking if Woodwork has been initialized moved to decorator (#1093)
Fix some release notes that ended up under the wrong release (#1082)
Add BooleanNullable and IntegerNullable types to the docs (#1085)
Add guide for saving and loading Woodwork DataFrames (#1066)
Add in-depth guide on logical types and semantic tags (#1086)
Add additional reviewers to minimum and latest dependency checkers (#1070, #1073, #1077)
Update the sample_df fixture to have more logical_type coverage (#1058)
Thanks to the following people for contributing to this release: @davesque, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
#1065: The criteria for categorical type inference have changed. Relatedly, the meaning of both the categorical_threshold and numeric_categorical_threshold settings have changed. Now, a categorical match is signaled when a series either has the “categorical” pandas dtype or if the ratio of unique value count (nan excluded) and total value count (nan also excluded) is below or equal to some fraction. The value used for this fraction is set by the categorical_threshold setting which now has a default value of 0.2. If a fraction is set for the numeric_categorical_threshold setting, then series with either a float or integer dtype may be inferred as categorical by applying the same logic described above with the numeric_categorical_threshold fraction. Otherwise, the numeric_categorical_threshold setting defaults to None which indicates that series with a numerical type should not be inferred as categorical. Users who have overridden either the categorical_threshold or numeric_categorical_threshold settings will need to adjust their settings accordingly. #1083: The process of sampling series for logical type inference was updated to be more consistent. Before, initial sampling for inference differed depending on collection type (pandas, dask, or koalas). Also, further randomized subsampling was performed in some cases during categorical inference and in every case during email inference regardless of collection type. Overall, the way sampling was done was inconsistent and unpredictable. Now, the first 100,000 records of a column are sampled for logical type inference regardless of collection type although only records from the first partition of a dask dataset will be used. Subsampling performed by the inference functions of individual types has been removed. The effect of these changes is that inferred types may now be different although in many cases they will be more correct.
#1065: The criteria for categorical type inference have changed. Relatedly, the meaning of both the categorical_threshold and numeric_categorical_threshold settings have changed. Now, a categorical match is signaled when a series either has the “categorical” pandas dtype or if the ratio of unique value count (nan excluded) and total value count (nan also excluded) is below or equal to some fraction. The value used for this fraction is set by the categorical_threshold setting which now has a default value of 0.2. If a fraction is set for the numeric_categorical_threshold setting, then series with either a float or integer dtype may be inferred as categorical by applying the same logic described above with the numeric_categorical_threshold fraction. Otherwise, the numeric_categorical_threshold setting defaults to None which indicates that series with a numerical type should not be inferred as categorical. Users who have overridden either the categorical_threshold or numeric_categorical_threshold settings will need to adjust their settings accordingly.
0.2
None
#1083: The process of sampling series for logical type inference was updated to be more consistent. Before, initial sampling for inference differed depending on collection type (pandas, dask, or koalas). Also, further randomized subsampling was performed in some cases during categorical inference and in every case during email inference regardless of collection type. Overall, the way sampling was done was inconsistent and unpredictable. Now, the first 100,000 records of a column are sampled for logical type inference regardless of collection type although only records from the first partition of a dask dataset will be used. Subsampling performed by the inference functions of individual types has been removed. The effect of these changes is that inferred types may now be different although in many cases they will be more correct.
Enhancements Store inferred datetime format on Datetime logical type instance (#1025) Add support for automatically inferring the EmailAddress logical type (#1047) Add feature origin attribute to schema (#1056) Add ability to calculate outliers and the statistical info required for box and whisker plots to WoodworkColumnAccessor (#1048) Add ability to change config settings in a with block with ww.config.with_options (#1062) Fixes Raises warning and removes tags when user adds a column with index tags to DataFrame (#1035) Changes Entirely null columns are now inferred as the Unknown logical type (#1043) Add helper functions that check for whether an object is a koalas/dask series or dataframe (#1055) TableAccessor.select method will now maintain dataframe column ordering in TableSchema columns (#1052) Documentation Changes Add supported types to metadata docstring (#1049) Thanks to the following people for contributing to this release: @davesque, @frances-h, @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd
Store inferred datetime format on Datetime logical type instance (#1025)
Add support for automatically inferring the EmailAddress logical type (#1047)
EmailAddress
Add feature origin attribute to schema (#1056)
Add ability to calculate outliers and the statistical info required for box and whisker plots to WoodworkColumnAccessor (#1048)
WoodworkColumnAccessor
Add ability to change config settings in a with block with ww.config.with_options (#1062)
ww.config.with_options
Raises warning and removes tags when user adds a column with index tags to DataFrame (#1035)
Entirely null columns are now inferred as the Unknown logical type (#1043)
Add helper functions that check for whether an object is a koalas/dask series or dataframe (#1055)
TableAccessor.select method will now maintain dataframe column ordering in TableSchema columns (#1052)
TableAccessor.select
Add supported types to metadata docstring (#1049)
Thanks to the following people for contributing to this release: @davesque, @frances-h, @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd
Enhancements Add support for numpy array inputs to Woodwork (#1023) Add support for pandas.api.extensions.ExtensionArray inputs to Woodwork (#1026) Fixes Add input validation to ww.init_series (#1015) Changes Remove lines in LogicalType.transform that raise error if dtype conflicts (#1012) Add infer_datetime_format param to speed up to_datetime calls (#1016) The default logical type is now the Unknown type instead of the NaturalLanguage type (#992) Add pandas 1.3.0 compatibility (#987) Thanks to the following people for contributing to this release: @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd, @tuethan1999
Add support for numpy array inputs to Woodwork (#1023)
Add support for pandas.api.extensions.ExtensionArray inputs to Woodwork (#1026)
Add input validation to ww.init_series (#1015)
Remove lines in LogicalType.transform that raise error if dtype conflicts (#1012)
LogicalType.transform
Add infer_datetime_format param to speed up to_datetime calls (#1016)
infer_datetime_format
to_datetime
The default logical type is now the Unknown type instead of the NaturalLanguage type (#992)
Unknown
NaturalLanguage
Add pandas 1.3.0 compatibility (#987)
Thanks to the following people for contributing to this release: @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd, @tuethan1999
The default logical type is now the Unknown type instead of the NaturalLanguage type. The global config natural_language_threshold has been renamed to categorical_threshold.
natural_language_threshold
Enhancements Pass additional progress information in callback functions (#979) Add the ability to generate optional extra stats with DataFrame.ww.describe_dict (#988) Add option to read and write orc files (#997) Retain schema when calling series.ww.to_frame() (#1004) Fixes Raise type conversion error in Datetime logical type (#1001) Try collections.abc to avoid deprecation warning (#1010) Changes Remove make_index parameter from DataFrame.ww.init (#1000) Remove version restriction for dask requirements (#998) Documentation Changes Add instructions for installing the update checker (#993) Disable pdf format with documentation build (#1002) Silence deprecation warnings in documentation build (#1008) Temporarily remove update checker to fix docs warnings (#1011) Testing Changes Add env setting to update checker (#978, #994) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd, @tuethan1999
Pass additional progress information in callback functions (#979)
Add the ability to generate optional extra stats with DataFrame.ww.describe_dict (#988)
DataFrame.ww.describe_dict
Add option to read and write orc files (#997)
Retain schema when calling series.ww.to_frame() (#1004)
series.ww.to_frame()
Raise type conversion error in Datetime logical type (#1001)
Datetime
Try collections.abc to avoid deprecation warning (#1010)
Remove make_index parameter from DataFrame.ww.init (#1000)
make_index
DataFrame.ww.init
Remove version restriction for dask requirements (#998)
Add instructions for installing the update checker (#993)
Disable pdf format with documentation build (#1002)
Silence deprecation warnings in documentation build (#1008)
Temporarily remove update checker to fix docs warnings (#1011)
Add env setting to update checker (#978, #994)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd, @tuethan1999
Progress callback functions parameters have changed and progress is now being reported in the units specified by the unit of measurement parameter instead of percentage of total. Progress callback functions now are expected to accept the following five parameters: progress increment since last call progress units complete so far total units to complete the progress unit of measurement time elapsed since start of calculation DataFrame.ww.init no longer accepts the make_index parameter
Progress callback functions parameters have changed and progress is now being reported in the units specified by the unit of measurement parameter instead of percentage of total. Progress callback functions now are expected to accept the following five parameters:
progress increment since last call progress units complete so far total units to complete the progress unit of measurement time elapsed since start of calculation
progress increment since last call
progress units complete so far
total units to complete
the progress unit of measurement
time elapsed since start of calculation
DataFrame.ww.init no longer accepts the make_index parameter
Enhancements Add concat_columns util function to concatenate multiple Woodwork objects into one, retaining typing information (#932) Add option to pass progress callback function to mutual information functions (#958) Add optional automatic update checker (#959, #970) Fixes Fix issue related to serialization/deserialization of data with whitespace and newline characters (#957) Update to allow initializing a ColumnSchema object with an Ordinal logical type without order values (#972) Changes Change write_dataframe to only copy dataframe if it contains LatLong (#955) Testing Changes Fix bug in test_list_logical_types_default (#954) Update minimum unit tests to run on all pull requests (#952) Pass token to authorize uploading of codecov reports (#969) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @tamargrey, @thehomebrewnerd
Add concat_columns util function to concatenate multiple Woodwork objects into one, retaining typing information (#932)
concat_columns
Add option to pass progress callback function to mutual information functions (#958)
Add optional automatic update checker (#959, #970)
Fix issue related to serialization/deserialization of data with whitespace and newline characters (#957)
Update to allow initializing a ColumnSchema object with an Ordinal logical type without order values (#972)
ColumnSchema
Change write_dataframe to only copy dataframe if it contains LatLong (#955)
Fix bug in test_list_logical_types_default (#954)
test_list_logical_types_default
Update minimum unit tests to run on all pull requests (#952)
Pass token to authorize uploading of codecov reports (#969)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @tamargrey, @thehomebrewnerd
Enhancements Add option to return TableSchema instead of DataFrame from table accessor select method (#916) Add option to read and write arrow/feather files (#948) Add dropping and renaming columns inplace (#920) Add option to pass progress callback function to mutual information functions (#943) Fixes Fix bug when setting table name and metadata through accessor (#942) Fix bug in which the dtype of category values were not restored properly on deserialization (#949) Changes Add logical type method to transform data (#915) Testing Changes Update when minimum unit tests will run to include minimum text files (#917) Create separate workflows for each CI job (#919) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @thehomebrewnerd, @tuethan1999
Add option to return TableSchema instead of DataFrame from table accessor select method (#916)
TableSchema
DataFrame
select
Add option to read and write arrow/feather files (#948)
Add dropping and renaming columns inplace (#920)
Add option to pass progress callback function to mutual information functions (#943)
Fix bug when setting table name and metadata through accessor (#942)
Fix bug in which the dtype of category values were not restored properly on deserialization (#949)
Add logical type method to transform data (#915)
Update when minimum unit tests will run to include minimum text files (#917)
Create separate workflows for each CI job (#919)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @thehomebrewnerd, @tuethan1999
Warning This Woodwork release uses a weak reference for maintaining a reference from the accessor to the DataFrame. Because of this, chaining a Woodwork call onto another call that creates a new DataFrame or Series object can be problematic. Instead of calling pd.DataFrame({'id':[1, 2, 3]}).ww.init(), first store the DataFrame in a new variable and then initialize Woodwork: df = pd.DataFrame({'id':[1, 2, 3]}) df.ww.init() Enhancements Add deep parameter to Woodwork Accessor and Schema equality checks (#889) Add support for reading from parquet files to woodwork.read_file (#909) Changes Remove command line functions for list logical and semantic tags (#891) Keep index and time index tags for single column when selecting from a table (#888) Update accessors to store weak reference to data (#894) Documentation Changes Update nbsphinx version to fix docs build issue (#911, #913) Testing Changes Use Minimum Dependency Generator GitHub Action and remove tools folder (#897) Move all latest and minimum dependencies into 1 folder (#912) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd
Warning
This Woodwork release uses a weak reference for maintaining a reference from the accessor to the DataFrame. Because of this, chaining a Woodwork call onto another call that creates a new DataFrame or Series object can be problematic.
Instead of calling pd.DataFrame({'id':[1, 2, 3]}).ww.init(), first store the DataFrame in a new variable and then initialize Woodwork:
pd.DataFrame({'id':[1, 2, 3]}).ww.init()
df = pd.DataFrame({'id':[1, 2, 3]}) df.ww.init()
Add deep parameter to Woodwork Accessor and Schema equality checks (#889)
deep
Add support for reading from parquet files to woodwork.read_file (#909)
woodwork.read_file
Remove command line functions for list logical and semantic tags (#891)
Keep index and time index tags for single column when selecting from a table (#888)
Update accessors to store weak reference to data (#894)
Update nbsphinx version to fix docs build issue (#911, #913)
Use Minimum Dependency Generator GitHub Action and remove tools folder (#897)
Move all latest and minimum dependencies into 1 folder (#912)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd
The command line functions python -m woodwork list-logical-types and python -m woodwork list-semantic-tags no longer exist. Please call the underlying Python functions ww.list_logical_types() and ww.list_semantic_tags().
python -m woodwork list-logical-types
python -m woodwork list-semantic-tags
ww.list_logical_types()
ww.list_semantic_tags()
Enhancements Add is_schema_valid and get_invalid_schema_message functions for checking schema validity (#834) Add logical type for Age and AgeNullable (#849) Add logical type for Address (#858) Add generic to_disk function to save Woodwork schema and data (#872) Add generic read_file function to read file as Woodwork DataFrame (#878) Fixes Raise error when a column is set as the index and time index (#859) Allow NaNs in index for schema validation check (#862) Fix bug where invalid casting to Boolean would not raise error (#863) Changes Consistently use ColumnNotPresentError for mismatches between user input and dataframe/schema columns (#837) Raise custom WoodworkNotInitError when accessing Woodwork attributes before initialization (#838) Remove check requiring Ordinal instance for initializing a ColumnSchema object (#870) Increase koalas min version to 1.8.0 (#885) Documentation Changes Improve formatting of release notes (#874) Testing Changes Remove unnecessary argument in codecov upload job (#853) Change from GitHub Token to regenerated GitHub PAT dependency checkers (#855) Update README.md with non-nullable dtypes in code example (#856) Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Add is_schema_valid and get_invalid_schema_message functions for checking schema validity (#834)
is_schema_valid
Add logical type for Age and AgeNullable (#849)
Age
AgeNullable
Add logical type for Address (#858)
Address
Add generic to_disk function to save Woodwork schema and data (#872)
to_disk
Add generic read_file function to read file as Woodwork DataFrame (#878)
read_file
Raise error when a column is set as the index and time index (#859)
Allow NaNs in index for schema validation check (#862)
Fix bug where invalid casting to Boolean would not raise error (#863)
Boolean
Consistently use ColumnNotPresentError for mismatches between user input and dataframe/schema columns (#837)
ColumnNotPresentError
Raise custom WoodworkNotInitError when accessing Woodwork attributes before initialization (#838)
WoodworkNotInitError
Remove check requiring Ordinal instance for initializing a ColumnSchema object (#870)
Increase koalas min version to 1.8.0 (#885)
Improve formatting of release notes (#874)
Remove unnecessary argument in codecov upload job (#853)
Change from GitHub Token to regenerated GitHub PAT dependency checkers (#855)
Update README.md with non-nullable dtypes in code example (#856)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Woodwork tables can no longer be saved using to disk df.ww.to_csv, df.ww.to_pickle, or df.ww.to_parquet. Use df.ww.to_disk instead. The read_csv function has been replaced by read_file.
Woodwork tables can no longer be saved using to disk df.ww.to_csv, df.ww.to_pickle, or df.ww.to_parquet. Use df.ww.to_disk instead.
df.ww.to_csv
df.ww.to_pickle
df.ww.to_parquet
df.ww.to_disk
The read_csv function has been replaced by read_file.
read_csv
Warning This Woodwork release does not support Python 3.6 Enhancements Add validation control to WoodworkTableAccessor (#736) Store make_index value on WoodworkTableAccessor (#780) Add optional exclude parameter to WoodworkTableAccessor select method (#783) Add validation control to deserialize.read_woodwork_table and ww.read_csv (#788) Add WoodworkColumnAccessor.schema and handle copying column schema (#799) Allow initializing a WoodworkColumnAccessor with a ColumnSchema (#814) Add __repr__ to ColumnSchema (#817) Add BooleanNullable and IntegerNullable logical types (#830) Add validation control to WoodworkColumnAccessor (#833) Changes Rename FullName logical type to PersonFullName (#740) Rename ZIPCode logical type to PostalCode (#741) Fix issue with smart-open version 5.0.0 (#750, #758) Update minimum scikit-learn version to 0.22 (#763) Drop support for Python version 3.6 (#768) Remove ColumnNameMismatchWarning (#777) get_column_dict does not use standard tags by default (#782) Make logical_type and name params to _get_column_dict optional (#786) Rename Schema object and files to match new table-column schema structure (#789) Store column typing information in a ColumnSchema object instead of a dictionary (#791) TableSchema does not use standard tags by default (#806) Store use_standard_tags on the ColumnSchema instead of the TableSchema (#809) Move functions in column_schema.py to be methods on ColumnSchema (#829) Documentation Changes Update Pygments version requirement (#751) Update spark config for docs build (#787, #801, #810) Testing Changes Add unit tests against minimum dependencies for python 3.6 on PRs and main (#743, #753, #763) Update spark config for test fixtures (#787) Separate latest unit tests into pandas, dask, koalas (#813) Update latest dependency checker to generate separate core, koalas, and dask dependencies (#815, #825) Ignore latest dependency branch when checking for updates to the release notes (#827) Change from GitHub PAT to auto generated GitHub Token for dependency checker (#831) Expand ColumnSchema semantic tag testing coverage and null logical_type testing coverage (#832) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
This Woodwork release does not support Python 3.6
Add validation control to WoodworkTableAccessor (#736)
Store make_index value on WoodworkTableAccessor (#780)
Add optional exclude parameter to WoodworkTableAccessor select method (#783)
exclude
Add validation control to deserialize.read_woodwork_table and ww.read_csv (#788)
deserialize.read_woodwork_table
ww.read_csv
Add WoodworkColumnAccessor.schema and handle copying column schema (#799)
WoodworkColumnAccessor.schema
Allow initializing a WoodworkColumnAccessor with a ColumnSchema (#814)
Add __repr__ to ColumnSchema (#817)
__repr__
Add BooleanNullable and IntegerNullable logical types (#830)
BooleanNullable
IntegerNullable
Add validation control to WoodworkColumnAccessor (#833)
Rename FullName logical type to PersonFullName (#740)
FullName
PersonFullName
Rename ZIPCode logical type to PostalCode (#741)
ZIPCode
PostalCode
Fix issue with smart-open version 5.0.0 (#750, #758)
Update minimum scikit-learn version to 0.22 (#763)
Drop support for Python version 3.6 (#768)
Remove ColumnNameMismatchWarning (#777)
ColumnNameMismatchWarning
get_column_dict does not use standard tags by default (#782)
get_column_dict
Make logical_type and name params to _get_column_dict optional (#786)
name
_get_column_dict
Rename Schema object and files to match new table-column schema structure (#789)
Store column typing information in a ColumnSchema object instead of a dictionary (#791)
TableSchema does not use standard tags by default (#806)
Store use_standard_tags on the ColumnSchema instead of the TableSchema (#809)
use_standard_tags
Move functions in column_schema.py to be methods on ColumnSchema (#829)
column_schema.py
Update Pygments version requirement (#751)
Update spark config for docs build (#787, #801, #810)
Add unit tests against minimum dependencies for python 3.6 on PRs and main (#743, #753, #763)
Update spark config for test fixtures (#787)
Separate latest unit tests into pandas, dask, koalas (#813)
Update latest dependency checker to generate separate core, koalas, and dask dependencies (#815, #825)
Ignore latest dependency branch when checking for updates to the release notes (#827)
Change from GitHub PAT to auto generated GitHub Token for dependency checker (#831)
Expand ColumnSchema semantic tag testing coverage and null logical_type testing coverage (#832)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
The ZIPCode logical type has been renamed to PostalCode The FullName logical type has been renamed to PersonFullName The Schema object has been renamed to TableSchema With the ColumnSchema object, typing information for a column can no longer be accessed with df.ww.columns[col_name]['logical_type']. Instead use df.ww.columns[col_name].logical_type. The Boolean and Integer logical types will no longer work with data that contains null values. The new BooleanNullable and IntegerNullable logical types should be used if null values are present.
The ZIPCode logical type has been renamed to PostalCode
The FullName logical type has been renamed to PersonFullName
The Schema object has been renamed to TableSchema
Schema
With the ColumnSchema object, typing information for a column can no longer be accessed with df.ww.columns[col_name]['logical_type']. Instead use df.ww.columns[col_name].logical_type.
df.ww.columns[col_name]['logical_type']
df.ww.columns[col_name].logical_type
The Boolean and Integer logical types will no longer work with data that contains null values. The new BooleanNullable and IntegerNullable logical types should be used if null values are present.
Integer
Enhancements Implement Schema and Accessor API (#497) Add Schema class that holds typing info (#499) Add WoodworkTableAccessor class that performs type inference and stores Schema (#514) Allow initializing Accessor schema with a valid Schema object (#522) Add ability to read in a csv and create a DataFrame with an initialized Woodwork Schema (#534) Add ability to call pandas methods from Accessor (#538, #589) Add helpers for checking if a column is one of Boolean, Datetime, numeric, or categorical (#553) Add ability to load demo retail dataset with a Woodwork Accessor (#556) Add select to WoodworkTableAccessor (#548) Add mutual_information to WoodworkTableAccessor (#571) Add WoodworkColumnAccessor class (#562) Add semantic tag update methods to column accessor (#573) Add describe and describe_dict to WoodworkTableAccessor (#579) Add init_series util function for initializing a series with dtype change (#581) Add set_logical_type method to WoodworkColumnAccessor (#590) Add semantic tag update methods to table schema (#591) Add warning if additional parameters are passed along with schema (#593) Better warning when accessing column properties before init (#596) Update column accessor to work with LatLong columns (#598) Add set_index to WoodworkTableAccessor (#603) Implement loc and iloc for WoodworkColumnAccessor (#613) Add set_time_index to WoodworkTableAccessor (#612) Implement loc and iloc for WoodworkTableAccessor (#618) Allow updating logical types with set_types and make relevant DataFrame changes (#619) Allow serialization of WoodworkColumnAccessor to csv, pickle, and parquet (#624) Add DaskColumnAccessor (#625) Allow deserialization from csv, pickle, and parquet to Woodwork table (#626) Add value_counts to WoodworkTableAccessor (#632) Add KoalasColumnAccessor (#634) Add pop to WoodworkTableAccessor (#636) Add drop to WoodworkTableAccessor (#640) Add rename to WoodworkTableAccessor (#646) Add DaskTableAccessor (#648) Add Schema properties to WoodworkTableAccessor (#651) Add KoalasTableAccessor (#652) Adds __getitem__ to WoodworkTableAccessor (#633) Update Koalas min version and add support for more new pandas dtypes with Koalas (#678) Adds __setitem__ to WoodworkTableAccessor (#669) Fixes Create new Schema object when performing pandas operation on Accessors (#595) Fix bug in _reset_semantic_tags causing columns to share same semantic tags set (#666) Maintain column order in DataFrame and Woodwork repr (#677) Changes Move mutual information logic to statistics utils file (#584) Bump min Koalas version to 1.4.0 (#638) Preserve pandas underlying index when not creating a Woodwork index (#664) Restrict Koalas version to <1.7.0 due to breaking changes (#674) Clean up dtype usage across Woodwork (#682) Improve error when calling accessor properties or methods before init (#683) Remove dtype from Schema dictionary (#685) Add include_index param and allow unique columns in Accessor mutual information (#699) Include DataFrame equality and use_standard_tags in WoodworkTableAccessor equality check (#700) Remove DataTable and DataColumn classes to migrate towards the accessor approach (#713) Change sample_series dtype to not need conversion and remove convert_series util (#720) Rename Accessor methods since DataTable has been removed (#723) Documentation Changes Update README.md and Get Started guide to use accessor (#655, #717) Update Understanding Types and Tags guide to use accessor (#657) Update docstrings and API Reference page (#660) Update statistical insights guide to use accessor (#693) Update Customizing Type Inference guide to use accessor (#696) Update Dask and Koalas guide to use accessor (#701) Update index notebook and install guide to use accessor (#715) Add section to documentation about schema validity (#729) Update README.md and Get Started guide to use pd.read_csv (#730) Make small fixes to documentation formatting (#731) Testing Changes Add tests to Accessor/Schema that weren’t previously covered (#712, #716) Update release branch name in notes update check (#719) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey, @thehomebrewnerd
Implement Schema and Accessor API (#497)
Add Schema class that holds typing info (#499)
Add WoodworkTableAccessor class that performs type inference and stores Schema (#514)
Allow initializing Accessor schema with a valid Schema object (#522)
Add ability to read in a csv and create a DataFrame with an initialized Woodwork Schema (#534)
Add ability to call pandas methods from Accessor (#538, #589)
Add helpers for checking if a column is one of Boolean, Datetime, numeric, or categorical (#553)
Add ability to load demo retail dataset with a Woodwork Accessor (#556)
Add select to WoodworkTableAccessor (#548)
Add mutual_information to WoodworkTableAccessor (#571)
mutual_information
Add WoodworkColumnAccessor class (#562)
Add semantic tag update methods to column accessor (#573)
Add describe and describe_dict to WoodworkTableAccessor (#579)
describe
describe_dict
Add init_series util function for initializing a series with dtype change (#581)
init_series
Add set_logical_type method to WoodworkColumnAccessor (#590)
set_logical_type
Add semantic tag update methods to table schema (#591)
Add warning if additional parameters are passed along with schema (#593)
Better warning when accessing column properties before init (#596)
Update column accessor to work with LatLong columns (#598)
Add set_index to WoodworkTableAccessor (#603)
set_index
Implement loc and iloc for WoodworkColumnAccessor (#613)
loc
iloc
Add set_time_index to WoodworkTableAccessor (#612)
set_time_index
Implement loc and iloc for WoodworkTableAccessor (#618)
Allow updating logical types with set_types and make relevant DataFrame changes (#619)
set_types
Allow serialization of WoodworkColumnAccessor to csv, pickle, and parquet (#624)
Add DaskColumnAccessor (#625)
Allow deserialization from csv, pickle, and parquet to Woodwork table (#626)
Add value_counts to WoodworkTableAccessor (#632)
value_counts
Add KoalasColumnAccessor (#634)
Add pop to WoodworkTableAccessor (#636)
pop
Add drop to WoodworkTableAccessor (#640)
drop
Add rename to WoodworkTableAccessor (#646)
rename
Add DaskTableAccessor (#648)
Add Schema properties to WoodworkTableAccessor (#651)
Add KoalasTableAccessor (#652)
Adds __getitem__ to WoodworkTableAccessor (#633)
__getitem__
Update Koalas min version and add support for more new pandas dtypes with Koalas (#678)
Adds __setitem__ to WoodworkTableAccessor (#669)
__setitem__
Create new Schema object when performing pandas operation on Accessors (#595)
Fix bug in _reset_semantic_tags causing columns to share same semantic tags set (#666)
_reset_semantic_tags
Maintain column order in DataFrame and Woodwork repr (#677)
Move mutual information logic to statistics utils file (#584)
Bump min Koalas version to 1.4.0 (#638)
Preserve pandas underlying index when not creating a Woodwork index (#664)
Restrict Koalas version to <1.7.0 due to breaking changes (#674)
<1.7.0
Clean up dtype usage across Woodwork (#682)
Improve error when calling accessor properties or methods before init (#683)
Remove dtype from Schema dictionary (#685)
Add include_index param and allow unique columns in Accessor mutual information (#699)
include_index
Include DataFrame equality and use_standard_tags in WoodworkTableAccessor equality check (#700)
Remove DataTable and DataColumn classes to migrate towards the accessor approach (#713)
DataTable
DataColumn
Change sample_series dtype to not need conversion and remove convert_series util (#720)
sample_series
convert_series
Rename Accessor methods since DataTable has been removed (#723)
Update README.md and Get Started guide to use accessor (#655, #717)
Update Understanding Types and Tags guide to use accessor (#657)
Update docstrings and API Reference page (#660)
Update statistical insights guide to use accessor (#693)
Update Customizing Type Inference guide to use accessor (#696)
Update Dask and Koalas guide to use accessor (#701)
Update index notebook and install guide to use accessor (#715)
Add section to documentation about schema validity (#729)
Update README.md and Get Started guide to use pd.read_csv (#730)
pd.read_csv
Make small fixes to documentation formatting (#731)
Add tests to Accessor/Schema that weren’t previously covered (#712, #716)
Update release branch name in notes update check (#719)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey, @thehomebrewnerd
The DataTable and DataColumn classes have been removed and replaced by new WoodworkTableAccessor and WoodworkColumnAccessor classes which are used through the ww namespace available on DataFrames after importing Woodwork.
WoodworkTableAccessor
ww
Changes Restrict Koalas version to <1.7.0 due to breaking changes (#674) Include unique columns in mutual information calculations (#687) Add parameter to include index column in mutual information calculations (#692) Documentation Changes Update to remove warning message from statistical insights guide (#690) Testing Changes Update branch reference in tests to run on main (#641) Make release notes updated check separate from unit tests (#642) Update release branch naming instructions (#644) Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
Include unique columns in mutual information calculations (#687)
Add parameter to include index column in mutual information calculations (#692)
Update to remove warning message from statistical insights guide (#690)
Update branch reference in tests to run on main (#641)
Make release notes updated check separate from unit tests (#642)
Update release branch naming instructions (#644)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
Changes Avoid calculating mutualinfo for non-unique columns (#563) Preserve underlying DataFrame index if index column is not specified (#588) Add blank issue template for creating issues (#630) Testing Changes Update branch reference in tests workflow (#552, #601) Fixed text on back arrow on install page (#564) Refactor test_datatable.py (#574) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey
Avoid calculating mutualinfo for non-unique columns (#563)
Preserve underlying DataFrame index if index column is not specified (#588)
Add blank issue template for creating issues (#630)
Update branch reference in tests workflow (#552, #601)
Fixed text on back arrow on install page (#564)
Refactor test_datatable.py (#574)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey
Enhancements Add Python 3.9 support without Koalas testing (#511) Add get_valid_mi_types function to list LogicalTypes valid for mutual information calculation (#517) Fixes Handle missing values in Datetime columns when calculating mutual information (#516) Support numpy 1.20.0 by restricting version for koalas and changing serialization error message (#532) Move Koalas option setting to DataTable init instead of import (#543) Documentation Changes Add Alteryx OSS Twitter link (#519) Update logo and add new favicon (#521) Multiple improvements to Getting Started page and guides (#527) Clean up API Reference and docstrings (#536) Added Open Graph for Twitter and Facebook (#544) Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
Add Python 3.9 support without Koalas testing (#511)
Add get_valid_mi_types function to list LogicalTypes valid for mutual information calculation (#517)
get_valid_mi_types
Handle missing values in Datetime columns when calculating mutual information (#516)
Support numpy 1.20.0 by restricting version for koalas and changing serialization error message (#532)
Move Koalas option setting to DataTable init instead of import (#543)
Add Alteryx OSS Twitter link (#519)
Update logo and add new favicon (#521)
Multiple improvements to Getting Started page and guides (#527)
Clean up API Reference and docstrings (#536)
Added Open Graph for Twitter and Facebook (#544)
Enhancements Add DataTable.df property for accessing the underling DataFrame (#470) Set index of underlying DataFrame to match DataTable index (#464) Fixes Sort underlying series when sorting dataframe (#468) Allow setting indices to current index without side effects (#474) Changes Fix release document with Github Actions link for CI (#462) Don’t allow registered LogicalTypes with the same name (#477) Move str_to_logical_type to TypeSystem class (#482) Remove pyarrow from core dependencies (#508) Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
Add DataTable.df property for accessing the underling DataFrame (#470)
DataTable.df
Set index of underlying DataFrame to match DataTable index (#464)
Sort underlying series when sorting dataframe (#468)
Allow setting indices to current index without side effects (#474)
Fix release document with Github Actions link for CI (#462)
Don’t allow registered LogicalTypes with the same name (#477)
Move str_to_logical_type to TypeSystem class (#482)
str_to_logical_type
Remove pyarrow from core dependencies (#508)
pyarrow
Enhancements Allow for user-defined logical types and inference functions in TypeSystem object (#424) Add __repr__ to DataTable (#425) Allow initializing DataColumn with numpy array (#430) Add drop to DataTable (#434) Migrate CI tests to Github Actions (#417, #441, #451) Add metadata to DataColumn for user-defined metadata (#447) Fixes Update DataColumn name when using setitem on column with no name (#426) Don’t allow pickle serialization for Koalas DataFrames (#432) Check DataTable metadata in equality check (#449) Propagate all attributes of DataTable in _new_dt_including (#454) Changes Update links to use alteryx org Github URL (#423) Support column names of any type allowed by the underlying DataFrame (#442) Use object dtype for LatLong columns for easy access to latitude and longitude values (#414) Restrict dask version to prevent 2020.12.0 release from being installed (#453) Lower minimum requirement for numpy to 1.15.4, and set pandas minimum requirement 1.1.1 (#459) Testing Changes Fix missing test coverage (#436) Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd
Allow for user-defined logical types and inference functions in TypeSystem object (#424)
Add __repr__ to DataTable (#425)
Allow initializing DataColumn with numpy array (#430)
Add drop to DataTable (#434)
Migrate CI tests to Github Actions (#417, #441, #451)
Add metadata to DataColumn for user-defined metadata (#447)
metadata
Update DataColumn name when using setitem on column with no name (#426)
Don’t allow pickle serialization for Koalas DataFrames (#432)
Check DataTable metadata in equality check (#449)
Propagate all attributes of DataTable in _new_dt_including (#454)
_new_dt_including
Update links to use alteryx org Github URL (#423)
Support column names of any type allowed by the underlying DataFrame (#442)
Use object dtype for LatLong columns for easy access to latitude and longitude values (#414)
object
Restrict dask version to prevent 2020.12.0 release from being installed (#453)
Lower minimum requirement for numpy to 1.15.4, and set pandas minimum requirement 1.1.1 (#459)
Fix missing test coverage (#436)
Enhancements Add support for creating DataTable from Koalas DataFrame (#327) Add ability to initialize DataTable with numpy array (#367) Add describe_dict method to DataTable (#405) Add mutual_information_dict method to DataTable (#404) Add metadata to DataTable for user-defined metadata (#392) Add update_dataframe method to DataTable to update underlying DataFrame (#407) Sort dataframe if time_index is specified, bypass sorting with already_sorted parameter. (#410) Add description attribute to DataColumn (#416) Implement DataColumn.__len__ and DataTable.__len__ (#415) Fixes Rename data_column.py datacolumn.py (#386) Rename data_table.py datatable.py (#387) Rename get_mutual_information mutual_information (#390) Changes Lower moto test requirement for serialization/deserialization (#376) Make Koalas an optional dependency installable with woodwork[koalas] (#378) Remove WholeNumber LogicalType from Woodwork (#380) Updates to LogicalTypes to support Koalas 1.4.0 (#393) Replace set_logical_types and set_semantic_tags with just set_types (#379) Remove copy_dataframe parameter from DataTable initialization (#398) Implement DataTable.__sizeof__ to return size of the underlying dataframe (#401) Include Datetime columns in mutual info calculation (#399) Maintain column order on DataTable operations (#406) Testing Changes Add pyarrow, dask, and koalas to automated dependency checks (#388) Use new version of pull request Github Action (#394) Improve parameterization for test_datatable_equality (#409) Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
Add support for creating DataTable from Koalas DataFrame (#327)
Add ability to initialize DataTable with numpy array (#367)
Add describe_dict method to DataTable (#405)
Add mutual_information_dict method to DataTable (#404)
mutual_information_dict
Add metadata to DataTable for user-defined metadata (#392)
Add update_dataframe method to DataTable to update underlying DataFrame (#407)
update_dataframe
Sort dataframe if time_index is specified, bypass sorting with already_sorted parameter. (#410)
time_index
already_sorted
Add description attribute to DataColumn (#416)
description
Implement DataColumn.__len__ and DataTable.__len__ (#415)
DataColumn.__len__
DataTable.__len__
Rename data_column.py datacolumn.py (#386)
data_column.py
datacolumn.py
Rename data_table.py datatable.py (#387)
data_table.py
datatable.py
Rename get_mutual_information mutual_information (#390)
get_mutual_information
Lower moto test requirement for serialization/deserialization (#376)
Make Koalas an optional dependency installable with woodwork[koalas] (#378)
Remove WholeNumber LogicalType from Woodwork (#380)
Updates to LogicalTypes to support Koalas 1.4.0 (#393)
Replace set_logical_types and set_semantic_tags with just set_types (#379)
set_logical_types
set_semantic_tags
Remove copy_dataframe parameter from DataTable initialization (#398)
copy_dataframe
Implement DataTable.__sizeof__ to return size of the underlying dataframe (#401)
DataTable.__sizeof__
Include Datetime columns in mutual info calculation (#399)
Maintain column order on DataTable operations (#406)
Add pyarrow, dask, and koalas to automated dependency checks (#388)
Use new version of pull request Github Action (#394)
Improve parameterization for test_datatable_equality (#409)
test_datatable_equality
Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
The DataTable.set_semantic_tags method was removed. DataTable.set_types can be used instead. The DataTable.set_logical_types method was removed. DataTable.set_types can be used instead. WholeNumber was removed from LogicalTypes. Columns that were previously inferred as WholeNumber will now be inferred as Integer. The DataTable.get_mutual_information was renamed to DataTable.mutual_information. The copy_dataframe parameter was removed from DataTable initialization.
The DataTable.set_semantic_tags method was removed. DataTable.set_types can be used instead.
DataTable.set_semantic_tags
DataTable.set_types
The DataTable.set_logical_types method was removed. DataTable.set_types can be used instead.
DataTable.set_logical_types
WholeNumber was removed from LogicalTypes. Columns that were previously inferred as WholeNumber will now be inferred as Integer.
WholeNumber
The DataTable.get_mutual_information was renamed to DataTable.mutual_information.
DataTable.get_mutual_information
DataTable.mutual_information
The copy_dataframe parameter was removed from DataTable initialization.
Enhancements Add __eq__ to DataTable and DataColumn and update LogicalType equality (#318) Add value_counts() method to DataTable (#342) Support serialization and deserialization of DataTables via csv, pickle, or parquet (#293) Add shape property to DataTable and DataColumn (#358) Add iloc method to DataTable and DataColumn (#365) Add numeric_categorical_threshold config value to allow inferring numeric columns as Categorical (#363) Add rename method to DataTable (#367) Fixes Catch non numeric time index at validation (#332) Changes Support logical type inference from a Dask DataFrame (#248) Fix validation checks and make_index to work with Dask DataFrames (#260) Skip validation of Ordinal order values for Dask DataFrames (#270) Improve support for datetimes with Dask input (#286) Update DataTable.describe to work with Dask input (#296) Update DataTable.get_mutual_information to work with Dask input (#300) Modify to_pandas function to return DataFrame with correct index (#281) Rename DataColumn.to_pandas method to DataColumn.to_series (#311) Rename DataTable.to_pandas method to DataTable.to_dataframe (#319) Remove UserWarning when no matching columns found (#325) Remove copy parameter from DataTable.to_dataframe and DataColumn.to_series (#338) Allow pandas ExtensionArrays as inputs to DataColumn (#343) Move warnings to a separate exceptions file and call via UserWarning subclasses (#348) Make Dask an optional dependency installable with woodwork[dask] (#357) Documentation Changes Create a guide for using Woodwork with Dask (#304) Add conda install instructions (#305, #309) Fix README.md badge with correct link (#314) Simplify issue templates to make them easier to use (#339) Remove extra output cell in Start notebook (#341) Testing Changes Parameterize numeric time index tests (#288) Add DockerHub credentials to CI testing environment (#326) Fix removing files for serialization test (#350) Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
Add __eq__ to DataTable and DataColumn and update LogicalType equality (#318)
__eq__
Add value_counts() method to DataTable (#342)
value_counts()
Support serialization and deserialization of DataTables via csv, pickle, or parquet (#293)
Add shape property to DataTable and DataColumn (#358)
shape
Add iloc method to DataTable and DataColumn (#365)
Add numeric_categorical_threshold config value to allow inferring numeric columns as Categorical (#363)
Add rename method to DataTable (#367)
Catch non numeric time index at validation (#332)
Support logical type inference from a Dask DataFrame (#248)
Fix validation checks and make_index to work with Dask DataFrames (#260)
Skip validation of Ordinal order values for Dask DataFrames (#270)
Improve support for datetimes with Dask input (#286)
Update DataTable.describe to work with Dask input (#296)
DataTable.describe
Update DataTable.get_mutual_information to work with Dask input (#300)
Modify to_pandas function to return DataFrame with correct index (#281)
to_pandas
Rename DataColumn.to_pandas method to DataColumn.to_series (#311)
DataColumn.to_pandas
DataColumn.to_series
Rename DataTable.to_pandas method to DataTable.to_dataframe (#319)
DataTable.to_pandas
DataTable.to_dataframe
Remove UserWarning when no matching columns found (#325)
Remove copy parameter from DataTable.to_dataframe and DataColumn.to_series (#338)
copy
Allow pandas ExtensionArrays as inputs to DataColumn (#343)
Move warnings to a separate exceptions file and call via UserWarning subclasses (#348)
Make Dask an optional dependency installable with woodwork[dask] (#357)
Create a guide for using Woodwork with Dask (#304)
Add conda install instructions (#305, #309)
Fix README.md badge with correct link (#314)
Simplify issue templates to make them easier to use (#339)
Remove extra output cell in Start notebook (#341)
Parameterize numeric time index tests (#288)
Add DockerHub credentials to CI testing environment (#326)
Fix removing files for serialization test (#350)
The DataColumn.to_pandas method was renamed to DataColumn.to_series. The DataTable.to_pandas method was renamed to DataTable.to_dataframe. copy is no longer a parameter of DataTable.to_dataframe or DataColumn.to_series.
The DataColumn.to_pandas method was renamed to DataColumn.to_series.
The DataTable.to_pandas method was renamed to DataTable.to_dataframe.
copy is no longer a parameter of DataTable.to_dataframe or DataColumn.to_series.
Enhancements Add optional include parameter for DataTable.describe() to filter results (#228) Add make_index parameter to DataTable.__init__ to enable optional creation of a new index column (#238) Add support for setting ranking order on columns with Ordinal logical type (#240) Add list_semantic_tags function and CLI to get dataframe of woodwork semantic_tags (#244) Add support for numeric time index on DataTable (#267) Add pop method to DataTable (#289) Add entry point to setup.py to run CLI commands (#285) Fixes Allow numeric datetime time indices (#282) Changes Remove redundant methods DataTable.select_ltypes and DataTable.select_semantic_tags (#239) Make results of get_mutual_information more clear by sorting and removing self calculation (#247) Lower minimum scikit-learn version to 0.21.3 (#297) Documentation Changes Add guide for dt.describe and dt.get_mutual_information (#245) Update README.md with documentation link (#261) Add footer to doc pages with Alteryx Open Source (#258) Add types and tags one-sentence definitions to Understanding Types and Tags guide (#271) Add issue and pull request templates (#280, #284) Testing Changes Add automated process to check latest dependencies. (#268) Add test for setting a time index with specified string logical type (#279) Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
Add optional include parameter for DataTable.describe() to filter results (#228)
include
DataTable.describe()
Add make_index parameter to DataTable.__init__ to enable optional creation of a new index column (#238)
DataTable.__init__
Add support for setting ranking order on columns with Ordinal logical type (#240)
Add list_semantic_tags function and CLI to get dataframe of woodwork semantic_tags (#244)
Add support for numeric time index on DataTable (#267)
Add pop method to DataTable (#289)
Add entry point to setup.py to run CLI commands (#285)
Allow numeric datetime time indices (#282)
Remove redundant methods DataTable.select_ltypes and DataTable.select_semantic_tags (#239)
DataTable.select_ltypes
DataTable.select_semantic_tags
Make results of get_mutual_information more clear by sorting and removing self calculation (#247)
Lower minimum scikit-learn version to 0.21.3 (#297)
Add guide for dt.describe and dt.get_mutual_information (#245)
dt.describe
dt.get_mutual_information
Update README.md with documentation link (#261)
Add footer to doc pages with Alteryx Open Source (#258)
Add types and tags one-sentence definitions to Understanding Types and Tags guide (#271)
Add issue and pull request templates (#280, #284)
Add automated process to check latest dependencies. (#268)
Add test for setting a time index with specified string logical type (#279)
Enhancements Implement setitem on DataTable to create/overwrite an existing DataColumn (#165) Add to_pandas method to DataColumn to access the underlying series (#169) Add list_logical_types function and CLI to get dataframe of woodwork LogicalTypes (#172) Add describe method to DataTable to generate statistics for the underlying data (#181) Add optional return_dataframe parameter to load_retail to return either DataFrame or DataTable (#189) Add get_mutual_information method to DataTable to generate mutual information between columns (#203) Add read_csv function to create DataTable directly from CSV file (#222) Fixes Fix bug causing incorrect values for quartiles in DataTable.describe method (#187) Fix bug in DataTable.describe that could cause an error if certain semantic tags were applied improperly (#190) Fix bug with instantiated LogicalTypes breaking when used with issubclass (#231) Changes Remove unnecessary add_standard_tags attribute from DataTable (#171) Remove standard tags from index column and do not return stats for index column from DataTable.describe (#196) Update DataColumn.set_semantic_tags and DataColumn.add_semantic_tags to return new objects (#205) Update various DataTable methods to return new objects rather than modifying in place (#210) Move datetime_format to Datetime LogicalType (#216) Do not calculate mutual info with index column in DataTable.get_mutual_information (#221) Move setting of underlying physical types from DataTable to DataColumn (#233) Documentation Changes Remove unused code from sphinx conf.py, update with Github URL(#160, #163) Update README and docs with new Woodwork logo, with better code snippets (#161, #159) Add DataTable and DataColumn to API Reference (#162) Add docstrings to LogicalType classes (#168) Add Woodwork image to index, clear outputs of Jupyter notebook in docs (#173) Update contributing.md, release.md with all instructions (#176) Add section for setting index and time index to start notebook (#179) Rename changelog to Release Notes (#193) Add section for standard tags to start notebook (#188) Add Understanding Types and Tags user guide (#201) Add missing docstring to list_logical_types (#202) Add Woodwork Global Configuration Options guide (#215) Testing Changes Add tests that confirm dtypes are as expected after DataTable init (#152) Remove unused none_df test fixture (#224) Add test for LogicalType.__str__ method (#225) Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
Implement setitem on DataTable to create/overwrite an existing DataColumn (#165)
Add to_pandas method to DataColumn to access the underlying series (#169)
Add list_logical_types function and CLI to get dataframe of woodwork LogicalTypes (#172)
Add describe method to DataTable to generate statistics for the underlying data (#181)
Add optional return_dataframe parameter to load_retail to return either DataFrame or DataTable (#189)
return_dataframe
load_retail
Add get_mutual_information method to DataTable to generate mutual information between columns (#203)
Add read_csv function to create DataTable directly from CSV file (#222)
Fix bug causing incorrect values for quartiles in DataTable.describe method (#187)
Fix bug in DataTable.describe that could cause an error if certain semantic tags were applied improperly (#190)
Fix bug with instantiated LogicalTypes breaking when used with issubclass (#231)
Remove unnecessary add_standard_tags attribute from DataTable (#171)
add_standard_tags
Remove standard tags from index column and do not return stats for index column from DataTable.describe (#196)
Update DataColumn.set_semantic_tags and DataColumn.add_semantic_tags to return new objects (#205)
DataColumn.set_semantic_tags
DataColumn.add_semantic_tags
Update various DataTable methods to return new objects rather than modifying in place (#210)
Move datetime_format to Datetime LogicalType (#216)
Do not calculate mutual info with index column in DataTable.get_mutual_information (#221)
Move setting of underlying physical types from DataTable to DataColumn (#233)
Remove unused code from sphinx conf.py, update with Github URL(#160, #163)
Update README and docs with new Woodwork logo, with better code snippets (#161, #159)
Add DataTable and DataColumn to API Reference (#162)
Add docstrings to LogicalType classes (#168)
Add Woodwork image to index, clear outputs of Jupyter notebook in docs (#173)
Update contributing.md, release.md with all instructions (#176)
Add section for setting index and time index to start notebook (#179)
Rename changelog to Release Notes (#193)
Add section for standard tags to start notebook (#188)
Add Understanding Types and Tags user guide (#201)
Add missing docstring to list_logical_types (#202)
list_logical_types
Add Woodwork Global Configuration Options guide (#215)
Add tests that confirm dtypes are as expected after DataTable init (#152)
Remove unused none_df test fixture (#224)
none_df
Add test for LogicalType.__str__ method (#225)
LogicalType.__str__
Fixes Fix formatting issue when printing global config variables (#138) Changes Change add_standard_tags to use_standard_Tags to better describe behavior (#149) Change access of underlying dataframe to be through to_pandas with ._dataframe field on class (#146) Remove replace_none parameter to DataTables (#146) Documentation Changes Add working code example to README and create Using Woodwork page (#103) Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
Fix formatting issue when printing global config variables (#138)
Change add_standard_tags to use_standard_Tags to better describe behavior (#149)
Change access of underlying dataframe to be through to_pandas with ._dataframe field on class (#146)
Remove replace_none parameter to DataTables (#146)
replace_none
Add working code example to README and create Using Woodwork page (#103)
Add natural_language_threshold global config option used for Categorical/NaturalLanguage type inference (#135) Add global config options and add datetime_format option for type inference (#134) Fix bug with Integer and WholeNumber inference in column with pd.NA values (#133) Add DataTable.ltypes property to return series of logical types (#131) Add ability to create new datatable from specified columns with dt[[columns]] (#127) Handle setting and tagging of index and time index columns (#125) Add combined tag and ltype selection (#124) Add changelog, and update changelog check to CI (#123) Implement reset_semantic_tags (#118) Implement DataTable getitem (#119) Add remove_semantic_tags method (#117) Add semantic tag selection (#106) Add github action, rename to woodwork (#113) Add license to setup.py (#112) Reset semantic tags on logical type change (#107) Add standard numeric and category tags (#100) Change semantic_types to semantic_tags, a set of strings (#100) Update dataframe dtypes based on logical types (#94) Add select_logical_types to DataTable (#96) Add pygments to dev-requirements.txt (#97) Add replacing None with np.nan in DataTable init (#87) Refactor DataColumn to make semantic_types and logical_type private (#86) Add pandas_dtype to each Logical Type, and remove dtype attribute on DataColumn (#85) Add set_semantic_types methods on both DataTable and DataColumn (#75) Support passing camel case or snake case strings for setting logical types (#74) Improve flexibility when setting semantic types (#72) Add Whole Number Inference of Logical Types (#66) Add dtypes property to DataTables and repr for DataColumn (#61) Allow specification of semantic types during DataTable creation (#69) Implements set_logical_types on DataTable (#65) Add init files to tests to fix code coverage (#60) Add AutoAssign bot (#59) Add logical types validation in DataTables (#49) Fix working_directory in CI (#57) Add infer_logical_types for DataColumn (#45) Fix ReadME library name, and code coverage badge (#56, #56) Add code coverage (#51) Improve and refactor the validation checks during initialization of a DataTable (#40) Add dataframe attribute to DataTable (#39) Update ReadME with minor usage details (#37) Add License (#34) Rename from datatables to datatables (#4) Add Logical Types, DataTable, DataColumn (#3) Add Makefile, setup.py, requirements.txt (#2) Initial Release (#1) Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
Add natural_language_threshold global config option used for Categorical/NaturalLanguage type inference (#135)
Add global config options and add datetime_format option for type inference (#134)
datetime_format
Fix bug with Integer and WholeNumber inference in column with pd.NA values (#133)
pd.NA
Add DataTable.ltypes property to return series of logical types (#131)
DataTable.ltypes
Add ability to create new datatable from specified columns with dt[[columns]] (#127)
dt[[columns]]
Handle setting and tagging of index and time index columns (#125)
Add combined tag and ltype selection (#124)
Add changelog, and update changelog check to CI (#123)
Implement reset_semantic_tags (#118)
reset_semantic_tags
Implement DataTable getitem (#119)
Add remove_semantic_tags method (#117)
remove_semantic_tags
Add semantic tag selection (#106)
Add github action, rename to woodwork (#113)
Add license to setup.py (#112)
Reset semantic tags on logical type change (#107)
Add standard numeric and category tags (#100)
Change semantic_types to semantic_tags, a set of strings (#100)
semantic_types
semantic_tags
Update dataframe dtypes based on logical types (#94)
Add select_logical_types to DataTable (#96)
select_logical_types
Add pygments to dev-requirements.txt (#97)
Add replacing None with np.nan in DataTable init (#87)
Refactor DataColumn to make semantic_types and logical_type private (#86)
Add pandas_dtype to each Logical Type, and remove dtype attribute on DataColumn (#85)
Add set_semantic_types methods on both DataTable and DataColumn (#75)
Support passing camel case or snake case strings for setting logical types (#74)
Improve flexibility when setting semantic types (#72)
Add Whole Number Inference of Logical Types (#66)
Add dtypes property to DataTables and repr for DataColumn (#61)
dtypes
repr
Allow specification of semantic types during DataTable creation (#69)
Implements set_logical_types on DataTable (#65)
Add init files to tests to fix code coverage (#60)
Add AutoAssign bot (#59)
Add logical types validation in DataTables (#49)
Fix working_directory in CI (#57)
Add infer_logical_types for DataColumn (#45)
infer_logical_types
Fix ReadME library name, and code coverage badge (#56, #56)
Add code coverage (#51)
Improve and refactor the validation checks during initialization of a DataTable (#40)
Add dataframe attribute to DataTable (#39)
Update ReadME with minor usage details (#37)
Add License (#34)
Rename from datatables to datatables (#4)
Add Logical Types, DataTable, DataColumn (#3)
Add Makefile, setup.py, requirements.txt (#2)
Initial Release (#1)