Release Notes

v0.13.0 Feb 16, 2022

Warning

Woodwork may not support Python 3.7 in next non-bugfix release.

  • Enhancements
    • Add validation to EmailAddress logical type (#1247)

    • Add validation to URL logical type (#1285)

    • Add validation to Age, AgeFractional, and AgeNullable logical types (#1289)

  • Fixes
    • Check range length in table stats without producing overflow error (#1287)

    • Fixes issue with initializing Woodwork Series with LatLong values (#1299)

  • Changes
    • Remove framework for unused woodwork CLI (#1288)

    • Add back support for Python 3.7 (#1292)

    • Nested statistical utility functions into directory (#1295)

  • Documentation Changes
    • Updating contributing doc with PATH and JAVA_HOME instructions (#1273)

    • Better install page with new Sphinx extensions for copying and in-line tabs (#1280, #1282)

    • Update README.md with Alteryx link (#1291)

  • Testing Changes
    • Replace mock with unittest.mock (#1304)

Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd

v0.12.0 Jan 27, 2022

  • Enhancements
    • Add Slack link to GitHub issue creation templates (#1242)

  • Fixes
    • Fixed issue with tuples being incorrectly inferred as EmailAddress (#1253)

    • Set high and low bounds to the max and min values if no outliers are present in box_plot_dict (#1269)

  • Changes
    • Prevent setting index that contains null values (#1239)

    • Allow tuple NaN LatLong values (#1255)

    • Update ipython to 7.31.1 (#1258)

    • Temporarily restrict pandas and koalas max versions (#1261)

    • Update to drop Python 3.7 support and add support for pandas version 1.4.0 (#1264)

  • Testing Changes
    • Change auto approve workflow to use PR number (#1240, #1241)

    • Update auto approve workflow to delete branch and change on trigger (#1251)

    • Fix permissions issue with S3 deserialization test (#1238)

Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd

v0.11.2 Jan 28, 2022

  • Fixes
    • Set high and low bounds to the max and min values if no outliers are present in box_plot_dict (backport of #1269)

Thanks to the following people for contributing to this release: @tamargrey

Note

  • The pandas version for Koalas has been restricted, and a change was made to a pandas replace call to account for the recent pandas 1.4.0 release.

v0.11.1 Jan 4, 2022

  • Changes
    • Update inference process to only check for NaturalLanguage if no other type matches are found first (#1234)

  • Documentation Changes
    • Updating contributing doc with Spark installation instructions (#1232)

  • Testing Changes
    • Enable auto-merge for minimum and latest dependency merge requests (#1228, #1230, #1233)

Thanks to the following people for contributing to this release: @gsheni, @thehomebrewnerd, @willsmithorg

v0.11.0 Dec 22, 2021

  • Enhancements
    • Add type inference for natural language (#1210)

  • Changes
    • Make public method get_subset_schema (#1218)

Thanks to the following people for contributing to this release: @jeff-hernandez, @thehomebrewnerd, @tuethan1999

v0.10.0 Nov 30, 2021

  • Enhancements
    • Allow frequency inference on temporal (Datetime, Timedelta) columns of Woodwork DataFrame (#1202)

    • Update describe_dict to compute top_values for double columns that contain only integer values (#1206)

  • Changes
    • Return histogram bins as a list of floats instead of a pandas.Interval object (#1207)

Thanks to the following people for contributing to this release: @tamargrey, @thehomebrewnerd

Breaking Changes

  • :pr:1207: The behavior of describe_dict has changed when using extra_stats=True. Previously, the histogram bins were returned as pandas.Interval objects. This has been updated so that the histogram bins are now represented as a two-element list of floats with the first element being the left edge of the bin and the second element being the right edge.

v0.9.1 Nov 19, 2021

  • Fixes
    • Fix bug that causes mutual_information to fail with certain index types (#1199)

  • Changes
    • Update pip to 21.3.1 for test requirements (#1196)

  • Documentation Changes
    • Update install page with updated minimum optional dependencies (#1193)

Thanks to the following people for contributing to this release: @gsheni, @thehomebrewnerd

v0.9.0 Nov 11, 2021

  • Enhancements
    • Added read_file parameter for replacing empty string values with NaN values (#1161)

  • Fixes
    • Set a maximum version for pyspark until we understand why #1169 failed (#1179)

    • Require newer dask version (#1180)

  • Changes
    • Make box plot low/high indices/values optional to return in box_plot_dict (#1184)

  • Documentation Changes
    • Update docs dependencies (#1176)

  • Testing Changes
    • Add black linting package and remove autopep8 (#1164, #1183)

    • Updated notebook standardizer to standardize python versions (#1166)

Thanks to the following people for contributing to this release: @bchen1116, @davesque, @gsheni, @rwedge, @tamargrey, @thehomebrewnerd

v0.8.2 Oct 12, 2021

  • Fixes
    • Fixed an issue when inferring the format of datetime strings with day of week or meridiem placeholders (#1158)

    • Implements change in Datetime.transform to prevent initialization failure in some cases (#1162)

  • Testing Changes
    • Update reviewers for minimum and latest dependency checkers (#1150)

    • Added notebook standardizer to remove executed outputs (#1153)

Thanks to the following people for contributing to this release: @bchen1116, @davesque, @jeff-hernandez, @thehomebrewnerd

v0.8.1 Sep 16, 2021

  • Changes
    • Update Datetime.transform to use default nrows value when calling _infer_datetime_format (#1137)

  • Documentation Changes
    • Hide spark config in Using Dask and Koalas Guide (#1139)

Thanks to the following people for contributing to this release: @jeff-hernandez, @simha104, @thehomebrewnerd

v0.8.0 Sep 9, 2021

  • Enhancements
    • Add support for automatically inferring the URL and IPAddress logical types (#1122, #1124)

    • Add get_valid_mi_columns method to list columns that have valid logical types for mutual information calculation (#1129)

    • Add attribute to check if column has a nullable logical type (#1127)

  • Changes
    • Update get_invalid_schema_message to improve performance (#1132)

  • Documentation Changes
    • Fix typo in the “Get Started” documentation (#1126)

    • Clean up the logical types guide (#1134)

Thanks to the following people for contributing to this release: @ajaypallekonda, @davesque, @jeff-hernandez, @thehomebrewnerd

v0.7.1 Aug 25, 2021

  • Fixes
    • Validate schema’s index if being used in partial schema init (#1115)

    • Allow falsy index, time index, and name values to be set along with partial schema at init (#1115)

Thanks to the following people for contributing to this release: @tamargrey

v0.7.0 Aug 25, 2021

  • Enhancements
    • Add 'passthrough' and 'ignore' to tags in list_semantic_tags (#1094)

    • Add initialize with partial table schema (#1100)

    • Apply ordering specified by the Ordinal logical type to underlying series (#1097)

    • Add AgeFractional logical type (#1112)

Thanks to the following people for contributing to this release: @davesque, @jeff-hernandez, @tamargrey, @tuethan1999

Breaking Changes

  • :pr:1100: The behavior for init has changed. A full schema is a schema that contains all of the columns of the dataframe it describes whereas a partial schema only contains a subset. A full schema will also require that the schema is valid without having to make any changes to the DataFrame. Before, only a full schema was permitted by the init method so passing a partial schema would error. Additionally, any parameters like logical_types would be ignored if passing in a schema. Now, passing a partial schema to the init method calls the init_with_partial_schema method instead of throwing an error. Information from keyword arguments will override information from the partial schema. For example, if column a has the Integer Logical Type in the partial schema, it’s possible to use the logical_type argument to reinfer it’s logical type by passing {'a': None} or force a type by passing in {'a': Double}. These changes mean that Woodwork init is less restrictive. If no type inference takes place and no changes are required of the DataFrame at initialization, init_with_full_schema should be used instead of init. init_with_full_schema maintains the same functionality as when a schema was passed to the old init.

v0.6.0 Aug 4, 2021

  • Fixes
    • Fix bug in _infer_datetime_format with all np.nan input (#1089)

  • Changes
    • The criteria for categorical type inference have changed (#1065)

    • The meaning of both the categorical_threshold and numeric_categorical_threshold settings have changed (#1065)

    • Make sampling for type inference more consistent (#1083)

    • Accessor logic checking if Woodwork has been initialized moved to decorator (#1093)

  • Documentation Changes
    • Fix some release notes that ended up under the wrong release (#1082)

    • Add BooleanNullable and IntegerNullable types to the docs (#1085)

    • Add guide for saving and loading Woodwork DataFrames (#1066)

    • Add in-depth guide on logical types and semantic tags (#1086)

  • Testing Changes
    • Add additional reviewers to minimum and latest dependency checkers (#1070, #1073, #1077)

    • Update the sample_df fixture to have more logical_type coverage (#1058)

Thanks to the following people for contributing to this release: @davesque, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999

Breaking Changes

  • #1065: The criteria for categorical type inference have changed. Relatedly, the meaning of both the categorical_threshold and numeric_categorical_threshold settings have changed. Now, a categorical match is signaled when a series either has the “categorical” pandas dtype or if the ratio of unique value count (nan excluded) and total value count (nan also excluded) is below or equal to some fraction. The value used for this fraction is set by the categorical_threshold setting which now has a default value of 0.2. If a fraction is set for the numeric_categorical_threshold setting, then series with either a float or integer dtype may be inferred as categorical by applying the same logic described above with the numeric_categorical_threshold fraction. Otherwise, the numeric_categorical_threshold setting defaults to None which indicates that series with a numerical type should not be inferred as categorical. Users who have overridden either the categorical_threshold or numeric_categorical_threshold settings will need to adjust their settings accordingly.

  • #1083: The process of sampling series for logical type inference was updated to be more consistent. Before, initial sampling for inference differed depending on collection type (pandas, dask, or koalas). Also, further randomized subsampling was performed in some cases during categorical inference and in every case during email inference regardless of collection type. Overall, the way sampling was done was inconsistent and unpredictable. Now, the first 100,000 records of a column are sampled for logical type inference regardless of collection type although only records from the first partition of a dask dataset will be used. Subsampling performed by the inference functions of individual types has been removed. The effect of these changes is that inferred types may now be different although in many cases they will be more correct.

v0.5.1 Jul 22, 2021

  • Enhancements
    • Store inferred datetime format on Datetime logical type instance (#1025)

    • Add support for automatically inferring the EmailAddress logical type (#1047)

    • Add feature origin attribute to schema (#1056)

    • Add ability to calculate outliers and the statistical info required for box and whisker plots to WoodworkColumnAccessor (#1048)

    • Add ability to change config settings in a with block with ww.config.with_options (#1062)

  • Fixes
    • Raises warning and removes tags when user adds a column with index tags to DataFrame (#1035)

  • Changes
    • Entirely null columns are now inferred as the Unknown logical type (#1043)

    • Add helper functions that check for whether an object is a koalas/dask series or dataframe (#1055)

    • TableAccessor.select method will now maintain dataframe column ordering in TableSchema columns (#1052)

  • Documentation Changes
    • Add supported types to metadata docstring (#1049)

Thanks to the following people for contributing to this release: @davesque, @frances-h, @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd

v0.5.0 Jul 7, 2021

  • Enhancements
    • Add support for numpy array inputs to Woodwork (#1023)

    • Add support for pandas.api.extensions.ExtensionArray inputs to Woodwork (#1026)

  • Fixes
    • Add input validation to ww.init_series (#1015)

  • Changes
    • Remove lines in LogicalType.transform that raise error if dtype conflicts (#1012)

    • Add infer_datetime_format param to speed up to_datetime calls (#1016)

    • The default logical type is now the Unknown type instead of the NaturalLanguage type (#992)

    • Add pandas 1.3.0 compatibility (#987)

Thanks to the following people for contributing to this release: @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd, @tuethan1999

Breaking Changes

  • The default logical type is now the Unknown type instead of the NaturalLanguage type. The global config natural_language_threshold has been renamed to categorical_threshold.

v0.4.2 Jun 23, 2021

  • Enhancements
    • Pass additional progress information in callback functions (#979)

    • Add the ability to generate optional extra stats with DataFrame.ww.describe_dict (#988)

    • Add option to read and write orc files (#997)

    • Retain schema when calling series.ww.to_frame() (#1004)

  • Fixes
    • Raise type conversion error in Datetime logical type (#1001)

    • Try collections.abc to avoid deprecation warning (#1010)

  • Changes
    • Remove make_index parameter from DataFrame.ww.init (#1000)

    • Remove version restriction for dask requirements (#998)

  • Documentation Changes
    • Add instructions for installing the update checker (#993)

    • Disable pdf format with documentation build (#1002)

    • Silence deprecation warnings in documentation build (#1008)

    • Temporarily remove update checker to fix docs warnings (#1011)

  • Testing Changes
    • Add env setting to update checker (#978, #994)

Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd, @tuethan1999

Breaking Changes

  • Progress callback functions parameters have changed and progress is now being reported in the units specified by the unit of measurement parameter instead of percentage of total. Progress callback functions now are expected to accept the following five parameters:

    • progress increment since last call

    • progress units complete so far

    • total units to complete

    • the progress unit of measurement

    • time elapsed since start of calculation

  • DataFrame.ww.init no longer accepts the make_index parameter

v0.4.1 Jun 9, 2021

  • Enhancements
    • Add concat_columns util function to concatenate multiple Woodwork objects into one, retaining typing information (#932)

    • Add option to pass progress callback function to mutual information functions (#958)

    • Add optional automatic update checker (#959, #970)

  • Fixes
    • Fix issue related to serialization/deserialization of data with whitespace and newline characters (#957)

    • Update to allow initializing a ColumnSchema object with an Ordinal logical type without order values (#972)

  • Changes
    • Change write_dataframe to only copy dataframe if it contains LatLong (#955)

  • Testing Changes
    • Fix bug in test_list_logical_types_default (#954)

    • Update minimum unit tests to run on all pull requests (#952)

    • Pass token to authorize uploading of codecov reports (#969)

Thanks to the following people for contributing to this release: @frances-h, @gsheni, @tamargrey, @thehomebrewnerd

v0.4.0 May 26, 2021

  • Enhancements
    • Add option to return TableSchema instead of DataFrame from table accessor select method (#916)

    • Add option to read and write arrow/feather files (#948)

    • Add dropping and renaming columns inplace (#920)

    • Add option to pass progress callback function to mutual information functions (#943)

  • Fixes
    • Fix bug when setting table name and metadata through accessor (#942)

    • Fix bug in which the dtype of category values were not restored properly on deserialization (#949)

  • Changes
    • Add logical type method to transform data (#915)

  • Testing Changes
    • Update when minimum unit tests will run to include minimum text files (#917)

    • Create separate workflows for each CI job (#919)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @thehomebrewnerd, @tuethan1999

v0.3.1 May 12, 2021

Warning

This Woodwork release uses a weak reference for maintaining a reference from the accessor to the DataFrame. Because of this, chaining a Woodwork call onto another call that creates a new DataFrame or Series object can be problematic.

Instead of calling pd.DataFrame({'id':[1, 2, 3]}).ww.init(), first store the DataFrame in a new variable and then initialize Woodwork:

df = pd.DataFrame({'id':[1, 2, 3]})
df.ww.init()
  • Enhancements
    • Add deep parameter to Woodwork Accessor and Schema equality checks (#889)

    • Add support for reading from parquet files to woodwork.read_file (#909)

  • Changes
    • Remove command line functions for list logical and semantic tags (#891)

    • Keep index and time index tags for single column when selecting from a table (#888)

    • Update accessors to store weak reference to data (#894)

  • Documentation Changes
    • Update nbsphinx version to fix docs build issue (#911, #913)

  • Testing Changes
    • Use Minimum Dependency Generator GitHub Action and remove tools folder (#897)

    • Move all latest and minimum dependencies into 1 folder (#912)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd

Breaking Changes

  • The command line functions python -m woodwork list-logical-types and python -m woodwork list-semantic-tags no longer exist. Please call the underlying Python functions ww.list_logical_types() and ww.list_semantic_tags().

v0.3.0 May 3, 2021

  • Enhancements
    • Add is_schema_valid and get_invalid_schema_message functions for checking schema validity (#834)

    • Add logical type for Age and AgeNullable (#849)

    • Add logical type for Address (#858)

    • Add generic to_disk function to save Woodwork schema and data (#872)

    • Add generic read_file function to read file as Woodwork DataFrame (#878)

  • Fixes
    • Raise error when a column is set as the index and time index (#859)

    • Allow NaNs in index for schema validation check (#862)

    • Fix bug where invalid casting to Boolean would not raise error (#863)

  • Changes
    • Consistently use ColumnNotPresentError for mismatches between user input and dataframe/schema columns (#837)

    • Raise custom WoodworkNotInitError when accessing Woodwork attributes before initialization (#838)

    • Remove check requiring Ordinal instance for initializing a ColumnSchema object (#870)

    • Increase koalas min version to 1.8.0 (#885)

  • Documentation Changes
    • Improve formatting of release notes (#874)

  • Testing Changes
    • Remove unnecessary argument in codecov upload job (#853)

    • Change from GitHub Token to regenerated GitHub PAT dependency checkers (#855)

    • Update README.md with non-nullable dtypes in code example (#856)

Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd

Breaking Changes

  • Woodwork tables can no longer be saved using to disk df.ww.to_csv, df.ww.to_pickle, or df.ww.to_parquet. Use df.ww.to_disk instead.

  • The read_csv function has been replaced by read_file.

v0.2.0 Apr 20, 2021

Warning

This Woodwork release does not support Python 3.6

  • Enhancements
    • Add validation control to WoodworkTableAccessor (#736)

    • Store make_index value on WoodworkTableAccessor (#780)

    • Add optional exclude parameter to WoodworkTableAccessor select method (#783)

    • Add validation control to deserialize.read_woodwork_table and ww.read_csv (#788)

    • Add WoodworkColumnAccessor.schema and handle copying column schema (#799)

    • Allow initializing a WoodworkColumnAccessor with a ColumnSchema (#814)

    • Add __repr__ to ColumnSchema (#817)

    • Add BooleanNullable and IntegerNullable logical types (#830)

    • Add validation control to WoodworkColumnAccessor (#833)

  • Changes
    • Rename FullName logical type to PersonFullName (#740)

    • Rename ZIPCode logical type to PostalCode (#741)

    • Fix issue with smart-open version 5.0.0 (#750, #758)

    • Update minimum scikit-learn version to 0.22 (#763)

    • Drop support for Python version 3.6 (#768)

    • Remove ColumnNameMismatchWarning (#777)

    • get_column_dict does not use standard tags by default (#782)

    • Make logical_type and name params to _get_column_dict optional (#786)

    • Rename Schema object and files to match new table-column schema structure (#789)

    • Store column typing information in a ColumnSchema object instead of a dictionary (#791)

    • TableSchema does not use standard tags by default (#806)

    • Store use_standard_tags on the ColumnSchema instead of the TableSchema (#809)

    • Move functions in column_schema.py to be methods on ColumnSchema (#829)

  • Documentation Changes
    • Update Pygments version requirement (#751)

    • Update spark config for docs build (#787, #801, #810)

  • Testing Changes
    • Add unit tests against minimum dependencies for python 3.6 on PRs and main (#743, #753, #763)

    • Update spark config for test fixtures (#787)

    • Separate latest unit tests into pandas, dask, koalas (#813)

    • Update latest dependency checker to generate separate core, koalas, and dask dependencies (#815, #825)

    • Ignore latest dependency branch when checking for updates to the release notes (#827)

    • Change from GitHub PAT to auto generated GitHub Token for dependency checker (#831)

    • Expand ColumnSchema semantic tag testing coverage and null logical_type testing coverage (#832)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd

Breaking Changes

  • The ZIPCode logical type has been renamed to PostalCode

  • The FullName logical type has been renamed to PersonFullName

  • The Schema object has been renamed to TableSchema

  • With the ColumnSchema object, typing information for a column can no longer be accessed with df.ww.columns[col_name]['logical_type']. Instead use df.ww.columns[col_name].logical_type.

  • The Boolean and Integer logical types will no longer work with data that contains null values. The new BooleanNullable and IntegerNullable logical types should be used if null values are present.

v0.1.0 Mar 22, 2021

  • Enhancements
    • Implement Schema and Accessor API (#497)

    • Add Schema class that holds typing info (#499)

    • Add WoodworkTableAccessor class that performs type inference and stores Schema (#514)

    • Allow initializing Accessor schema with a valid Schema object (#522)

    • Add ability to read in a csv and create a DataFrame with an initialized Woodwork Schema (#534)

    • Add ability to call pandas methods from Accessor (#538, #589)

    • Add helpers for checking if a column is one of Boolean, Datetime, numeric, or categorical (#553)

    • Add ability to load demo retail dataset with a Woodwork Accessor (#556)

    • Add select to WoodworkTableAccessor (#548)

    • Add mutual_information to WoodworkTableAccessor (#571)

    • Add WoodworkColumnAccessor class (#562)

    • Add semantic tag update methods to column accessor (#573)

    • Add describe and describe_dict to WoodworkTableAccessor (#579)

    • Add init_series util function for initializing a series with dtype change (#581)

    • Add set_logical_type method to WoodworkColumnAccessor (#590)

    • Add semantic tag update methods to table schema (#591)

    • Add warning if additional parameters are passed along with schema (#593)

    • Better warning when accessing column properties before init (#596)

    • Update column accessor to work with LatLong columns (#598)

    • Add set_index to WoodworkTableAccessor (#603)

    • Implement loc and iloc for WoodworkColumnAccessor (#613)

    • Add set_time_index to WoodworkTableAccessor (#612)

    • Implement loc and iloc for WoodworkTableAccessor (#618)

    • Allow updating logical types with set_types and make relevant DataFrame changes (#619)

    • Allow serialization of WoodworkColumnAccessor to csv, pickle, and parquet (#624)

    • Add DaskColumnAccessor (#625)

    • Allow deserialization from csv, pickle, and parquet to Woodwork table (#626)

    • Add value_counts to WoodworkTableAccessor (#632)

    • Add KoalasColumnAccessor (#634)

    • Add pop to WoodworkTableAccessor (#636)

    • Add drop to WoodworkTableAccessor (#640)

    • Add rename to WoodworkTableAccessor (#646)

    • Add DaskTableAccessor (#648)

    • Add Schema properties to WoodworkTableAccessor (#651)

    • Add KoalasTableAccessor (#652)

    • Adds __getitem__ to WoodworkTableAccessor (#633)

    • Update Koalas min version and add support for more new pandas dtypes with Koalas (#678)

    • Adds __setitem__ to WoodworkTableAccessor (#669)

  • Fixes
    • Create new Schema object when performing pandas operation on Accessors (#595)

    • Fix bug in _reset_semantic_tags causing columns to share same semantic tags set (#666)

    • Maintain column order in DataFrame and Woodwork repr (#677)

  • Changes
    • Move mutual information logic to statistics utils file (#584)

    • Bump min Koalas version to 1.4.0 (#638)

    • Preserve pandas underlying index when not creating a Woodwork index (#664)

    • Restrict Koalas version to <1.7.0 due to breaking changes (#674)

    • Clean up dtype usage across Woodwork (#682)

    • Improve error when calling accessor properties or methods before init (#683)

    • Remove dtype from Schema dictionary (#685)

    • Add include_index param and allow unique columns in Accessor mutual information (#699)

    • Include DataFrame equality and use_standard_tags in WoodworkTableAccessor equality check (#700)

    • Remove DataTable and DataColumn classes to migrate towards the accessor approach (#713)

    • Change sample_series dtype to not need conversion and remove convert_series util (#720)

    • Rename Accessor methods since DataTable has been removed (#723)

  • Documentation Changes
    • Update README.md and Get Started guide to use accessor (#655, #717)

    • Update Understanding Types and Tags guide to use accessor (#657)

    • Update docstrings and API Reference page (#660)

    • Update statistical insights guide to use accessor (#693)

    • Update Customizing Type Inference guide to use accessor (#696)

    • Update Dask and Koalas guide to use accessor (#701)

    • Update index notebook and install guide to use accessor (#715)

    • Add section to documentation about schema validity (#729)

    • Update README.md and Get Started guide to use pd.read_csv (#730)

    • Make small fixes to documentation formatting (#731)

  • Testing Changes
    • Add tests to Accessor/Schema that weren’t previously covered (#712, #716)

    • Update release branch name in notes update check (#719)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey, @thehomebrewnerd

Breaking Changes

  • The DataTable and DataColumn classes have been removed and replaced by new WoodworkTableAccessor and WoodworkColumnAccessor classes which are used through the ww namespace available on DataFrames after importing Woodwork.

v0.0.11 Mar 15, 2021

  • Changes
    • Restrict Koalas version to <1.7.0 due to breaking changes (#674)

    • Include unique columns in mutual information calculations (#687)

    • Add parameter to include index column in mutual information calculations (#692)

  • Documentation Changes
    • Update to remove warning message from statistical insights guide (#690)

  • Testing Changes
    • Update branch reference in tests to run on main (#641)

    • Make release notes updated check separate from unit tests (#642)

    • Update release branch naming instructions (#644)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.10 Feb 25, 2021

  • Changes
    • Avoid calculating mutualinfo for non-unique columns (#563)

    • Preserve underlying DataFrame index if index column is not specified (#588)

    • Add blank issue template for creating issues (#630)

  • Testing Changes
    • Update branch reference in tests workflow (#552, #601)

    • Fixed text on back arrow on install page (#564)

    • Refactor test_datatable.py (#574)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey

v0.0.9 Feb 5, 2021

  • Enhancements
    • Add Python 3.9 support without Koalas testing (#511)

    • Add get_valid_mi_types function to list LogicalTypes valid for mutual information calculation (#517)

  • Fixes
    • Handle missing values in Datetime columns when calculating mutual information (#516)

    • Support numpy 1.20.0 by restricting version for koalas and changing serialization error message (#532)

    • Move Koalas option setting to DataTable init instead of import (#543)

  • Documentation Changes
    • Add Alteryx OSS Twitter link (#519)

    • Update logo and add new favicon (#521)

    • Multiple improvements to Getting Started page and guides (#527)

    • Clean up API Reference and docstrings (#536)

    • Added Open Graph for Twitter and Facebook (#544)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.8 Jan 25, 2021

  • Enhancements
    • Add DataTable.df property for accessing the underling DataFrame (#470)

    • Set index of underlying DataFrame to match DataTable index (#464)

  • Fixes
    • Sort underlying series when sorting dataframe (#468)

    • Allow setting indices to current index without side effects (#474)

  • Changes
    • Fix release document with Github Actions link for CI (#462)

    • Don’t allow registered LogicalTypes with the same name (#477)

    • Move str_to_logical_type to TypeSystem class (#482)

    • Remove pyarrow from core dependencies (#508)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.7 Dec 14, 2020

  • Enhancements
    • Allow for user-defined logical types and inference functions in TypeSystem object (#424)

    • Add __repr__ to DataTable (#425)

    • Allow initializing DataColumn with numpy array (#430)

    • Add drop to DataTable (#434)

    • Migrate CI tests to Github Actions (#417, #441, #451)

    • Add metadata to DataColumn for user-defined metadata (#447)

  • Fixes
    • Update DataColumn name when using setitem on column with no name (#426)

    • Don’t allow pickle serialization for Koalas DataFrames (#432)

    • Check DataTable metadata in equality check (#449)

    • Propagate all attributes of DataTable in _new_dt_including (#454)

  • Changes
    • Update links to use alteryx org Github URL (#423)

    • Support column names of any type allowed by the underlying DataFrame (#442)

    • Use object dtype for LatLong columns for easy access to latitude and longitude values (#414)

    • Restrict dask version to prevent 2020.12.0 release from being installed (#453)

    • Lower minimum requirement for numpy to 1.15.4, and set pandas minimum requirement 1.1.1 (#459)

  • Testing Changes
    • Fix missing test coverage (#436)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd

v0.0.6 Nov 30, 2020

  • Enhancements
    • Add support for creating DataTable from Koalas DataFrame (#327)

    • Add ability to initialize DataTable with numpy array (#367)

    • Add describe_dict method to DataTable (#405)

    • Add mutual_information_dict method to DataTable (#404)

    • Add metadata to DataTable for user-defined metadata (#392)

    • Add update_dataframe method to DataTable to update underlying DataFrame (#407)

    • Sort dataframe if time_index is specified, bypass sorting with already_sorted parameter. (#410)

    • Add description attribute to DataColumn (#416)

    • Implement DataColumn.__len__ and DataTable.__len__ (#415)

  • Fixes
    • Rename data_column.py datacolumn.py (#386)

    • Rename data_table.py datatable.py (#387)

    • Rename get_mutual_information mutual_information (#390)

  • Changes
    • Lower moto test requirement for serialization/deserialization (#376)

    • Make Koalas an optional dependency installable with woodwork[koalas] (#378)

    • Remove WholeNumber LogicalType from Woodwork (#380)

    • Updates to LogicalTypes to support Koalas 1.4.0 (#393)

    • Replace set_logical_types and set_semantic_tags with just set_types (#379)

    • Remove copy_dataframe parameter from DataTable initialization (#398)

    • Implement DataTable.__sizeof__ to return size of the underlying dataframe (#401)

    • Include Datetime columns in mutual info calculation (#399)

    • Maintain column order on DataTable operations (#406)

  • Testing Changes
    • Add pyarrow, dask, and koalas to automated dependency checks (#388)

    • Use new version of pull request Github Action (#394)

    • Improve parameterization for test_datatable_equality (#409)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

Breaking Changes

  • The DataTable.set_semantic_tags method was removed. DataTable.set_types can be used instead.

  • The DataTable.set_logical_types method was removed. DataTable.set_types can be used instead.

  • WholeNumber was removed from LogicalTypes. Columns that were previously inferred as WholeNumber will now be inferred as Integer.

  • The DataTable.get_mutual_information was renamed to DataTable.mutual_information.

  • The copy_dataframe parameter was removed from DataTable initialization.

v0.0.5 Nov 11, 2020

  • Enhancements
    • Add __eq__ to DataTable and DataColumn and update LogicalType equality (#318)

    • Add value_counts() method to DataTable (#342)

    • Support serialization and deserialization of DataTables via csv, pickle, or parquet (#293)

    • Add shape property to DataTable and DataColumn (#358)

    • Add iloc method to DataTable and DataColumn (#365)

    • Add numeric_categorical_threshold config value to allow inferring numeric columns as Categorical (#363)

    • Add rename method to DataTable (#367)

  • Fixes
    • Catch non numeric time index at validation (#332)

  • Changes
    • Support logical type inference from a Dask DataFrame (#248)

    • Fix validation checks and make_index to work with Dask DataFrames (#260)

    • Skip validation of Ordinal order values for Dask DataFrames (#270)

    • Improve support for datetimes with Dask input (#286)

    • Update DataTable.describe to work with Dask input (#296)

    • Update DataTable.get_mutual_information to work with Dask input (#300)

    • Modify to_pandas function to return DataFrame with correct index (#281)

    • Rename DataColumn.to_pandas method to DataColumn.to_series (#311)

    • Rename DataTable.to_pandas method to DataTable.to_dataframe (#319)

    • Remove UserWarning when no matching columns found (#325)

    • Remove copy parameter from DataTable.to_dataframe and DataColumn.to_series (#338)

    • Allow pandas ExtensionArrays as inputs to DataColumn (#343)

    • Move warnings to a separate exceptions file and call via UserWarning subclasses (#348)

    • Make Dask an optional dependency installable with woodwork[dask] (#357)

  • Documentation Changes
    • Create a guide for using Woodwork with Dask (#304)

    • Add conda install instructions (#305, #309)

    • Fix README.md badge with correct link (#314)

    • Simplify issue templates to make them easier to use (#339)

    • Remove extra output cell in Start notebook (#341)

  • Testing Changes
    • Parameterize numeric time index tests (#288)

    • Add DockerHub credentials to CI testing environment (#326)

    • Fix removing files for serialization test (#350)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

Breaking Changes

  • The DataColumn.to_pandas method was renamed to DataColumn.to_series.

  • The DataTable.to_pandas method was renamed to DataTable.to_dataframe.

  • copy is no longer a parameter of DataTable.to_dataframe or DataColumn.to_series.

v0.0.4 Oct 21, 2020

  • Enhancements
    • Add optional include parameter for DataTable.describe() to filter results (#228)

    • Add make_index parameter to DataTable.__init__ to enable optional creation of a new index column (#238)

    • Add support for setting ranking order on columns with Ordinal logical type (#240)

    • Add list_semantic_tags function and CLI to get dataframe of woodwork semantic_tags (#244)

    • Add support for numeric time index on DataTable (#267)

    • Add pop method to DataTable (#289)

    • Add entry point to setup.py to run CLI commands (#285)

  • Fixes
    • Allow numeric datetime time indices (#282)

  • Changes
    • Remove redundant methods DataTable.select_ltypes and DataTable.select_semantic_tags (#239)

    • Make results of get_mutual_information more clear by sorting and removing self calculation (#247)

    • Lower minimum scikit-learn version to 0.21.3 (#297)

  • Documentation Changes
    • Add guide for dt.describe and dt.get_mutual_information (#245)

    • Update README.md with documentation link (#261)

    • Add footer to doc pages with Alteryx Open Source (#258)

    • Add types and tags one-sentence definitions to Understanding Types and Tags guide (#271)

    • Add issue and pull request templates (#280, #284)

  • Testing Changes
    • Add automated process to check latest dependencies. (#268)

    • Add test for setting a time index with specified string logical type (#279)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

v0.0.3 Oct 9, 2020

  • Enhancements
    • Implement setitem on DataTable to create/overwrite an existing DataColumn (#165)

    • Add to_pandas method to DataColumn to access the underlying series (#169)

    • Add list_logical_types function and CLI to get dataframe of woodwork LogicalTypes (#172)

    • Add describe method to DataTable to generate statistics for the underlying data (#181)

    • Add optional return_dataframe parameter to load_retail to return either DataFrame or DataTable (#189)

    • Add get_mutual_information method to DataTable to generate mutual information between columns (#203)

    • Add read_csv function to create DataTable directly from CSV file (#222)

  • Fixes
    • Fix bug causing incorrect values for quartiles in DataTable.describe method (#187)

    • Fix bug in DataTable.describe that could cause an error if certain semantic tags were applied improperly (#190)

    • Fix bug with instantiated LogicalTypes breaking when used with issubclass (#231)

  • Changes
    • Remove unnecessary add_standard_tags attribute from DataTable (#171)

    • Remove standard tags from index column and do not return stats for index column from DataTable.describe (#196)

    • Update DataColumn.set_semantic_tags and DataColumn.add_semantic_tags to return new objects (#205)

    • Update various DataTable methods to return new objects rather than modifying in place (#210)

    • Move datetime_format to Datetime LogicalType (#216)

    • Do not calculate mutual info with index column in DataTable.get_mutual_information (#221)

    • Move setting of underlying physical types from DataTable to DataColumn (#233)

  • Documentation Changes
    • Remove unused code from sphinx conf.py, update with Github URL(#160, #163)

    • Update README and docs with new Woodwork logo, with better code snippets (#161, #159)

    • Add DataTable and DataColumn to API Reference (#162)

    • Add docstrings to LogicalType classes (#168)

    • Add Woodwork image to index, clear outputs of Jupyter notebook in docs (#173)

    • Update contributing.md, release.md with all instructions (#176)

    • Add section for setting index and time index to start notebook (#179)

    • Rename changelog to Release Notes (#193)

    • Add section for standard tags to start notebook (#188)

    • Add Understanding Types and Tags user guide (#201)

    • Add missing docstring to list_logical_types (#202)

    • Add Woodwork Global Configuration Options guide (#215)

  • Testing Changes
    • Add tests that confirm dtypes are as expected after DataTable init (#152)

    • Remove unused none_df test fixture (#224)

    • Add test for LogicalType.__str__ method (#225)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.2 Sep 28, 2020

  • Fixes
    • Fix formatting issue when printing global config variables (#138)

  • Changes
    • Change add_standard_tags to use_standard_Tags to better describe behavior (#149)

    • Change access of underlying dataframe to be through to_pandas with ._dataframe field on class (#146)

    • Remove replace_none parameter to DataTables (#146)

  • Documentation Changes
    • Add working code example to README and create Using Woodwork page (#103)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.1.0 Sep 24, 2020

  • Add natural_language_threshold global config option used for Categorical/NaturalLanguage type inference (#135)

  • Add global config options and add datetime_format option for type inference (#134)

  • Fix bug with Integer and WholeNumber inference in column with pd.NA values (#133)

  • Add DataTable.ltypes property to return series of logical types (#131)

  • Add ability to create new datatable from specified columns with dt[[columns]] (#127)

  • Handle setting and tagging of index and time index columns (#125)

  • Add combined tag and ltype selection (#124)

  • Add changelog, and update changelog check to CI (#123)

  • Implement reset_semantic_tags (#118)

  • Implement DataTable getitem (#119)

  • Add remove_semantic_tags method (#117)

  • Add semantic tag selection (#106)

  • Add github action, rename to woodwork (#113)

  • Add license to setup.py (#112)

  • Reset semantic tags on logical type change (#107)

  • Add standard numeric and category tags (#100)

  • Change semantic_types to semantic_tags, a set of strings (#100)

  • Update dataframe dtypes based on logical types (#94)

  • Add select_logical_types to DataTable (#96)

  • Add pygments to dev-requirements.txt (#97)

  • Add replacing None with np.nan in DataTable init (#87)

  • Refactor DataColumn to make semantic_types and logical_type private (#86)

  • Add pandas_dtype to each Logical Type, and remove dtype attribute on DataColumn (#85)

  • Add set_semantic_types methods on both DataTable and DataColumn (#75)

  • Support passing camel case or snake case strings for setting logical types (#74)

  • Improve flexibility when setting semantic types (#72)

  • Add Whole Number Inference of Logical Types (#66)

  • Add dtypes property to DataTables and repr for DataColumn (#61)

  • Allow specification of semantic types during DataTable creation (#69)

  • Implements set_logical_types on DataTable (#65)

  • Add init files to tests to fix code coverage (#60)

  • Add AutoAssign bot (#59)

  • Add logical types validation in DataTables (#49)

  • Fix working_directory in CI (#57)

  • Add infer_logical_types for DataColumn (#45)

  • Fix ReadME library name, and code coverage badge (#56, #56)

  • Add code coverage (#51)

  • Improve and refactor the validation checks during initialization of a DataTable (#40)

  • Add dataframe attribute to DataTable (#39)

  • Update ReadME with minor usage details (#37)

  • Add License (#34)

  • Rename from datatables to datatables (#4)

  • Add Logical Types, DataTable, DataColumn (#3)

  • Add Makefile, setup.py, requirements.txt (#2)

  • Initial Release (#1)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd