Release Notes

v0.1.0 March 22, 2021
  • Enhancements
    • Implement Schema and Accessor API (#497)

    • Add Schema class that holds typing info (#499)

    • Add WoodworkTableAccessor class that performs type inference and stores Schema (#514)

    • Allow initializing Accessor schema with a valid Schema object (#522)

    • Add ability to read in a csv and create a DataFrame with an initialized Woodwork Schema (#534)

    • Add ability to call pandas methods from Accessor (#538, #589)

    • Add helpers for checking if a column is one of Boolean, Datetime, numeric, or categorical (#553)

    • Add ability to load demo retail dataset with a Woodwork Accessor (#556)

    • Add select to WoodworkTableAccessor (#548)

    • Add mutual_information to WoodworkTableAccessor (#571)

    • Add WoodworkColumnAccessor class (#562)

    • Add semantic tag update methods to column accessor (#573)

    • Add describe and describe_dict to WoodworkTableAccessor (#579)

    • Add init_series util function for initializing a series with dtype change (#581)

    • Add set_logical_type method to WoodworkColumnAccessor (#590)

    • Add semantic tag update methods to table schema (#591)

    • Add warning if additional parameters are passed along with schema (#593)

    • Better warning when accessing column properties before init (#596)

    • Update column accessor to work with LatLong columns (#598)

    • Add set_index to WoodworkTableAccessor (#603)

    • Implement loc and iloc for WoodworkColumnAccessor (#613)

    • Add set_time_index to WoodworkTableAccessor (#612)

    • Implement loc and iloc for WoodworkTableAccessor (#618)

    • Allow updating logical types with set_types and make relevant DataFrame changes (#619)

    • Allow serialization of WoodworkColumnAccessor to csv, pickle, and parquet (#624)

    • Add DaskColumnAccessor (#625)

    • Allow deserialization from csv, pickle, and parquet to Woodwork table (#626)

    • Add value_counts to WoodworkTableAccessor (#632)

    • Add KoalasColumnAccessor (#634)

    • Add pop to WoodworkTableAccessor (#636)

    • Add drop to WoodworkTableAccessor (#640)

    • Add rename to WoodworkTableAccessor (#646)

    • Add DaskTableAccessor (#648)

    • Add Schema properties to WoodworkTableAccessor (#651)

    • Add KoalasTableAccessor (#652)

    • Adds __getitem__ to WoodworkTableAccessor (#633)

    • Update Koalas min version and add support for more new pandas dtypes with Koalas (#678)

    • Adds __setitem__ to WoodworkTableAccessor (#669)

  • Fixes
    • Create new Schema object when performing pandas operation on Accessors (#595)

    • Fix bug in _reset_semantic_tags causing columns to share same semantic tags set (#666)

    • Maintain column order in DataFrame and Woodwork repr (#677)

  • Changes
    • Move mutual information logic to statistics utils file (#584)

    • Bump min Koalas version to 1.4.0 (#638)

    • Preserve pandas underlying index when not creating a Woodwork index (#664)

    • Restrict Koalas version to <1.7.0 due to breaking changes (#674)

    • Clean up dtype usage across Woodwork (#682)

    • Improve error when calling accessor properties or methods before init (#683)

    • Remove dtype from Schema dictionary (#685)

    • Add include_index param and allow unique columns in Accessor mutual information (#699)

    • Include DataFrame equality and use_standard_tags in WoodworkTableAccessor equality check (#700)

    • Remove DataTable and DataColumn classes to migrate towards the accessor approach (#713)

    • Change sample_series dtype to not need conversion and remove convert_series util (#720)

    • Rename Accessor methods since DataTable has been removed (#723)

  • Documentation Changes
    • Update README.md and Get Started guide to use accessor (#655, #717)

    • Update Understanding Types and Tags guide to use accessor (#657)

    • Update docstrings and API Reference page (#660)

    • Update statistical insights guide to use accessor (#693)

    • Update Customizing Type Inference guide to use accessor (#696)

    • Update Dask and Koalas guide to use accessor (#701)

    • Update index notebook and install guide to use accessor (#715)

    • Add section to documentation about schema validity (#729)

    • Update README.md and Get Started guide to use pd.read_csv (#730)

    • Make small fixes to documentation formatting (#731)

  • Testing Changes
    • Add tests to Accessor/Schema that weren’t previously covered (#712, #716)

    • Update release branch name in notes update check (#719)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey, @thehomebrewnerd

Breaking Changes
  • The DataTable and DataColumn classes have been removed and replaced by new WoodworkTableAccessor and WoodworkColumnAccessor classes which are used through the ww namespace available on DataFrames after importing Woodwork.

v0.0.11 March 15, 2021
  • Changes
    • Restrict Koalas version to <1.7.0 due to breaking changes (#674)

    • Include unique columns in mutual information calculations (#687)

    • Add parameter to include index column in mutual information calculations (#692)

  • Documentation Changes
    • Update to remove warning message from statistical insights guide (#690)

  • Testing Changes
    • Update branch reference in tests to run on main (#641)

    • Make release notes updated check separate from unit tests (#642)

    • Update release branch naming instructions (#644)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.10 February 25, 2021
  • Changes
    • Avoid calculating mutualinfo for non-unique columns (#563)

    • Preserve underlying DataFrame index if index column is not specified (#588)

    • Add blank issue template for creating issues (#630)

  • Testing Changes
    • Update branch reference in tests workflow (#552, #601)

    • Fixed text on back arrow on install page (#564)

    • Refactor test_datatable.py (#574)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey

v0.0.9 February 5, 2021
  • Enhancements
    • Add Python 3.9 support without Koalas testing (#511)

    • Add get_valid_mi_types function to list LogicalTypes valid for mutual information calculation (#517)

  • Fixes
    • Handle missing values in Datetime columns when calculating mutual information (#516)

    • Support numpy 1.20.0 by restricting version for koalas and changing serialization error message (#532)

    • Move Koalas option setting to DataTable init instead of import (#543)

  • Documentation Changes
    • Add Alteryx OSS Twitter link (#519)

    • Update logo and add new favicon (#521)

    • Multiple improvements to Getting Started page and guides (#527)

    • Clean up API Reference and docstrings (#536)

    • Added Open Graph for Twitter and Facebook (#544)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.8 January 25, 2021
  • Enhancements
    • Add DataTable.df property for accessing the underling DataFrame (#470)

    • Set index of underlying DataFrame to match DataTable index (#464)

  • Fixes
    • Sort underlying series when sorting dataframe (#468)

    • Allow setting indices to current index without side effects (#474)

  • Changes
    • Fix release document with Github Actions link for CI (#462)

    • Don’t allow registered LogicalTypes with the same name (#477)

    • Move str_to_logical_type to TypeSystem class (#482)

    • Remove pyarrow from core dependencies (#508)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.7 December 14, 2020
  • Enhancements
    • Allow for user-defined logical types and inference functions in TypeSystem object (#424)

    • Add __repr__ to DataTable (#425)

    • Allow initializing DataColumn with numpy array (#430)

    • Add drop to DataTable (#434)

    • Migrate CI tests to Github Actions (#417, #441, #451)

    • Add metadata to DataColumn for user-defined metadata (#447)

  • Fixes
    • Update DataColumn name when using setitem on column with no name (#426)

    • Don’t allow pickle serialization for Koalas DataFrames (#432)

    • Check DataTable metadata in equality check (#449)

    • Propagate all attributes of DataTable in _new_dt_including (#454)

  • Changes
    • Update links to use alteryx org Github URL (#423)

    • Support column names of any type allowed by the underlying DataFrame (#442)

    • Use object dtype for LatLong columns for easy access to latitude and longitude values (#414)

    • Restrict dask version to prevent 2020.12.0 release from being installed (#453)

    • Lower minimum requirement for numpy to 1.15.4, and set pandas minimum requirement 1.1.1 (#459)

  • Testing Changes
    • Fix missing test coverage (#436)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd

v0.0.6 November 30, 2020
  • Enhancements
    • Add support for creating DataTable from Koalas DataFrame (#327)

    • Add ability to initialize DataTable with numpy array (#367)

    • Add describe_dict method to DataTable (#405)

    • Add mutual_information_dict method to DataTable (#404)

    • Add metadata to DataTable for user-defined metadata (#392)

    • Add update_dataframe method to DataTable to update underlying DataFrame (#407)

    • Sort dataframe if time_index is specified, bypass sorting with already_sorted parameter. (#410)

    • Add description attribute to DataColumn (#416)

    • Implement DataColumn.__len__ and DataTable.__len__ (#415)

  • Fixes
    • Rename data_column.py datacolumn.py (#386)

    • Rename data_table.py datatable.py (#387)

    • Rename get_mutual_information mutual_information (#390)

  • Changes
    • Lower moto test requirement for serialization/deserialization (#376)

    • Make Koalas an optional dependency installable with woodwork[koalas] (#378)

    • Remove WholeNumber LogicalType from Woodwork (#380)

    • Updates to LogicalTypes to support Koalas 1.4.0 (#393)

    • Replace set_logical_types and set_semantic_tags with just set_types (#379)

    • Remove copy_dataframe parameter from DataTable initialization (#398)

    • Implement DataTable.__sizeof__ to return size of the underlying dataframe (#401)

    • Include Datetime columns in mutual info calculation (#399)

    • Maintain column order on DataTable operations (#406)

  • Testing Changes
    • Add pyarrow, dask, and koalas to automated dependency checks (#388)

    • Use new version of pull request Github Action (#394)

    • Improve parameterization for test_datatable_equality (#409)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

Breaking Changes
  • The DataTable.set_semantic_tags method was removed. DataTable.set_types can be used instead.

  • The DataTable.set_logical_types method was removed. DataTable.set_types can be used instead.

  • WholeNumber was removed from LogicalTypes. Columns that were previously inferred as WholeNumber will now be inferred as Integer.

  • The DataTable.get_mutual_information was renamed to DataTable.mutual_information.

  • The copy_dataframe parameter was removed from DataTable initialization.

v0.0.5 November 11, 2020
  • Enhancements
    • Add __eq__ to DataTable and DataColumn and update LogicalType equality (#318)

    • Add value_counts() method to DataTable (#342)

    • Support serialization and deserialization of DataTables via csv, pickle, or parquet (#293)

    • Add shape property to DataTable and DataColumn (#358)

    • Add iloc method to DataTable and DataColumn (#365)

    • Add numeric_categorical_threshold config value to allow inferring numeric columns as Categorical (#363)

    • Add rename method to DataTable (#367)

  • Fixes
    • Catch non numeric time index at validation (#332)

  • Changes
    • Support logical type inference from a Dask DataFrame (#248)

    • Fix validation checks and make_index to work with Dask DataFrames (#260)

    • Skip validation of Ordinal order values for Dask DataFrames (#270)

    • Improve support for datetimes with Dask input (#286)

    • Update DataTable.describe to work with Dask input (#296)

    • Update DataTable.get_mutual_information to work with Dask input (#300)

    • Modify to_pandas function to return DataFrame with correct index (#281)

    • Rename DataColumn.to_pandas method to DataColumn.to_series (#311)

    • Rename DataTable.to_pandas method to DataTable.to_dataframe (#319)

    • Remove UserWarning when no matching columns found (#325)

    • Remove copy parameter from DataTable.to_dataframe and DataColumn.to_series (#338)

    • Allow pandas ExtensionArrays as inputs to DataColumn (#343)

    • Move warnings to a separate exceptions file and call via UserWarning subclasses (#348)

    • Make Dask an optional dependency installable with woodwork[dask] (#357)

  • Documentation Changes
    • Create a guide for using Woodwork with Dask (#304)

    • Add conda install instructions (#305, #309)

    • Fix README.md badge with correct link (#314)

    • Simplify issue templates to make them easier to use (#339)

    • Remove extra output cell in Start notebook (#341)

  • Testing Changes
    • Parameterize numeric time index tests (#288)

    • Add DockerHub credentials to CI testing environment (#326)

    • Fix removing files for serialization test (#350)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

Breaking Changes
  • The DataColumn.to_pandas method was renamed to DataColumn.to_series.

  • The DataTable.to_pandas method was renamed to DataTable.to_dataframe.

  • copy is no longer a parameter of DataTable.to_dataframe or DataColumn.to_series.

v0.0.4 October 21, 2020
  • Enhancements
    • Add optional include parameter for DataTable.describe() to filter results (#228)

    • Add make_index parameter to DataTable.__init__ to enable optional creation of a new index column (#238)

    • Add support for setting ranking order on columns with Ordinal logical type (#240)

    • Add list_semantic_tags function and CLI to get dataframe of woodwork semantic_tags (#244)

    • Add support for numeric time index on DataTable (#267)

    • Add pop method to DataTable (#289)

    • Add entry point to setup.py to run CLI commands (#285)

  • Fixes
    • Allow numeric datetime time indices (#282)

  • Changes
    • Remove redundant methods DataTable.select_ltypes and DataTable.select_semantic_tags (#239)

    • Make results of get_mutual_information more clear by sorting and removing self calculation (#247)

    • Lower minimum scikit-learn version to 0.21.3 (#297)

  • Documentation Changes
    • Add guide for dt.describe and dt.get_mutual_information (#245)

    • Update README.md with documentation link (#261)

    • Add footer to doc pages with Alteryx Open Source (#258)

    • Add types and tags one-sentence definitions to Understanding Types and Tags guide (#271)

    • Add issue and pull request templates (#280, #284)

  • Testing Changes
    • Add automated process to check latest dependencies. (#268)

    • Add test for setting a time index with specified string logical type (#279)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

v0.0.3 October 9, 2020
  • Enhancements
    • Implement setitem on DataTable to create/overwrite an existing DataColumn (#165)

    • Add to_pandas method to DataColumn to access the underlying series (#169)

    • Add list_logical_types function and CLI to get dataframe of woodwork LogicalTypes (#172)

    • Add describe method to DataTable to generate statistics for the underlying data (#181)

    • Add optional return_dataframe parameter to load_retail to return either DataFrame or DataTable (#189)

    • Add get_mutual_information method to DataTable to generate mutual information between columns (#203)

    • Add read_csv function to create DataTable directly from CSV file (#222)

  • Fixes
    • Fix bug causing incorrect values for quartiles in DataTable.describe method (#187)

    • Fix bug in DataTable.describe that could cause an error if certain semantic tags were applied improperly (#190)

    • Fix bug with instantiated LogicalTypes breaking when used with issubclass (#231)

  • Changes
    • Remove unnecessary add_standard_tags attribute from DataTable (#171)

    • Remove standard tags from index column and do not return stats for index column from DataTable.describe (#196)

    • Update DataColumn.set_semantic_tags and DataColumn.add_semantic_tags to return new objects (#205)

    • Update various DataTable methods to return new objects rather than modifying in place (#210)

    • Move datetime_format to Datetime LogicalType (#216)

    • Do not calculate mutual info with index column in DataTable.get_mutual_information (#221)

    • Move setting of underlying physical types from DataTable to DataColumn (#233)

  • Documentation Changes
    • Remove unused code from sphinx conf.py, update with Github URL(#160, #163)

    • Update README and docs with new Woodwork logo, with better code snippets (#161, #159)

    • Add DataTable and DataColumn to API Reference (#162)

    • Add docstrings to LogicalType classes (#168)

    • Add Woodwork image to index, clear outputs of Jupyter notebook in docs (#173)

    • Update contributing.md, release.md with all instructions (#176)

    • Add section for setting index and time index to start notebook (#179)

    • Rename changelog to Release Notes (#193)

    • Add section for standard tags to start notebook (#188)

    • Add Understanding Types and Tags user guide (#201)

    • Add missing docstring to list_logical_types (#202)

    • Add Woodwork Global Configuration Options guide (#215)

  • Testing Changes
    • Add tests that confirm dtypes are as expected after DataTable init (#152)

    • Remove unused none_df test fixture (#224)

    • Add test for LogicalType.__str__ method (#225)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.2 September 28, 2020
  • Fixes
    • Fix formatting issue when printing global config variables (#138)

  • Changes
    • Change add_standard_tags to use_standard_Tags to better describe behavior (#149)

    • Change access of underlying dataframe to be through to_pandas with ._dataframe field on class (#146)

    • Remove replace_none parameter to DataTables (#146)

  • Documentation Changes
    • Add working code example to README and create Using Woodwork page (#103)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.1.0 September 24, 2020
  • Add natural_language_threshold global config option used for Categorical/NaturalLanguage type inference (#135)

  • Add global config options and add datetime_format option for type inference (#134)

  • Fix bug with Integer and WholeNumber inference in column with pd.NA values (#133)

  • Add DataTable.ltypes property to return series of logical types (#131)

  • Add ability to create new datatable from specified columns with dt[[columns]] (#127)

  • Handle setting and tagging of index and time index columns (#125)

  • Add combined tag and ltype selection (#124)

  • Add changelog, and update changelog check to CI (#123)

  • Implement reset_semantic_tags (#118)

  • Implement DataTable getitem (#119)

  • Add remove_semantic_tags method (#117)

  • Add semantic tag selection (#106)

  • Add github action, rename to woodwork (#113)

  • Add license to setup.py (#112)

  • Reset semantic tags on logical type change (#107)

  • Add standard numeric and category tags (#100)

  • Change semantic_types to semantic_tags, a set of strings (#100)

  • Update dataframe dtypes based on logical types (#94)

  • Add select_logical_types to DataTable (#96)

  • Add pygments to dev-requirements.txt (#97)

  • Add replacing None with np.nan in DataTable init (#87)

  • Refactor DataColumn to make semantic_types and logical_type private (#86)

  • Add pandas_dtype to each Logical Type, and remove dtype attribute on DataColumn (#85)

  • Add set_semantic_types methods on both DataTable and DataColumn (#75)

  • Support passing camel case or snake case strings for setting logical types (#74)

  • Improve flexibility when setting semantic types (#72)

  • Add Whole Number Inference of Logical Types (#66)

  • Add dtypes property to DataTables and repr for DataColumn (#61)

  • Allow specification of semantic types during DataTable creation (#69)

  • Implements set_logical_types on DataTable (#65)

  • Add init files to tests to fix code coverage (#60)

  • Add AutoAssign bot (#59)

  • Add logical types validation in DataTables (#49)

  • Fix working_directory in CI (#57)

  • Add infer_logical_types for DataColumn (#45)

  • Fix ReadME library name, and code coverage badge (#56, #56)

  • Add code coverage (#51)

  • Improve and refactor the validation checks during initialization of a DataTable (#40)

  • Add dataframe attribute to DataTable (#39)

  • Update ReadME with minor usage details (#37)

  • Add License (#34)

  • Rename from datatables to datatables (#4)

  • Add Logical Types, DataTable, DataColumn (#3)

  • Add Makefile, setup.py, requirements.txt (#2)

  • Initial Release (#1)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd