Release Notes

v0.0.9 February 5, 2021
  • Enhancements
    • Add Python 3.9 support without Koalas testing (#511)

    • Add get_valid_mi_types function to list LogicalTypes valid for mutual information calculation (#517)

  • Fixes
    • Handle missing values in Datetime columns when calculating mutual information (#516)

    • Support numpy 1.20.0 by restricting version for koalas and changing serialization error message (#532)

    • Move Koalas option setting to DataTable init instead of import (#543)

  • Documentation Changes
    • Add Alteryx OSS Twitter link (#519)

    • Update logo and add new favicon (#521)

    • Multiple improvements to Getting Started page and guides (#527)

    • Clean up API Reference and docstrings (#536)

    • Added Open Graph for Twitter and Facebook (#544)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.8 January 25, 2021
  • Enhancements
    • Add DataTable.df property for accessing the underling DataFrame (#470)

    • Set index of underlying DataFrame to match DataTable index (#464)

  • Fixes
    • Sort underlying series when sorting dataframe (#468)

    • Allow setting indices to current index without side effects (#474)

  • Changes
    • Fix release document with Github Actions link for CI (#462)

    • Don’t allow registered LogicalTypes with the same name (#477)

    • Move str_to_logical_type to TypeSystem class (#482)

    • Remove pyarrow from core dependencies (#508)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.7 December 14, 2020
  • Enhancements
    • Allow for user-defined logical types and inference functions in TypeSystem object (#424)

    • Add __repr__ to DataTable (#425)

    • Allow initializing DataColumn with numpy array (#430)

    • Add drop to DataTable (#434)

    • Migrate CI tests to Github Actions (#417, #441, #451)

    • Add metadata to DataColumn for user-defined metadata (#447)

  • Fixes
    • Update DataColumn name when using setitem on column with no name (#426)

    • Don’t allow pickle serialization for Koalas DataFrames (#432)

    • Check DataTable metadata in equality check (#449)

    • Propagate all attributes of DataTable in _new_dt_including (#454)

  • Changes
    • Update links to use alteryx org Github URL (#423)

    • Support column names of any type allowed by the underlying DataFrame (#442)

    • Use object dtype for LatLong columns for easy access to latitude and longitude values (#414)

    • Restrict dask version to prevent 2020.12.0 release from being installed (#453)

    • Lower minimum requirement for numpy to 1.15.4, and set pandas minimum requirement 1.1.1 (#459)

  • Testing Changes
    • Fix missing test coverage (#436)

Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd

v0.0.6 November 30, 2020
  • Enhancements
    • Add support for creating DataTable from Koalas DataFrame (#327)

    • Add ability to initialize DataTable with numpy array (#367)

    • Add describe_dict method to DataTable (#405)

    • Add mutual_information_dict method to DataTable (#404)

    • Add metadata to DataTable for user-defined metadata (#392)

    • Add update_dataframe method to DataTable to update underlying DataFrame (#407)

    • Sort dataframe if time_index is specified, bypass sorting with already_sorted parameter. (#410)

    • Add description attribute to DataColumn (#416)

    • Implement DataColumn.__len__ and DataTable.__len__ (#415)

  • Fixes
    • Rename data_column.py datacolumn.py (#386)

    • Rename data_table.py datatable.py (#387)

    • Rename get_mutual_information mutual_information (#390)

  • Changes
    • Lower moto test requirement for serialization/deserialization (#376)

    • Make Koalas an optional dependency installable with woodwork[koalas] (#378)

    • Remove WholeNumber LogicalType from Woodwork (#380)

    • Updates to LogicalTypes to support Koalas 1.4.0 (#393)

    • Replace set_logical_types and set_semantic_tags with just set_types (#379)

    • Remove copy_dataframe parameter from DataTable initialization (#398)

    • Implement DataTable.__sizeof__ to return size of the underlying dataframe (#401)

    • Include Datetime columns in mutual info calculation (#399)

    • Maintain column order on DataTable operations (#406)

  • Testing Changes
    • Add pyarrow, dask, and koalas to automated dependency checks (#388)

    • Use new version of pull request Github Action (#394)

    • Improve parameterization for test_datatable_equality (#409)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

Breaking Changes
  • The DataTable.set_semantic_tags method was removed. DataTable.set_types can be used instead.

  • The DataTable.set_logical_types method was removed. DataTable.set_types can be used instead.

  • WholeNumber was removed from LogicalTypes. Columns that were previously inferred as WholeNumber will now be inferred as Integer.

  • The DataTable.get_mutual_information was renamed to DataTable.mutual_information.

  • The copy_dataframe parameter was removed from DataTable initialization.

v0.0.5 November 11, 2020
  • Enhancements
    • Add __eq__ to DataTable and DataColumn and update LogicalType equality (#318)

    • Add value_counts() method to DataTable (#342)

    • Support serialization and deserialization of DataTables via csv, pickle, or parquet (#293)

    • Add shape property to DataTable and DataColumn (#358)

    • Add iloc method to DataTable and DataColumn (#365)

    • Add numeric_categorical_threshold config value to allow inferring numeric columns as Categorical (#363)

    • Add rename method to DataTable (#367)

  • Fixes
    • Catch non numeric time index at validation (#332)

  • Changes
    • Support logical type inference from a Dask DataFrame (#248)

    • Fix validation checks and make_index to work with Dask DataFrames (#260)

    • Skip validation of Ordinal order values for Dask DataFrames (#270)

    • Improve support for datetimes with Dask input (#286)

    • Update DataTable.describe to work with Dask input (#296)

    • Update DataTable.get_mutual_information to work with Dask input (#300)

    • Modify to_pandas function to return DataFrame with correct index (#281)

    • Rename DataColumn.to_pandas method to DataColumn.to_series (#311)

    • Rename DataTable.to_pandas method to DataTable.to_dataframe (#319)

    • Remove UserWarning when no matching columns found (#325)

    • Remove copy parameter from DataTable.to_dataframe and DataColumn.to_series (#338)

    • Allow pandas ExtensionArrays as inputs to DataColumn (#343)

    • Move warnings to a separate exceptions file and call via UserWarning subclasses (#348)

    • Make Dask an optional dependency installable with woodwork[dask] (#357)

  • Documentation Changes
    • Create a guide for using Woodwork with Dask (#304)

    • Add conda install instructions (#305, #309)

    • Fix README.md badge with correct link (#314)

    • Simplify issue templates to make them easier to use (#339)

    • Remove extra output cell in Start notebook (#341)

  • Testing Changes
    • Parameterize numeric time index tests (#288)

    • Add DockerHub credentials to CI testing environment (#326)

    • Fix removing files for serialization test (#350)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

Breaking Changes
  • The DataColumn.to_pandas method was renamed to DataColumn.to_series.

  • The DataTable.to_pandas method was renamed to DataTable.to_dataframe.

  • copy is no longer a parameter of DataTable.to_dataframe or DataColumn.to_series.

v0.0.4 October 21, 2020
  • Enhancements
    • Add optional include parameter for DataTable.describe() to filter results (#228)

    • Add make_index parameter to DataTable.__init__ to enable optional creation of a new index column (#238)

    • Add support for setting ranking order on columns with Ordinal logical type (#240)

    • Add list_semantic_tags function and CLI to get dataframe of woodwork semantic_tags (#244)

    • Add support for numeric time index on DataTable (#267)

    • Add pop method to DataTable (#289)

    • Add entry point to setup.py to run CLI commands (#285)

  • Fixes
    • Allow numeric datetime time indices (#282)

  • Changes
    • Remove redundant methods DataTable.select_ltypes and DataTable.select_semantic_tags (#239)

    • Make results of get_mutual_information more clear by sorting and removing self calculation (#247)

    • Lower minimum scikit-learn version to 0.21.3 (#297)

  • Documentation Changes
    • Add guide for dt.describe and dt.get_mutual_information (#245)

    • Update README.md with documentation link (#261)

    • Add footer to doc pages with Alteryx Open Source (#258)

    • Add types and tags one-sentence definitions to Understanding Types and Tags guide (#271)

    • Add issue and pull request templates (#280, #284)

  • Testing Changes
    • Add automated process to check latest dependencies. (#268)

    • Add test for setting a time index with specified string logical type (#279)

Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd

v0.0.3 October 9, 2020
  • Enhancements
    • Implement setitem on DataTable to create/overwrite an existing DataColumn (#165)

    • Add to_pandas method to DataColumn to access the underlying series (#169)

    • Add list_logical_types function and CLI to get dataframe of woodwork LogicalTypes (#172)

    • Add describe method to DataTable to generate statistics for the underlying data (#181)

    • Add optional return_dataframe parameter to load_retail to return either DataFrame or DataTable (#189)

    • Add get_mutual_information method to DataTable to generate mutual information between columns (#203)

    • Add read_csv function to create DataTable directly from CSV file (#222)

  • Fixes
    • Fix bug causing incorrect values for quartiles in DataTable.describe method (#187)

    • Fix bug in DataTable.describe that could cause an error if certain semantic tags were applied improperly (#190)

    • Fix bug with instantiated LogicalTypes breaking when used with issubclass (#231)

  • Changes
    • Remove unnecessary add_standard_tags attribute from DataTable (#171)

    • Remove standard tags from index column and do not return stats for index column from DataTable.describe (#196)

    • Update DataColumn.set_semantic_tags and DataColumn.add_semantic_tags to return new objects (#205)

    • Update various DataTable methods to return new objects rather than modifying in place (#210)

    • Move datetime_format to Datetime LogicalType (#216)

    • Do not calculate mutual info with index column in DataTable.get_mutual_information (#221)

    • Move setting of underlying physical types from DataTable to DataColumn (#233)

  • Documentation Changes
    • Remove unused code from sphinx conf.py, update with Github URL(#160, #163)

    • Update README and docs with new Woodwork logo, with better code snippets (#161, #159)

    • Add DataTable and DataColumn to API Reference (#162)

    • Add docstrings to LogicalType classes (#168)

    • Add Woodwork image to index, clear outputs of Jupyter notebook in docs (#173)

    • Update contributing.md, release.md with all instructions (#176)

    • Add section for setting index and time index to start notebook (#179)

    • Rename changelog to Release Notes (#193)

    • Add section for standard tags to start notebook (#188)

    • Add Understanding Types and Tags user guide (#201)

    • Add missing docstring to list_logical_types (#202)

    • Add Woodwork Global Configuration Options guide (#215)

  • Testing Changes
    • Add tests that confirm dtypes are as expected after DataTable init (#152)

    • Remove unused none_df test fixture (#224)

    • Add test for LogicalType.__str__ method (#225)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.0.2 September 28, 2020
  • Fixes
    • Fix formatting issue when printing global config variables (#138)

  • Changes
    • Change add_standard_tags to use_standard_Tags to better describe behavior (#149)

    • Change access of underlying dataframe to be through to_pandas with ._dataframe field on class (#146)

    • Remove replace_none parameter to DataTables (#146)

  • Documentation Changes
    • Add working code example to README and create Using Woodwork page (#103)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd

v0.1.0 September 24, 2020
  • Add natural_language_threshold global config option used for Categorical/NaturalLanguage type inference (#135)

  • Add global config options and add datetime_format option for type inference (#134)

  • Fix bug with Integer and WholeNumber inference in column with pd.NA values (#133)

  • Add DataTable.ltypes property to return series of logical types (#131)

  • Add ability to create new datatable from specified columns with dt[[columns]] (#127)

  • Handle setting and tagging of index and time index columns (#125)

  • Add combined tag and ltype selection (#124)

  • Add changelog, and update changelog check to CI (#123)

  • Implement reset_semantic_tags (#118)

  • Implement DataTable getitem (#119)

  • Add remove_semantic_tags method (#117)

  • Add semantic tag selection (#106)

  • Add github action, rename to woodwork (#113)

  • Add license to setup.py (#112)

  • Reset semantic tags on logical type change (#107)

  • Add standard numeric and category tags (#100)

  • Change semantic_types to semantic_tags, a set of strings (#100)

  • Update dataframe dtypes based on logical types (#94)

  • Add select_logical_types to DataTable (#96)

  • Add pygments to dev-requirements.txt (#97)

  • Add replacing None with np.nan in DataTable init (#87)

  • Refactor DataColumn to make semantic_types and logical_type private (#86)

  • Add pandas_dtype to each Logical Type, and remove dtype attribute on DataColumn (#85)

  • Add set_semantic_types methods on both DataTable and DataColumn (#75)

  • Support passing camel case or snake case strings for setting logical types (#74)

  • Improve flexibility when setting semantic types (#72)

  • Add Whole Number Inference of Logical Types (#66)

  • Add dtypes property to DataTables and repr for DataColumn (#61)

  • Allow specification of semantic types during DataTable creation (#69)

  • Implements set_logical_types on DataTable (#65)

  • Add init files to tests to fix code coverage (#60)

  • Add AutoAssign bot (#59)

  • Add logical types validation in DataTables (#49)

  • Fix working_directory in CI (#57)

  • Add infer_logical_types for DataColumn (#45)

  • Fix ReadME library name, and code coverage badge (#56, #56)

  • Add code coverage (#51)

  • Improve and refactor the validation checks during initialization of a DataTable (#40)

  • Add dataframe attribute to DataTable (#39)

  • Update ReadME with minor usage details (#37)

  • Add License (#34)

  • Rename from datatables to datatables (#4)

  • Add Logical Types, DataTable, DataColumn (#3)

  • Add Makefile, setup.py, requirements.txt (#2)

  • Initial Release (#1)

Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd