Release Notes#
Future Release#
Enhancements
Fixes
- Changes
Restrict numpy to <2.0.0 GH#1869
Documentation Changes
Testing Changes
Thanks to the following people for contributing to this release: @thehomebrewnerd
v0.31.0 May 13, 2024#
- Enhancements
Add support for Python 3.12 GH#1855
Thanks to the following people for contributing to this release: @thehomebrewnerd
Breaking Changes#
With this release, Woodwork can no longer be used with Dask or Pyspark dataframes. The behavior when using pandas dataframes remains unchanged.
v0.30.0 Apr 10, 2024#
Warning
Support for use with Dask and Pyspark dataframes is planned for removal in an upcoming release of Woodwork.
- Testing Changes
Fix serialization test to work with pytest 8.1.1 GH#1837
Thanks to the following people for contributing to this release: @thehomebrewnerd
v0.29.0 Feb 26, 2024#
Thanks to the following people for contributing to this release: @thehomebrewnerd
v0.28.0 Feb 5, 2024#
Warning
This release of Woodwork will not support Python 3.8
- Changes
Upgraded numpy to < 2.0.0 GH#1799
- Documentation Changes
Added dask string storage note to “Other Limitations” in Dask documentation GH#1799
- Testing Changes
Upgraded moto and boto3 GH#1799
Thanks to the following people for contributing to this release: @cp2boston, @gsheni, @tamargrey
v0.27.0 Dec 12, 2023#
Changes * Temporarily restrict pyarrow version due to serialization issues (GH#1768) * Update pandas categorical type call and remove black with ruff formatter (GH#1794)
Testing Changes * Removed old performance testing workflow (GH#1776)
Thanks to the following people for contributing to this release: @eccabay, @gsheni, @thehomebrewnerd, @petejanuszewski1
v0.26.0 Aug 22, 2023#
v0.25.1 Jul 18, 2023#
- Fixes
Restrict
numpy
version to resolve boolean inference issue with v1.25.0 GH#1735Thanks to the following people for contributing to this release: @thehomebrewnerd
v0.25.0 Jul 17, 2023#
Thanks to the following people for contributing to this release: @christopherbunn, @thehomebrewnerd
v0.24.0 May 24, 2023#
- Enhancements
- Changes
Stopped calculating
top_values
for Double columns with integer values (GH#1692)Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @ParthivNaresh, @petejanuszewski1, @simha104, @tamargrey
v0.23.0 April 12, 2023#
- Fixes
Updated
Datetime
format inference to include formats with two digit year dates along with timezones (GH#1666)Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @ParthivNaresh, @simha104
v0.22.0 March 13, 2023#
- Testing Changes
Add ruff for linting and replace isort/flake8 (GH#1614)
Specify black and ruff config arguments (GH#1620)
Add codecov token for unit tests workflow (GH#1630)
Add GitHub Actions cache to speed up workflows (GH#1631)
Add pull request check for linked issues to CI workflow (GH#1633, GH#1636)
Run lint fix on latest dependency update pull requests (GH#1640, GH#1641)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @ParthivNaresh
v0.21.2 January 11, 2023#
Thanks to the following people for contributing to this release: @ParthivNaresh, @sbadithe, @thehomebrewnerd
v0.21.1 December 16, 2022#
Thanks to the following people for contributing to this release: @bchen1116, @sbadithe
v0.21.0 December 1, 2022#
- Fixes
Resolved FutureWarning in
_get_box_plot_info_for_column
(GH#1563)Fixed error message in validate method in logical_types.py (GH#1565)
Fixed
IntegerNullable
inference by checking values are within validInt64
bounds (GH#1572)Update demo dataset links to point to new endpoint (GH#1570)
Fix DivisionByZero error in
type_system.py
(GH#1571)Fix Categorical dtype inference for
PostalCode
logical type (GH#1574)Fixed issue where forcing a
Boolean
logical type on a column of 0.0s and 1.0s caused incorrect transformation (GH#1576)
- Documentation Changes
Updated documentation to include the
get_outliers
andmedcouple_dict
(GH#1547)
- Testing Changes
Run looking glass performance tests on merge (GH#1567)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @ParthivNaresh, @sbadithe, @simha104
Breaking Changes#
GH#1549 will automatically infer more values as
Boolean
orBooleanNullable
, including, but not limited to,[0, 1]
,['yes', 'no']
, and["True", "False"]
.
v0.20.0 October 31, 2022#
- Enhancements
Replace use of deprecated
append
method for dataframes and series withconcat
method (GH#1533)Thanks to the following people for contributing to this release: @bchen1116, @sbadithe
v0.19.0 September 27, 2022#
- Enhancements
Added
Spearman Correlation
to options for dependence calculations (GH#1523)Added
ignore_zeros
as an argument forbox_plot_dict
to allow for calculations of outliers without 0 values (GH#1524)Added
target_col
argument todependence
anddependence_dict
to calculate correlations between features and target_col (GH#1531)
- Fixes
Fix datetime pivot point to be set at current year + 10 rather than the default for two-digit years when
datetime_format
provided (GH#1512)
- Testing Changes
Add kickoff for create conda forge pull request from release (GH#1515)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @ParthivNaresh, @thehomebrewnerd
v0.18.0 August 31, 2022#
- Enhancements
Updated
dependence_dict
andmutual_information
to drop to drop Categorical columns with a large number of unique values during mutual information calculation, non-dask only. (GH#1501)
- Fixes
Fix applying LatLong.transform to empty dask data (GH#1507)
- Testing Changes
Update development requirements and use latest for documentation (GH#1499)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @jeff-hernandez, @ParthivNaresh, @rwedge
v0.17.2 August 5, 2022#
- Fixes
Updated concat_columns to work with dataframes with mismatched indices or different shapes (GH#1485)
- Documentation Changes
Add instructions to add new users to woodwork feedstock (GH#1483)
- Testing Changes
Add create feedstock PR workflow (GH#1489)
Thanks to the following people for contributing to this release: @chukarsten, @cmancuso, @gsheni,
v0.17.1 July 29, 2022#
- Testing Changes
Allow for manual kickoff for minimum dependency checker (GH#1476)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni
v0.17.0 July 14, 2022#
Warning
This release of Woodwork will not support Python 3.7
- Enhancements
Added ability to null invalid values for
Double
logical type (GH#1449)Added ability to null invalid values for
BooleanNullable
logical type (GH#1455)Added ability to null invalid values for
IntegerNullable
logical type (GH#1456)Added ability to null invalid values for
EmailAddress
logical type (GH#1457)Added ability to null invalid values for
URL
logical type (GH#1459)Added ability to null invalid values for
PhoneNumber
logical type (GH#1460)Added ability to null invalid values for
AgeFractional
andAgeNullable
logical types (GH#1462)Added ability to null invalid values for
LatLong
logical type (GH#1465)Added ability to null invalid values for
PostalCode
logical type (US only) (GH#1467)Added smarter inference for
IntegerNullable
andBooleanNullable
types (GH#1458)
- Fixes
Fixed inference of all null string values as
Unknown
instead ofDatetime
(GH#1458)Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @ParthivNaresh
v0.16.4 Jun 23, 2022#
- Changes
Restrict pyspark below 3.3.0 (GH#1450)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @jeff-hernandez, @ParthivNaresh
v0.16.3 May 4, 2022#
- Fixes
Fixed
col_is_datetime
inference function to not infer numeric dtypes as datetime (GH#1413)
- Changes
Delete setup.py, MANIFEST.in and move configuration to pyproject.toml (GH#1409)
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @ParthivNaresh
v0.16.2 Apr 25, 2022#
- Fixes
Fixed import issues regarding
pyarrow
and madepython-dateutil>=2.8.1
a required dependency (GH#1397)Thanks to the following people for contributing to this release: @ParthivNaresh
v0.16.1 Apr 25, 2022#
- Fixes
Reverting
string[pyarrow]
until fix can be found for pandas issue (GH#1391)Thanks to the following people for contributing to this release: @ParthivNaresh
v0.16.0 Apr 21, 2022#
- Enhancements
Added the ability to provide a callback function to
TableAccessor.describe()
to get intermediate results (GH#1387)Add
pearson_correlation
anddependence
methods to TableAccessor (GH#1265)Uses
string[pyarrow]
instead ofstring
dtype to save memory (GH#1360)Added a better error message when dataframe and schema have different columns (GH#1366)
Stores timezones in Datetime logical type (GH#1376)
Added type inference for phone numbers (GH#1357)
Added type inference for zip code (GH#1378)
- Fixes
Cap pandas at 1.4.1 (GH#1373)
- Changes
Change underlying logic of
TableAccessor.mutual_information
(GH#1265)Added
from_disk
as a convenience function to deserialize a WW table (GH#1363)Allow attr version in setup.cfg (GH#1361)
Raise error if files already exist during serialization (GH#1356)
Improve exception handling in
col_is_datetime
(GH#1365)Store typing info in parquet file header during serialization (GH#1377)
- Testing Changes
Separate testing matrix to speed up GitHub Actions Linux tests for latest dependencies GH#1380
Thanks to the following people for contributing to this release: @bchen1116, @gsheni, @jeff-hernandez, @ParthivNaresh, @rwedge, @thehomebrewnerd
v0.15.0 Mar 24, 2022#
- Fixes
Updated
__str__
output forOrdinal
logical types (GH#1340)
- Documentation Changes
Update release.md with correct version updating info (GH#1358)
- Testing Changes
Updated scheduled workflows to only run on Alteryx owned repos (GH#1351)
Thanks to the following people for contributing to this release: @bchen1116, @dvreed77, @jeff-hernandez, @ParthivNaresh, @thehomebrewnerd
v0.14.0 Mar 15, 2022#
- Fixes
Preserve custom semantic tags when changing column logical type (GH#1300)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @mingdavidqi
Breaking Changes#
GH#1325: The following serialization functions have been removed from the API:
woodwork.serialize.write_dataframe
,woodwork.serialize.write_typing_info
andwoodwork.serialize.write_woodwork_table
. Also, the functionwoodwork.serialize.typing_info_to_dict
has been moved towoodwork.serializers.serializer_base.typing_info_to_dict
.
v0.13.0 Feb 16, 2022#
Warning
Woodwork may not support Python 3.7 in next non-bugfix release.
- Testing Changes
Replace mock with unittest.mock (GH#1304)
Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
v0.12.0 Jan 27, 2022#
- Enhancements
Add Slack link to GitHub issue creation templates (GH#1242)
Thanks to the following people for contributing to this release: @dvreed77, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
v0.11.2 Jan 28, 2022#
- Fixes
Set high and low bounds to the max and min values if no outliers are present in
box_plot_dict
(backport of GH#1269)Thanks to the following people for contributing to this release: @tamargrey
Note#
The pandas version for Koalas has been restricted, and a change was made to a pandas
replace
call to account for the recent pandas 1.4.0 release.
v0.11.1 Jan 4, 2022#
- Changes
Update inference process to only check for NaturalLanguage if no other type matches are found first (GH#1234)
- Documentation Changes
Updating contributing doc with Spark installation instructions (GH#1232)
Thanks to the following people for contributing to this release: @gsheni, @thehomebrewnerd, @willsmithorg
v0.11.0 Dec 22, 2021#
Thanks to the following people for contributing to this release: @jeff-hernandez, @thehomebrewnerd, @tuethan1999
v0.10.0 Nov 30, 2021#
- Changes
Return histogram bins as a list of floats instead of a
pandas.Interval
object (GH#1207)Thanks to the following people for contributing to this release: @tamargrey, @thehomebrewnerd
Breaking Changes#
:pr:
1207
: The behavior ofdescribe_dict
has changed when usingextra_stats=True
. Previously, the histogram bins were returned aspandas.Interval
objects. This has been updated so that the histogram bins are now represented as a two-element list of floats with the first element being the left edge of the bin and the second element being the right edge.
v0.9.1 Nov 19, 2021#
- Fixes
Fix bug that causes
mutual_information
to fail with certain index types (GH#1199)
- Changes
Update pip to 21.3.1 for test requirements (GH#1196)
- Documentation Changes
Update install page with updated minimum optional dependencies (GH#1193)
Thanks to the following people for contributing to this release: @gsheni, @thehomebrewnerd
v0.9.0 Nov 11, 2021#
- Enhancements
Added
read_file
parameter for replacing empty string values with NaN values (GH#1161)
- Changes
Make box plot low/high indices/values optional to return in
box_plot_dict
(GH#1184)
- Documentation Changes
Update docs dependencies (GH#1176)
Thanks to the following people for contributing to this release: @bchen1116, @davesque, @gsheni, @rwedge, @tamargrey, @thehomebrewnerd
v0.8.2 Oct 12, 2021#
Thanks to the following people for contributing to this release: @bchen1116, @davesque, @jeff-hernandez, @thehomebrewnerd
v0.8.1 Sep 16, 2021#
- Changes
Update
Datetime.transform
to use default nrows value when calling_infer_datetime_format
(GH#1137)
- Documentation Changes
Hide spark config in Using Dask and Koalas Guide (GH#1139)
Thanks to the following people for contributing to this release: @jeff-hernandez, @simha104, @thehomebrewnerd
v0.8.0 Sep 9, 2021#
- Enhancements
- Changes
Update
get_invalid_schema_message
to improve performance (GH#1132)Thanks to the following people for contributing to this release: @ajaypallekonda, @davesque, @jeff-hernandez, @thehomebrewnerd
v0.7.1 Aug 25, 2021#
Thanks to the following people for contributing to this release: @tamargrey
v0.7.0 Aug 25, 2021#
Thanks to the following people for contributing to this release: @davesque, @jeff-hernandez, @tamargrey, @tuethan1999
Breaking Changes#
:pr:
1100
: The behavior forinit
has changed. A full schema is a schema that contains all of the columns of the dataframe it describes whereas a partial schema only contains a subset. A full schema will also require that the schema is valid without having to make any changes to the DataFrame. Before, only a full schema was permitted by theinit
method so passing a partial schema would error. Additionally, any parameters likelogical_types
would be ignored if passing in a schema. Now, passing a partial schema to theinit
method calls theinit_with_partial_schema
method instead of throwing an error. Information from keyword arguments will override information from the partial schema. For example, if columna
has the Integer Logical Type in the partial schema, it’s possible to use thelogical_type
argument to reinfer it’s logical type by passing{'a': None}
or force a type by passing in{'a': Double}
. These changes mean that Woodwork init is less restrictive. If no type inference takes place and no changes are required of the DataFrame at initialization,init_with_full_schema
should be used instead ofinit
.init_with_full_schema
maintains the same functionality as when a schema was passed to the oldinit
.
v0.6.0 Aug 4, 2021#
- Fixes
Fix bug in
_infer_datetime_format
with allnp.nan
input (GH#1089)
- Changes
The criteria for categorical type inference have changed (GH#1065)
The meaning of both the
categorical_threshold
andnumeric_categorical_threshold
settings have changed (GH#1065)Make sampling for type inference more consistent (GH#1083)
Accessor logic checking if Woodwork has been initialized moved to decorator (GH#1093)
Thanks to the following people for contributing to this release: @davesque, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd, @tuethan1999
Breaking Changes#
GH#1065: The criteria for categorical type inference have changed. Relatedly, the meaning of both the
categorical_threshold
andnumeric_categorical_threshold
settings have changed. Now, a categorical match is signaled when a series either has the “categorical” pandas dtype or if the ratio of unique value count (nan excluded) and total value count (nan also excluded) is below or equal to some fraction. The value used for this fraction is set by thecategorical_threshold
setting which now has a default value of0.2
. If a fraction is set for thenumeric_categorical_threshold
setting, then series with either a float or integer dtype may be inferred as categorical by applying the same logic described above with thenumeric_categorical_threshold
fraction. Otherwise, thenumeric_categorical_threshold
setting defaults toNone
which indicates that series with a numerical type should not be inferred as categorical. Users who have overridden either thecategorical_threshold
ornumeric_categorical_threshold
settings will need to adjust their settings accordingly.GH#1083: The process of sampling series for logical type inference was updated to be more consistent. Before, initial sampling for inference differed depending on collection type (pandas, dask, or koalas). Also, further randomized subsampling was performed in some cases during categorical inference and in every case during email inference regardless of collection type. Overall, the way sampling was done was inconsistent and unpredictable. Now, the first 100,000 records of a column are sampled for logical type inference regardless of collection type although only records from the first partition of a dask dataset will be used. Subsampling performed by the inference functions of individual types has been removed. The effect of these changes is that inferred types may now be different although in many cases they will be more correct.
v0.5.1 Jul 22, 2021#
- Enhancements
Store inferred datetime format on Datetime logical type instance (GH#1025)
Add support for automatically inferring the
EmailAddress
logical type (GH#1047)Add feature origin attribute to schema (GH#1056)
Add ability to calculate outliers and the statistical info required for box and whisker plots to
WoodworkColumnAccessor
(GH#1048)Add ability to change config settings in a with block with
ww.config.with_options
(GH#1062)
- Fixes
Raises warning and removes tags when user adds a column with index tags to DataFrame (GH#1035)
- Documentation Changes
Add supported types to metadata docstring (GH#1049)
Thanks to the following people for contributing to this release: @davesque, @frances-h, @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd
v0.5.0 Jul 7, 2021#
- Fixes
Add input validation to ww.init_series (GH#1015)
Thanks to the following people for contributing to this release: @jeff-hernandez, @simha104, @tamargrey, @thehomebrewnerd, @tuethan1999
Breaking Changes#
The default logical type is now the
Unknown
type instead of theNaturalLanguage
type. The global confignatural_language_threshold
has been renamed tocategorical_threshold
.
v0.4.2 Jun 23, 2021#
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd, @tuethan1999
Breaking Changes#
Progress callback functions parameters have changed and progress is now being reported in the units specified by the unit of measurement parameter instead of percentage of total. Progress callback functions now are expected to accept the following five parameters:
progress increment since last call
progress units complete so far
total units to complete
the progress unit of measurement
time elapsed since start of calculation
DataFrame.ww.init
no longer accepts the make_index parameter
v0.4.1 Jun 9, 2021#
- Changes
Change write_dataframe to only copy dataframe if it contains LatLong (GH#955)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @tamargrey, @thehomebrewnerd
v0.4.0 May 26, 2021#
- Enhancements
- Changes
Add logical type method to transform data (GH#915)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @thehomebrewnerd, @tuethan1999
v0.3.1 May 12, 2021#
Warning
This Woodwork release uses a weak reference for maintaining a reference from the accessor to the DataFrame. Because of this, chaining a Woodwork call onto another call that creates a new DataFrame or Series object can be problematic.
Instead of calling
pd.DataFrame({'id':[1, 2, 3]}).ww.init()
, first store the DataFrame in a new variable and then initialize Woodwork:df = pd.DataFrame({'id':[1, 2, 3]}) df.ww.init()
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd
Breaking Changes#
The command line functions
python -m woodwork list-logical-types
andpython -m woodwork list-semantic-tags
no longer exist. Please call the underlying Python functionsww.list_logical_types()
andww.list_semantic_tags()
.
v0.3.0 May 3, 2021#
- Enhancements
Add
is_schema_valid
andget_invalid_schema_message
functions for checking schema validity (GH#834)Add logical type for
Age
andAgeNullable
(GH#849)Add logical type for
Address
(GH#858)Add generic
to_disk
function to save Woodwork schema and data (GH#872)Add generic
read_file
function to read file as Woodwork DataFrame (GH#878)
- Changes
Consistently use
ColumnNotPresentError
for mismatches between user input and dataframe/schema columns (GH#837)Raise custom
WoodworkNotInitError
when accessing Woodwork attributes before initialization (GH#838)Remove check requiring
Ordinal
instance for initializing aColumnSchema
object (GH#870)Increase koalas min version to 1.8.0 (GH#885)
- Documentation Changes
Improve formatting of release notes (GH#874)
Thanks to the following people for contributing to this release: @frances-h, @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Breaking Changes#
Woodwork tables can no longer be saved using to disk
df.ww.to_csv
,df.ww.to_pickle
, ordf.ww.to_parquet
. Usedf.ww.to_disk
instead.The
read_csv
function has been replaced byread_file
.
v0.2.0 Apr 20, 2021#
Warning
This Woodwork release does not support Python 3.6
- Enhancements
Add validation control to WoodworkTableAccessor (GH#736)
Store
make_index
value on WoodworkTableAccessor (GH#780)Add optional
exclude
parameter to WoodworkTableAccessorselect
method (GH#783)Add validation control to
deserialize.read_woodwork_table
andww.read_csv
(GH#788)Add
WoodworkColumnAccessor.schema
and handle copying column schema (GH#799)Allow initializing a
WoodworkColumnAccessor
with aColumnSchema
(GH#814)Add
__repr__
toColumnSchema
(GH#817)Add
BooleanNullable
andIntegerNullable
logical types (GH#830)Add validation control to
WoodworkColumnAccessor
(GH#833)
- Changes
Rename
FullName
logical type toPersonFullName
(GH#740)Rename
ZIPCode
logical type toPostalCode
(GH#741)Update minimum scikit-learn version to 0.22 (GH#763)
Drop support for Python version 3.6 (GH#768)
Remove
ColumnNameMismatchWarning
(GH#777)
get_column_dict
does not use standard tags by default (GH#782)Make
logical_type
andname
params to_get_column_dict
optional (GH#786)Rename Schema object and files to match new table-column schema structure (GH#789)
Store column typing information in a
ColumnSchema
object instead of a dictionary (GH#791)
TableSchema
does not use standard tags by default (GH#806)Store
use_standard_tags
on theColumnSchema
instead of theTableSchema
(GH#809)Move functions in
column_schema.py
to be methods onColumnSchema
(GH#829)
- Testing Changes
Add unit tests against minimum dependencies for python 3.6 on PRs and main (GH#743, GH#753, GH#763)
Update spark config for test fixtures (GH#787)
Separate latest unit tests into pandas, dask, koalas (GH#813)
Update latest dependency checker to generate separate core, koalas, and dask dependencies (GH#815, GH#825)
Ignore latest dependency branch when checking for updates to the release notes (GH#827)
Change from GitHub PAT to auto generated GitHub Token for dependency checker (GH#831)
Expand
ColumnSchema
semantic tag testing coverage and nulllogical_type
testing coverage (GH#832)Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @rwedge, @tamargrey, @thehomebrewnerd
Breaking Changes#
The
ZIPCode
logical type has been renamed toPostalCode
The
FullName
logical type has been renamed toPersonFullName
The
Schema
object has been renamed toTableSchema
With the
ColumnSchema
object, typing information for a column can no longer be accessed withdf.ww.columns[col_name]['logical_type']
. Instead usedf.ww.columns[col_name].logical_type
.The
Boolean
andInteger
logical types will no longer work with data that contains null values. The newBooleanNullable
andIntegerNullable
logical types should be used if null values are present.
v0.1.0 Mar 22, 2021#
- Enhancements
Implement Schema and Accessor API (GH#497)
Add Schema class that holds typing info (GH#499)
Add WoodworkTableAccessor class that performs type inference and stores Schema (GH#514)
Allow initializing Accessor schema with a valid Schema object (GH#522)
Add ability to read in a csv and create a DataFrame with an initialized Woodwork Schema (GH#534)
Add ability to call pandas methods from Accessor (GH#538, GH#589)
Add helpers for checking if a column is one of Boolean, Datetime, numeric, or categorical (GH#553)
Add ability to load demo retail dataset with a Woodwork Accessor (GH#556)
Add
select
to WoodworkTableAccessor (GH#548)Add
mutual_information
to WoodworkTableAccessor (GH#571)Add WoodworkColumnAccessor class (GH#562)
Add semantic tag update methods to column accessor (GH#573)
Add
describe
anddescribe_dict
to WoodworkTableAccessor (GH#579)Add
init_series
util function for initializing a series with dtype change (GH#581)Add
set_logical_type
method to WoodworkColumnAccessor (GH#590)Add semantic tag update methods to table schema (GH#591)
Add warning if additional parameters are passed along with schema (GH#593)
Better warning when accessing column properties before init (GH#596)
Update column accessor to work with LatLong columns (GH#598)
Add
set_index
to WoodworkTableAccessor (GH#603)Implement
loc
andiloc
for WoodworkColumnAccessor (GH#613)Add
set_time_index
to WoodworkTableAccessor (GH#612)Implement
loc
andiloc
for WoodworkTableAccessor (GH#618)Allow updating logical types with
set_types
and make relevant DataFrame changes (GH#619)Allow serialization of WoodworkColumnAccessor to csv, pickle, and parquet (GH#624)
Add DaskColumnAccessor (GH#625)
Allow deserialization from csv, pickle, and parquet to Woodwork table (GH#626)
Add
value_counts
to WoodworkTableAccessor (GH#632)Add KoalasColumnAccessor (GH#634)
Add
pop
to WoodworkTableAccessor (GH#636)Add
drop
to WoodworkTableAccessor (GH#640)Add
rename
to WoodworkTableAccessor (GH#646)Add DaskTableAccessor (GH#648)
Add Schema properties to WoodworkTableAccessor (GH#651)
Add KoalasTableAccessor (GH#652)
Adds
__getitem__
to WoodworkTableAccessor (GH#633)Update Koalas min version and add support for more new pandas dtypes with Koalas (GH#678)
Adds
__setitem__
to WoodworkTableAccessor (GH#669)
- Changes
Move mutual information logic to statistics utils file (GH#584)
Bump min Koalas version to 1.4.0 (GH#638)
Preserve pandas underlying index when not creating a Woodwork index (GH#664)
Restrict Koalas version to
<1.7.0
due to breaking changes (GH#674)Clean up dtype usage across Woodwork (GH#682)
Improve error when calling accessor properties or methods before init (GH#683)
Remove dtype from Schema dictionary (GH#685)
Add
include_index
param and allow unique columns in Accessor mutual information (GH#699)Include DataFrame equality and
use_standard_tags
in WoodworkTableAccessor equality check (GH#700)Remove
DataTable
andDataColumn
classes to migrate towards the accessor approach (GH#713)Change
sample_series
dtype to not need conversion and removeconvert_series
util (GH#720)Rename Accessor methods since
DataTable
has been removed (GH#723)
- Documentation Changes
Update README.md and Get Started guide to use accessor (GH#655, GH#717)
Update Understanding Types and Tags guide to use accessor (GH#657)
Update docstrings and API Reference page (GH#660)
Update statistical insights guide to use accessor (GH#693)
Update Customizing Type Inference guide to use accessor (GH#696)
Update Dask and Koalas guide to use accessor (GH#701)
Update index notebook and install guide to use accessor (GH#715)
Add section to documentation about schema validity (GH#729)
Update README.md and Get Started guide to use
pd.read_csv
(GH#730)Make small fixes to documentation formatting (GH#731)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey, @thehomebrewnerd
Breaking Changes#
The
DataTable
andDataColumn
classes have been removed and replaced by newWoodworkTableAccessor
andWoodworkColumnAccessor
classes which are used through theww
namespace available on DataFrames after importing Woodwork.
v0.0.11 Mar 15, 2021#
- Documentation Changes
Update to remove warning message from statistical insights guide (GH#690)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
v0.0.10 Feb 25, 2021#
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @johnbridstrup, @tamargrey
v0.0.9 Feb 5, 2021#
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
v0.0.8 Jan 25, 2021#
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
v0.0.7 Dec 14, 2020#
- Changes
Update links to use alteryx org Github URL (GH#423)
Support column names of any type allowed by the underlying DataFrame (GH#442)
Use
object
dtype for LatLong columns for easy access to latitude and longitude values (GH#414)Restrict dask version to prevent 2020.12.0 release from being installed (GH#453)
Lower minimum requirement for numpy to 1.15.4, and set pandas minimum requirement 1.1.1 (GH#459)
- Testing Changes
Fix missing test coverage (GH#436)
Thanks to the following people for contributing to this release: @gsheni, @jeff-hernandez, @tamargrey, @thehomebrewnerd
v0.0.6 Nov 30, 2020#
- Enhancements
Add support for creating DataTable from Koalas DataFrame (GH#327)
Add ability to initialize DataTable with numpy array (GH#367)
Add
describe_dict
method to DataTable (GH#405)Add
mutual_information_dict
method to DataTable (GH#404)Add
metadata
to DataTable for user-defined metadata (GH#392)Add
update_dataframe
method to DataTable to update underlying DataFrame (GH#407)Sort dataframe if
time_index
is specified, bypass sorting withalready_sorted
parameter. (GH#410)Add
description
attribute to DataColumn (GH#416)Implement
DataColumn.__len__
andDataTable.__len__
(GH#415)
- Changes
Lower moto test requirement for serialization/deserialization (GH#376)
Make Koalas an optional dependency installable with woodwork[koalas] (GH#378)
Remove WholeNumber LogicalType from Woodwork (GH#380)
Updates to LogicalTypes to support Koalas 1.4.0 (GH#393)
Replace
set_logical_types
andset_semantic_tags
with justset_types
(GH#379)Remove
copy_dataframe
parameter from DataTable initialization (GH#398)Implement
DataTable.__sizeof__
to return size of the underlying dataframe (GH#401)Include Datetime columns in mutual info calculation (GH#399)
Maintain column order on DataTable operations (GH#406)
Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
Breaking Changes#
The
DataTable.set_semantic_tags
method was removed.DataTable.set_types
can be used instead.The
DataTable.set_logical_types
method was removed.DataTable.set_types
can be used instead.
WholeNumber
was removed from LogicalTypes. Columns that were previously inferred as WholeNumber will now be inferred as Integer.The
DataTable.get_mutual_information
was renamed toDataTable.mutual_information
.The
copy_dataframe
parameter was removed from DataTable initialization.
v0.0.5 Nov 11, 2020#
- Enhancements
Add
__eq__
to DataTable and DataColumn and update LogicalType equality (GH#318)Add
value_counts()
method to DataTable (GH#342)Support serialization and deserialization of DataTables via csv, pickle, or parquet (GH#293)
Add
shape
property to DataTable and DataColumn (GH#358)Add
iloc
method to DataTable and DataColumn (GH#365)Add
numeric_categorical_threshold
config value to allow inferring numeric columns as Categorical (GH#363)Add
rename
method to DataTable (GH#367)
- Fixes
Catch non numeric time index at validation (GH#332)
- Changes
Support logical type inference from a Dask DataFrame (GH#248)
Fix validation checks and
make_index
to work with Dask DataFrames (GH#260)Skip validation of Ordinal order values for Dask DataFrames (GH#270)
Improve support for datetimes with Dask input (GH#286)
Update
DataTable.describe
to work with Dask input (GH#296)Update
DataTable.get_mutual_information
to work with Dask input (GH#300)Modify
to_pandas
function to return DataFrame with correct index (GH#281)Rename
DataColumn.to_pandas
method toDataColumn.to_series
(GH#311)Rename
DataTable.to_pandas
method toDataTable.to_dataframe
(GH#319)Remove UserWarning when no matching columns found (GH#325)
Remove
copy
parameter fromDataTable.to_dataframe
andDataColumn.to_series
(GH#338)Allow pandas ExtensionArrays as inputs to DataColumn (GH#343)
Move warnings to a separate exceptions file and call via UserWarning subclasses (GH#348)
Make Dask an optional dependency installable with woodwork[dask] (GH#357)
Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
Breaking Changes#
The
DataColumn.to_pandas
method was renamed toDataColumn.to_series
.The
DataTable.to_pandas
method was renamed toDataTable.to_dataframe
.
copy
is no longer a parameter ofDataTable.to_dataframe
orDataColumn.to_series
.
v0.0.4 Oct 21, 2020#
- Enhancements
Add optional
include
parameter forDataTable.describe()
to filter results (GH#228)Add
make_index
parameter toDataTable.__init__
to enable optional creation of a new index column (GH#238)Add support for setting ranking order on columns with Ordinal logical type (GH#240)
Add
list_semantic_tags
function and CLI to get dataframe of woodwork semantic_tags (GH#244)Add support for numeric time index on DataTable (GH#267)
Add pop method to DataTable (GH#289)
Add entry point to setup.py to run CLI commands (GH#285)
- Fixes
Allow numeric datetime time indices (GH#282)
Thanks to the following people for contributing to this release: @ctduffy, @gsheni, @tamargrey, @thehomebrewnerd
v0.0.3 Oct 9, 2020#
- Enhancements
Implement setitem on DataTable to create/overwrite an existing DataColumn (GH#165)
Add
to_pandas
method to DataColumn to access the underlying series (GH#169)Add list_logical_types function and CLI to get dataframe of woodwork LogicalTypes (GH#172)
Add
describe
method to DataTable to generate statistics for the underlying data (GH#181)Add optional
return_dataframe
parameter toload_retail
to return either DataFrame or DataTable (GH#189)Add
get_mutual_information
method to DataTable to generate mutual information between columns (GH#203)Add
read_csv
function to create DataTable directly from CSV file (GH#222)
- Changes
Remove unnecessary
add_standard_tags
attribute from DataTable (GH#171)Remove standard tags from index column and do not return stats for index column from
DataTable.describe
(GH#196)Update
DataColumn.set_semantic_tags
andDataColumn.add_semantic_tags
to return new objects (GH#205)Update various DataTable methods to return new objects rather than modifying in place (GH#210)
Move datetime_format to Datetime LogicalType (GH#216)
Do not calculate mutual info with index column in
DataTable.get_mutual_information
(GH#221)Move setting of underlying physical types from DataTable to DataColumn (GH#233)
- Documentation Changes
Remove unused code from sphinx conf.py, update with Github URL(GH#160, GH#163)
Update README and docs with new Woodwork logo, with better code snippets (GH#161, GH#159)
Add DataTable and DataColumn to API Reference (GH#162)
Add docstrings to LogicalType classes (GH#168)
Add Woodwork image to index, clear outputs of Jupyter notebook in docs (GH#173)
Update contributing.md, release.md with all instructions (GH#176)
Add section for setting index and time index to start notebook (GH#179)
Rename changelog to Release Notes (GH#193)
Add section for standard tags to start notebook (GH#188)
Add Understanding Types and Tags user guide (GH#201)
Add missing docstring to
list_logical_types
(GH#202)Add Woodwork Global Configuration Options guide (GH#215)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
v0.0.2 Sep 28, 2020#
- Fixes
Fix formatting issue when printing global config variables (GH#138)
- Documentation Changes
Add working code example to README and create Using Woodwork page (GH#103)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd
v0.1.0 Sep 24, 2020#
Add
natural_language_threshold
global config option used for Categorical/NaturalLanguage type inference (GH#135)Add global config options and add
datetime_format
option for type inference (GH#134)Fix bug with Integer and WholeNumber inference in column with
pd.NA
values (GH#133)Add
DataTable.ltypes
property to return series of logical types (GH#131)Add ability to create new datatable from specified columns with
dt[[columns]]
(GH#127)Handle setting and tagging of index and time index columns (GH#125)
Add combined tag and ltype selection (GH#124)
Add changelog, and update changelog check to CI (GH#123)
Implement
reset_semantic_tags
(GH#118)Implement DataTable getitem (GH#119)
Add
remove_semantic_tags
method (GH#117)Add semantic tag selection (GH#106)
Add github action, rename to woodwork (GH#113)
Add license to setup.py (GH#112)
Reset semantic tags on logical type change (GH#107)
Add standard numeric and category tags (GH#100)
Change
semantic_types
tosemantic_tags
, a set of strings (GH#100)Update dataframe dtypes based on logical types (GH#94)
Add
select_logical_types
to DataTable (GH#96)Add pygments to dev-requirements.txt (GH#97)
Add replacing None with np.nan in DataTable init (GH#87)
Refactor DataColumn to make
semantic_types
andlogical_type
private (GH#86)Add pandas_dtype to each Logical Type, and remove dtype attribute on DataColumn (GH#85)
Add set_semantic_types methods on both DataTable and DataColumn (GH#75)
Support passing camel case or snake case strings for setting logical types (GH#74)
Improve flexibility when setting semantic types (GH#72)
Add Whole Number Inference of Logical Types (GH#66)
Add
dtypes
property to DataTables andrepr
for DataColumn (GH#61)Allow specification of semantic types during DataTable creation (GH#69)
Implements
set_logical_types
on DataTable (GH#65)Add init files to tests to fix code coverage (GH#60)
Add AutoAssign bot (GH#59)
Add logical types validation in DataTables (GH#49)
Fix working_directory in CI (GH#57)
Add
infer_logical_types
for DataColumn (GH#45)Fix ReadME library name, and code coverage badge (GH#56, GH#56)
Add code coverage (GH#51)
Improve and refactor the validation checks during initialization of a DataTable (GH#40)
Add dataframe attribute to DataTable (GH#39)
Update ReadME with minor usage details (GH#37)
Add License (GH#34)
Rename from datatables to datatables (GH#4)
Add Logical Types, DataTable, DataColumn (GH#3)
Add Makefile, setup.py, requirements.txt (GH#2)
Initial Release (GH#1)
Thanks to the following people for contributing to this release: @gsheni, @tamargrey, @thehomebrewnerd