Customizing Logical Types and Type Inference

The default type system in Woodwork contains many built-in LogicalTypes that work for a wide variety of datasets. For situations in which the built-in LogicalTypes are not sufficient, Woodwork allows you to create custom LogicalTypes.

Woodwork also has a set of standard type inference functions that can help in automatically identifying correct LogicalTypes in the data. Woodwork also allows you to override these existing functions, or add new functions for inferring any custom LogicalTypes that are added.

This guide provides an overview of how to create custom LogicalTypes as well as how to override and add new type inference functions. If you need to learn more about types and tags in Woodwork, refer to the Understanding Types and Tags guide for more detail.

Viewing Built-In Logical Types

To view all of the default LogicalTypes in Woodwork, use the list_logical_types function. If the existing types are not sufficient for your needs, you can create and register new LogicalTypes for use with Woodwork initialized DataFrames and Series.

[1]:
import woodwork as ww

ww.list_logical_types()
[1]:
name type_string description physical_type standard_tags is_default_type is_registered parent_type
0 Address address Represents Logical Types that contain address ... string {} True True NaturalLanguage
1 Age age Represents Logical Types that contain non-nega... int64 {numeric} True True Integer
2 AgeNullable age_nullable Represents Logical Types that contain non-nega... Int64 {numeric} True True IntegerNullable
3 Boolean boolean Represents Logical Types that contain binary v... bool {} True True BooleanNullable
4 BooleanNullable boolean_nullable Represents Logical Types that contain binary v... boolean {} True True None
5 Categorical categorical Represents Logical Types that contain unordere... category {category} True True None
6 CountryCode country_code Represents Logical Types that contain categori... category {category} True True Categorical
7 Datetime datetime Represents Logical Types that contain date and... datetime64[ns] {} True True None
8 Double double Represents Logical Types that contain positive... float64 {numeric} True True None
9 EmailAddress email_address Represents Logical Types that contain email ad... string {} True True NaturalLanguage
10 Filepath filepath Represents Logical Types that specify location... string {} True True NaturalLanguage
11 IPAddress ip_address Represents Logical Types that contain IP addre... string {} True True NaturalLanguage
12 Integer integer Represents Logical Types that contain positive... int64 {numeric} True True IntegerNullable
13 IntegerNullable integer_nullable Represents Logical Types that contain positive... Int64 {numeric} True True None
14 LatLong lat_long Represents Logical Types that contain latitude... object {} True True None
15 NaturalLanguage natural_language Represents Logical Types that contain text or ... string {} True True None
16 Ordinal ordinal Represents Logical Types that contain ordered ... category {category} True True Categorical
17 PersonFullName person_full_name Represents Logical Types that may contain firs... string {} True True NaturalLanguage
18 PhoneNumber phone_number Represents Logical Types that contain numeric ... string {} True True NaturalLanguage
19 PostalCode postal_code Represents Logical Types that contain a series... category {category} True True Categorical
20 SubRegionCode sub_region_code Represents Logical Types that contain codes re... category {category} True True Categorical
21 Timedelta timedelta Represents Logical Types that contain values s... timedelta64[ns] {} True True None
22 URL url Represents Logical Types that contain URLs, wh... string {} True True NaturalLanguage

Registering a New LogicalType

The first step in registering a new LogicalType is to define the class for the new type. This is done by sub-classing the built-in LogicalType class. There are a few class attributes that should be set when defining this new class. Each is reviewed in more detail below.

For this example, you will work through an example for a dataset that contains UPC Codes. First create a new UPCCode LogicalType. For this example, consider the UPC Code to be a type of categorical variable.

[2]:
from woodwork.logical_types import LogicalType

class UPCCode(LogicalType):
    """Represents Logical Types that contain 12-digit UPC Codes."""
    primary_dtype = 'category'
    backup_dtype = 'string'
    standard_tags = {'category', 'upc_code'}
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

When defining the UPCCode LogicalType class, three class attributes were set. All three of these attributes are optional, and will default to the values defined on the LogicalType class if they are not set when defining the new type.

  • primary_dtype: This value specifies how the data will be stored. If the column of the dataframe is not already of this type, Woodwork will convert the data to this dtype. This should be specified as a string that represents a valid pandas dtype. If not specified, this will default to 'string'.

  • backup_dtype: This is primarily useful when working with Koalas dataframes. backup_dtype specifies the dtype to use if Woodwork is unable to convert to the dtype specified by primary_dtype. In our example, we set this to 'string' since Koalas does not currently support the 'category' dtype.

  • standard_tags: This is a set of semantic tags to apply to any column that is set with the specified LogicalType. If not specified, standard_tags will default to an empty set.

  • docstring: Adding a docstring for the class is optional, but if specified, this text will be used for adding a description of the type in the list of available types returned by ww.list_logical_types().

Note

Behind the scenes, Woodwork uses the category and numeric semantic tags to determine whether a column is categorical or numeric column, respectively. If the new LogicalType you define represents a categorical or numeric type, you should include the appropriate tag in the set of tags specified for standard_tags.

Now that you have created the new LogicalType, you can register it with the Woodwork type system so you can use it. All modifications to the type system are performed by calling the appropriate method on the ww.type_system object.

[3]:
ww.type_system.add_type(UPCCode, parent='Categorical')
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

If you once again list the available LogicalTypes, you will see the new type you created was added to the list, including the values for description, physical_type and standard_tags specified when defining the UPCCode LogicalType.

[4]:
ww.list_logical_types()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[4]:
name type_string description physical_type standard_tags is_default_type is_registered parent_type
0 Address address Represents Logical Types that contain address ... string {} True True NaturalLanguage
1 Age age Represents Logical Types that contain non-nega... int64 {numeric} True True Integer
2 AgeNullable age_nullable Represents Logical Types that contain non-nega... Int64 {numeric} True True IntegerNullable
3 Boolean boolean Represents Logical Types that contain binary v... bool {} True True BooleanNullable
4 BooleanNullable boolean_nullable Represents Logical Types that contain binary v... boolean {} True True None
5 Categorical categorical Represents Logical Types that contain unordere... category {category} True True None
6 CountryCode country_code Represents Logical Types that contain categori... category {category} True True Categorical
7 Datetime datetime Represents Logical Types that contain date and... datetime64[ns] {} True True None
8 Double double Represents Logical Types that contain positive... float64 {numeric} True True None
9 EmailAddress email_address Represents Logical Types that contain email ad... string {} True True NaturalLanguage
10 Filepath filepath Represents Logical Types that specify location... string {} True True NaturalLanguage
11 IPAddress ip_address Represents Logical Types that contain IP addre... string {} True True NaturalLanguage
12 Integer integer Represents Logical Types that contain positive... int64 {numeric} True True IntegerNullable
13 IntegerNullable integer_nullable Represents Logical Types that contain positive... Int64 {numeric} True True None
14 LatLong lat_long Represents Logical Types that contain latitude... object {} True True None
15 NaturalLanguage natural_language Represents Logical Types that contain text or ... string {} True True None
16 Ordinal ordinal Represents Logical Types that contain ordered ... category {category} True True Categorical
17 PersonFullName person_full_name Represents Logical Types that may contain firs... string {} True True NaturalLanguage
18 PhoneNumber phone_number Represents Logical Types that contain numeric ... string {} True True NaturalLanguage
19 PostalCode postal_code Represents Logical Types that contain a series... category {category} True True Categorical
20 SubRegionCode sub_region_code Represents Logical Types that contain codes re... category {category} True True Categorical
21 Timedelta timedelta Represents Logical Types that contain values s... timedelta64[ns] {} True True None
22 UPCCode upc_code Represents Logical Types that contain 12-digit... category {upc_code, category} False True Categorical
23 URL url Represents Logical Types that contain URLs, wh... string {} True True NaturalLanguage

Logical Type Relationships

When adding a new type to the type system, you can specify an optional parent LogicalType as done above. When performing type inference a given set of data might match multiple different LogicalTypes. Woodwork uses the parent-child relationship defined when registering a type to determine which type to infer in this case.

When multiple matches are found, Woodwork will return the most specific type match found. By setting the parent type to Categorical when registering the UPCCode LogicalType, you are telling Woodwork that if a data column matches both Categorical and UPCCode during inference, the column should be considered as UPCCode as this is more specific than Categorical. Woodwork always assumes that a child type is a more specific version of the parent type.

Working with Custom LogicalTypes

Next, you will create a small sample DataFrame to demonstrate use of the new custom type. This sample DataFrame includes an id column, a column with valid UPC Codes, and a column that should not be considered UPC Codes because it contains non-numeric values.

[5]:
import pandas as pd
df = pd.DataFrame({
    'id': [0, 1, 2, 3],
    'code': ['012345412359', '122345712358', '012345412359', '022323413459'],
    'not_upc': ['abcdefghijkl', '122345712358', '012345412359', '022323413459']
})
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

Before using this dataframe, update Woodwork’s default threshold for differentiating between a NaturalLanguage and Categorical column so that Woodwork will correctly recognize the code column as a Categorical column. After setting the threshold, initialize Woodwork and verify that Woodwork has identified our column as Categorical.

[6]:
ww.config.set_option('natural_language_threshold', 12)
df.ww.init()
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[6]:
Physical Type Logical Type Semantic Tag(s)
Column
id int64 Integer ['numeric']
code category Categorical ['category']
not_upc category Categorical ['category']

The reason Woodwork did not identify the code column to have a UPCCode LogicalType, is that you have not yet defined an inference function to use with this type. The inference function is what tells Woodwork how to match columns to specific LogicalTypes.

Even without the inference function, you can manually tell Woodwork that the code column should be of type UPCCode. This will set the physical type properly and apply the standard semantic tags you have defined

[7]:
df.ww.init(logical_types = {'code': 'UPCCode'})
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[7]:
Physical Type Logical Type Semantic Tag(s)
Column
id int64 Integer ['numeric']
code category UPCCode ['upc_code', 'category']
not_upc category Categorical ['category']

Next, add a new inference function and allow Woodwork to automatically set the correct type for the code column.

Defining Custom Inference Functions

The first step in adding an inference function for the UPCCode LogicalType is to define an appropriate function. Inference functions always accept a single parameter, a pandas.Series. The function should return True if the series is a match for the LogicalType for which the function is associated, or False if the series is not a match.

For the UPCCode LogicalType, define a function to check that all of the values in a column are 12 character strings that contain only numbers. Note, this function is for demonstration purposes only and may not catch all cases that need to be considered for properly identifying a UPC Code.

[8]:
def infer_upc_code(series):
    # Make sure series contains only strings:
    if not series.apply(type).eq(str).all():
        return False
    # Check that all items are 12 characters long
    if all(series.str.len() == 12):
        # Try to convert to a number
        try:
            series.astype('int')
            return True
        except:
            return False
    return False
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

After defining the new UPC Code inference function, add it to the Woodwork type system so it can be used when inferring column types.

[9]:
ww.type_system.update_inference_function('UPCCode', inference_function=infer_upc_code)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

After updating the inference function, you can reinitialize Woodwork on the DataFarme. Notice that Woodwork has correctly identified the code column to have a LogicalType of UPCCode and has correctly set the physical type and added the standard tags to the semantic tags for that column.

Also note that the not_upc column was identified as Categorical. Even though this column contains 12-digit strings, some of the values contain letters, and our inference function correctly told Woodwork this was not valid for the UPCCode LogicalType.

[10]:
df.ww.init()
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[10]:
Physical Type Logical Type Semantic Tag(s)
Column
id int64 Integer ['numeric']
code category UPCCode ['upc_code', 'category']
not_upc category Categorical ['category']

Overriding Default Inference Functions

Overriding the default inference functions is done with the update_inference_function TypeSystem method. Simply pass in the LogicalType for which you want to override the function, along with the new function to use.

For example you can tell Woodwork to use the new infer_upc_code function for the built in Categorical LogicalType.

[11]:
ww.type_system.update_inference_function('Categorical', inference_function=infer_upc_code)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

If you initialize Woodwork on a DataFrame after updating the Categorical function, you can see that the not_upc column is no longer identified as a Categorical column, but is rather set to the default NaturalLanguage LogicalType. This is because the letters in the first row of the not_upc column cause our inference function to return False for this column, while the default Categorical function will allow non-numeric values to be present. After updating the inference function, this column is no longer considered a match for the Categorical type, nor does the column match any other logical types. As a result, the LogicalType is set to NaturalLanguage, the default type used when no type matches are found.

[12]:
df.ww.init()
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[12]:
Physical Type Logical Type Semantic Tag(s)
Column
id int64 Integer ['numeric']
code category UPCCode ['upc_code', 'category']
not_upc string NaturalLanguage []

Updating LogicalType Relationships

If you need to change the parent for a registered LogicalType, you can do this with the update_relationship method. Update the new UPCCode LogicalType to be a child of NaturalLanguage instead.

[13]:
ww.type_system.update_relationship('UPCCode', parent='NaturalLanguage')
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

The parent for a logical type can also be set to None to indicate this is a root-level LogicalType that is not a child of any other existing LogicalType.

[14]:
ww.type_system.update_relationship('UPCCode', parent=None)
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

Setting the proper parent-child relationships between logical types is important. Because Woodwork will return the most specific LogicalType match found during inference, improper inference can occur if the relationships are not set correctly.

As an example, if you initialize Woodwork after setting the UPCCode LogicalType to have a parent of None, you will now see that the UPC Code column is inferred as Categorical instead of UPCCode. After setting the parent to None, UPCCode and Categorical are now siblings in the relationship graph instead of having a parent-child relationship as they did previously. When Woodwork finds multiple matches on the same level in the relationship graph, the first match is returned, which in this case is Categorical. Without proper parent-child relationships set, Woodwork is unable to determine which LogicalType is most specific.

[15]:
df.ww.init()
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[15]:
Physical Type Logical Type Semantic Tag(s)
Column
id int64 Integer ['numeric']
code category Categorical ['category']
not_upc string NaturalLanguage []

Removing a LogicalType

If a LogicalType is no longer needed, or is unwanted, it can be removed from the type system with the remove_type method. When a LogicalType has been removed, a value of False will be present in the is_registered column for the type. If a LogicalType that has children is removed, all of the children types will have their parent set to the parent of the LogicalType that is being removed, assuming a parent was defined.

Remove the custom UPCCode type and confirm it has been removed from the type system by listing the available LogicalTypes. You can confirm that the UPCCode type will no longer be used because it will have a value of False listed in the is_registered column.

[16]:
ww.type_system.remove_type('UPCCode')
ww.list_logical_types()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[16]:
name type_string description physical_type standard_tags is_default_type is_registered parent_type
0 Address address Represents Logical Types that contain address ... string {} True True NaturalLanguage
1 Age age Represents Logical Types that contain non-nega... int64 {numeric} True True Integer
2 AgeNullable age_nullable Represents Logical Types that contain non-nega... Int64 {numeric} True True IntegerNullable
3 Boolean boolean Represents Logical Types that contain binary v... bool {} True True BooleanNullable
4 BooleanNullable boolean_nullable Represents Logical Types that contain binary v... boolean {} True True None
5 Categorical categorical Represents Logical Types that contain unordere... category {category} True True None
6 CountryCode country_code Represents Logical Types that contain categori... category {category} True True Categorical
7 Datetime datetime Represents Logical Types that contain date and... datetime64[ns] {} True True None
8 Double double Represents Logical Types that contain positive... float64 {numeric} True True None
9 EmailAddress email_address Represents Logical Types that contain email ad... string {} True True NaturalLanguage
10 Filepath filepath Represents Logical Types that specify location... string {} True True NaturalLanguage
11 IPAddress ip_address Represents Logical Types that contain IP addre... string {} True True NaturalLanguage
12 Integer integer Represents Logical Types that contain positive... int64 {numeric} True True IntegerNullable
13 IntegerNullable integer_nullable Represents Logical Types that contain positive... Int64 {numeric} True True None
14 LatLong lat_long Represents Logical Types that contain latitude... object {} True True None
15 NaturalLanguage natural_language Represents Logical Types that contain text or ... string {} True True None
16 Ordinal ordinal Represents Logical Types that contain ordered ... category {category} True True Categorical
17 PersonFullName person_full_name Represents Logical Types that may contain firs... string {} True True NaturalLanguage
18 PhoneNumber phone_number Represents Logical Types that contain numeric ... string {} True True NaturalLanguage
19 PostalCode postal_code Represents Logical Types that contain a series... category {category} True True Categorical
20 SubRegionCode sub_region_code Represents Logical Types that contain codes re... category {category} True True Categorical
21 Timedelta timedelta Represents Logical Types that contain values s... timedelta64[ns] {} True True None
22 UPCCode upc_code Represents Logical Types that contain 12-digit... category {upc_code, category} False False None
23 URL url Represents Logical Types that contain URLs, wh... string {} True True NaturalLanguage

Resetting Type System to Defaults

Finally, if you made multiple changes to the default Woodwork type system and would like to reset everything back to the default state, you can use the reset_defaults method as shown below. This unregisters any new types you have registered, resets all relationships to their default values and sets all inference functions back to their default functions.

[17]:
ww.type_system.reset_defaults()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)

Overriding Default LogicalTypes

There may be times when you would like to override Woodwork’s default LogicalTypes. An example might be if you wanted to use the nullable Int64 dtype for the Integer LogicalType instead of the default dtype of int64. In this case, you want to stop Woodwork from inferring the default Integer LogicalType and have a compatible Logical Type inferred instead. You may solve this issue in one of two ways.

First, you can create an entirely new LogicalType with its own name, MyInteger, and register it in the TypeSystem. If you want to infer it in place of the normal Integer LogicalType, you would remove Integer from the type system, and use Integer’s default inference function for MyInteger. Doing this will make it such that MyInteger will get inferred any place that Integer would have previously. Note, that because Integer has a parent LogicalType of IntegerNullable, you also need to set the parent of MyInteger to be IntegerNullable when registering with the type system.

[18]:
from woodwork.logical_types import LogicalType

class MyInteger(LogicalType):
    primary_dtype = 'Int64'
    standard_tags = {'numeric'}
int_inference_fn = ww.type_system.inference_functions[ww.logical_types.Integer]

ww.type_system.remove_type(ww.logical_types.Integer)
ww.type_system.add_type(MyInteger, int_inference_fn, parent='IntegerNullable')

df.ww.init()
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
[18]:
Physical Type Logical Type Semantic Tag(s)
Column
id Int64 MyInteger ['numeric']
code category Categorical ['category']
not_upc category Categorical ['category']

Above, you can see that the id column, which was previously inferred as Integer is now inferred as MyInteger with the Int64 physical type. In the full list of Logical Types at ww.list_logical_types(), Integer and MyInteger will now both be present, but Integer’s is_registered will be False while the value for is_registered for MyInteger will be set to True.

The second option for overriding the default Logical Types allows you to create a new LogicalType with the same name as an existing one. This might be desirable because it will allow Woodwork to interpret the string 'Integer' as your new LogicalType, allowing previous code that might have selected 'Integer' to be used without updating references to a new LogicalType like MyInteger.

Before adding a LogicalType whose name already exists into the TypeSystem, you must first unregister the default LogicalType.

In order to avoid using the same name space locally between Integer LogicalTypes, it is recommended to reference Woodwork’s default LogicalType as ww.logical_types.Integer.

[19]:
ww.type_system.reset_defaults()
class Integer(LogicalType):
    primary_dtype = 'Int64'
    standard_tags = {'numeric'}
int_inference_fn = ww.type_system.inference_functions[ww.logical_types.Integer]

ww.type_system.remove_type(ww.logical_types.Integer)
ww.type_system.add_type(Integer, int_inference_fn, parent='IntegerNullable')

df.ww.init()
display(df.ww)
ww.type_system.reset_defaults()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/stable/lib/python3.7/site-packages/ipykernel/ipkernel.py:287: DeprecationWarning: `should_run_async` will not call `transform_cell` automatically in the future. Please pass the result to `transformed_cell` argument and any exception that happen during thetransform in `preprocessing_exc_tuple` in IPython 7.17 and above.
  and should_run_async(code)
Physical Type Logical Type Semantic Tag(s)
Column
id Int64 Integer ['numeric']
code category Categorical ['category']
not_upc category Categorical ['category']

Notice how id now gets inferred as an Integer Logical Type that has Int64 as its Physical Type!