Woodwork contains global configuration options that you can use to control the behavior of certain aspects of Woodwork. This guide provides an overview of working with those options, including viewing the current settings and updating the config values.
To demonstrate how to display the current configuration options, follow along.
After you’ve imported Woodwork, you can view the options with ww.config as shown below.
ww.config
[1]:
import woodwork as ww ww.config
Woodwork Global Config Settings ------------------------------- categorical_threshold: 10 numeric_categorical_threshold: -1 email_inference_regex: (^[a-zA-Z0-9_.+-][email protected][a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)
The output of ww.config lists each of the available config variables followed by it’s current setting. In the output above, the categorical_threshold config variable has been set to 10 and the numeric_categorical_threshold has been set to -1.
categorical_threshold
10
numeric_categorical_threshold
-1
Updating a config variable is done simply with a call to the ww.config.set_option function. This function requires two arguments: the name of the config variable to update and the new value to set.
ww.config.set_option
As an example, update the categorical_threshold config variable to have a value of 25 instead of the default value of 10.
25
[2]:
ww.config.set_option('categorical_threshold', 25) ww.config
Woodwork Global Config Settings ------------------------------- categorical_threshold: 25 numeric_categorical_threshold: -1 email_inference_regex: (^[a-zA-Z0-9_.+-][email protected][a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)
As you can see from the output above, the value for the categorical_threshold config variable has been updated to 25.
Settings can also be temporarily updated in the context of a with block by using ww.config.with_options:
ww.config.with_options
[3]:
with ww.config.with_options(categorical_threshold=35): # Do something print("Temporary settings:\n") print(repr(ww.config), "\n") print("Restored settings:\n") print(repr(ww.config))
Temporary settings: Woodwork Global Config Settings ------------------------------- categorical_threshold: 35 numeric_categorical_threshold: -1 email_inference_regex: (^[a-zA-Z0-9_.+-][email protected][a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$) Restored settings: Woodwork Global Config Settings ------------------------------- categorical_threshold: 25 numeric_categorical_threshold: -1 email_inference_regex: (^[a-zA-Z0-9_.+-][email protected][a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)
If you need access to the value that is set for a specific config variable you can access it with the ww.config.get_option function, passing in the name of the config variable for which you want the value.
ww.config.get_option
[4]:
ww.config.get_option('categorical_threshold')
Config variables can be reset to their default values using the ww.config.reset_option function, passing in the name of the variable to reset.
ww.config.reset_option
As an example, reset the categorical_threshold config variable to its default value.
[5]:
ww.config.reset_option('categorical_threshold') ww.config
This section provides an overview of the current config options that can be set within Woodwork.
The categorical_threshold config variable helps control the distinction between Categorical and Unknown logical types during type inference. More specifically, this threshold represents the average string length that is used to distinguish between these two types. If the average string length in a column is greater than this threshold, the column is inferred as a Unknown column; otherwise, it is inferred as a Categorical column. The categorical_threshold config variable defaults to 10.
Categorical
Unknown
Woodwork provides the option to infer numeric columns as the Categorical logical type if they have few enough unique values. The numeric_categorical_threshold config variable allows users to set the threshold of unique values below which numeric columns are inferred as categorical. The default threshold is -1, meaning that numeric columns are not inferred to be categorical by default (because the fewest number of unique values a column can have is zero).
Woodwork provides the option to infer string columns as the EmailAddress logical type if a representative sample of valid (non-missing) rows all match a given regular expression. The email_inference_regex config variable allows users to set the regular expression that is used during this matching process. The default regex is r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)" (taken from https://emailregex.com/).
EmailAddress
email_inference_regex
r"(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)"