Working with Types and Tags#
Using Woodwork effectively requires a good understanding of physical types, logical types, and semantic tags, all concepts that are core to Woodwork. This guide provides a detailed overview of types and tags, as well as how to work with them.
Definitions of Types and Tags#
Woodwork has been designed to allow users to easily specify additional typing information for a DataFrame while providing the ability to interface with the data based on the typing information. Because a single DataFrame might store various types of data like numbers, text, or dates in different columns, the additional information is defined on a per-column basis.
There are 3 main ways that Woodwork stores additional information about user data:
Physical Type: defines how the data is stored on disk or in memory.
Logical Type: defines how the data should be parsed or interpreted.
Semantic Tag(s): provides additional data about the meaning of the data or how it should be used.
Physical Types#
Physical types define how the data is stored on disk or in memory. You might also see the physical type for a column referred to as the column’s dtype
.
For example, typical Pandas dtypes often used include object
, int64
, float64
and datetime64[ns]
, though there are many more. In Woodwork, there are 10 different physical types that are used, each corresponding to a Pandas dtype. When Woodwork is initialized on a DataFrame, the dtype of the underlying data is converted to one of these values, if it isn’t already one of these types:
bool
boolean
category
datetime64[ns]
float64
int64
Int64
object
string
timedelta64[ns]
The physical type conversion is done based on the LogicalType
that has been specified or inferred for a given column.
Logical Types#
Logical types define how data should be interpreted or parsed. Logical types provide an additional level of detail beyond the physical type. Some columns might share the same physical type, but might have different parsing requirements depending on the information that is stored in the column.
For example, email addresses and phone numbers would typically both be stored in a data column with a physical type of string
. However, when reading and validating these two types of information, different rules apply. For email addresses, the presence of the @
symbol is important. For phone numbers, you might want to confirm that only a certain number of digits are present, and special characters might be restricted to +
, -
, (
or )
. In this particular example Woodwork
defines two different logical types to separate these parsing needs: EmailAddress
and PhoneNumber
.
There are many different logical types defined within Woodwork. To get a complete list of all the available logical types, you can use the list_logical_types
function.
[1]:
from woodwork import list_logical_types
list_logical_types()
[1]:
name | type_string | description | physical_type | standard_tags | is_default_type | is_registered | parent_type | |
---|---|---|---|---|---|---|---|---|
0 | Address | address | Represents Logical Types that contain address ... | string | {} | True | True | None |
1 | Age | age | Represents Logical Types that contain whole nu... | int64 | {numeric} | True | True | Integer |
2 | AgeFractional | age_fractional | Represents Logical Types that contain non-nega... | float64 | {numeric} | True | True | Double |
3 | AgeNullable | age_nullable | Represents Logical Types that contain whole nu... | Int64 | {numeric} | True | True | IntegerNullable |
4 | Boolean | boolean | Represents Logical Types that contain binary v... | bool | {} | True | True | BooleanNullable |
5 | BooleanNullable | boolean_nullable | Represents Logical Types that contain binary v... | boolean | {} | True | True | None |
6 | Categorical | categorical | Represents Logical Types that contain unordere... | category | {category} | True | True | None |
7 | CountryCode | country_code | Represents Logical Types that use the ISO-3166... | category | {category} | True | True | Categorical |
8 | CurrencyCode | currency_code | Represents Logical Types that use the ISO-4217... | category | {category} | True | True | Categorical |
9 | Datetime | datetime | Represents Logical Types that contain date and... | datetime64[ns] | {} | True | True | None |
10 | Double | double | Represents Logical Types that contain positive... | float64 | {numeric} | True | True | None |
11 | EmailAddress | email_address | Represents Logical Types that contain email ad... | string | {} | True | True | Unknown |
12 | Filepath | filepath | Represents Logical Types that specify location... | string | {} | True | True | None |
13 | IPAddress | ip_address | Represents Logical Types that contain IP addre... | string | {} | True | True | Unknown |
14 | Integer | integer | Represents Logical Types that contain positive... | int64 | {numeric} | True | True | IntegerNullable |
15 | IntegerNullable | integer_nullable | Represents Logical Types that contain positive... | Int64 | {numeric} | True | True | None |
16 | LatLong | lat_long | Represents Logical Types that contain latitude... | object | {} | True | True | None |
17 | NaturalLanguage | natural_language | Represents Logical Types that contain text or ... | string | {} | True | True | None |
18 | Ordinal | ordinal | Represents Logical Types that contain ordered ... | category | {category} | True | True | Categorical |
19 | PersonFullName | person_full_name | Represents Logical Types that may contain firs... | string | {} | True | True | None |
20 | PhoneNumber | phone_number | Represents Logical Types that contain numeric ... | string | {} | True | True | Unknown |
21 | PostalCode | postal_code | Represents Logical Types that contain a series... | category | {category} | True | True | Categorical |
22 | SubRegionCode | sub_region_code | Represents Logical Types that use the ISO-3166... | category | {category} | True | True | Categorical |
23 | Timedelta | timedelta | Represents Logical Types that contain values s... | timedelta64[ns] | {} | True | True | Unknown |
24 | URL | url | Represents Logical Types that contain URLs, wh... | string | {} | True | True | Unknown |
25 | Unknown | unknown | Represents Logical Types that cannot be inferr... | string | {} | True | True | None |
In the table, notice that each logical type has a specific physical_type
value associated with it. Any time a logical type is set for a column, the physical type of the underlying data is converted to the type shown in the physical_type
column. There is only one physical type associated with each logical type.
Semantic Tags#
Semantic tags provide more context about the meaning of a data column. This could directly affect how the information contained in the column is interpreted. Unlike physical types and logical types, semantic tags are much less restrictive. A column might contain many semantic tags or none at all. Regardless, when assigning semantic tags, users should take care to not assign tags that have conflicting meanings.
As an example of how semantic tags can be useful, consider a dataset with 2 date columns: a signup date and a user birth date. Both of these columns have the same physical type (datetime64[ns]
), and both have the same logical type (Datetime
). However, semantic tags can be used to differentiate these columns. For example, you might want to add the date_of_birth
semantic tag to the user birth date column to indicate this column has special meaning and could be used to compute a user’s
age. Computing an age from the signup date column would not make sense, so the semantic tag can be used to differentiate between what the dates in these columns mean.
Standard Semantic Tags#
As you can see from the table generated with the list_logical_types
function above, Woodwork has some standard tags that are applied to certain columns by default. Woodwork adds a standard set of semantic tags to columns with LogicalTypes that fall under certain predefined categories.
The standard tags are as follows:
'numeric'
- The tag applied to numeric Logical Types.Age
AgeFractional
AgeNullable
Integer
IntegerNullable
Double
'category'
- The tag applied to Logical Types that represent categorical variables.Categorical
CountryCode
Ordinal
PostalCode
SubRegionCode
There are also 2 tags that get added to index columns. If no index columns have been specified, these tags are not present:
'index'
- on the index column, when specified'time_index'
on the time index column, when specified
The application of standard tags, excluding the index
and time_index
tags, which have special meaning, can be controlled by the user. This is discussed in more detail in the Working with Semantic Tags section. There are a few different semantic tags defined within Woodwork. To get a list of the standard, index, and time index tags, you can use the list_semantic_tags
function.
[2]:
from woodwork import list_semantic_tags
list_semantic_tags()
[2]:
name | is_standard_tag | valid_logical_types | |
---|---|---|---|
0 | numeric | True | [Age, AgeFractional, AgeNullable, Double, Inte... |
1 | category | True | [Categorical, CountryCode, CurrencyCode, Ordin... |
2 | index | False | Any LogicalType |
3 | time_index | False | [Datetime, Age, AgeFractional, AgeNullable, Do... |
4 | date_of_birth | False | [Datetime] |
5 | ignore | False | Any LogicalType |
6 | passthrough | False | Any LogicalType |
Working with Logical Types#
When initializing Woodwork, users have the option to specify the logical types for all, some, or none of the columns in the underlying DataFrame. If logical types are defined for all of the columns, these logical types are used directly, provided the data is compatible with the specified logical type. You can’t, for example, use a logical type of Integer
on a column that contains text values that can’t be converted to integers.
If users don’t supply any logical type information during initialization, Woodwork infers the logical types based on the physical type of the column and the information contained in the columns. If the user passes information for some of the columns, the logical types are inferred for any columns not specified.
These scenarios are illustrated in this section. To start, create a simple DataFrame to use for this example.
[3]:
import pandas as pd
import woodwork as ww
df = pd.DataFrame(
{
"integers": [-2, 30, 20],
"bools": [True, False, True],
"names": ["Jane Doe", "Bill Smith", "John Hancock"],
}
)
df
[3]:
integers | bools | names | |
---|---|---|---|
0 | -2 | True | Jane Doe |
1 | 30 | False | Bill Smith |
2 | 20 | True | John Hancock |
Importing Woodwork creates a special namespace on the DataFrame, called ww
, that can be used to initialize and modify Woodwork information for a DataFrame. Now that you’ve created the data to use for the example, you can initialize Woodwork on this DataFrame, assigning logical type values to each of the columns. Then view the types stored for each column by using the DataFrame.ww.types
property.
[4]:
logical_types = {"integers": "Integer", "bools": "Boolean", "names": "PersonFullName"}
df.ww.init(logical_types=logical_types)
df.ww.types
[4]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | [] |
names | string | PersonFullName | [] |
As you can see, the logical types that you specified have been assigned to each of the columns. Now assign only one logical type value, and let Woodwork infer the types for the other columns.
[5]:
logical_types = {"names": "PersonFullName"}
df.ww.init(logical_types=logical_types)
df.ww
[5]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | [] |
names | string | PersonFullName | [] |
With that input, you get the same results. Woodwork used the PersonFullName
logical type you assigned to the names
column and then correctly inferred the logical types for the integers
and bools
columns.
Next, look at what happens if we do not specify any logical types.
[6]:
df.ww.init()
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[6]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | [] |
names | string | Unknown | [] |
In this case, Woodwork correctly inferred type for the integers
and bools
columns, but failed to recognize the names
column should have a logical type of PersonFullName
. In situations like this, Woodwork provides users the ability to change the logical type.
Update the logical type of the names
column to be PersonFullName
.
[7]:
df.ww.set_types(logical_types={"names": "PersonFullName"})
df.ww
[7]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | [] |
names | string | PersonFullName | [] |
If you look carefully at the output, you can see that several things happened to the names
column. First, the correct PersonFullName
logical type has been applied. Second, the physical type of the column has changed from category
to string
to match the standard physical type for the PersonFullName
logical type. Finally, the standard tag of category
that was previously set for the names
column has been removed because it no longer applies.
When setting the LogicalType for a column, the type can be specified by passing a string representing the camel-case name of the LogicalType class as you have done in previous examples. Alternatively, you can pass the class directly instead of a string or the snake-case name of the string. All of these would be valid values to use for setting the PersonFullName Logical type: PersonFullName
, "PersonFullName"
or "person_full_name"
.
Note—in order to use the class name, first you have to import the class.
Working with Semantic Tags#
Woodwork provides several methods for working with semantic types. You can add and remove specific tags, or you can reset the tags to their default values. In this section, you learn how to use those methods.
Standard Tags#
As mentioned above, Woodwork applies default semantic tags to columns by default, based on the logical type that was specified or inferred. If this behavior is undesirable, it can be controlled by setting the parameter use_standard_tags
to False
when initializing Woodwork.
[8]:
df.ww.init(use_standard_tags=False)
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[8]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | [] |
bools | bool | Boolean | [] |
names | string | Unknown | [] |
As can be seen in the output above, when initializing Woodwork with use_standard_tags
set to False
, all semantic tags are empty. The only exception to this is if the index or time index column were set. We discuss that in more detail later on.
Create a new Woodwork DataFrame with the standard tags, and specify some additional user-defined semantic tags during creation.
[9]:
semantic_tags = {"bools": "user_status", "names": "legal_name"}
df.ww.init(semantic_tags=semantic_tags)
df.ww
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/latest/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[9]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | ['user_status'] |
names | string | Unknown | ['legal_name'] |
Woodwork has applied the tags we specified along with any standard tags to the columns in our DataFrame.
After initializing Woodwork, you have changed your mind and decided you don’t like the tag of user_status
that you applied to the bools
column. Now you want to remove it. You can do that with the remove_semantic_tags
method.
[10]:
df.ww.remove_semantic_tags({"bools": "user_status"})
df.ww
[10]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | [] |
names | string | Unknown | ['legal_name'] |
The user_status
tag has been removed.
You can also add multiple tags to a column at once by passing in a list of tags, rather of a single tag. Similarly, multiple tags can be removed at once by passing a list of tags.
[11]:
df.ww.add_semantic_tags({"bools": ["tag1", "tag2"]})
df.ww
[11]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | ['tag2', 'tag1'] |
names | string | Unknown | ['legal_name'] |
[12]:
df.ww.remove_semantic_tags({"bools": ["tag1", "tag2"]})
df.ww
[12]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | [] |
names | string | Unknown | ['legal_name'] |
All tags can be reset to their default values by using the reset_semantic_tags
methods. If use_standard_tags
is True
, the tags are reset to the standard tags. Otherwise, the tags are reset to be empty sets.
[13]:
df.ww.reset_semantic_tags()
df.ww
[13]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
integers | int64 | Integer | ['numeric'] |
bools | bool | Boolean | [] |
names | string | Unknown | [] |
In this case, since you initialized Woodwork with the default behavior of using standard tags, calling reset_semantic_tags
resulted in all of our semantic tags being reset to the standard tags for each column.
Index and Time Index Tags#
When initializing Woodwork, you have the option to specify which column represents the index and which column represents the time index. If these columns are specified, semantic tags of index
and time_index
are applied to the specified columns. Behind the scenes, Woodwork is performing additional validation checks on the columns to make sure they are appropriate. For example, index columns must be unique, and time index columns must contain datetime values or numeric values.
Because of the need for these validation checks, you can’t set the index
or time_index
tags directly on a column. In order to designate a column as the index, the set_index
method should be used. Similarly, in order to set the time index column, the set_time_index
method should be used. Optionally, these can be specified when initializing Woodwork by using the index
or time_index
parameters.
Setting the index#
Create a new sample DataFrame that contains columns that can be used as index and time index columns and initialize Woodwork.
[14]:
df = pd.DataFrame(
{
"index": [0, 1, 2],
"id": [1, 2, 3],
"times": pd.to_datetime(["2020-09-01", "2020-09-02", "2020-09-03"]),
"numbers": [10, 20, 30],
}
)
df.ww.init()
df.ww
[14]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
index | int64 | Integer | ['numeric'] |
id | int64 | Integer | ['numeric'] |
times | datetime64[ns] | Datetime | [] |
numbers | int64 | Integer | ['numeric'] |
Without specifying an index or time index column during initialization, Woodwork has inferred that the index
and id
columns are integers and the numeric semantic tag has been applied. You can now set the index column with the set_index
method.
[15]:
df.ww.set_index("index")
df.ww
[15]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
index | int64 | Integer | ['index'] |
id | int64 | Integer | ['numeric'] |
times | datetime64[ns] | Datetime | [] |
numbers | int64 | Integer | ['numeric'] |
Inspecting the types now reveals that the index
semantic tag has been added to the index
column, and the numeric
standard tag has been removed. You can also check that the index has been set correctly by checking the value of the DataFrame.ww.index
attribute.
[16]:
df.ww.index
[16]:
'index'
If you want to change the index column to be the id
column instead, you can do that with another call to set_index
.
[17]:
df.ww.set_index("id")
df.ww
[17]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
index | int64 | Integer | ['numeric'] |
id | int64 | Integer | ['index'] |
times | datetime64[ns] | Datetime | [] |
numbers | int64 | Integer | ['numeric'] |
The index
tag has been removed from the index
column and added to the id
column. The numeric
standard tag that was originally present on the index
column has been added back.
Setting the time index#
Setting the time index works similarly to setting the index. You can now set the time index with the set_time_index
method.
[18]:
df.ww.set_time_index("times")
df.ww
[18]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
index | int64 | Integer | ['numeric'] |
id | int64 | Integer | ['index'] |
times | datetime64[ns] | Datetime | ['time_index'] |
numbers | int64 | Integer | ['numeric'] |
After calling set_time_index
, the time_index
semantic tag has been added to the semantic tags for times
column.
Validating Woodwork’s Typing Information#
The logical types, physical types, and semantic tags described above make up a DataFrame’s typing information, which will be referred to as its “schema”. For Woodowork to be useful, the schema must be valid with respect to its DataFrame.
[19]:
df.ww.schema
[19]:
Logical Type | Semantic Tag(s) | |
---|---|---|
Column | ||
index | Integer | ['numeric'] |
id | Integer | ['index'] |
times | Datetime | ['time_index'] |
numbers | Integer | ['numeric'] |
The Woodwork schema shown above can be seen reflected in the DataFrame below. Every column present in the schema is present in the DataFrame, the dtypes all match the physical types defined by each column’s LogicalType, and the Woodwork index column is both unique and matches the DataFrame’s underlying index.
[20]:
df
[20]:
index | id | times | numbers | |
---|---|---|---|---|
1 | 0 | 1 | 2020-09-01 | 10 |
2 | 1 | 2 | 2020-09-02 | 20 |
3 | 2 | 3 | 2020-09-03 | 30 |
[21]:
df.dtypes
[21]:
index int64
id int64
times datetime64[ns]
numbers int64
dtype: object
Woodwork defines the elements of a valid schema, and maintaining schema validity requires that the DataFrame follow Woodwork’s type system. For this reason, it is not recommended to perform DataFrame operations directly on the DataFrame; instead, you should go through the ww
namespace. Woodwork will attempt to retain a valid schema for any operations performed through the ww
namespace. If a DataFrame operation called through the ww
namespace invalidates the Woodwork schema defined
for that DataFrame, the typing information will be removed.
Therefore, when performing Woodwork operations, you can be sure that if the schema is present on df.ww.schema
then the schema is valid for that DataFrame.
Defining a Valid Schema#
Given a DataFrame and its Woodwork typing information, the schema will be considered valid if:
All of the columns present in the schema are present on the DataFrame and vice versa
The physical type used by each column’s Logical Type matches the corresponding series’
dtype
If an index is present, the index column is unique [pandas only]
If an index is present, the DataFrame’s underlying index matches the index column exactly [pandas only]
Calling sort_values
on a DataFrame, for example, will not invalidate a DataFrame’s schema, as none of the above properties get broken. In the example below, a new DataFrame is created with the columns sorted in descending order, and it has Woodwork initialized. Looking at the schema, you will see that it’s exactly the same as the schema of the original DataFrame.
[22]:
sorted_df = df.ww.sort_values(["numbers"], ascending=False)
sorted_df
[22]:
index | id | times | numbers | |
---|---|---|---|---|
3 | 2 | 3 | 2020-09-03 | 30 |
2 | 1 | 2 | 2020-09-02 | 20 |
1 | 0 | 1 | 2020-09-01 | 10 |
[23]:
sorted_df.ww
[23]:
Physical Type | Logical Type | Semantic Tag(s) | |
---|---|---|---|
Column | |||
index | int64 | Integer | ['numeric'] |
id | int64 | Integer | ['index'] |
times | datetime64[ns] | Datetime | ['time_index'] |
numbers | int64 | Integer | ['numeric'] |
Conversely, changing a column’s dtype so that it does not match the corresponding physical type by calling astype
on a DataFrame will invalidate the schema, removing it from the DataFrame. The resulting DataFrame will not have Woodwork initialized, and a warning will be raised explaining why the schema was invalidated.
[24]:
astype_df = df.ww.astype({"numbers": "float64"})
astype_df
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-datatables/envs/latest/lib/python3.9/site-packages/woodwork/table_accessor.py:748: TypingInfoMismatchWarning: Operation performed by astype has invalidated the Woodwork typing information:
dtype mismatch for column numbers between DataFrame dtype, float64, and Integer dtype, int64.
Please initialize Woodwork with DataFrame.ww.init
warnings.warn(
[24]:
index | id | times | numbers | |
---|---|---|---|---|
1 | 0 | 1 | 2020-09-01 | 10.0 |
2 | 1 | 2 | 2020-09-02 | 20.0 |
3 | 2 | 3 | 2020-09-03 | 30.0 |
[25]:
assert astype_df.ww.schema is None
Woodwork provides two helper functions that will allow you to check if a schema is valid for a given dataframe. The ww.is_schema_valid
function will return a boolean indicating whether or not the schema is valid for the dataframe.
Check whether the schema from df
is valid for the sorted_df
created above.
[26]:
ww.is_schema_valid(sorted_df, df.ww.schema)
[26]:
True
The function ww.get_invalid_schema_message
can be used to obtain a string message indicating the reason for an invalid schema. If the schema is valid, this function will return None
.
Use the function to determine why the schema from df
is invalid for the astype_df
created above.
[27]:
ww.is_schema_valid(astype_df, df.ww.schema)
[27]:
False
[28]:
ww.get_invalid_schema_message(astype_df, df.ww.schema)
[28]:
'dtype mismatch for column numbers between DataFrame dtype, float64, and Integer dtype, int64'