woodwork.table_accessor.WoodworkTableAccessor.mutual_information¶
- WoodworkTableAccessor.mutual_information(num_bins=10, nrows=None, include_index=False, callback=None)[source]¶
Calculates mutual information between all pairs of columns in the DataFrame that support mutual information. Use get_valid_mi_types to see which Logical Types support mutual information: >>> from woodwork.utils import get_valid_mi_types >>> get_valid_mi_types() [Age, AgeFractional, AgeNullable, Boolean, BooleanNullable, Categorical, CountryCode, Datetime, Double, Integer, IntegerNullable, Ordinal, PostalCode, SubRegionCode]
- Parameters
num_bins (int) – Determines number of bins to use for converting numeric features into categorical.
nrows (int) – The number of rows to sample for when determining mutual info. If specified, samples the desired number of rows from the data. Defaults to using all rows.
include_index (bool) – If True, the column specified as the index will be included as long as its LogicalType is valid for mutual information calculations. If False, the index column will not have mutual information calculated for it. Defaults to False.
callback (callable, optional) –
function to be called with incremental updates. Has the following parameters:
update (int): change in progress since last call
progress (int): the progress so far in the calculations
total (int): the total number of calculations to do
unit (str): unit of measurement for progress/total
time_elapsed (float): total time in seconds elapsed since start of call
- Returns
A DataFrame containing mutual information with columns column_1, column_2, and mutual_info that is sorted in decending order by mutual info. Mutual information values are between 0 (no mutual information) and 1 (perfect dependency).
- Return type
pd.DataFrame