woodwork.table_accessor.WoodworkTableAccessor.spearman_correlation_dict#

WoodworkTableAccessor.spearman_correlation_dict(nrows=None, include_index=False, include_time_index=False, callback=None, extra_stats=False, min_shared=25, random_seed=0)[source]#

Calculates Spearman correlation coefficient between all pairs of columns in the DataFrame that support correlation. Works with numeric, ordinal, and datetime data. Call woodwork.utils.get_valid_spearman_types to see which Logical Types are supported.

Parameters:
  • nrows (int) – The number of rows to sample for when determining correlation. If specified, samples the desired number of rows from the data. Defaults to using all rows.

  • include_index (bool) – If True, the column specified as the index will be included as long as its LogicalType is valid for correlation calculations. If False, the index column will not have the Spearman correlation calculated for it. Defaults to False.

  • include_time_index (bool) – If True, the column specified as the time index will be included for correlation calculations. If False, the time index column will not have the Spearman correlation calculated for it. Defaults to False.

  • callback (callable, optional) – Function to be called with incremental updates. Has the following parameters: - update (int): change in progress since last call - progress (int): the progress so far in the calculations - total (int): the total number of calculations to do - unit (str): unit of measurement for progress/total - time_elapsed (float): total time in seconds elapsed since start of call

  • extra_stats (bool) – If True, additional column “shared_rows” recording the number of shared non-null rows for a column pair will be included with the dataframe. Defaults to False.

  • min_shared (int) – The number of shared non-null rows needed to calculate. Less rows than this will be considered too sparse to measure accurately and will return a NaN value. Must be non-negative. Defaults to 25.

  • random_seed (int) – Seed for the random number generator. Defaults to 0.

Returns:

A list containing dictionaries that have keys column_1, column_2, and spearman that is sorted in decending order by correlation coefficient. Correlation coefficient values are between -1 and 1.

Return type:

list(dict)