woodwork.table_accessor.WoodworkTableAccessor.describe#

WoodworkTableAccessor.describe(include: Optional[Sequence[Union[str, LogicalType]]] = None, callback: Optional[Callable[[int, int, int, str, float], Any]] = None, results_callback: Optional[Callable[[DataFrame, Series], Any]] = None) DataFrame[source]#

Calculates statistics for data contained in the DataFrame.

Parameters:
  • include (list[str or LogicalType], optional) – filter for what columns to include in the statistics returned. Can be a list of column names, semantic tags, logical types, or a list combining any of the three. It follows the most broad specification. Favors logical types then semantic tag then column name. If no matching columns are found, an empty DataFrame will be returned.

  • callback (callable, optional) –

    Function to be called with incremental updates. Has the following parameters:

    • update (int): change in progress since last call

    • progress (int): the progress so far in the calculations

    • total (int): the total number of calculations to do

    • unit (str): unit of measurement for progress/total

    • time_elapsed (float): total time in seconds elapsed since start of call

  • results_callback (callable, optional) –

    function to be called with intermediate results. Has the following parameters:

    • results_so_far (pd.DataFrame): the full dataframe calculated so far

    • most_recent_calculation (pd.Series): the calculations for the most recent column

Returns:

A Dataframe containing statistics for the data or the subset of the original DataFrame that contains the logical types, semantic tags, or column names specified in include.

Return type:

pd.DataFrame