woodwork.table_accessor.WoodworkTableAccessor.infer_temporal_frequencies

WoodworkTableAccessor.infer_temporal_frequencies(temporal_columns=None, debug=False)[source]
Infers the observation frequency (daily, biweekly, yearly, etc) of each temporal column

in the DataFrame. Temporal columns are ones with the logical type Datetime or Timedelta. Not supported for Dask and Spark DataFrames.

Parameters
  • temporal_columns (list[str], optional) – Columns for which frequencies should be inferred. Must be columns that are present in the DataFrame and are temporal in nature. Defaults to None. If not specified, all temporal columns will have their frequencies inferred.

  • debug (boolean) – A flag used to determine if more information should be returned for each temporal column if no uniform frequency was found.

Returns

A dictionary where each key is a temporal column from the DataFrame, and the

value is its observation frequency represented as a pandas offset alias string (D, M, Y, etc.) or None if no uniform frequency was present in the data.

Return type

(dict)

Note

The pandas util pd.infer_freq, which is used in this method, has the following behaviors:
  • If even one row in a column does not follow the frequency seen in the remaining rows,

    no frequency will be inferred. Example of otherwise daily data that skips one day: ['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-07'].

  • If any NaNs are present in the data, no frequency will be inferred.

  • Pandas will use the largest offset alias available to it, so W will be inferred for weekly data instead of 7D.

    The list of available offset aliases, which include aliases such as B for business day or N for nanosecond, can be found at https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

  • Offset aliases can be combined to create something like 2d1H, which could also be expressed as ‘49H’.

    Pandas’ frequency inference will return the lower common alias, 49H, in situations when it’d otherwise need to combine aliases.

  • Offset strings can contain more information than just the offset alias. For example, a date range

    pd.date_range(start="2020-01-01", freq="w", periods=10) will be inferred to have frequency W-SUN. That string is an offset alias with an anchoring suffix that indicates that the data is not only observed at a weekly frequency, but that all the dates are on Sundays. More anchored offsets can be seen here: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#anchored-offsets

  • Some frequencies that can be defined for a pd.date_range cannot then be re-inferred by pandas’ pd.infer_freq.

    One example of this can be seen when using the business day offset alias B pd.date_range(start="2020-01-01", freq="4b", periods=10), which is a valid freq parameter when building the date range, but is not then inferrable.

The algorithm used to infer frequency on noisy data can configured (see https://woodwork.alteryx.com/en/stable/guides/setting_config_options.html#Available-Config-Settings)