woodwork.statistics_utils.infer_frequency

woodwork.statistics_utils.infer_frequency(observed_ts: pandas.core.series.Series, debug=False, window_length=15, threshold=0.9)[source]

Infer the frequency of a given Pandas Datetime Series.

Parameters
  • series (pd.Series) – data to use for histogram

  • debug (boolean) – a flag to determine if debug object should be returned (explained below).

  • window_length (int) – the window length used to determine the most likely candidate frequence. Default is 15. If the timeseries is noisy and needs to inferred, the minimum length of the input timeseries needs to be greater than this window.

  • threshold (float) – a value between 0 and 1. Given the number of windows that contain the most observed frequency (N), and total number of windows (T), if N/T > threshold, the most observed frequency is determined to be the most likely frequency, else None.

Returns

pandas offset alias string (D, M, Y, etc.) or None if no uniform frequency was present in the data. debug (dict): a dictionary containing debug information if frequency cannot be inferred. This dictionary has the following properties:

  • actual_range_start (str): a string representing the minimum Timestamp in the input observed timeseries according to ISO 8601.

  • actual_range_end (str): a string representing the maximum Timestamp in the input observed timeseries according to ISO 8601.

  • message (str): message describing any issues with the input Datetime series

  • estimated_freq (str): None

  • estimated_range_start (str): a string representing the minimum Timestamp in the output estimated timeseries according to ISO 8601.

  • estimated_range_end (str): a string representing the maximum Timestamp in the output estimated timeseries according to ISO 8601.

  • duplicate_values (list(RangeObject)): a list of RangeObjects of Duplicate timestamps

  • missing_values (list(RangeObject)): a list of RangeObjects of Missing timestamps

  • extra_values (list(RangeObject)): a list of RangeObjects of Extra timestamps

  • nan_values (list(RangeObject)): a list of RangeObjects of NaN timestamps

A range object contains the following information:

  • dt: an ISO 8601 formatted string of the first timestamp in this range

  • idx: the index of the first timestamp in this range
    • for duplicates and extra values, the idx is in reference to the observed data

    • for missing values, the idx is in reference to the estimated data.

  • range: the length of this range.

Return type

inferred_freq (str)