woodwork.statistics_utils.infer_frequency#
- woodwork.statistics_utils.infer_frequency(observed_ts: Series, debug=False, window_length=15, threshold=0.9)[source]#
Infer the frequency of a given Pandas Datetime Series.
- Parameters:
series (pd.Series) – data to use for histogram
debug (boolean) – a flag to determine if debug object should be returned (explained below).
window_length (int) – the window length used to determine the most likely candidate frequence. Default is 15. If the timeseries is noisy and needs to inferred, the minimum length of the input timeseries needs to be greater than this window.
threshold (float) – a value between 0 and 1. Given the number of windows that contain the most observed frequency (N), and total number of windows (T), if N/T > threshold, the most observed frequency is determined to be the most likely frequency, else None.
- Returns:
pandas offset alias string (D, M, Y, etc.) or None if no uniform frequency was present in the data. debug (dict): a dictionary containing debug information if frequency cannot be inferred. This dictionary has the following properties:
actual_range_start (str): a string representing the minimum Timestamp in the input observed timeseries according to ISO 8601.
actual_range_end (str): a string representing the maximum Timestamp in the input observed timeseries according to ISO 8601.
message (str): message describing any issues with the input Datetime series
estimated_freq (str): None
estimated_range_start (str): a string representing the minimum Timestamp in the output estimated timeseries according to ISO 8601.
estimated_range_end (str): a string representing the maximum Timestamp in the output estimated timeseries according to ISO 8601.
duplicate_values (list(RangeObject)): a list of RangeObjects of Duplicate timestamps
missing_values (list(RangeObject)): a list of RangeObjects of Missing timestamps
extra_values (list(RangeObject)): a list of RangeObjects of Extra timestamps
nan_values (list(RangeObject)): a list of RangeObjects of NaN timestamps
A range object contains the following information:
dt: an ISO 8601 formatted string of the first timestamp in this range
- idx: the index of the first timestamp in this range
for duplicates and extra values, the idx is in reference to the observed data
for missing values, the idx is in reference to the estimated data.
range: the length of this range.
- Return type:
inferred_freq (str)