python code example of tsfresh库

代码功能解释

1. 导入必要的库

test_tsfresh.pyApply
import pandas as pd
from tsfresh import extract_features, select_features
from tsfresh.utilities.dataframe_functions import impute
from sktime.datasets import load_arrow_head
import warnings

pandas：用于数据处理和分析，提供 DataFrame 和 Series 等数据结构。
tsfresh：用于从时间序列数据中提取和选择特征。extract_features 用于提取特征，select_features 用于选择有意义的特征，impute 用于处理缺失值。
sktime.datasets.load_arrow_head：用于加载 Arrow Head 时间序列数据集。
warnings：用于控制警告信息的显示。

2. 忽略警告信息

test_tsfresh.pyApply
# 忽略 tsfresh 可能出现的警告
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=UserWarning)

这两行代码的作用是忽略 RuntimeWarning 和 UserWarning 类型的警告，避免这些警告信息干扰程序的运行和输出。

3. 数据格式转换函数

test_tsfresh.pyApply
def convert_sktime_to_tsfresh(df_sktime, y):
    """
    Convert the dataset format from sktime to tsfresh.

    :param df_sktime: The dataset loaded by sktime.
    :param y: The target labels.
    :return: A DataFrame in the format required by tsfresh and a Series of target labels.
    """
    df_tsfresh = []
    for idx, row in df_sktime.iterrows():
        # 假设第一列包含时间序列数据
        series = row.iloc[0]
        for time, value in series.items():
            df_tsfresh.append([idx, time, value])
    df_tsfresh = pd.DataFrame(df_tsfresh, columns=['id', 'time', 'value'])
    y = pd.Series(y)
    return df_tsfresh, y

该函数的作用是将 sktime 加载的数据集格式转换为 tsfresh 所需的格式。
df_sktime 是 sktime 加载的数据集，y 是对应的目标标签。
函数通过遍历 df_sktime 的每一行，提取第一列的时间序列数据，将其转换为 [id, time, value] 的形式，存储在 df_tsfresh 列表中。
最后将 df_tsfresh 列表转换为 DataFrame，并将 y 转换为 Series 后返回。

4. 主程序

test_tsfresh.pyApply
if __name__ == "__main__":
    try:
        # 加载 Arrow Head 数据集
        X, y = load_arrow_head(split="train", return_X_y=True)
        # 转换数据集格式
        df, y = convert_sktime_to_tsfresh(X, y)

        # 提取特征
        X_extracted = extract_features(df, column_id='id', column_sort='time')
        # 处理缺失值
        X_extracted = impute(X_extracted)
        # 选择特征
        X_selected = select_features(X_extracted, y)

        print(X_selected)
    except Exception as e:
        print(f"An error occurred: {e}")

load_arrow_head(split="train", return_X_y=True)：加载 Arrow Head 数据集的训练集，返回特征 X 和目标标签 y。
convert_sktime_to_tsfresh(X, y)：将 sktime 格式的数据集转换为 tsfresh 所需的格式。
extract_features(df, column_id='id', column_sort='time')：从转换后的 DataFrame 中提取特征，column_id 指定样本的标识列，column_sort 指定时间排序列。
impute(X_extracted)：处理提取特征后可能出现的缺失值。
select_features(X_extracted, y)：根据目标标签 y 选择有意义的特征。
try-except 块用于捕获程序运行过程中可能出现的异常，并打印错误信息。

运算结果解释

最终打印的 X_selected 是一个 DataFrame，包含经过特征提取和特征选择后的数据。

行：代表不同的样本，每个样本对应 Arrow Head 数据集中的一个时间序列。
列：代表选择出的有意义的特征，这些特征是从原始时间序列数据中提取出来的，例如均值、方差、最大值、最小值等统计特征。
值：每个单元格的值是对应样本在该特征下的取值。

通过这种方式，可以将原始的时间序列数据转换为结构化的特征数据，方便后续进行机器学习模型的训练和预测。

完整代码

import pandas as pd
from tsfresh import extract_features, select_features
from tsfresh.utilities.dataframe_functions import impute
from sktime.datasets import load_arrow_head
import warnings

# 忽略 tsfresh 可能出现的警告
warnings.filterwarnings("ignore", category=RuntimeWarning)
warnings.filterwarnings("ignore", category=UserWarning)


def convert_sktime_to_tsfresh(df_sktime, y):
    """
    Convert the dataset format from sktime to tsfresh.

    :param df_sktime: The dataset loaded by sktime.
    :param y: The target labels.
    :return: A DataFrame in the format required by tsfresh and a Series of target labels.
    """
    df_tsfresh = []
    for idx, row in df_sktime.iterrows():
        # 假设第一列包含时间序列数据
        series = row.iloc[0]
        for time, value in series.items():
            df_tsfresh.append([idx, time, value])
    df_tsfresh = pd.DataFrame(df_tsfresh, columns=['id', 'time', 'value'])
    y = pd.Series(y)
    return df_tsfresh, y


if __name__ == "__main__":
    try:
        # 加载 Arrow Head 数据集
        X, y = load_arrow_head(split="train", return_X_y=True)
        # 转换数据集格式
        df, y = convert_sktime_to_tsfresh(X, y)

        # 提取特征
        X_extracted = extract_features(df, column_id='id', column_sort='time')
        # 处理缺失值
        X_extracted = impute(X_extracted)
        # 选择特征
        X_selected = select_features(X_extracted, y)

        print(X_selected)
    except Exception as e:
        print(f"An error occurred: {e}")

一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31