
首页> 《中国测试》期刊 >本期导读>基于混合采样和改进随机森林的窃电检测


1008    2023-01-12



作者:张震, 彭坤, 孔帅华

作者单位:郑州大学电气工程学院,河南 郑州 450001



针对窃电检测中存在的数据不平衡和分类器效率低的问题,提出一种基于混合采样和随机森林的窃电检测方法。首先,用随机森林模型的误分率作为SMOTE算法的重采样率,提出E-SMOTE算法;其次,在E-SMOTE和Tome Links混合采样的过程中,引入模型ROC曲线下方的面积(area under curve,AUC)作为迭代停止的条件,实现用电数据集的平衡;最后,用基于马修斯相关系数(Matthews correlation coefficient,MCC)的置换法和卡方检验进行特征选择,并在传统的随机森林模型中引入Q统计值进行选择性集成,不仅优化属性特征的选择,还提升随机森林模型的多样性。实验结果表明:提出的混合采样算法较优于7种常用采样方法,改进的随机森林模型也在精确率、特异度和F1分数等多项指标中表现出更优的性能。

Electric theft detection based on hybrid sampling and improved random forest
ZHANG Zhen, PENG Kun, KONG Shuaihua
School of Electrical Engineering, Zhengzhou University, Zhengzhou 450001, China
Abstract: Targeting the problems of data imbalance and low efficiency of classifiers in power theft detection, a power theft detection method based on hybrid sampling and random forest is proposed. Firstly, the error rate of the random forest model is used as the re-sampling rate of the SMOTE algorithm, and the E-SMOTE algorithm is proposed. Secondly, during the mixed sampling of E-SMOTE and Tome Links, the area under the ROC curve (AUC) of the model is introduced as a condition for the iteration to stop to achieve the balance of the electricity data set. Finally, the permutation method based on Matthews correlation coefficient (MCC) and the chi-square test are used for feature selection, and the Q statistics is introduced into the traditional random forest model for selective integration, which not only optimizes the selection of attribute features, but also improves the diversity of the random forest model. The experimental results show that the proposed hybrid sampling algorithm is better than 7 common sampling methods, and the improved random forest model also shows better performance in accuracy, specificity and F1 score.
Keywords: electric theft detection;hybrid sampling;feature selection;selective integration;random forest
2023, 49(1):92-97  收稿日期: 2021-07-02;收到修改稿日期: 2021-09-11
基金项目: 国家重点研发计划“公共安全风险防控与应急技术装备”重点专项 (2018YFC0824XXX)
作者简介: 张震(1966-),男,河南郑州市人,教授,博士生导师,研究方向为信息安全、图像处理与模式识别
[1] 胡天宇, 郭庆来, 孙宏斌. 基于堆叠去相关自编码器和支持向量机的窃电检测[J]. 电力系统自动化, 2019, 43(1): 119-125
[2] 严勤, 邓高峰, 胡涛, 等. 基于深度循环神经网络的异常用电检测方法[J]. 中国测试, 2021, 47(7): 99-104.
[3] JINDAL A, DUA A, KAUR K, et al. Decision tree and SVM-based data analytics for theft detection in smart grid[J]. IEEE Transactions on Industrial Informatics, 2016, 12(3): 1005-1016
[4] LI S, HAN Y H, YAO X, et al. Electricity theft detection in power grids with deep learning and random forests[J]. Journal of Electrical and Computer Engineering, 2019, 2019: 4136874
[5] 邓高峰, 赵震宇, 王珺, 等. 基于改进自编码器和随机森林的窃电检测方法[J]. 中国测试, 2020, 46(7): 83-89
[6] FENG X F, HUI H Y, LIANG Z Y, et al. A novel electricity theft detection scheme based on text convolutional neural networks[J]. Energies, 2020, 12(21): 5758
[7] 罗康洋, 王国强. L-SMOTE与SVM结合的不平衡数据集分类研究[J]. 计算机工程与应用, 2019, 55(17): 55-62,220
[8] 巢政, 温蜜. 一种基于SMOTE和XGBoost的窃电检测方案[J]. 智慧电力, 2020, 48(11): 97-102
[9] ZHENG W Q, HONG W L, YUN J W, et al. Detection of electricity theft behavior based on improved synthetic minority oversampling technique and random forest classifier[J]. Energies, 2020, 13(8): 2039
[10] 魏志强, 张浩, 陈龙. 一种采用SmoteTomek和LightGBM算法的Web异常检测模型[J]. 小型微型计算机系统, 2020, 41(3): 587-592
[11] XU Z, SHEN D, NIE T, et al. A hybrid sampling algorithm combining M-SMOTE and ENN based on Random Forest for medical imbalanced data[J]. Journal of Biomedical Informatics, 2020, 107: 103465
[12] 靳萍, 李红志, 王磊. 基于时频分析的感应传输CTD数据降噪方法研究[J]. 中国测试, 2021, 47(5): 24-32,57
[13] ZHOU Z H, WU J X, TANG W. Ensembling neural networks: many could be better than all[J]. Artificial Intelligence, 2002, 137(1/2): 239-263
[14] 游文霞, 申坤, 杨楠, 等. 基于Bagging异质集成学习的窃电检测[J]. 电力系统自动化, 2021, 45(2): 105-113
[15] BEHÇET K, VEDAT V. Detection of electricity theft using data processing and LSTM method in distribution systems[J]. Sadhana, 2020, 45(1): 286