用户登录
期刊信息
  • 主管单位:
  • 上海市教育委员会
  • 主办单位:
  • 上海理工大学
  • 主  编:
  • 庄松林
  • 地  址:
  • 上海市军工路516号
  • 邮政编码:
  • 200093
  • 联系电话:
  • 021-55277251
  • 电子邮件:
  • xbzrb@usst.edu.cn
  • 国际标准刊号:
  • 1007-6735
  • 国内统一刊号:
  • 31-1739/T
  • 邮发代号:
  • 4-401
  • 单  价:
  • 15.00
  • 定  价:
  • 90.00
陈扬,刘勤明,梁耀旭.小样本不平衡设备数据下的机器学习策略研究[J].上海理工大学学报,2022,44(4):407-416.
小样本不平衡设备数据下的机器学习策略研究
Machine learning strategy under small sample unbalanced equipment data
投稿时间:2021-08-22  
DOI:10.13255/j.cnki.jusst.20210822001
中文关键词:  机器学习  寿命预测  小样本  KNN算法  SMOTE算法
英文关键词:machine learning  life prediction  small sample  KNN algorithm  SMOTE algorithm
基金项目:国家自然科学基金资助项目(71632008, 71840003);上海市自然科学基金资助项目(19ZR1435600); 教育部人文社会科学研究规划基金资助项目(20YJAZH068);上海理工大学科技发展项目(2020KJFZ038);2020年上海理工大学大学生创新创业训练计划项目(SH2020067)
作者单位E-mail
陈扬 上海理工大学 管理学院上海 200093  
刘勤明 上海理工大学 管理学院上海 200093 lqm0531@163.com 
梁耀旭 上海理工大学 管理学院上海 200093  
摘要点击次数: 35
全文下载次数: 36
中文摘要:
      针对小样本数据样本容量不足与分布不平衡的设备寿命预测问题,构建基于改进SMOTE算法与改进KNN(K-NearestNeighbor)算法联合优化模型。首先,设置噪声比例系数β排除样本数据中的噪声,随后通过类B-SMOTE(Borderline-SMOTE)算法与传统SOMTE算法结合构建改进SMOTE(ISMOTE)算法对存在分布问题的少数类样本进行新增优化,避免因为样本分布不平衡以及样本数量较少引起的偏差。其次,针对分类过程中边界模糊的样本点,通过利用粒子群算法寻求每个样本种类中心点并计算样本距离均值建立分隔阈值$ \stackrel{-}{d} $,对阈值范围内的样本点利用“投票法”判断样本种类,规避KNN算法在处理数据时因为不同种类样本混合而出现误差的问题。最后,通过利用美国卡特彼勒公司液压泵状态数据以及凌津滩水电站水导轴承振动数据进行仿真,算例证明上述两种改进算法在面对小样本不平衡设备数据时可以准确分析设备运行状态以及预测设备未来健康发展趋势。
英文摘要:
      Aiming at the problem of equipment life prediction with insufficient sample size and unbalanced distribution of small sample data, a joint optimization model based on improved smote algorithm and improved KNN (K-Nearest Neighbor) algorithm was constructed. First, the noise scale factor β was set to eliminate the noise in the sample data. Then, an improved smote (ISMOTE) was built through the combination of B-SMOTE (Borderline-SMOTE) algorithm and traditional SMOTE algorithm to add and optimize a few samples with distribution problems, so as to avoid the deviation caused by unbalanced sample distribution and small number of samples. Secondly, for the sample points with fuzzy boundary in the classification process, the particle swarm optimization algorithm was used to find the center point of each sample type and calculate the mean value of sample distance to establish the separation threshold $\bar d $. For the sample points within the threshold range, the "voting method" was used to judge the sample type, so as to avoid the error caused by the mixing of different kinds of samples in the KNN algorithm. Finally, through the simulation using the state data of caterpillar hydraulic pump and the vibration data of hydraulic guide bearing of Lingjintan hydropower station, the numerical examples show that the above two improved algorithms can accurately analyze the equipment operation state and predict the future healthy development trend of equipment in the face of small sample unbalanced equipment data.
HTML   查看全文  查看/发表评论  下载PDF阅读器