A Hybrid Simplified Swarm Optimization Method for Imbalanced Data Feature Selection

Wei Chang Yeh, Yu Tai Yang, Chyh Ming Lai

Abstract


In recent years, feature selection has become an important field in data mining and is being used heavily in numerous areas. The purpose of feature selection is to search for an optimal subset of features from existing data to maximize the accuracy. However, there are still only a few studies investigating the impact of data imbalance - the existence of underrepresented categories of data - on feature selection problem. The aim of this study is therefore to provide a feature selection method for increasing classifying high-dimensional imbalanced data accuracy. In this study, we propose a hybrid method which can spot a better optimal features subset. In the proposed method, information gain as a filter selects the most informative features from the original dataset. The imbalance of the dataset with selected features is justified by using Synthetic minority over-sampling technique. Simplified swarm optimization is then implemented as feature search engine to guide the search for an optimal feature subset. Finally, support vector machine serves as a classifier to evaluate the performance of the proposed method. To evaluate the performance of proposed algorithm, we apply our algorithm in four benchmark datasets and compare the results with existing algorithm. The results show that our algorithm has a better performance than its competitor.

Aus. Aca. Busi & Eco. Rev Vol 2(3), July 2016, P 263-275


Keywords


Data Mining; Feature Selection; Imbalanced Data; Soft Computing; Simplified Swarm Optimization; Support Vector Machine

Full Text:

PDF

Refbacks

  • There are currently no refbacks.