ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

Feature-driven label generation for congestion detection in smart cities under big data

Journal: International Journal of Advanced Technology and Engineering Exploration (IJATEE) (Vol.9, No. 86)

Publication Date:

Authors : ; ;

Page : 94-110

Keywords : Smart cities; Big data; Label generation; Classification; Traffic congestion.;

Source : Downloadexternal Find it from : Google Scholarexternal


Due to rapid urbanization and the emergence of smart cities, the problem of traffic congestion has materialized into a major issue for smart city planners. Therefore, traffic congestion prediction is needed to effectively reduce traffic congestion and enhance the road capacity. There have been various studies which have tried to solve the problem of traffic congestion. However, it is difficult to properly judge the effectiveness of such studies given the absence of properly labeled datasets. Additionally, current studies use datasets with relatively lesser number of data instances, which does not correctly reflect the big data nature of the traffic data. Motivated by these problems and challenges, in this paper, we aim to study the problem of traffic congestion with respect to effective label-generation under big data perspective. Essentially, we provide two sound and intuitive techniques for label generation which help in the correct annotation of unlabeled data. One of the techniques is based on the number of vehicles plying on the road and the other is based on the amalgamation of average speed and number of vehicles. For this purpose, we consider a publicly available CityPulse traffic dataset with 13.5 million data instances. Using our techniques, we generate “congested” and “not-congested” labels depicting whether there is congestion on the road or not. To tackle the class imbalance problem, besides using random undersampling and oversampling techniques, we also introduce a mixture of the two techniques to negate any bias inherent to two individual sampling techniques. To test the effectiveness of our label generation approaches, we make the extensive use of various machine learning techniques and for performance evaluation we use all the standard classification evaluation metrics. Finally, we compare our techniques with a previous work which only considered average speed for label generation. Our results demonstrate the effectiveness of the proposed approaches against the comparing method. For example, in random undersampling the F1-score of every classifier under the proposed techniques is close to 1, whereas that under the comparing method, F1-score is as low as 0.70 in multinomial naïve Bayes (MNB) classifier and 0.88 in support vector machine (SVM). Similarly, in oversampling, our approaches have a close F1-score of 1 across all the classifiers, whereas the comparing method gets as low as 0.70 in MNB. The same trend can be seen in the mixture of both the sampling techniques.

Last modified: 2022-02-04 15:59:46