﻿ 基于交通指数聚类的路网区域动态划分
 上海理工大学学报  2021, Vol. 43 Issue (4): 360-367 PDF

Dynamic division of road network areas based on traffic index clustering
HOU Junjun, LONG Baichao, WANG Hongyu, XIAO Jianli
School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
Abstract: As road network areas need to be generated firstly for traffic state forecasting of macro road network areas, a new dynamic division method of road network areas based on traffic index clustering was presented. The entire city′s road networks were first divided into grids, in which each road section belonged to one certain grid. Following by this, the traffic index for each grid was computed. Then, features for each grid were extracted to get sample feature matrix. The k-means++ clustering algorithm was used to cluster the sample feature matrix. Consequently, the initial clustering labels were generated. For better clustering results, the grid labels with singularity were modified. Finally, the completed road network areas were obtained. In order to verify the performance of the proposed method, the GPS data of Shanghai was utilized to divide road network areas, and the results of the proposed method were compared with the results obtained by other clustering methods. Experimental results show that the proposed method has improved the division accuracy and stability.
Key words: road network areas     dynamic division     traffic index     k-means++ clustering algorithm

1 城市路网区域动态划分方法

 图 1 路网区域动态划分方法流程 Fig. 1 Flowchart of the dynamic division method of traffic road network area
1.1 地图网格划分

1.2 网格交通指数计算

 $TFI{\rm{ }} = {\rm{ }}\frac{{\displaystyle\sum\limits_{r = 1}^R {{w_{_r}}\left(\displaystyle\sum\limits_{t = 1}^T {{w_t}\left(\displaystyle\sum\limits_{i = 1}^N {{k_i}{l_i}} \frac{{{v_i}}}{{{v_{{\rm f}r}}}}\right)} \right)} }}{{\displaystyle\sum\limits_{r = 1}^R {{w_r}\left(\sum\limits_{t = 1}^T {{w_t}\left(\displaystyle\sum\limits_{i = 1}^N {{k_i}{l_i}}\right)} \right)} }}{\rm{ }} \times {\rm{ }}100$ (1)

1.3 网格特征提取

 ${{\boldsymbol{f}}_i} = \left(x,y,TF{I_{i,1}},TF{I_{i,2}},\cdots,TF{I_{i,6}}\right)$ (2)

 $\begin{gathered} \;\;\;\;\;\;\;\;\;x\;\;\;\;\;y\;\;\;\;\;TF{I_{i,1}}\;\;\;\;\;\;\;TF{I_{i,2}}\;\;\;\; \cdots \;\;\;TF{I_{i,6}}\;\;\; \\ {\boldsymbol{F}} = \left[\!\! {\begin{array}{*{20}{c}} 1\!\!\!\!&\!\!\!\!1\!\!\!\!&\!\!\!\!{TF{I_{1,1}}}\!\!\!\!&\!\!\!\!{TF{I_{1,2}}}\!\!\!\!&\!\!\!\! \cdots \!\!\!\!&\!\!\!\!{TF{I_{1,6}}} \\ 1\!\!\!\!&\!\!\!\!2\!\!\!\!&\!\!\!\!{TF{I_{2,1}}}\!\!\!\!&\!\!\!\!{TF{I_{2,2}}}\!\!\!\!&\!\!\!\! \cdots \!\!\!\!&\!\!\!\!{TF{I_{2,6}}} \\ 1\!\!\!\!&\!\!\!\!3\!\!\!\!&\!\!\!\!{TF{I_{3,1}}}\!\!\!\!&\!\!\!\!{TF{I_{3,2}}}\!\!\!\!&\!\!\!\! \cdots \!\!\!\!&\!\!\!\!{TF{I_{3,6}}} \\ \vdots \!\!\!\!&\!\!\!\! \vdots \!\!\!\!&\!\!\!\! \vdots \!\!\!\!&\!\!\!\! \vdots \!\!\!\!&\!\!\!\!{}\!\!\!\!&\!\!\!\! \vdots \\ m\!\!\!\!&\!\!\!\!n\!\!\!\!&\!\!\!\!{TF{I_{m \times n,1}}}\!\!\!\!&\!\!\!\!{TF{I_{m \times n,2}}}\!\!\!\!&\!\!\!\! \cdots \!\!\!\!&\!\!\!\!{TF{I_{m \times n,6}}} \\ \vdots \!\!\!\!&\!\!\!\! \vdots \!\!\!\!&\!\!\!\! \vdots \!\!\!\!&\!\!\!\! \vdots \!\!\!\!&\!\!\!\!{}\!\!\!\!&\!\!\!\! \vdots \\ {50}\!\!\!\!&\!\!\!\!{50}\!\!\!\!&\!\!\!\!{TF{I_{2\,500,1}}}\!\!\!\!&\!\!\!\!{TF{I_{2\,500,2}}}\!\!\!\!&\!\!\!\! \cdots \!\!\!\!&\!\!\!\!{TF{I_{2\,500,6}}} \end{array}}\!\! \right] \\ \end{gathered}$ (3)

 ${\boldsymbol{F}} = \left[ \begin{array}{c} {{\boldsymbol{f}}_1} \\ \vdots \\ {{\boldsymbol{f}}_i} \\ \vdots \\ {{\boldsymbol{f}}_{{{2\,500}}}} \\ \end{array} \right]$ (4)
1.4 样本特征矩阵聚类

a. 初始聚类。

k-means++聚类算法中，数据点与聚类中心之间的最近距离用 $D(x)$ 表示，其中 $\,{\mu _o}(o = 1, 2,\cdots,n)$ 表示n个聚类中心点， $D(x)$ 的具体计算公式[13]如下：

 $D(x) = \arg \min \left\| {{x_i} - {\mu _o}} \right\|_2^2$ (5)

k-means++聚类算法在聚类中心点的选取过程中引入了概率的思想：将每一个被选中的样本数据点概率与距其最近的已选中心点的距离相互联系，这样距离越大被选中的概率就越大。选取好k-means++聚类中心点后，还需要对选定的聚类中心点进行不断更新，直到聚类中心点不再发生变化。假定类别数为k，则聚类中心G可以表示为 $G = \{ {g_1},{g_2}, \cdots ,{g_k}\}$ ，则聚类中心的数学表达式可以表示为[13]

 ${g_k} = \frac{1}{{\left| {{G_k}} \right|}}\sum\limits_{{x_i} \in {A_k}} {{x_i}}$ (6)

b. 奇异网格标签的修正。

 $N{\rm{4}}(p) = (x - 1,y),(x + 1,y), (x,y{\rm{ - 1}}),(x,y + 1)$ (7)

 $\begin{gathered} N8(p) = N4\bigcup \begin{gathered} (x - 1,y + 1),(x + 1,y + 1), \\ (x - 1,y - 1),(x + 1,y - 1) \\ \end{gathered} \\ \end{gathered}$ (8)

 图 2 奇异网格示意图 Fig. 2 Examples of singular grids
1.5 产生路网区域

2 相关实验 2.1 数据来源

2.2 交通指数颜色映射

 图 3 交通指数颜色映射图 Fig. 3 Map generated by mapping the traffic indexes into the color space
2.3 聚类算法精度评价指标

 图 4 路网区域及区域中的主基色网格和辅基色网格 Fig. 4 Traffic road network areas and the major color grids and minor color grids in these areas

 $P = \left(\frac{1}{H}\sum\limits_{i = 1}^n {{h_i}} + \frac{1}{S}\right)$ (9)

 ${P_{{\rm{ave}}}} = \frac{{\displaystyle\sum\limits_{k = 1}^N {{P_k}} }}{N}$ (10)

2.4 实验验证

k-means++聚类算法与k-means聚类算法、AGNES聚类算法以及Birch聚类算法4种方法对比验证。k-means聚类算法流程如下：首先设置聚类的类别数k，然后从样本数据中随机选取k个样本作为初始的聚类中心，通过计算聚类中心与样本数据点之间的距离，把每个数据点分配到距离它最近的聚类中心；再将同一类别的所有样本数据特征取平均值，并将平均后所得的数据点作为新的聚类中心；最后，重复上述步骤直到聚类中心满足终止条件，输出聚类标签。AGNES聚类算法采取自下而上的聚类思想。首先将样本数据作为一个聚类簇，然后根据一定的相似性度量将这些聚类簇逐渐合并成一个大的聚类簇，直到达到初始设定的类别数为止。Birch聚类算法引入聚类特征和聚类特征树两个新的概念来概括对聚类的描述。其中，聚类特征树概括了聚类中的有用信息，占用的空间比原来的数据集要小，这节约了内存空间，提升了Birch算法在大型数据集上的聚类速度[17-20]

a.方法精度验证。

 图 5 k-means++聚类算法的精度验证 Fig. 5 Accuracy verification of k-means++ clustering algorithm

 图 6 不同聚类算法对比图 Fig. 6 Comparison of different clustering algorithms

b. 方法稳定性验证。

 图 7 k-means++聚类算法稳定性验证 Fig. 7 Stability verification of k-means++ clustering algorithm

3 结　论

 [1] LI Y F, DANG A R, HUANG T H. Evolution of administrative division in Beijing and analysis on its spatial pattern based on GIS[C]//Proceedings of 17th International Conference on Geoinformatics. Fairfax, VA: IEEE, 2009: 1–6. [2] 邹文杰, 翁剑成, 荣建, 等. 基于空间相关性分析的路网评价区域划分方法[J]. 北京工业大学学报, 2012, 38(4): 564-569. [3] 谭飞刚, 刘伟铭, 黄玲, 等. 混合交通交叉路口跨摄像机自行车再识别研究[J]. 中国公路学报, 2017, 30(5): 132-138. DOI:10.3969/j.issn.1001-7372.2017.05.017 [4] 张爽. 城市路网交通控制子区自动划分方法研究[D]. 南京: 东南大学, 2018. [5] MOORE J E, JOVANIS P P. Statistical designation of traffic control subareas[J]. Journal of Transportation Engineering, 1985, 111(3): 208-223. DOI:10.1061/(ASCE)0733-947X(1985)111:3(208) [6] GEROLIMINIS N, DAGANZO C F. Existence of urban-scale macroscopic fundamental diagrams: some experimental findings[J]. Transportation Research Part B: Methodological, 2008, 42(9): 759-770. DOI:10.1016/j.trb.2008.02.002 [7] LIN T, ZHAO C. Nearest neighbor optimization k-means++ clustering algorithm [J]. Computer Science, 2019, 46(S2): 216-219. [8] XIAO J L, WANG H Y. Traffic index cloud maps for traffic flow analysis with big traffic data[C]//Proceedings of the 5th International Conference on Computer and Communication Systems (ICCCS). Shanghai: IEEE, 2020: 20–23. [9] 肖建力. 智能交通中的多核支持向量机与分类器集成方法研究[D]. 上海: 上海交通大学, 2013. [10] SINAGA K P, YANG M S. Unsupervised k-means clustering algorithm [J]. IEEE Access, 2020, 8: 80716-80727. DOI:10.1109/ACCESS.2020.2988796 [11] LANG A, SCHUBERT E. BETULA: numerically stable CF-Trees for BIRCH clustering[C]//Proceedings of the 13th International Conference on Similarity Search and Applications. Copenhagen, Denmark: Springer, 2020: 281–296. [12] SINGH A, YADAV A. Effect of similarity measures on AGNES based hierarchical clustering[J]. International Journal of Engineering and Management Research, 2014, 4(3): 257-263. [13] ARTHUR D, VASSILVITSKII S. K-means++: the advantages of careful seeding[C]//Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans: Society for Industrial and Applied Mathematics, 2007: 1027–1035. [14] GONZALEZ R C, WOODS R E, EDDINS S L. 数字图像处理[M]. 阮秋琦, 译. 北京: 电子工业出版社, 2014. [15] XIAO J L, WEI C, LIU Y C. Speed estimation of traffic flow using multiple kernel support vector regression[J]. Physica A: Statistical Mechanics and its Applications, 2018, 509: 989-997. DOI:10.1016/j.physa.2018.06.082 [16] 王忠浩, 张静, 肖建力. 基于交通指数云图的宏观交通流分析方法综述[J]. 上海理工大学学报, 2017, 39(4): 353-357. [17] 张宪超. 数据聚类[M]. 北京: 科学出版社, 2018. [18] SANO A V D, IMANUEL T D, CALISTA M I, et al. The application of AGNES algorithm to optimize knowledge base for tourism chatbot[C]//Proceedings of 2018 International Conference on Information Management and Technology. Jakarta, Indonesia: IEEE, 2018: 65–68. [19] 崔世斌, 牛智勇, 李若楠, 等. 基于AGNES聚类算法的城市交通运行状态分析研究[J]. 西昌学院学报(自然科学版), 2020, 34(4): 62-67. [20] 王梦遥, 王晓晔, 洪睿琪, 等. 基于改进BIRCH聚类算法的评价对象挖掘[J]. 软件, 2019, 40(11): 9-12, 61. DOI:10.3969/j.issn.1003-6970.2019.11.003