While CUDA has been the dominated parallel computing platform and programming model for general-purpose GPU computing, CUDA synchronization undergoes significant challenges for GPU programmers due to its intricate parallel computing mechanism and coding practices. In this paper,we propose AuCS, the first general framework to automate synchronization for CUDA kernel functions. AuCS transforms the original LLVM-level CUDA program control flow graph in a semantic-preserving manner for exploring the possible barrier function locations. Accordingly, AuCS develops mechanisms to correctly place barrier functions for automating synchronization in multiple erroneous (challenging-to-be-detected) synchronization scenarios, including data race, barrier divergence, redundant barrier functions. To evaluate the effectiveness and efficiency of AuCS, we conduct an intensive set of experiments and the results suggest that AuCS can automate 20 out of 24 erroneous synchronization scenarios.
This program is tentative and subject to change.
Thu 14 Nov
|10:40 - 11:00|
Zan WangCollege of Intelligence and Computing, Tianjin University, Yingquan ZhaoCollege of Intelligence and Computing, Tianjin University, Shuang LiuCollege of Intelligence and Computing, Tianjin University, Jun SunSingapore Management University, Singapore, Xiang ChenSchool of Information Science and Technology, Nantong University, Huarui LinCollege of Intelligence and Computing, Tianjin University
|11:00 - 11:20|
|11:20 - 11:40|
|11:40 - 12:00|
|12:00 - 12:10|
|12:10 - 12:20|
Ruijie MengUniversity of Chinese Academy of Sciences, Biyun ZhuUniversity of Chinese Academy of Sciences, Hao YunUniversity of Chinese Academy of Sciences, Haicheng LiUniversity of Chinese Academy of Sciences, Yan CaiInstitute of Software, Chinese Academy of Sciences, Zijiang YangWestern Michigan University