The decompiler is one of the most common tools for examining binaries without corresponding source code. It transforms binaries into high-level code, reversing the compilation process. However, compilation loses information contained within the original source code (e.g., structure, type information, and variable names). Semantically meaningful variable names are known to increase code understandability, but they generally cannot be recovered by decompilers. We propose the Decompiled Identifier Renaming Engine (DIRE), a novel probabilistic technique for variable name recovery that uses both lexical and structural information. We also present a technique for generating corpora suitable for training and evaluating models of decompiled code renaming, which we use to create a corpus of 164,632 unique x86-64 binaries generated from C projects mined from GitHub. Our results show that on this corpus DIRE can predict variable names identical to the names in the original source code up to 74.3% of the time.
Wed 13 NovDisplayed time zone: Tijuana, Baja California change
16:00 - 17:40 | API and RenamingResearch Papers / Journal First Presentations at Cortez 2&3 Chair(s): Massimiliano Di Penta University of Sannio | ||
16:00 20mTalk | CodeKernel: A Graph Kernel based Approach to the Selection of API Usage Examples Research Papers Xiaodong Gu The Hong Kong University of Science and Technology, Hongyu Zhang The University of Newcastle, Sunghun Kim Hong Kong University of Science and Technology Pre-print | ||
16:20 20mTalk | Machine Learning Based Automated Method Name Recommendation: How Far Are We Research Papers Lin Jiang beijing university of posts and telecommunication, Hui Liu Beijing Institute of Technology, He Jiang School of Software, Dalian University of Technology Link to publication Pre-print | ||
16:40 20mTalk | MARBLE: Mining for Boilerplate Code to Identify API Usability Problems Research Papers Daye Nam Carnegie Mellon University, Amber Horvath Carnegie Mellon University, Andrew Macvean Google, Inc., Brad A. Myers Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University Pre-print | ||
17:00 20mTalk | DIRE: A Neural Approach to Decompiled Identifier Renaming Research Papers Jeremy Lacomis Carnegie Mellon University, Pengcheng Yin Carnegie Mellon University, Edward J. Schwartz Carnegie Mellon University Software Engineering Institute, Miltiadis Allamanis Microsoft Research, Cambridge, Claire Le Goues Carnegie Mellon University, Graham Neubig Carnegie Mellon University, Bogdan Vasilescu Carnegie Mellon University Pre-print Media Attached | ||
17:20 20mTalk | Automatic Detection and Update Suggestion for Outdated API Names in Documentation Journal First Presentations Seonah Lee Gyeongsang National University, Rongxin Wu Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Shing-Chi Cheung Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Sungwon Kang Korea Advanced Institute of Science and Technology Link to publication |