Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression
Execution logs record detailed runtime information of software systems and are used as the main data source for many tasks around software engineering. As modern software systems are evolving into large scale and complex structures, logs have become one type of fast-growing big data in industry. In particular, such logs often need to be stored for a long time in practice (e.g., a year), in order to analyze recurrent problems or track security issues. However, archiving logs consumes a large amount of storage space and computing resources, which in turn incurs high operational cost. Data compression is essential to reduce the cost of log storage. Traditional compression tools (e.g., gzip) work well for general texts, but are not tailed for execution logs. In this paper, we propose a novel and effective log compression method, namely logzip. Logzip is capable of extracting hidden structures from logs via fast iterative clustering and further generating coherent intermediate representations that can enable more effective compression. We evaluate logzip on five large log datasets of different types, with a total of 63.6 GB in size. The results show that, on average, logzip can save about half of the storage space over traditional compression tools. Meanwhile, the design of logzip is highly parallel and only incurs negligible overhead. In addition, we share the industrial experience of applying logzip in a global company.
Thu 14 NovDisplayed time zone: Tijuana, Baja California change
13:40 - 15:20 | Models and LogsResearch Papers / Demonstrations at Hillcrest Chair(s): Timo Kehrer Humboldt-Universtität zu Berlin | ||
13:40 20mTalk | Statistical Log Differencing Research Papers Lingfeng Bao Institute of Information Engineering, Chinese Academy of Sciences, Nimrod Busany Tel Aviv University, David Lo Singapore Management University, Shahar Maoz Tel Aviv University Pre-print | ||
14:00 20mTalk | Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression Research Papers Jinyang Liu Sun Yat-Sen University, Jieming Zhu Huawei Noah's Ark Lab, Shilin He Chinese University of Hong Kong, Pinjia He ETH Zurich, Zibin Zheng Sun Yat-Sen University, Michael Lyu The Chinese University of Hong Kong | ||
14:20 20mTalk | Code-First Model-Driven Engineering: On the Agile Adoption of MDE Tooling Research Papers Artur Boronat University of Leicester | ||
14:40 20mTalk | Size and Accuracy in Model Inference Research Papers Nimrod Busany Tel Aviv University, Shahar Maoz Tel Aviv University, Yehonatan Yulazari Tel Aviv University Pre-print | ||
15:00 10mDemonstration | PMExec: An Execution Engine of Partial UML-RT Models Demonstrations Mojtaba Bagherzadeh Queen's University, Karim Jahed Queen's University, Nafiseh Kahani Queen's University, Juergen Dingel Queen's University, Kingston, Ontario Pre-print | ||
15:10 10mDemonstration | mCUTE: A Model-level Concolic Unit Testing Engine for UML State Machines Demonstrations Reza Ahmadi Queen's University, Karim Jahed Queen's University, Juergen Dingel Queen's University, Kingston, Ontario |