I will show how entropy, a measure of information content defined by Shannon in 1948, can provide useful ways of organizing and analyzing logdata. In particular, we use entropy and mutual information heuristics to group syslog records and packet captures in such a way as to bring out anomalies and summarize the overall structure in each particular data set. I will show a modification of Ethereal that is based on these heuristics, and a separate tool for browsing syslogs. Our data organization heuristics produce decision trees that can be saved and applied to building views of other data sets. Our tools also allow the user to mark records based on relevance, and use this feedback to improve the data views. Our tools and algorithm descriptions can be found at http://kerf.cs.dartmouth.edu
Sergey Bratus is a Research Assistant Professor the Computer Science Dept. at Dartmouth College. His research interests include designing new operating system and hardware-based features to support more expressive and developer-friendly debugging, secure programming and reverse engineering; Linux kernel security (kernel exploits, LKM rootkits, and hardening patches); data organization and other AI techniques for better log and traffic analysis; and all kinds of wired and wireless network hacking.
Before coming to Dartmouth, he worked on statistical learning methods for natural text processing and information extraction at BBN Technologies. He has a Ph.D. in Mathematics from Northeastern University.