Laika BOSS - Lockheed Martin

4 downloads 149 Views 2MB Size Report
such as email, web, and social media as delivery vectors, and target widespread ..... send the object to the EXPLODE EMA
Laika BOSS: Scalable File-Centric Malware Analysis and Intrusion Detection System Matthew Arnao

1

Charles Smutz Adam Zollman Andrew Richardson Lockheed Martin Computer Incident Response Team

Eric Hutchins

Introduction

Threat actors intent on gaining access to a network often choose file-based exploits because they can be easily and reliably delivered to intended targets. These actors often use the most common, critical protocols such as email, web, and social media as delivery vectors, and target widespread and critical applications. Wholesale blocks on those protocols or file types would cripple legitimate business activity and is generally not an option for network defenders. To defeat intrusions, defenders must be able to detect malicious files wherever they exist - either transiting a network or stored on disk. There are a multitude of malware analysis tools and reverse engineering resources available to analyze malicious code, but these work best in one-off, isolated conditions and are not capable of real-time processing. As a result, most security teams have to manage a disparate set of analysis tools with different capabilities. This inefficient solution presents a frustration for many defenders: being able to detect malware in a lab, but not able to scale that approach to successfully detect malware and defend an enterprise. Most intrusion detection systems are focused primarily on the medium they monitor (e.g. networkbased, host-based). The medium-centric approach normalizes all collection, logging, and alerting around the medium. File features - with all their different formats, ))) return moduleResult

5.3

File Centric Metadata

Late in 2012, LM-CIRT received intelligence identifying a specific spear phishing-based attack. This wave was analyzed for indicators that could be used to pivot and find additional related attacks. Analysis determined that typical network transaction metadata, such as IP addresses or email addresses, were not useful in

9

finding related waves due to adversary specific TTPs. However, some unique metadata were discovered in the malicious Microsoft Office attachment that were of value. The document had an “Author” of “Jerry” and a “LastAuthor” of “Windows User” indicating that the document was created by a user named “Jerry” and last modified by a user named “Windows User.” Since Laika is configured to extract metadata from all Office files, these metadata conditions were searched. Analysts discovered that, individually, both potential indicators were common in many benign documents - however, the intersection of the two authors was very rare. Three unique documents in separate emails were discovered. These related documents were all confirmed to be malicious and were part of attacks not previously known to be related. Performing a historical search of this data to vet the fidelity of this condition, it was simple to write a signature to detect this condition on future observation. This case study demonstrates that file-level metadata can be used in incident response similar to the manner in which network metadata is commonly used today.

6

Operational Experience

Figure 6: Scan Times By File Type

Figure 5: Scan Times Distribution

Laika has been deployed at Lockheed Martin since early 2012. Laika is integrated with network sensors based on Vortex [11] and also Suricata [12], scanning files extracted from HTTP, among other protocols. Inline blocking capability occurs through integration with email gateways and web proxies using Milter [13] and ICAP [14], respectively. Analysts also manually submit individual files for processing. On a typical business day, this Laika installation scans approximately two million external emails totalling 150 GB of input resulting in 400 GB of data scanned post subfile explosion. Web traffic, which is also analyzed, weighs in at about 100 million web requests totalling about 1.5 TB of HTTP payload data daily. Because the core components of Laika (ZeroMQ, YARA, etc.) are implemented in efficient native code, Laika is highly performant. Laika adds little computational overhead and latency beyond the processing performed by individual modules. Figure 5 shows the weighted distribution of Laika scan times from a typical day with over 100 million objects. Eighty percent of the scan times are under 100ms and 98.6% of the objects are scanned in under 1s. Since Laika is usually configured to tailor processing to the type of file being scanned, Laika scan times are very long tailed. Figure 6 shows the average scan types for some analysis heavy file types on the same data set. Portable executables, excluded from this graph, had an average scan time of 83s. Using simple queuing and distribution to independent scanner threads, average typical scan times are kept low.

10

Laika scales horizontally well. The largest Laika cluster known to the authors was built using 16 servers with 24 cores each, totaling 384 cores. Laika scanning services have remained highly available despite individual scanner node failures.

7

Conclusions

Laika BOSS is a file-centric malware analysis and intrusion detection system. It implements the core functionality of dispatching input files to modules. These modules typically perform extraction of subfiles, metadata collection, and/or detections such as signature matching. They abstract away the different network protocols, encapsulations, and obfuscations which commonly frustrate detection, leaving analysts free to focus on the payload. Laika supports input from various sources including analyst submissions, passive network sensors, and inline network gateways. Analysts can generate their own modules and can do so with a simple Python API. This extensibility gives the defender significant power and flexibility to identify and defeat intrusion attempts that would evade other countermeasures. Laika BOSS is built on popular open source projects such as YARA and ZeroMQ. In turn, the Laika IDS Framework is freely available as open source: https://github.com/lmco/laikaboss. The current collection includes the core framework including many modules and clients for Milter and ICAP. A client for network sensors, such as Suricata, will be available in the future. All functionality is supported at scale and has been running successfully in Lockheed Martin’s global network since 2012.

References [1] “YARA - The pattern matching swiss knife for malware researchers (and everyone else).” http://plusvic.github.io/yara. [2] “Viper.” http://viper.li. [3] “Cuckoo Sandbox.” http://www.cuckoosandbox.org. [4] “Python.” https://www.python.org. [5] “Cyber Kill Chain.” http://www.lockheedmartin.com/us/what-we-do/information-technology/ cyber-security/cyber-kill-chain.html. [6] “ZeroMQ.” http://zeromq.org/. [7] “Fluentd — open source data collector.” http://www.fluentd.org. [8] “ClamAV.” http://www.clamav.net. [9] “PDFrate - a machine learning based classifier operating on document metadata and structure.” http://pdfrate.com/about. [10] “XML Data Package Specification, version 2.0.” http://partners.adobe.com/public/developer/en/xml/xdp_2.0.pdf. [11] “Vortex IDS.” http://sourceforge.net/projects/vortex-ids. [12] “Suricata - Open Source IDS / IPS / NSM engine.” http://www.suricata-ids.org. [13] “Milter.” https://www.milter.org/developers/api. [14] “Internet Content Adaptation Protocol.” http://tools.ietf.org/html/rfc3507.

11