Distributed Security Intelligence
Artificial Intelligence is radically transforming the cybersecurity industry. To successfully use A.I for security, the quality of the data is paramount. Security-related data must be collected from many different sources – network data from packets, server data from commands and processes, application data such as logs, and threat intelligence data from security researchers, among others. These disparate streams of information are fed into a centralized processer, wherein machine learning is conducted to detect security threats.
A few challenges appear in the data collection part of the process.
- Not enough data
In some cases, the amount of data is insufficient for machine learning to generate an accurate output. When this happens, there may be too many false positives or false negatives. In general, the greater the volume of data, the more accurate the result.
- Too much data
The downside of having a high volume of data, however, is the increasing cost of required computing power. There may be so much data that machine learning consumes too many resources and cannot be sustained. In these cases, it becomes unpractical or cost-prohibitive to deploy the machine learning models inline.
- Missing data
The data may be missing or incomplete. If pieces of the puzzle are missing, certain security events cannot be detected. We will elaborate on what this means in a later section.
- Incorrect data
If the data is incorrect, even a theoretically perfect machine learning model will produce the wrong results. Garbage in, Garbage out.
Because second and third challenges are less intuitive, we will focus on addressing these two challenges.
We will discuss why the architecture of the security intelligence matters greatly in determining its scalability and reliability in deployment.
Centralized vs Distributed Security Intelligence
To design machine learning for cybersecurity, two architectures can be considered. Centralized architecture is quite common. In centralized machine learning, the data feeds are from many sources while machine learning is running in a centralized place. The data feeds, which are logs or network traffic such as Netflow or IPFIx, contain little intelligence themselves – they are merely transport vehicles to the central big data platform. Machine learning is then conducted by the central platform on the aggregate data.
With the Distributed Security Intelligence (DSI) architecture, security intelligence is skillfully applied at critical junctures throughout system, starting from the data sources at the very beginning of the process. Though the DSI architecture similarly feeds these disparate data sources into a centralized big data platform for analysis, the application of intelligence at additional points reduces the amount of data that is ingested by the big data platform. Like FOG computing, this distinction enables the scalability and affordability that is highly sought by mid-to-large enterprises and MSSPs with multiple SME customers.
DSI illustrates its superiority as an architecture for security intelligence in the following cases:
Problem Case 1: Raw Packet Data is not scalable
As previously demonstrated by IDS/IPS, using raw packets for detection has severe constraints on scalability. To mitigate this problem, most IDS/IPS are deployed in close proximity to, if not a part of, the perimeter firewall. Imagine attempting this on some centralized servers in a data center or cloud – the packets are duplicated and streamed across the network to the cluster of servers. While it may be possible to attempt, it will result in a heavy burden on the CPU of the source server, the network bandwidth, as well as the computing resources of the centralized servers. Running machine learning on raw packets is simply impractical. Furthermore, the security-relevant information density of each is very low, and the packets are formatted for efficient transmission, not for analysis like machine learning.
Problem Case 2: Netflow/IPFIX misses critical data
It may seem prudent, following the lack of scalability of raw packets, to compress the data and extract only useful information. Netflow and IPFIX are protocols that track network traffic flow information instead of individual packets. They dramatically reduce the volume of data, making machine learning feasible. However, though Netflow/IPFIX are useful for network performance analysis, not much insight into application content can be achieved. Security threat detection requires information such as DNS domain names, HTTP URLs, database queries, among others.
Attempts have been made to augment IPFIX functionality to support content such as application name, but the results fall short due to the abundance of different applications as well as the complexities of each application.
The Solution: Superior Data with Application Content
Distributed intelligence represents a better way. Security-related information should be extracted from popular applications, such as DNS domain names and MySQL queries, by properly identifying the applications from raw packets. The extracted data can be enriched at collection time with flow information such as the session start, the session duration, the total byte count in each direction of the session, and the packet transmission pattern, just to name a few. This distributed model boasts a data reduction compared to only using raw packets, while also overcoming limitations from the standard protocols such as Netflow/IPFIX. The density of useful information to aid threat detection is increased, while the volume of data is decreased.
Considering the potential diversity and complexity of the applications and potential complexity of each application, application identification can be very time consuming. Open source tools such as BRO can extract application content, but performance continues to be a challenge. To achieve a certain throughput, it may seem necessary to have expensive dedicated hardware. Aella Data’s data sifter is a powerful, lightweight solution with built-in intelligence that can identify thousands of applications with just the first packet of the flow. Its intelligence reduces the required computing power, and provides additional information that will prove critical in detecting security events.
Problem Case 3: Network Traffic alone misses critical data
Running machine learning on network traffic data can certainly detect some security events, but the results may not be quickly actionable. For example, it may be possible to identify a compromised server or container by its IP address. An improvement, however, would be to enrich the server’s IP information with its hostname, because IP addresses can change over time. A further improvement would be to pinpoint the command, process, or user on the server that generated the event, so that malicious processes can be stopped and compromised users can be cleared. To achieve these objectives, intelligent data acquisition and fusion must be conducted from other data sources, such as application logs, executed commands, and server processes.
The Solution: Superior Data from More Sources
Data from multiple sources can and should be acquired. Aella Data’s Data Sifters employ distributed intelligence to support a diverse range of data sources, from network traffic with application content, to commands or processes running on servers, to application logs, among others. Our centralized processor can ingest data from additional sources such as firewall and IDS/IPS logs, threat intelligence feeds, and user information from AD. These rich data sets are then aggregated and correlated in preparation for advanced analysis.
Problem Case 4: Too much data for centralized processing
Common threats such as port-scans, SYN floods and data exfiltration via DNS tunneling, can be detected by the intelligent central processor. A more efficient and economical strategy, however, is to detect them at the initial data collection stage. Applying intelligence at the local branches of the system reduces the volume of data that must be ingested, processed, and stored by the central processor. If the entire set of network traffic data that contains the relevant threats is fed to the processor, the machine learning module will unnecessarily run analysis on tens of thousands or millions of extra records. To conserve resources, the data collection agent should distill the data into significant items before proceeding. In addition to improved performance, the central processor will also benefit from reduced risk of receiving DOS attacks.
Smarter, Faster Security with Distributed Intelligence
The advantages of distributed intelligence in scaling machine learning and enhancing security detection extend beyond just these cases. An intelligent data collector, for example, can capture the packets of a DNS tunnel event at the moment of detection such that the tunneled information can be recovered.
Distributing security intelligence throughout the entire data processing chain enhances the scalability of the entire threat detection system. Intelligence at the data collection points, improves the quality of data while simultaneously reducing volume. The micro-service based architecture of the centralized data processor then enables both supervised and unsupervised machine learning to be used in the pipeline for timely and confident threat detections.