Spothreat Machine Learning

The machine learning component of Spothreat contains routines for performing suspicious connections analyses on netflow, DNS or proxy logs gathered from a network. These analyses consume a collection of network events and produce a list of the events that are considered to be the least probable, and these are consider the most suspicious. They rely on the ingest component of Spot to collect and load netflow, DNS, and proxy records.

Spothreat uses topic modeling to discover normal and abnormal behavior. It treats the collection of logs related to an IP as a document and uses Latent Dirichlet Allocation (LDA) to discover hidden semantic structures in the collection of such documents.

LDA is a generative probabilistic model used for discrete data, such as text corpora. LDA is a three-level Bayesian model in which each word of a document is generated from a mixture of an underlying set of topics [1]. We apply LDA to network traffic by converting network log entries into words through aggregation and discretization. In this manner, documents correspond to IP addresses, words to log entries (related to an IP address) and topics to profiles of common network activity.

Spothreat infers a probabilistic model for the network behavior of each IP address. Each network log entry is assigned an estimated probability (score) by the model. The events with lower scores are flagged as “suspicious” for further analysis.

[1] Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3, no. Jan (2003): 993-1022.

More Info

Apache Incubator

As cyber threats evolve, so do we. Spothreat is committed to staying at the forefront of cybersecurity innovation, ensuring that your organization is equipped with the tools and insights needed to navigate the complexities of today’s digital landscape.

Spothreat is an open-source initiative under the Spothreat Open Source Foundation. Spothreat and its logo are trademarks of the Spothreat Open Source Foundation.

Top