Spothreat Open Data Models

Many organizations have built threat detection capabilities leveraging myriad vendor solutions. This approach leads to many silos of data corresponding to each vendor and often results in storing multiple copies of the same data, as each vendor's capability operates independently from the others. There is no single vendor able to cost-effectively store and analyze all the data required to detect threats and facilitate incident investigations and remediation.

Spothreat ODM brings together all security-related data (event, user, network, endpoint, etc.) into a singular view that can be used to detect threats more effectively than ever before. This consolidated view can be leveraged to create new analytic models that were not previously possible and to provide needed context at the event level to effectively determine whether or not there is a threat. The Spothreat ODM enables the sharing and reuse of threat detection models, algorithms and analytics, because of a shared, open data model.

The open data model (ODM) provides a common taxonomy for describing security telemetry data used to detect threats. It uses schemas, data structures, file formats and configurations in the underlying Hadoop platform for collecting, storing and analyzing security telemetry data at scale. Spot defines relationships amongst the various security data types for joining log data with user, network and endpoint entity data.

The Spothreat ODM enables organizations to:

Store one copy of the security telemetry data and apply UNLIMITED analytics
- Leverage out-of-the-box analytics powered by machine learning to detect threats in DNS, Flow and Proxy data
- Build custom analytics to your desired specification
- Plug-in third-party vendor analytics that interoperate with the ODM
Share and/or reuse threat detection models, algorithms, ingest pipelines, visualizations and analytics across the Spothreat community, due to a common data model.
Leverage all your security telemetry data to establish the context needed to better detect threats
- Security logs
- User, endpoint and network entity data
- Threat intelligence data
Avoid "lock-in" to a specific technology and gain needed analytic flexibility resultant from a shared, open data model.

Data Models

In order to provide a framework for effectively analyzing data for cyber threats, it is necessary to collect and analyze standard security event logs/alerts and contextual data regarding the entities referenced in these logs/alerts. The most common entities include network, user and endpoint, but there are others such as file and certificate.

In the diagram below, the raw event tells us that user "jsmith" successfully logged in to an Oracle database from the IP address 10.1.1.3. Based on the raw event only, we don't know if this event is a legitimate threat or not. After injecting user and endpoint context, the enriched event tells us this event is a potential threat that requires further investigation.

Based on the need to collect and analyze both security event logs/alerts and contextual data, support for the following types of security information are included in the Spot open data model:

Security event logs/alerts

This data type includes event logs from common data sources used to detect threats and includes network flows, operating system logs, IPS/IDS logs, firewall logs, proxy logs, web logs, DLP logs, etc.

Network context data

This data type includes information about the network, which can be gleaned from Whois servers, asset databases and other similar data sources.

User context data

This data type includes information from user and identity management systems including Active Directory, Centrify, and other similar systems.

Endpoint context data

This data includes information about endpoint systems (servers, workstations, routers, switches, etc.) and can be sourced from asset management systems, vulnerability scanners, and endpoint management/detection/response systems such as Webroot, Tanium, Sophos, Endgame, CarbonBlack and others.

Threat intelligence context data

This data includes contextual information about URLs, domains, websites, files and others.

Vulnerability context data

This data includes contextual information about vulnerabilities and is typically sources from vulnerability management systems (i.e. Qualys, Tenable, etc.).

Roadmap Items:

File context data
Certificate context data

Naming Convention

A naming convention is needed for the open data model to represent attributes across vendor products and technologies. The naming convention is composed of prefixes (net, http, src, dst, etc.) and common attribute names (ip4, user_name, etc.). It is common to use multiple prefixes in combination with an attribute. The following examples are provided to illustrate the naming convention.

src_ip4

"src" - this prefix indicates the attribute pertains to details about the "source" entity referenced in the event (src_ip4, src_user_name, src_host, etc.)
"ip4" - this attribute name corresponds to an IP address (version 4)
Summary: This attribute represents the source ip address (version 4) within the referenced event

prx_browser

"prx" - this prefix indicates the attribute pertains to a "Proxy" event
"browser" -this attribute name corresponds to the "browser" referenced within the event
Summary: This attribute represents the browser (i.e. "Mozilla", "Internet Explorer", etc.) referenced in the Proxy event

dvc_host

"dvc" - This prefix indicates the attribute pertains to the "Device" that is the source of the event
"host" - This attribute name corresponds to the "hostname"
Summary: This attribute represents the hostname of the device where the event was generated

Prefixes

Prefix	Description
src	Corresponds to the "source" fields within a given event (i.e. source address)
dst	Corresponds to the "destination" fields within a given event (i.e. destination address)
dvc	Corresponds to the "device" applicable fields within a given event (i.e. device address) and represent where the event originated
fwd	Forwarded from device
request	Corresponds to requested values (vs. those returned, i.e. "requested URI")
response	Corresponds to response value (vs. those requested)
file	Corresponds to the "file" fields within a given event (i.e. file type)
user	Corresponds to user attributes (i.e. name, id, etc.)
xlate	Corresponds to translated values within a given event (i.e. src_xlate_ip for "translated source ip address"
in	Ingress
out	Egress
new	New value
orig	Original value
app	Corresponds to values associated with application events
net	Corresponds to values associated with network attributes (direction, flags)
end	Corresponds to values associated with endpoint attributes
dns	Corresponds to attributes within the DNS protocol
prx	Corresponds to attributes within Proxy events
av	Corresponds to attributes within Antivirus events
http	Corresponds to attributes within the HTTP protocol
smtp	Corresponds to attributes within the SMTP protocol
ftp	Corresponds to attributes within the FTP protocol
snmp	Corresponds to attributes within the SNMP protocol
tls	Corresponds to attributes within the TLS protocol
ssh	Corresponds to attributes within the SSH protocol
dhcp	Corresponds to attributes within the DHCP protocol
irc	Corresponds to attributes within the IRC protocol
flow	Corresponds to attributes within FLOW events
ti	Corresponds to attributes within Threat Intelligence context data
vuln	Corresponds to attributes within vulnerability management data

Security Event Log/Alert Data Model

The data model for security event logs/alerts is detailed in the below. The attributes are categorized as follows:

Common

Attributes that are common across many device types

Device

Attributes that are applicable to the device that generated the event

Network

Attributes that are applicable to the network components of the event

File

Attributes that are applicable to file objects referenced in the event

Endpoint

Attributes that are applicable to the endpoints referenced in the event

User

Attributes that are applicable to the user referenced in the event

Proxy

Attributes that are applicable to proxy events

Antivirus

Attributes that are applicable to antivirus events

Vulnerability

Attributes that are applicable to vulnerability management events

Protocol

DNS - attributes that are specific to the DNS protocol
HTTP - attributes that are specific to the HTTP protocol
….SMTP, SSH, TLS, DHCP, IRC, SNMP and FTP

Note: The model will evolve to include reserved attributes for additional device types that are not currently represented. The model can currently be extended to support ANY attribute for ANY device type by following the guidance outlined in the section titled "Extensibility of Data Model".

Category	Attribute	Data type	Description	Sample Values
Common	event_time	long	timestamp of event (UTC)	1472653952
	begintime	long	timestamp	1472653952
	endtime	long	timestamp	1472653952
	event_insertime	long	timestamp	1472653952
	lastupdatetime	long	timestamp	1472653952
	duration	float	Time duration (milliseconds)	2345
	event_id	string	Unique identifier for event	x:2388
	name	string	Name of event	"Successful login …"
	org	string	Organization	"HR" or "Finance" or "CustomerA"
	type	string	Type information	"Informational", "image/gif"
	n_proto	string	Network protocol of event	TCP, UDP, ICMP
	a_proto	string	Application protocol of event	HTTP, NFS, FTP
	msg	string	Message (details of action taken on object)	Some long string
	mac	string	MAC address	94:94:26:3:86:16
	severity	string	Severity of event	High, 10, 1
	raw	string	Raw text message of entire event	Complete copy of log entry
	risk	Floating point	Risk score	95.67
	code	string	Response or error code	404
	category	string	Event category	/Application/Start
	query	string	Query (DNS query, URI query, SQL query, etc.)	Select * from table
	service	string	(i.e. service name, type of service)	sshd
	state	string	State of object	Running, Paused, stopped
	in_bytes	int	Bytes in	1025
	out_bytes	int	Bytes out	9344
	xref	string	External reference to public description	http://www.oracle.com/technetwork/java/javase/2col/6u85-bugfixes-2298235.html
	version	string	Version	5.4
	api	string	API label	"somestring"
	parameter	string	Parameter label	"somestring"
	action	string	Action label	"somestring"
	proc	string	Process label	"somestring"
	app	string	Application label	"somestring"
	disposition	string	Disposition label	"somestring"
	prevalence	string	Prevalence label	"somestring"
	confidence	string	Confidence label	"somestring"
	sensitivity	string	Sensitivity label	"somestring"
	count	int	Generic count	20
	company	string	Company label	"somestring"
	additional_attrs	String (JSON Map)	Custom event attributes	"building":"729","cube":"401"
	totrust	string	Coming soon	Coming soon
	fromtrust	string	Coming soon	Coming soon
	rule	string	Coming soon	Coming soon
	threat	string	Coming soon	Coming soon
	pcap_id	int	Coming soon	Coming soon
Device	dvc_time	long	UTC timestamp from device where event/alert originates or is received	1472653952
	dvc_ip4/dvc_ip6	long	IP address of device	Integer representation of 10.1.1.1
	dvc_group	string	Device group label	"somestring"
	dvc_server	string	Server label	"somestring"
	dvc_host	string	Hostname of device	Integer representation of 10.1.1.1
	dvc_domain	string	Domain of dvc	"somestring"
	dvc_type	string	Device type that generated the log	Unix, Windows, Sonicwall
	dvc_vendor	string	Vendor	Microsoft, Fireeye
	dvc_fwd_ip4/fwd_ip6	long	Forwarded from device	Integer representation of 10.1.1.1
	dvc_version	string	Version	"3.2.2"
Network	src_ip4/src_ip6	bigint	Source ip address of event	Integer representation of 10.1.1.1
	src_host	string	Source FQDN of event	test.companyA.com
	src_domain	string	Domain name of source address	companyA.com
	src_port	int	Source port of event	1025
	src_country_code	string	Source country code	cn
	src_country_name	string	Source country name	China
	src_region	string	Source region	string
	src_city	string	Source city	Shenghai
	src_lat	int	Source latitude	90
	src_long	int	Source longitude	90
	dst_ip4/dst_ip6	bigint	Destination ip address of event	Integer representation of 10.1.1.1
	dst_host	string	Destination FQDN of event	test.companyA.com
	dst_domain	string	Domain name of destination address	companyA.com
	dst_port	int	Destination port of event	80
	dst_country_code	string	Source country code	cn
	dst_country_name	string	Source country name	China
	dst_region	string	Source region	string
	dst_city	string	Source city	Shenghai
	dst_lat	int	Source latitude	90
	dst_long	int	Source longitude	90
	src_asn	int	Autonomous system number	33
	dst_asn	int	Autonomous system number	33
	net_direction	string	Direction	In, inbound, outbound, ingress, egress
	net_flags	string	TCP flags	.AP.SF
File	file_name	string	Filename from event	output.csv
	file_path	string	File path	/root/output.csv
	file_atime	bigint	Timestamp (UTC) of file access	1472653952
	file_acls	string	File permissions	rwx-rwx-rwx
	file_type	string	Type of file	".doc"
	file_size	int	Size of file in bytes	1244
	file_desc	string	Description of file	Project Plan for Project xyz
	file_hash	string	Hash of file
	file_hash_type	string	Type of hash	MD5, SHA1,SHA256
Endpoint	end_object	string	File/Process/Registry	File, Registry, Process
	end_action	string	Action taken on object (open/delete/edit)	Open, Edit
	end_msg	string	Message (details of action taken on object)	Some long string
	end_app	string	Application	Microsoft Powerpoint
	end_location	string	Location	Atlanta, GA
	end_proc	string	Process	SSHD
User	user_name (Src_user_name, dst_user_name)	string	username from event	jsmith
	user_email	string	Email address	test@companyA.com
	user_id	string	userid	234456
	user_loc	string	location	Herndon, VA
	user_desc	string	Description of user	"somestring"
DNS	dns_class	string	DNS class	1
	dns_len	int	DNS frame length	188
	dns_query	string	Requested DNS query	test.test.com
	dns_response_code	string	Response code	0x00000001
	dns_answers	string	Response to DNS Query	178.2.1.99
	dns_type	int	DNS query type	1
Proxy	prx_category	string	Event category	SG-HTTP-SERVICE
	prx_browser	string	Web browser	Internet Explorer
	prx_code	string	Error or response code	404
	prx_referrer	string	Referrer	www.usatoday.com
	prx_host	string	Requested URI	/wcm/assets/images/imagefileicon.gif
	prx_filter_rule	string	Applied filter or rule	Internet, Rule 6
	prx_filter_result	string	Result of applied filter or rule	Proxied, Blocked
	prx_query	string	URI query	?func=S_senseHTML&Page=a26815a313504697a126279
	prx_action	string	Action taken on object	TCP_HIT, TCP_MISS, TCP_TUNNELED
	prx_method	string	HTTP method	GET, CONNECT, POST
	prx_type	string	Type of request	image/gif
HTTP	http_request_method	string	HTTP method	GET, CONNECT, POST
	http_request_uri	string	Requested URI	/wcm/assets/images/imagefileicon.gif
	http_request_body_len	int	Length of request body	98
	http_request_user_name	string	username from event	jsmith
	http_request_password	string	Password from event	abc123
	http_request_proxied	string	Proxy request label	"somestring"
	http_request_headers	MAP	HTTP request headers	request_headers['HOST'] request_headers['USER-AGENT'] request_headers['ACCEPT']
	http_response_status_code	int	HTTP response status code	404
	http_response_status_msg	string	HTTP response status message	"Not found"
	http_response_body_len	int	Length of response body	98
	http_response_info_code	int	HTTP response info code	100
	http_response_info_msg	string	HTTP response info message	"somestring"
	http_response_resp_fuids	string	Response FUIDS	"somestring"
	http_response_mime_types	string	Mime types	"cgi,bat,exe"
	http_response_headers	MAP	Response headers	response_headers['SERVER'] response_headers['SET-COOKIE'] response_headers['DATE']
SMTP	smtp_trans_depth	int	Depth of email into SMTP exchange	2
	smtp_headers_helo	string	Helo header	"somestring"
	smtp_headers_mailfrom	string	Mailfrom header	"somestring"
	smtp_headers_rcptto	string	Rcptto header	"somestring"
	smtp_headers_date	string	Header date	"somestring"
	smtp_headers_from	string	From header	"somestring"
	smtp_headers_to	string	To header	"somestring"
	smtp_headers_reply_to	string	Reply to header	"somestring"
	smtp_headers_msg_id	string	Message ID	"somestring"
	smtp_headers_in_reply_to	string	In reply to header	"somestring"
	smtp_headers_subject	string	Subject	"somestring"
	smtp_headers_x_originating_ip4	bigint	Originating IP address	1203743731
	smtp_headers_first_received	string	First to receive message	"somestring"
	smtp_headers_second_received	string	Second to receive message	"somestring"
	smtp_last_reply	string	Last reply in message chain	"somestring"
	smtp_path	string	Path of message	"somestring"
	smtp_user_agent	string	User agent	"somestring"
	smtp_tls	boolean	Indication of TLS use	1
	smtp_is_webmail	boolean	Indication of webmail	0
FTP	ftp_user_name	string	Username	"somestring"
	ftp_password	string	Password	"somestring"
	ftp_command	string	FTP command	"somestring"
	ftp_arg	string	Argument	"somestring"
	ftp_mime_type	string	Mime type	"somestring"
	ftp_file_size	int	File size	1024
	ftp_reply_code	int	Reply code	3
	ftp_reply_msg	string	Reply message	"somestring"
	ftp_data_channel_passive	boolean	Passive data channel?	1
	ftp_data_channel_rsp_p	string		"somestring"
	ftp_cwd	string	Current working directory	"somestring"
	ftp_cmdarg_ts	float		Coming soon
	ftp_cmdarg_cmd	string	Command	"somestring"
	ftp_cmdarg_arg	string	Command argument	"somestring"
	ftp_cmdarg_seq	int	Sequence	2
	ftp_pending_commands	string	Pending commands	"somestring"
	ftp_is_passive	boolean	Passive mode enabled	0
	ftp_fuid	string	Coming soon	"somestring"
	ftp_last_auth_requested	string	Coming soon	"somestring"
SNMP	snmp_version	string	Coming soon	"somestring"
	snmp_community	string	Coming soon	"somestring"
	snmp_get_requests	int	Coming soon	Coming soon
	snmp_get_bulk_requests	int	Coming soon	Coming soon
	snmp_get_responses	int	Coming soon	Coming soon
	snmp_set_requests	int	Coming soon	Coming soon
	snmp_display_string	string	Coming soon	Coming soon
	snmp_up_since	float	Coming soon	Coming soon
TLS	tls_version	string	Coming soon	Coming soon
	tls_cipher	string	Coming soon	Coming soon
	tls_curve	string	Coming soon	Coming soon
	tls_server_name	string	Coming soon	Coming soon
	tls_resumed	boolean	Coming soon	Coming soon
	tls_next_protocol	string	Coming soon	Coming soon
	tls_established	boolean	Coming soon	Coming soon
	tls_cert_chain_fuids	string	Coming soon	Coming soon
	tls_client_cert_chain_fuids	string	Coming soon	Coming soon
	tls_subject	string	Coming soon	Coming soon
	tls_issuer	string	Coming soon	Coming soon
SSH	ssh_version	string	Coming soon	Coming soon
	ssh_auth_success	boolean	Coming soon	Coming soon
	ssh_client	string	Coming soon	Coming soon
	ssh_server	string	Coming soon	Coming soon
	ssh_cipher_algorithm	string	Coming soon	Coming soon
	ssh_mac_algorithm	string	Coming soon	Coming soon
	ssh_compression_algorithm	string	Coming soon	Coming soon
	ssh_key_exchange_algorithm	string	Coming soon	Coming soon
	ssh_host_key_algorithm	string	Coming soon	Coming soon
DHCP	dhcp_assigned_ip4	bigint	Coming soon	Coming soon
	dhcp_mac	string	Coming soon	Coming soon
	dhcp_lease_time	double	Coming soon	Coming soon
IRC	irc_user	string	Coming soon	Coming soon
	irc_nickname	string	Coming soon	Coming soon
	irc_command	string	Coming soon	Coming soon
	irc_value	string	Coming soon	Coming soon
	irc_additional_data	string	Coming soon	Coming soon
Flow	flow_in_packets	int	Coming soon	Coming soon
	flow_out_packets	int	Coming soon	Coming soon
	flow_conn_state	string	Coming soon	Coming soon
	flow_history	string	Coming soon	Coming soon
	flow_src_dscp	string	Coming soon	Coming soon
	flow_dst_dscp	string	Coming soon	Coming soon
	flow_input	string	Coming soon	Coming soon
	flow_output	string	Coming soon	Coming soon
Vulnerability	vuln_id	string	Unique vulnerability identifier	10748
	vuln_type	string	Vulnerability title (i.e. Wireshark Multiple Vulnerabilities)
	vuln_status	string	Vulnerability type (Potential, Confirmed, etc.)
	vuln_severity	string	Vulnerability severity (Critical, High, etc.)
	created	bigint	Timestamp of vulnerability identification
Antivirus	av_riskname	string	Coming soon	Coming soon
	av_actualaction	string	Coming soon	Coming soon
	av_requestedaction	string	Coming soon	Coming soon
	av_secondaryaction	string	Coming soon	Coming soon
	av_downloadsite	string	Coming soon	Coming soon
	av_downloadedby	string	Coming soon	Coming soon
	av_tracking_status	string	Coming soon	Coming soon
	av_firstseen	bigint	Coming soon	Coming soon
	application_hash	string	Coming soon	Coming soon
	application_hash_type	string	Coming soon	Coming soon
	application_name	string	Coming soon	Coming soon
	application_version	string	Coming soon	Coming soon
	application_type	string	Coming soon	Coming soon
	av_categoryset	string	Coming soon	Coming soon
	av_categorytype	string	Coming soon	Coming soon
	av_threat_count	int	Coming soon	Coming soon
	av_infected_count	int	Coming soon	Coming soon
	av_omitted_count	int	Coming soon	Coming soon
	av_scanid	int	Coming soon	Coming soon
	av_startmessage	string	Coming soon	Coming soon
	av_stopmessage	string	Coming soon	Coming soon
	av_totalfiles	int	Coming soon	Coming soon
	av_signatureid	string	Coming soon	Coming soon
	av_signaturestring	string	Coming soon	Coming soon
	av_signaturesubid	string	Coming soon	Coming soon
	av_intrusionurl	string	Coming soon	Coming soon
	av_intrusionpayloadurl	string	Coming soon	Coming soon
	objectname	string	Coming soon	Coming soon

Note, it is not necessary to populate all of the attributes within the model. For attributes not populated in a single security event log/alert, contextual data may not be available. For example, the sample event below can be enriched with contextual data about the referenced endpoints (10.1.1.1 and 192.168.10.10), but not a user, because username is not populated.

{
"date":"12/12/2015",
"time":"23:14:56",
"source_ip":"10.1.1.1",
"source_port":1025,
"protocol":"tcp",
"destination_ip":"192.168.10.10",
"destination_port":443,
"bytes":"1183"
}

Context Models

The recommended approach for populating the context models (user, endpoint, network, threat intelligence, etc.) involves consuming information from the systems most capable or providing the needed context. Populating the user context model is best accomplished by leveraging user/identity management systems such as Active Directory or Centrify and populating the model with details such as the user's full name, job title, phone number, manager's name, physical address, entitlements, etc. Similarly, an endpoint model can be populated by consuming information from endpoint/asset management systems (Tanium, Webroot, etc.), which provide information such as the services running on the system, system owner, business context, etc.

User Context Model

Attribute	Data Type	Description	Sample Values
dvc_time	bigint	Timestamp from when the user context information is obtained	1472653952
user_created	bigint	Timestamp from when user was created	1472653952
user_changed	bigint	Timestamp from when user was updated	1472653952
user_last_logon	bigint	Timestamp from when user last logged on	1472653952
user_logon_count	int	Number of times account has logged on	232
user_last_reset	bigint	Timestamp from when user last reset password	1472653952
user_expiration	bigint	Date/time when user expires	1472653952
user_id	string	Unique user id	1234
user_image	binary	Image/picture of user
user_name	string	Username in event log/alert	jsmith
user_name_first	string	First name	John
user_name_middle	string	Middle name	Henry
user_name_last	string	Last name	Smith
user_name_mgr	string	Manager's name	Ronald Reagan
user_phone	string	Phone number	703-555-1212
user_email	string	Email address	jsmith@company.com
user_code	string	Job code	3455
user_loc	string	Location	US
user_departm	string	Department	IT
user_dn		Distinguished name	"CN=scm-admin-mej-test2-adk,OU=app-admins,DC=ad,DC=halxg,DC=companya,DC=com"
user_ou	string	Organizational unit	EAST
user_empid	string	Employee ID	12345
user_title	string	Job Title	Director of IT
user_groups	array (Comma separated)	Groups to which the user belongs	"Domain Admins", "Domain Users"
dvc_type	string	Device type that generated the user context data	Active Directory
dvc_vendor	string	Vendor	Microsoft
user_risk	Floating point	Risk score	95.67
dvc_version	string	Version	8.1.2
additional_attrs	map	Additional attributes of user	Key value pairs

Endpoint Context Model

Abbreviation	Data Type	Description	Sample Values
dvc_time	bigint	Timestamp from when the endpoint context information is obtained	1472653952
end_ip4	bigint	IP address of endpoint	Integer representation of 10.1.1.1
end_ip6	bigint	IP address of endpoint	Integer representation of 10.1.1.1
end_os	string	Operating system	Redhat Linux 6.5.1
end_os_version	string	Version of OS	5.4
end_os_sp	string	Service pack	SP 2.3.4.55
end_tz	string	timezone	EST
end_hotfixes	array (Comma separated)	Applied hotfixes	993.2
end_disks	array (Comma separated)	Available disks	\\Device\\HarddiskVolume1, \\Device\\HarddiskVolume2
end_removables	array (Comma separated)	Removable media devices	USB Key
end_nics	array (Comma separated)	Network interfaces	fe10::28f4:1a47:658b:d6e8, fe82::28f4:1a47:658b:d6e8
end_drivers	array (Comma separated)	Installed kernel drivers	ntoskrnl.exe, hal.dll
end_users	array (Comma separated)	Local user accounts	administrator, jsmith
end_host	string	Hostname of endpoint	tes1.companya.com
end_mac	string	MAC address of endpoint	fe10::28f4:1a47:658b:d6e8
end_owner	string	Endpoint owner (name)	John Smith
end_vulns	array (Comma separated)	Vulnerability identifiers (CVE identifier)	CVE-123, CVE-456
end_loc	string	Location	US
end_departm	string	Department name	IT
end_company	string	Company name	CompanyA
end_regs	array (Comma separated)	Applicable regulations	HIPAA, SOX
end_svcs	array (Comma separated)	Services running on system	Cisco Systems, Inc. VPN Service, Adobe LM Service
end_procs	array (Comma separated)	Processes	svchost.exe, sppsvc.exe
end_criticality	string	Criticality of device	Very High
end_apps	array (Comma separated)	Applications running on system	Microsoft Word, Chrome
end_desc	string	Endpoint descriptor	Some string
dvc_type	string	Device type that generated the log	Microsoft Windows 7
dvc_vendor	string	Vendor	Endgame
dvc_version	string	Version	2.1
end_architecture	string	CPU architecture	x86
end_uuid	string	Universally unique identifier	a59ba71e-18b0-f762-2f02-0deaf95076c6
end_risk	Floating point	Risk score	95.67
end_memtotal	int	Total memory (bytes)	844564433
additional_attrs	map	Additional attributes	Key value pairs

Vulnerability Context Model

Attribute	Data Type	Description	Sample Values
vuln_id	string	Unique vulnerability identifier	10748
vuln_title	string	Vulnerability title	"Wireshark Multiple Vulnerabilities"
vuln_description	string	Vulnerability description
vuln_solution	string	Vulnerability remediation description	"Patch: The following URLs provide patch procedures .."
vuln_type	string	Vulnerability type	Potential, Confirmed, etc.
vuln_category	string	Vulnerability category	Ubuntu, Windows, etc.
vuln_status	string	Vulnerability status	Active, Fixed, etc.
vuln_severity	string	Vulnerability severity	Critical, High, Medium, etc.
vuln_created	bigint	Vulnerability creation timestamp	timestamp
vuln_updated	bigint	Vulnerability updated timestamp	timestamp
additional_attrs	map	Additional attributes	Key value pairs

Network Context Model

Attribute	Data Type	Description	Sample Values
net_domain_name	string	Domain name
net_registry_domain_id	string	Registry Domain ID
net_registrar_whois_server	string	Registrar WHOIS Server
net_registrar_url	string	Registrar URL
net_update_date	bigint	UTC timestamp
net_creation_date	bigint	Creation Date
net_registrar_registration_expiration_date	bigint	Registrar Registration Expiration Date
net_registrar	string	Registrar
net_registrar_iana_id	string	Registrar IANA ID
net_registrar_abuse_contact_email	string	Registrar Abuse Contact Email
net_registrar_abuse_contact_phone	string	Registrar Abuse Contact Phone
net_domain_status	string	Domain Status
net_registry_registrant_id	string	Registry Registrant ID
net_registrant_name	string	Registrant Name
net_registrant_organization	string	Registrant Organization
net_registrant_street	string	Registrant Street
net_registrant_city	string	Registrant City
net_registrant_state_province	string	Registrant State/Province
net_registrant_postal_code	string	Registrant Postal Code
net_registrant_country	string	Registrant Country
net_registrant_phone	string	Registrant Phone
net_registrant_email	string	Registrant Email
net_registry_admin_id	string	Registry Admin ID
net_name_servers	string	Name Server
net_dnssec	string	DNSSEC
net_risk	Floating point	Risk score	95.67

Threat Intelligence Context Model

Attribute	Data Type	Description
ti_source	String	TI Provider, Open Source List, Internally Developed, LE Tip, Other
ti_provider_id	String	Anomali, CrowdStrike, Mandiant, Alienvault OTX, USCERT, etc
ti_indicator_id	String	Unique IQ from the provider
ti_indicator_desc	String	Full Text descriptor and links of the Indicator and associated information
ti_date_added	UTC Timestamp	Date first added by the provider
ti_date_modified	UTC Timestamp	Date last updated by the provider.
ti_risk_impact	String	Likely Targets what function within the organization?
ti_severity	String	Nation State, Targeted, Advanced, Commodity, Other
ti_category	String	Ecrime, Hacktivism, Geo Pollitical, Foreign Intelligence Service
ti_campaign_name	String	Internal Campaign designation
ti_deployed_location	array (Comma separated)	Where this indicator should be matched for applicability (Core, Perimeter, Network, Endpoint, Logs, ALL, etc)
ti_associated_incidents	String	Known Associated Incident ID's
ti_adversarial_identification_group	String	Adversary Group designation usually provided by the provider.
ti_adversarial_identification_tactics	String	Known Adversary Tactics as indicated by the source provider.
ti_adversarial_identification_reports	String	Linked Adversary reports.
ti_phase	String	Discovery, Weaponization, Delivery, C2, Exploitation, Actions on Objectives, etc
ti_indicator_cve	String	MITRE CVE Link(s)
ti_indicator_ip4	array	CIDR noted IPv4 Address Indicated by Threat Intelligence
ti_indicator_ip6	array	IPv6 Address Indicated by Threat Intelligence
ti_indicator_domain	String	Domain Name(s)
ti_indicator_hostname	String	Host or Subdomain Name(es)
ti_indicator_email	array (Comma separated)	Email addresses associated with Indicator
ti_indicator_url	array (Comma separated)	URL(s) associated with indicator
ti_indicator_uri	array (Comma separated)	URI(s) associated with indicator
ti_indicator_file_hash	String	File Hash Value associated with the indicator.
ti_indicator_file_path	String	File Path Value associated with the indicator.
ti_indicator_mutex	String	MUTEX Value associated with the indicator.
ti_indicator_md5	String	MD5 Hash Sum Value
ti_indicator_sha1	String	SHA1 Hash Sum Value
ti_indicator_sha256	String	SHA256 Hash Sum Value
ti_indicator_device_path	String	Device Path Value associated with the indicator.
ti_indicator_drive	String	Drive Value associated with the indicator.
ti_indicator_file_name	String	File Name Value associated with the indicator.
ti_indicator_file_extension	String	File Extension Value associated with the indicator.
ti_indicator_file_size	String	File Size Value associated with the indicator.
ti_indicator_file_created	bigint	Date File value associated with the indicator was created.
ti_indicator_file_accessed	bigint	Date File value associated with the indicator was last accessed.
ti_indicator_file_changed	bigint	Date File value associated with the indicator was last changed.
ti_indicator_file_entropy	String	Calculated entropy value associated with the file indicated.
ti_indicator_file_attributes	array (Comma separated)	Read Only, System, Hidden, Directory, Archive, Device, Temporary, SparseFile, Compressed, Encrypted, Index, Deleted, etc
ti_indicator_user_name	String	username associated with the indicator.
ti_indicator_security_id	String	if known securityID associated with the indicator.
ti_indicator_pe_info	array (Comma separated)	Subsystem, BaseAddress, PETImeStamp, Expert, JumpCodes, DetectedAnomalies, DigitalSignatures,VersionInfo, ResourceInfo,Imported Modules
ti_indicator_pe_type	array (Comma separated)	Executable, DLL, Invalid, Unknown, Native, Windows_GUI, OS2, POSIX, EFI, etc
ti_indicator_strings	array (Comma separated)	Any strings associated with the file indicated that might be useful in identification or further indicator development or adversary identification.
ti_indicator_org	String	Name of the business that owns the IP address associated with the indicator
ti_indicator_reg_name	String	Name of the person who registered the domain
ti_indicator_reg_email	String	Email address of the person who registered the domain
ti_indicator_reg_org	String	Name of the organisation that registered the domain
ti_indicator_reg_phone	String	Phone number associated with the domain registered
ti_tags	String	Additional comments/associations from the feed
ti_threat_type	String	malware, compromised, apt, c2, etc...

Extensibility of Data Models

The aforementioned data model can be extended to accommodate custom attributes by embedding key-value pairs within the log/alert/context entries.

Each model will support an additional attribute by the name of additional_attrs whose value would be a JSON string. This JSON string will contain a Map (and only a Map) of additional attributes that can't be expressed in the specified model description. Regardless of the type of these additional attributes, they will always be interpreted as String. It's up to the user, to translate them to appropriate types, if necessary, in the analytics layer. It is also the user's responsibility to populate the aforementioned attribute as a Map, by presumably parsing out these attributes from the original message.

For example, if a user wanted to extend the user context model to include a string attribute for "Desk Location" and "City", the following string would be set for additional_attrs:

Attribute key	Attribute value
additional_attrs	{"dsk_location":"B3-F2-W3", "city":"Palo Alto"}

Something similar can be done for endpoint context model, security event log/alert model and other entities.

Note: This UDF library can be used for converting to/from JSON.

Model Relationships

The relationships between the data model entities are illustrated below.

Data Formats

The following data formats are recommended for use with the Spot open data model.

Avro

Avro is the recommended data format due to its schema representation, compatibility checks, and interoperability with Hadoop. Avro supports a pure JSON representation for readability and ease of use but also a binary representation of the data for efficient storage. Avro is the optimal format for streaming-based analytic use cases.

A sample event and corresponding schema representation are detailed below.

{
"event_time":1469562994,
"net_src_ip4":"192.168.1.1",
"net_src_host":"test1.companyA.com",
"net_src_port":1029,
"net_dst_ip4":"192.168.21.22",
"net_dst_host":"test3.companyB.com",
"net_dst_port":443,
"dvc_type":"sshd",
"category":"auth",
"a_proto":"sshd",
"msg":"user:jsmith successfully logged in to test3.companyA.com from 192.168.1.1",
"user_name":"jsmith",
"Severity":3
}

Schema

{
"type": "record",
"doc":"This event records SSHD activity",
"name": "auth",
"fields" :
[
{"name":"event_time", "type":"long", "doc":"Stop time of event""},
{"name":"net_src_ip4", "type":"long", "doc":"Source IP Address"},
{"name":"net_src_host", "type":"string","doc":"Source hostname},
{"name":"net_src_port", "type":"int","doc":"Source port"},
{"name":"net_dst_ip4", "type":"long", "doc"::"Destination IP Address"},
{"name":"net_dst_host", "type":"string", "doc":"Destination IP Address"},
{"name":"net_dst_port", "type":"int", "doc":"Destination port"},
{"name":"dvc_type", "type":"string", "doc":"Source device type"},
{"name":"category", "type":"string","doc":"category/type of event message"},
{"name":"a_proto", "type":"string","doc":"Application or network protocol"},
{"name":"msg", "type":"string","doc":"event message"},
{"name":"severity", "type":"int","doc":"severity of event on scale of 1-10"},
]
}

Parquet

Parquet is a columnar storage format that offers the benefits of compression and efficient columnar data representation and is optimal for batch analytic use cases. More information on parquet can be found here.

It should be noted that conversion from Avro to Parquet is supported. This allows for data collected and analyzed for stream-based use cases to be easily converted to Parquet for longer-term batch analytics.

ODM Resultant Capability - A Singular View

The resultant capability provided by the Spot ODM is the ability to bring together all the security relevant data from the entities referenced (event, user, network, endpoint, etc.) into a singular view that can be used to detect threats more effectively than ever before. The singular view can be leveraged to create new analytic models that were not previously possible and to provide needed context at the event level to effectively determine whether or not there is a threat.

Example - Advanced Threat Modeling

In this example, the ODM is leveraged to build an "event" table for a threat model that uses attributes native to the ODM and derived attributes, which are calculations based on the aggregate data stored in the model. In this context, an "event" table is defined by the attributes to be evaluated for predictive power in identifying threats and the actual attribute values (i.e rows in the table). In the example below, the event table is composed of the following attributes, which are then leveraged to identify threats via a Risk Score analytic model:

"net_src_ipv4" - This attribute is native to the security event log component of the ODM and represents the source IP address of the corresponding table row
"os" - This attribute is native to the endpoint context component of the ODM and represents the operating system of the endpoint system in the table row
SUM (in_bytes + out_bytes) for the last 7 days - "in_bytes" and "out_bytes" are native to the security event log component of the ODM. This derived attribute represents a summation of bytes between the source address and destination domain over the last 7 days
"net_dst_domain" - This attribute is native to the security event log component of the ODM and represents the destination domain
Days since "creation_date" - "creation_date" is native to the network context component of the ODM and represents the date the referenced domain was registered. This derived attribute calculates the days since the domain was created/registered.

net_src_ipv4	os	net_dst_domain	Days since "creation_date"	SUM (in_bytes + out_bytes)	Risk Score (1-100)
10.1.1.10	Microsoft	dajdkwk.com	39	3021 MB	99
192.168.8.9	Redhat	usatoday.com	3027	2 MB	2
172.16.32.3	Apple	box.com	1532	76 MB	10
192.168.4.4	Microsoft	kzjkeljr.ru	3	0.9 MB	92

The "Risk Score" attribute represents potential output from a threat detection model based on the attributes and values represented in the "event" table and is provided as an example of what is enabled by the ODM. Can you tell which attributes and values hold predictive power for threat detection?

Example - Singular Data View for Complete Context

The table below demonstrates a logical, "denormalized" view of what is offered by the ODM. In this example, the raw DNS event is mapped to the ODM, which is enriching the DNS event with Endpoint and Network context needed to make a proper threat determination. For large datasets, this type of view is not performant or reasonable to provide with databases upon which legacy security analytic technologies are built. However, this singular/denormalized data representation is feasible with Spot.

RAW DNS EVENT

1463702961,169,10.0.0.101,172.16.36.157,www.kzjkeljr.ru,1,0x00000001,0,49.52.46.49

DNS EVENT + ODM

ODM Attribute	Value	Description	ODM Context Attributes
event_time	1463702961	UTC timestamp of DNS query
length	169	DNS Frame length
net_dst_ip4	10.1.0.11	Destination address (DNS server)	Endpoint Context os="Redhat 6.3" host="dns.companyA.com" mac="94:94:26:3:86:16" departm="IT" regs="PCI" vulns="CVE-123, CVE-456,..." ….
net_src_ip4	172.16.32.17	Source address (DNS query initiator)	Endpoint Context os="Microsoft Windows 7" host="jsmith.companyA.com" mac="94:94:26:3:86:17" departm="FCE" regs="Corporate" apps="Office 365, Visio 12.2, Chrome 52.0.3…." vulns="CVE-123, CVE-456,..." ….
dns_query	www.kzjkeljr.ru	DNS query	Network Context domain_name="kzjkeljr.ru" Creation_date"2016-08-30" registrar_registration_expiration_date="2016-09-30" registration_country="Russia" ….
dns_class	1	DNS query class
dns_code	0x00000001	DNS response code
dns_answer	49.52.46.49	A record, DNS query response