Detect DGA domains using pretrained model in DSDL
THIS IS A EXPERIMENTAL DETECTION
This detection has been marked experimental by the Splunk Threat Research team. This means we have not been able to test, simulate, or build datasets for this detection. Use at your own risk. This analytic is NOT supported.
Description
The following analytic uses a pre trained deep learning model to detect Domain Generation Algorithm (DGA) generated domains. The model is trained independently and is then made available for download. One of the prominent indicators of a domain being DGA generated is if the domain name consists of unusual character sequences or concatenated dictionary words. Adversaries often use clever techniques to obfuscate machine generated domain names as human generated. Predicting DGA generated domain names requires analysis and building a model based on carefully chosen features. The deep learning model we have developed uses the domain name to analyze patterns of character sequences along with carefully chosen custom features to predict if a domain is DGA generated. The model takes a domain name consisting of second-level and top-level domain names as input and outputs a dga_score. Higher the dga_score, the more likely the input domain is a DGA domain. The threshold for flagging a domain as DGA is set at 0.5.
- Type: Anomaly
- Product: Splunk Enterprise, Splunk Enterprise Security, Splunk Cloud
- Datamodel: Network_Resolution
- Last Updated: 2023-01-18
- Author: Abhinav Mishra, Kumar Sharad and Namratha Sreekanta, Splunk
- ID: 92e24f32-9b9a-4060-bba2-2a0eb31f3493
Annotations
Kill Chain Phase
- Command & Control
NIST
- PR.DS
- PR.PT
- DE.AE
- DE.CM
CIS20
- CIS 8
- CIS 12
- CIS 13
CVE
Search
1
2
3
4
5
6
7
8
9
10
11
12
| tstats `security_content_summariesonly` values(DNS.answer) as IPs min(_time) as firstTime max(_time) as lastTime from datamodel=Network_Resolution by DNS.src, DNS.query
| `drop_dm_object_name(DNS)`
| rename query AS domain
| fields IPs, src, domain, firstTime, lastTime
| apply pretrained_dga_model_dsdl
| rename pred_dga_proba AS dga_score
| where dga_score>0.5
| `security_content_ctime(firstTime)`
| `security_content_ctime(lastTime)`
| table src, domain, IPs, firstTime, lastTime, dga_score
| `detect_dga_domains_using_pretrained_model_in_dsdl_filter`
Macros
The SPL above uses the following Macros:
detect_dga_domains_using_pretrained_model_in_dsdl_filter is a empty macro by default. It allows the user to filter out any results (false positives) without editing the SPL.
Required fields
List of fields required to use this analytic.
- IPs
- src
- domain
- firstTime
- lastTime
How To Implement
Steps to deploy DGA detection model into Splunk App DSDL.\ This detection depends on the Splunk app for Data Science and Deep Learning which can be found here - https://splunkbase.splunk.com/app/4607/ and the Network Resolution datamodel which can be found here - https://splunkbase.splunk.com/app/1621/. The detection uses a pre-trained deep learning model that needs to be deployed in DSDL app. Follow the steps for deployment here - https://github.com/splunk/security_content/wiki/How-to-deploy-pre-trained-Deep-Learning-models-for-ESCU.\ * Download the artifacts .tar.gz file from the link https://seal.splunkresearch.com/pretrained_dga_model_dsdl.tar.gz
\
- Download the pretrained_dga_model_dsdl.ipynb Jupyter notebook from
https://github.com/splunk/security_content/notebooks
\ - Login to the Jupyter Lab for pretrained_dga_model_dsdl container. This container should be listed on Containers page for DSDL app.\
- Below steps need to be followed inside Jupyter lab \
- Upload the pretrained_dga_model_dsdl.tar.gz file into
app/model/data
path using the upload option in the jupyter notebook.\ - Untar the artifact
pretrained_dga_model_dsdl.tar.gz
usingtar -xf app/model/data/pretrained_dga_model_dsdl.tar.gz -C app/model/data
\ - Upload
pretrained_dga_model_dsdl.pynb
into Jupyter lab notebooks folder using the upload option in Jupyter lab\ - Save the notebook using the save option in jupyter notebook.\
- Upload
pretrained_dga_model_dsdl.json
intonotebooks/data
folder.Known False Positives
False positives may be present if domain name is similar to dga generated domains.
Associated Analytic Story
RBA
Risk Score | Impact | Confidence | Message |
---|---|---|---|
63.0 | 70 | 90 | A potential connection to a DGA domain $domain$ was detected from host $src$, kindly review. |
The Risk Score is calculated by the following formula: Risk Score = (Impact * Confidence/100). Initial Confidence and Impact is set by the analytic author.
Reference
- https://attack.mitre.org/techniques/T1568/002/
- https://unit42.paloaltonetworks.com/threat-brief-understanding-domain-generation-algorithms-dga/
- https://en.wikipedia.org/wiki/Domain_generation_algorithm
Test Dataset
Replay any dataset to Splunk Enterprise by using our replay.py
tool or the UI.
Alternatively you can replay a dataset into a Splunk Attack Range
source | version: 1