Detection: LLM Model File Creation

Description

Detects the creation of Large Language Model (LLM) files on Windows endpoints by monitoring file creation events for specific model file formats and extensions commonly used by local AI frameworks. This detection identifies potential shadow AI deployments, unauthorized model downloads, and rogue LLM infrastructure by detecting file creation patterns associated with quantized models (.gguf, .ggml), safetensors model format files, and Ollama Modelfiles. These file types are characteristic of local inference frameworks such as Ollama, llama.cpp, GPT4All, LM Studio, and similar tools that enable running LLMs locally without cloud dependencies. Organizations can use this detection to identify potential data exfiltration risks, policy violations related to unapproved AI usage, and security blind spots created by decentralized AI deployments that bypass enterprise governance and monitoring.

 1
 2| tstats `security_content_summariesonly` count
 3    min(_time) as firstTime
 4    max(_time) as lastTime
 5from datamodel=Endpoint.Filesystem
 6where Filesystem.file_name IN (
 7    "*.gguf*",
 8    "*ggml*",
 9    "*Modelfile*",
10    "*safetensors*"
11)
12by Filesystem.action Filesystem.dest Filesystem.file_access_time Filesystem.file_create_time
13   Filesystem.file_hash Filesystem.file_modify_time Filesystem.file_name Filesystem.file_path
14   Filesystem.file_acl Filesystem.file_size Filesystem.process_guid Filesystem.process_id
15   Filesystem.user Filesystem.vendor_product
16
17| `drop_dm_object_name(Filesystem)`
18
19| `security_content_ctime(firstTime)`
20
21| `security_content_ctime(lastTime)`
22
23| `llm_model_file_creation_filter`

Data Source

Name Platform Sourcetype Source
Sysmon EventID 11 Windows icon Windows 'XmlWinEventLog' 'XmlWinEventLog:Microsoft-Windows-Sysmon/Operational'

Macros Used

Name Value
security_content_ctime convert timeformat="%Y-%m-%dT%H:%M:%S" ctime($field$)
llm_model_file_creation_filter search *
llm_model_file_creation_filter is an empty macro by default. It allows the user to filter out any results (false positives) without editing the SPL.

Annotations

- MITRE ATT&CK
+ Kill Chain Phases
+ NIST
+ CIS
- Threat Actors
ID Technique Tactic
T1543 Create or Modify System Process Persistence
Exploitation
Installation
DE.AE
CIS 10

Default Configuration

This detection is configured by default in Splunk Enterprise Security to run with the following settings:

Setting Value
Disabled true
Cron Schedule 0 * * * *
Earliest Time -70m@m
Latest Time -10m@m
Schedule Window auto
Creates Risk Event False
This configuration file applies to all detections of type hunting.

Implementation

To successfully implement this search, you need to be ingesting logs with file creation events from your endpoints. Ensure that the Endpoint data model is properly populated with filesystem events from EDR agents or Sysmon Event ID 11. The logs must be processed using the appropriate Splunk Technology Add-ons that are specific to the EDR product. The logs must also be mapped to the Filesystem node of the Endpoint data model. Use the Splunk Common Information Model (CIM) to normalize the field names and speed up the data modeling process.

Known False Positives

Legitimate creation of LLM model files by authorized developers, ML engineers, and researchers during model training, fine-tuning, or experimentation. Approved AI/ML sandboxes and lab environments where model file creation is expected. Automated ML pipelines and workflows that generate or update model files as part of their normal operation. Third-party applications and services that manage or cache LLM model files for legitimate purposes.

Associated Analytic Story

References

Detection Testing

Test Type Status Dataset Source Sourcetype
Validation Passing N/A N/A N/A
Unit Passing Dataset XmlWinEventLog:Microsoft-Windows-Sysmon/Operational XmlWinEventLog
Integration ✅ Passing Dataset XmlWinEventLog:Microsoft-Windows-Sysmon/Operational XmlWinEventLog

Replay any dataset to Splunk Enterprise by using our replay.py tool or the UI. Alternatively you can replay a dataset into a Splunk Attack Range


Source: GitHub | Version: 1