ID | Technique | Tactic |
---|---|---|
T1562 | Impair Defenses | Defense Evasion |
Detection: M365 Copilot Agentic Jailbreak Attack
EXPERIMENTAL DETECTION
This detection status is set to experimental. The Splunk Threat Research team has not yet fully tested, simulated, or built comprehensive datasets for this detection. As such, this analytic is not officially supported. If you have any questions or concerns, please reach out to us at research@splunk.com.
Description
Detects agentic AI jailbreak attempts that try to establish persistent control over M365 Copilot through rule injection, universal triggers, response automation, system overrides, and persona establishment techniques. The detection analyzes the PromptText field for keywords like "from now on," "always respond," "ignore previous," "new rule," "override," and role-playing commands (e.g., "act as," "you are now") that attempt to inject persistent instructions. The search computes risk by counting distinct jailbreak indicators per user session, flagging coordinated manipulation attempts.
Search
1`m365_exported_ediscovery_prompt_logs`
2| eval user = Sender
3| eval rule_injection=if(match(Subject_Title, "(?i)(rules
4|instructions)\s*="), "YES", "NO")
5| eval universal_trigger=if(match(Subject_Title, "(?i)(every
6|all).*prompt"), "YES", "NO")
7| eval response_automation=if(match(Subject_Title, "(?i)(always
8|automatic).*respond"), "YES", "NO")
9| eval system_override=if(match(Subject_Title, "(?i)(override
10|bypass
11|ignore).*(system
12|default)"), "YES", "NO")
13| eval persona_establishment=if(match(Subject_Title, "(?i)(with.*\[.*\]
14|persona)"), "YES", "NO")
15| where rule_injection="YES" OR universal_trigger="YES" OR response_automation="YES" OR system_override="YES" OR persona_establishment="YES"
16| table _time, "Source ID", user, Subject_Title, rule_injection, universal_trigger, response_automation, system_override, persona_establishment, Workload
17| sort -_time
18| `m365_copilot_agentic_jailbreak_attack_filter`
Data Source
Name | Platform | Sourcetype | Source |
---|---|---|---|
M365 Exported eDiscovery Prompts | N/A | 'csv' |
'csv' |
Macros Used
Name | Value |
---|---|
m365_exported_ediscovery_prompt_logs | (sourcetype=csv) |
m365_copilot_agentic_jailbreak_attack_filter | search * |
m365_copilot_agentic_jailbreak_attack_filter
is an empty macro by default. It allows the user to filter out any results (false positives) without editing the SPL.
Annotations
Default Configuration
This detection is configured by default in Splunk Enterprise Security to run with the following settings:
Setting | Value |
---|---|
Disabled | true |
Cron Schedule | 0 * * * * |
Earliest Time | -70m@m |
Latest Time | -10m@m |
Schedule Window | auto |
Creates Risk Event | True |
Implementation
To export M365 Copilot prompt logs, navigate to the Microsoft Purview compliance portal (compliance.microsoft.com) and access eDiscovery. Create a new eDiscovery case, add target user accounts or date ranges as data sources, then create a search query targeting M365 Copilot interactions across relevant workloads. Once the search completes, export the results to generate a package containing prompt logs with fields like Subject_Title (prompt text), Sender, timestamps, and workload metadata. Download the exported files using the eDiscovery Export Tool and ingest them into Splunk for security analysis and detection of jailbreak attempts, data exfiltration requests, and policy violations.
Known False Positives
Legitimate users discussing AI ethics research, security professionals testing system robustness, developers creating training materials for AI safety, or academic discussions about AI limitations and behavioral constraints may trigger false positives.
Associated Analytic Story
Risk Based Analytics (RBA)
Risk Message:
User $user$ attempted to establish persistent agentic control over M365 Copilot through advanced jailbreak techniques including rule injection, universal triggers, and system overrides, potentially compromising AI security across multiple sessions.
Risk Object | Risk Object Type | Risk Score | Threat Objects |
---|---|---|---|
user | user | 50 | No Threat Objects |
References
Detection Testing
Test Type | Status | Dataset | Source | Sourcetype |
---|---|---|---|---|
Validation | Not Applicable | N/A | N/A | N/A |
Unit | ✅ Passing | Dataset | csv |
csv |
Integration | ✅ Passing | Dataset | csv |
csv |
Replay any dataset to Splunk Enterprise by using our replay.py
tool or the UI.
Alternatively you can replay a dataset into a Splunk Attack Range
Source: GitHub | Version: 1