Detection: M365 Copilot Impersonation Jailbreak Attack

EXPERIMENTAL DETECTION

This detection status is set to experimental. The Splunk Threat Research team has not yet fully tested, simulated, or built comprehensive datasets for this detection. As such, this analytic is not officially supported. If you have any questions or concerns, please reach out to us at research@splunk.com.

Updated Date: 2025-09-25 ID: cc26aba8-7f4a-4078-b91a-052d6a53cb13 Author: Rod Soto Type: TTP Product: Splunk Enterprise Security

Description

Detects M365 Copilot impersonation and roleplay jailbreak attempts where users try to manipulate the AI into adopting alternate personas, behaving as unrestricted entities, or impersonating malicious AI systems to bypass safety controls. The detection searches exported eDiscovery prompt logs for roleplay keywords like "pretend you are," "act as," "you are now," "amoral," and "roleplay as" in the Subject_Title field. Prompts are categorized into specific impersonation types (AI_Impersonation, Malicious_AI_Persona, Unrestricted_AI_Persona, etc.) to identify attempts to override the AI's safety guardrails through persona injection attacks.

Search

 1`m365_exported_ediscovery_prompt_logs` 
 2| search Subject_Title="*Pretend you are*" OR Subject_Title="*act as*" OR Subject_Title="*you are now*" OR Subject_Title="*amoral*" OR Subject_Title="*being*" OR Subject_Title="*roleplay as*" OR Subject_Title="*imagine you are*" OR Subject_Title="*behave like*" 
 3| eval user = Sender 
 4| eval impersonation_type=case(match(Subject_Title, "(?i)pretend you are.*AI"), "AI_Impersonation", match(Subject_Title, "(?i)(act as
 5|roleplay as).*AI"), "AI_Roleplay", match(Subject_Title, "(?i)amoral.*AI"), "Amoral_AI", match(Subject_Title, "(?i)transcendent being"), "Fictional_Entity", match(Subject_Title, "(?i)(act as
 6|pretend you are).*(entities
 7|multiple)"), "Multi_Entity", match(Subject_Title, "(?i)(imagine you are
 8|behave like).*AI"), "AI_Behavioral_Change", match(Subject_Title, "(?i)you are now.*AI"), "AI_Identity_Override", match(Subject_Title, "(?i)(evil
 9|malicious
10|harmful).*AI"), "Malicious_AI_Persona", match(Subject_Title, "(?i)(unrestricted
11|unlimited
12|uncensored).*AI"), "Unrestricted_AI_Persona", 1=1, "Generic_Roleplay") 
13| table _time, user, Subject_Title, impersonation_type, Workload 
14| sort -_time 
15| `m365_copilot_impersonation_jailbreak_attack_filter`

Data Source

Name	Platform	Sourcetype	Source
M365 Exported eDiscovery Prompts	N/A	`'csv'`	`'csv'`

Macros Used

Name	Value
m365_exported_ediscovery_prompt_logs	`(sourcetype=csv)`
m365_copilot_impersonation_jailbreak_attack_filter	`search *`

m365_copilot_impersonation_jailbreak_attack_filter is an empty macro by default. It allows the user to filter out any results (false positives) without editing the SPL.

Annotations

- MITRE ATT&CK

+ Kill Chain Phases

+ NIST

+ CIS

- Threat Actors

ID	Technique	Tactic
T1562	Impair Defenses	Defense Evasion

Exploitation

DE.CM

CIS 10

Magic Hound

Default Configuration

This detection is configured by default in Splunk Enterprise Security to run with the following settings:

Setting	Value
Disabled	true
Cron Schedule	`0 * * * *`
Earliest Time	`-70m@m`
Latest Time	`-10m@m`
Schedule Window	`auto`
Creates Notable	Yes
Rule Title	`%name%`
Rule Description	`%description%`
Notable Event Fields	user, dest
Creates Risk Event	True

This configuration file applies to all detections of type TTP. These detections will use Risk Based Alerting and generate Notable Events.

Implementation

To export M365 Copilot prompt logs, navigate to the Microsoft Purview compliance portal (compliance.microsoft.com) and access eDiscovery. Create a new eDiscovery case, add target user accounts or date ranges as data sources, then create a search query targeting M365 Copilot interactions across relevant workloads. Once the search completes, export the results to generate a package containing prompt logs with fields like Subject_Title (prompt text), Sender, timestamps, and workload metadata. Download the exported files using the eDiscovery Export Tool and ingest them into Splunk for security analysis and detection of jailbreak attempts, data exfiltration requests, and policy violations.

Known False Positives

Legitimate creative writers developing fictional characters, game developers creating roleplay scenarios, educators teaching about AI ethics and limitations, researchers studying AI behavior, or users engaging in harmless creative storytelling may trigger false positives.

Associated Analytic Story

Suspicious Microsoft 365 Copilot Activities

Risk Based Analytics (RBA)

Risk Message:

User $user$ attempted M365 Copilot impersonation jailbreak with impersonation type $impersonation_type$, trying to manipulate the AI into adopting alternate personas or unrestricted behaviors that could bypass safety controls and violate acceptable use policies.

Risk Object	Risk Object Type	Risk Score	Threat Objects
user	user	10	No Threat Objects

References

https://www.splunk.com/en_us/blog/artificial-intelligence/m365-copilot-log-analysis-splunk.html

Detection Testing

Test Type	Status	Dataset	Source	Sourcetype
Validation	Not Applicable	N/A	N/A	N/A
Unit	✅ Passing	Dataset	`csv`	`csv`
Integration	✅ Passing	Dataset	`csv`	`csv`

Replay any dataset to Splunk Enterprise by using our replay.py tool or the UI. Alternatively you can replay a dataset into a Splunk Attack Range

Source: GitHub | Version: 1