THIS IS A DEPRECATED DETECTION
This detection has been marked deprecated by the Splunk Threat Research team. This means that it will no longer be maintained or supported.
This search is used to identify the creation of multiple user accounts using the same email domain name.
- Type: TTP
Product: Splunk Enterprise, Splunk Enterprise Security, Splunk Cloud
- Last Updated: 2018-10-08
- Author: Jim Apger, Splunk
- ID: bf1d7b5c-df2f-4249-a401-c09fdc221ddf
Kill Chain Phase
- Actions on Objectives
- CIS 16
1 2 3 4 5 6 7 8 9 `stream_http` http_content_type=text* uri="/magento2/customer/account/loginPost/" | rex field=cookie "form_key=(?<SessionID>\w+)" | rex field=form_data "login\[username\]=(?<Username>[^& |^$]+)" | search Username=* | rex field=Username "@(?<email_domain>.*)" | stats dc(Username) as UniqueUsernames list(Username) as src_user by email_domain | where UniqueUsernames> 25 | `web_fraud___account_harvesting_filter`
The SPL above uses the following Macros:
web_fraud_-_account_harvesting_filter is a empty macro by default. It allows the user to filter out any results (false positives) without editing the SPL.
List of fields required to use this analytic.
How To Implement
We start with a dataset that provides visibility into the email address used for the account creation. In this example, we are narrowing our search down to the single web page that hosts the Magento2 e-commerce platform (via URI) used for account creation, the single http content-type to grab only the user's clicks, and the http field that provides the username (form_data), for performance reasons. After we have the username and email domain, we look for numerous account creations per email domain. Common data sources used for this detection are customized Apache logs or Splunk Stream.
Known False Positives
As is common with many fraud-related searches, we are usually looking to attribute risk or synthesize relevant context with loosely written detections that simply detect anamolous behavior. This search will need to be customized to fit your environment—improving its fidelity by counting based on something much more specific, such as a device ID that may be present in your dataset. Consideration for whether the large number of registrations are occuring from a first-time seen domain may also be important. Extending the search window to look further back in time, or even calculating the average per hour/day for each email domain to look for an anomalous spikes, will improve this search. You can also use Shannon entropy or Levenshtein Distance (both courtesy of URL Toolbox) to consider the randomness or similarity of the email name or email domain, as the names are often machine-generated.
Associated Analytic Story
The Risk Score is calculated by the following formula: Risk Score = (Impact * Confidence/100). Initial Confidence and Impact is set by the analytic author.
source | version: 1