For this post, we are diving into data sources - looking at data sourcetypes and fields to do better threat hunting, instead of just wild carding your way to an answer (let’s be honest, we have all done it.) ‘Prepare’ is one of the most overlooked but important stages of threat hunting and we will talk about everything you can do to set yourself up for success.
Why bother preparing?
Would a world class chef start cooking without an ingredient list? Does a closing pitcher take the mound without throwing a few warm up pitches? Will an astronaut go to space without knowing all the planets? Of course not! So you shouldn’t starting thrunting without learning about your data and creating a plan of approach.
Preparing sets the stage for effective hypothesis generation and a targeted investigation. It allows you to be precise and know what matters, and what doesn’t.
For this blog, we are going to talk about using Splunk, which leverages the Splunk Processing Language (SPL) but if you’re hunting on something different, be sure to refer to the reference documentation provided with your tool. All these ideas translate across tools!
Let’s start exploring!
err… the sourcetypes and fields, not quite time to start thrunting yet.
With SPL, both the metadata and fieldsummary commands can help you get a good understanding of what is available for searching and manipulating.
metadata
The metadata command is a powerful tool that allows us to retrieve summary information about the data in a Splunk index. By running this command against any index ( for example: | metadata type=sourcetypes index=thrunt), we can quickly identify the different types of data sources (sourcetypes) available. This information is crucial for understanding the types of events recorded, how often they occur, and when they were first and last seen. This overview helps provide insight into the volume and type of data, setting the stage for more targeted threat hunting activities.
Did you run the query?
(I’ll wait)
You might be saying, well Lauren, this is great and all but half these fields are not in a human readable format. And you would be right. metadata likes to return fields in EPOCH time, which can be helpful if you are a computer, but if you are human too, here’s a fun trick to make those times more friendly.
| metadata type=sourcetypes index=thrunt
| eval firstTime_readable=strftime(firstTime, "%Y-%m-%d %H:%M:%S"),
lastTime_readable=strftime(lastTime, "%Y-%m-%d %H:%M:%S"),
recentTime_readable=strftime(recentTime, "%Y-%m-%d %H:%M:%S")
By leveraging eval we can transform date fields provided in EPOCH into human readable formats. You can leverage this data transformation with eval on top of multiple queries, so don’t limit yourself to just stacking with metadata.
fieldsummary
The fieldsummary command allows us to retrieve a summary of fields available in specific sourcetypes. You can build off the sourcetypes found in your metadata search to get even more granular.
index=thrunt sourcetype=web_logs | fieldsummary maxvals=3
This search will list the most common values for each field in the web_logs sourcetype, helping you understand the structure of the data.
Amazing! Now you know all the sourcetypes and fields available to you. You are almost ready!
What’s Next?
Before we dive into advanced scenarios, we are going to talk about building a relevant, precise hypothesis, another crucial part to the Prepare phase. This will help you setup guide rails for your hunt and find success much easier. Having the right hypothesis can help you go from looking for a needle in a haystack (or an APT in your environment) to looking for a needle in a sewing kit (or finding beaconing activity on your webserver.)
Ready to take the next step? Stay tuned for Part 3: Building Better Hypotheses — coming soon!