VPC Flow Logs is a feature you can use to monitor network traffic flow within your AWS VPC. It captures the network flow metadata (source, destination etc.) not the contents of the network packets. If you want to do that, AWS offers VPC Network Mirroring to gain insight into individual packets.
While Flow Logs can provide some significant security intelligence, it can also produce a large amount of data. How do you gain insights from the Flow Log data? You could push the logs into an external SIEM style tool, like Sumo Logic. However, if you don’t have such a tool, there is an AWS solution – Amazon Athena!
VPC Flow Logs & Amazon Athena
AWS has recently announced out-of-box integration with Amazon Athena.
Amazon VPC Flow Logs announces out-of-the-box integration with Amazon Athena
Querying flow logs using Amazon Athena
The solution provides you with an Athena WorkGroup, Table and pre-defined Queries ready to go for analysis the Flow Logs.
Enabling VPC Flow Logs
The first step is to make sure that you are capturing your flow log data. You can output VPC Flow Logs to
- an S3 Bucket, and;
- a CloudWatch Logs.
Flow Logs used a default format but you can also customise the format. This includes adding additional fields for further details on the network flow. We recommend that you persist your flow log data in an S3 bucket.
Creating a flow log that publishes to Amazon S3
Amazon Athena
Amazon Athena is a serverless query service you can use to query against data stored in S3 buckets. It uses the SQL-like Hive query language.
Athena Integration
Once Flow Logs have been enabled for your VPC, it’s time to setup the Athena integration. First, navigate to VPC console. Select Generate Athena Integration from the Actions dropdown. These steps will generate a CloudFormation template that you can use to create the integration.
You will now be presented with the Template settings screen.
- Partition load frequency. You have a choice to create partitions Daily, Monthly, Yearly or None;
- Partition start date and end date. The maximum date range you can enter here is 20 days but you can modify this later!
- You also need to enter the S3 bucket ARN where you want to store CloudFormation template, and the query results.
After clicking Generate Athena Integration you will see that the CloudFormation template is created in the S3 bucket that you specified.
Let’s download this template file.
Modifying the template
Before we deploy the stack, there are a few things you may want to update this template.
WorkGroup, Database, Table, Query names
The template is created with very long random names and you may want change these to make it easier to understand.
Athena Engine Version
Athens engine version 1 will be deprecated in near future and you should modify the template to create it with version 2.
You need to add version declaration in WorkGroupConfiguration.
Fix Partition Date
By default, the template creates partition date as single digit like below. This is an error in the generated template (at the time of this blog)!
year=2021/month=04/day=1
We need the partition to be updated to look like the following.
year=2021/month=04/day=01
To fix this, you need to find this line
and replace it with
Values = [String(strt.getFullYear()), (“0” + (strt.getMonth() + 1)).slice(-2), (“0” + strt.getDate()).slice(-2)]
Extend the partition
- Modify the partitionStartDate and partitionEndDate at the bottom of the template.
- If you extend the partition creation, Lambda function runs longer and you need to increasethe Timeout. I have tested and 200 is enough for 1 year in our AWS environment.
Querying VPC Flow Logs
After the stack has been successfully created, navigate to the Athena console and switch to the WorkGroup just created.
Then go to Saved Queries. This is where you see that all the pre-defined queries have been created.
All you need is to select one of them and just click on the Run query button!
From here, you can export the result as a CSV using the icon in the top right corner.
Wrapping up
Athena is a very handy tools for querying large amount of data store in S3. The new built in integration is an excellent way to investigate VPC Flow Logs when there is a security incident or to troubleshoot network problems.
If you would like to understood more about working with Athena or VPC Flow Logs, please get in contact with us at RedBear.