Amazon Athena Security: 6 essential tips
Amazon Athena is a popular cloud service that enables lightning-fast queries on huge volumes of data in Amazon S3. It is 100% serverless, working without setting up a server the user can see or manage, making it extremely convenient for data teams.
Athena sidesteps the traditional data pipeline, enabling advanced analysis directly on the data, with no need for preprocessing and specialized analytics software. However, at the same time, it also sidesteps many of your organization's existing security measures.
Athena enables analysts to gain direct access to sensitive data on S3 and derive useful insights, which may be similarly confidential. Your organization must have proper visibility into who performs Athena queries, why, and whether they are authorized to access the data. The use of Athena may have compliance implications as well.
Get more background on Athena security in this detailed blog post.
Learn Cloud Security
What is Amazon Athena?
Amazon Athena provides an interactive query service that lets you use standard SQL to perform data analysis directly in Amazon Simple Storage Service (Amazon S3).
To use Athena:
- Go to the AWS Management Console
- Point Athena at any relevant data stored in your S3 bucket.
- Use SQL to run any ad-hoc queries. You'll be able to get results in seconds.
Athena is a serverless service that does not require setting up or managing the infrastructure. The service lets you pay only for the queries you run and automatically scale. Athena can run queries parallel and provide fast results, even when analyzing large datasets or running complex queries.
Here are some important ways to maximize security when using Athena as part of your overall AWS security strategy.
Logging and monitoring in Athena
Robust monitoring is a critical part of security, and it can be challenging in serverless environments. Amazon provides built-in monitoring options that allow you to collect and respond to data about security events in Athena.
You can use Amazon CloudTrail to capture security-relevant information such as:
- Actions performed by any IAM role, the user or AWS service in Athena
- Calls to the Athena API
- Actions on the Athena console
You can use Amazon CloudWatch Events for change management — tracking operational changes in Athena such as:
- Feature activation
- Configuration changes
- Connection to S3 buckets
It is also possible to trigger a rule on API calls in CloudTrail to generate custom CloudWatch events.
Improving visibility with XDR
While Athena provides basic capabilities for logging and monitoring, it is difficult to connect these logs to traditional security tools. There is no physical machine or VM on which security teams can install an agent, meaning that traditional security tools cannot manage and control the usage of Athena. The same is true for S3 buckets. This also means that Athena activity will not be visible to traditional security tools.
To achieve a holistic view of activity from serverless tools like Athena, you'll need a security paradigm that can work with any data source, whether based on traditional agent-based security tooling or not.
EXtended Detection and Response (XDR) is a new approach to threat detection that provides unified visibility across all layers of the IT environment, including cloud services and serverless. XDR tools directly integrate with cloud providers, including XDR, to provide direct access to data from CloudTrail and CloudWatch. They can help detect anomalous activity in Athena logs and combine that activity with related events in other systems (for example, failed login attempts logged in Amazon IAM).
While XDR might seem like overkill just to protect Athena, consider that your organization uses other cloud-native technologies, which are similarly difficult to monitor and secure. XDR addresses security concerns across multiple cloud services on AWS and other clouds.
Connecting to Amazon Athena Using an Interface VPC Endpoint
A major threat vector for Athena, or any analytics service, is the interception of communication by attackers, for example, by Man in the Middle (MitM) attacks or session hijacking.
To reduce the chances of attackers exfiltrating data pulled by Athena, you can use two security measures:
- Run Athena in a virtual private cloud (VPC), a secure private network within an Amazon data center
- Amazon PrivateLink, which lets you create a secure, private connection between your local data center and your VPC
If you cannot run Athena in a VPC for some reason, you can use the Amazon VPN service to securely connect to non-VPC resources.
Securing S3 data you need to query with Athena
Here is a four-step process you can use to secure an S3 storage bucket queried by Athena:
- Your source S3 bucket should NOT be publicly accessible — unless you want the bucket to be publicly accessible, do not enable this option. You can change this option for each bucket directly from the AWS console.
- You can encrypt your S3 bucket from the AWS console or encrypt your source files. All data in your S3 bucket should be encrypted — you can do this by applying encryption at rest on the bucket. You can use the AWS Key Management Service (AKS), which offers three types of keys — SSE-S3 lets S3 manage your encryption key; CSE-KMS enables you to create your own key, which KMS uses; and SSE-KMS lets KMS generate and manage a key. Ideally, you should use SSE-KMS keys, which let you control access to the key.
- Encrypt your query results — Athena stores all query results in a pre-configured S3 location called an S3 staging directory. Encrypting an S3 bucket and source files does not help encrypt query results. You need to encrypt your staging directory to encrypt all data at rest. You should not use the same key to encrypt your stored data and query results. Ideally, use different keys for query results to ensure one compromised key does not threaten all data.
- Encrypt your glue data catalog — the Data Catalog contains all Athena table definitions in addition to other things. Once your catalog is encrypted, Athena table definitions are encrypted (excluding the data).
- Controlling access to encrypted data
Bucket policies can help you fine-tune access to your source data. A bucket policy can stipulate who gets access to a certain S3 bucket and the actions they are allowed to perform on the content. For example, you can use a policy to prevent certain users from decrypting the data.
Additionally, you can set a bucket policy that allows identity and access management (IAM) users of certain AWS accounts to gain access to the bucket. That way, if an unauthorized user gains access to your bucket's encryption KMS key, they may not be able to access the contents — because the policy explicitly denies access to this role, group or individual user.
Access control for Athena queries
Unlike traditional databases, Athena does not support user accounts. To control access to Athena, you must use IAM policies, including the two following AWS-managed IAM policies for Athena:
- AmazonAthenaFullAccess — grants users permission to perform all actions on Athena.
- AWSQuicksightAthenaAccess — ideal for IAM users that use Amazon Quicksight to access Athena.
You should also create the two following custom IAM policies for the following types of Athena users:
- Power-user policy — grants the user permission to create, modify and delete Athena objects such as databases, views and tables.
- Analyst user policy — does not provide any administrative privileges.
After creating these policies, do the following:
- Create two roles and then assign each policy to the relevant role.
- Assign the new roles to the relevant IAM groups.
- Assign individual IAM users to the IAM groups according to the access requirements of each user.
- Optional — assign the new roles to the instance for Athena queries running from an Amazon Elastic Compute Cloud (EC2) instance.
Learn Cloud Security
Security for serverless cloud services is not straightforward. In this article, I introduced Amazon Athena and explained six ways to enhance Athena security:
- Logging and monitoring — using Amazon tools to collect and analyze basic information about Amazon Athena activity.
- Improving visibility with XDR — leveraging a new generation of security tools to collect Athena logs, combine with activity on other IT systems, and detect abnormal activity.
- Accessing Athena via VPC endpoints—connecting to Athena using a secure Amazon PrivateLink.
- Securing S3 data — Athena works with Amazon S3, and if you don't secure your buckets, Athena won't be secure either. Ensure sensitive data in S3 is protected by authentication and encrypted.
- Control access to encrypted data on S3 — use bucket policies to determine who has access to a storage bucket and what they can do with it.
- Access control for Athena queries — use Amazon IAM to control who has full admin access to Athena and can only perform analyst-level queries.
I hope this will help you develop a security strategy for serverless tools in your organization.