AWS SAA-C03 Exam Practice Questions and Answers – Question 3
A company needs the ability to analyze the log files of its proprietary application. The logs are stored in JSON format in an Amazon S3 bucket. Queries will be simple and will run on-demand. A solutions architect needs to perform the analysis with minimal changes to the existing architecture. What should the solutions architect do to meet these requirements with the LEAST amount of operational overhead?
A. Use Amazon Redshift to load all the content into one place and run the SQL queries as needed.
B. Use Amazon CloudWatch Logs to store the logs. Run SQL queries as needed from the Amazon CloudWatch console.
C. Use Amazon Athena directly with Amazon S3 to run the queries as needed.
D. Use AWS Glue to catalog the logs. Use a transient Apache Spark cluster on Amazon EMR to run the SQL queries as needed.
Correct Answer: C. Use Amazon Athena directly with Amazon S3 to run the queries as needed.
Explanation:
Option A: Use Amazon Redshift to load all the content into one place and run the SQL queries as needed.
- Explanation:
- Amazon Redshift is a fully managed data warehouse solution designed for complex, large-scale analytical queries. However, using Redshift for on-demand and simple queries introduces unnecessary overhead.
- It requires the creation of a data warehouse, loading data into it, and managing resources, which contradicts the requirement for minimal operational overhead.
- Suitability:
- Not ideal. It adds significant operational complexity and cost for a use case that can be handled more efficiently with serverless solutions.
Option B: Use Amazon CloudWatch Logs to store the logs. Run SQL queries as needed from the Amazon CloudWatch console.
- Explanation:
- CloudWatch Logs is a service for monitoring and analyzing log data in real-time, but it is not designed for querying JSON logs directly in S3.
- Transferring logs from S3 to CloudWatch Logs adds operational steps and complexity, making this approach less suitable.
- Suitability:
- Not suitable. This approach involves additional steps and complexity that do not align with the requirements.
Option C: Use Amazon Athena directly with Amazon S3 to run the queries as needed.
- Explanation:
- Amazon Athena is a serverless, interactive query service designed to analyze data directly in Amazon S3 using SQL.
- Athena supports JSON, Parquet, and other formats, making it a perfect fit for querying log files in JSON format.
- It requires no infrastructure setup or data movement, minimizing operational overhead.
- By creating a schema for the JSON data, queries can be executed directly on the data stored in S3.
- Suitability:
- Best option. Athena provides a low-overhead, cost-effective solution for on-demand querying of JSON logs in S3.
Option D: Use AWS Glue to catalog the logs. Use a transient Apache Spark cluster on Amazon EMR to run the SQL queries as needed.
- Explanation:
- AWS Glue can catalog the logs, and Amazon EMR with Apache Spark can process the data. However, this approach requires setting up and managing Glue crawlers, Spark clusters, and job execution, introducing significant operational overhead.
- While suitable for complex processing tasks, it is overly complex for simple, on-demand queries.
- Suitability:
- Not ideal. This solution adds unnecessary complexity and is more appropriate for large-scale data processing, not simple queries.
Recommended Solution:
Correct Answer: C. Use Amazon Athena directly with Amazon S3 to run the queries as needed.
- Why?
- Athena meets all the requirements with minimal operational overhead.
- It provides a serverless and cost-effective way to query JSON log files stored in S3 on-demand using SQL.