In this project, we will show you how you can read JSON data that is stored into S3 bucket by connecting it to AWS Athena and performing some SQL queries on it. We will also be using AWS glue crawlers and then finally visualize our data in Quicksight.

Here is a basic block diagram which explains it better:

Lists of services we are going to use:

  1. Create S3 Bucket
  2. Connecting a Data source from AWS Athena
  3. Creating an AWS glue crawler
  4. Performing some queries in AWS Athena
  5. Joining and visualise data in Quicksight

Let’s start by first creating our S3 buckets where we will upload our data and store results.

Step 1: Create S3 Bucket

  1. Go to “S3” from the search bar.
  1. Click on “Create Bucket” on the upper right corner.
  1. We will name them as “skillcurb-raw-data” and “skillcurb-results-data”
  1. Click Create

5. We also have to upload data in the raw bucket. Our dataset is about cars and customers with a JSON format. If you store both your files in the same bucket there will be an issue with athena because when you query those tables in Athena, you get zero records. So to solve the issue we create 2 folders in S3 raw-data bucket and store the json files separately. 

You can read more about the issue here: 

https://aws.amazon.com/premiumsupport/knowledge-center/athena-empty-results/

Step 2: Connecting a Datasource from AWS Athena

  1. Go to “Athena” from the search bar.
  1. Go to “Settings” and define your query result location:
  1. Click on “Save” .
  1. Go back to Athena, select the three lines from the top left corner and click on “Data sources” and click on “Connect data source”.
  1. Here you need to select “S3 – AWS Glue Data Catalog”.
  1. Scroll down and here select the following options in order to create an AWS Glue. 
  1. A new tab will be opened where you will have to add information about your crawler. Here we named it as “skillcurb-athena”.
  1. Click “Next.
  1. You need to specify the crawler source type as follows:
  1. We need to choose our data source as “S3” and in the include path we need to specify our S3 bucket where we need to fetch our data from. Click on the little folder icon beside the bar and you can easily view your S3 buckets.
  1. We don’t need to add another data store so we click on “No” and go to next.
  1. We need to “Create an IAM role” and name it. Here it is “lab-athena”.
  1. We will create a schedule for this crawler and the frequency we will run it on will be “Run on demand”.
  1. Next we need to add a database and name it as “car-customer-data”.
  1. Next we are able to review all the previous steps.

Step 3: Run the Crawler

  1. Click on the crawler you just made and select “Run Crawler” you will see that the status is starting. It will take a few seconds to change status to running.
  1. After you see the status as running you can go back to Athena. You can also see that one table has also been created.
  1. Go back to Athena and click on the Query Editor. Refresh the page and you will be able to see two new tables created. 
  1. Click the 3 dots beside the table and select the preview table.
  1. You can see your data below.
  1. You can run some queries and view your data accordingly. Here I want to view the data where the car model year is greater or equal to 2000.

Step 4: Perform some queries in AWS Athena

  1. You can run some queries and view your data accordingly. Here I want to view the data where the car model year is greater or equal to 2000.

2: You can view your data down below.

Step 5: Visual the data on QuickSight

  1. Search “QuickSight” on the search bar.

If you haven’t created a Quicksight, then follow the steps and sign up.

  1. After creating, you click on “Manage QuickSight”.

4. Go to Security and Permissions and select manage.

  1. Checkbox the S3 bucket and also check box the permissions for Athena Workgroup. Click Finish.
  1. Click on new analysis.
  1. Select New dataset and from there select Athena.
  2. You need to create a new Athena data source with a data source. name. Here I’ll name it as “skillcurb-athena-visuals”. Click on “Create datasource”.
  1. Next you need to select your table that you previously created on Athena.
  1. Next you need to select “Directly query your data” then select “Edit/Preview data”.
  1. Here you will be able to see your data.
  1. You can click on “Add Data” on the right top corner to add another data source. Select the same database as earlier and select “customers” for the table.
  1. Now to return the matching values from the two tables we need to select “Inner” and then in the Join clause we need to define the common ids which the car id our project. 
  1. Next we can view our data below.
  1. Click on “Publish and Visualise” on the top right corner.
  1. Now you can just drag your fields and drop it in the box and a graph will be automatically made. You can add more visuals by clicking on “Add” on the left top corner.
  1. A dashboard has been created for this project.

You can use AWS Athena using other Amazon services as well like Amazon DynamoDB, Lambda and others. Please leave a comment if you think that this project was helpful or if you have any questions.

SkillCurb YouTube Channel:

If you want to see the Video tutorial of this project you can click the link down below:

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments