Skip to main content

Tutorial

Tutorial

Develop Spark Application

Introduction

To display how to upload data into Databricks on top of AWS using the Databricks UI, in our previous blog post, we have. Well, let’s write a short spark application to query the data set which was uploaded in the previous blog using notebook.

Creating a Notebook

For one to obtain a Spark application it is required that he or she creates a notebook. A notebook is an environment where data manipulations can be done by writing particular queries in any programming language. For creating a Notebook go to “New” in options and select “Notebook”.

Connecting to a Cluster

When you are done creating a notebook, there is an option to link it to a certain cluster. This can be done by clicking on the “Connect” button examples of the available clusters are provided. As a result of this, we only get one cluster and this is automatically chosen.

Writing Queries

All right, let us express some queries that allow modifying the data coming from the tables above. Now let me list all directories within our Databricks File System (DBFS) by using the ‘fs.ls’ command.

Loading Data into Data Frames

Then, to work with data from our files, we will transform it into data frames with the help of spark.read.csv. From the above SQL tables we shall generate two data frames with names orders and order_items respectively.

Joining Data Frames

And now let’s join the two data frames using the join function. Next we will use a join on the orders table and order items table where we’ll join it through the common primary key Order ID.

Filtering Data

To that list, let’s also include a filter that restricts the orders by status to “complete” or “closed”.

Grouping and Aggregating Data

After that, we will use the group by method to group the data by order date and then make use of the sum function to compute the daily revenue.

Rounding and Ordering Data

Last but not least, to minimize rounding errors, the revenue will be rounded to a decimal of 2 places and the data will be sorted according to the order date.

 

0
    0
    Your Cart
    Your cart is emptyReturn to Courses