“Ultimate Guide”

History of the database industry

The database industry has come a long way from purely addressing Structured, Relational and Low Volume data requirements using SQL database systems TO addressing the new age of exponential data growth by volume, velocity, and variety due to the popularity of Smart Devices, IOT (Internet of Things) and just streaming platforms.

Considering I have used SQL and NOSQL databases for the past 20 years in my consulting career as a cloud solution architect, there always comes a use case  where we have to pick SQL or NOSQL databases. As both these have its purpose  there are some major differences between the two which every IT practitioner should know which I hope this blog helps you decide.

For decades, the predominant data model that was used for application development was the relational data model used by relational databases such as Oracle, DB2, SQL Server, MySQL, and PostgreSQL.

It wasn’t until the mid to late 2000s that other data models began to gain significant adoption and usage which were categorized as “NoSQL” databases.  The term NoSQL can also be thought of as non relational as well.

 Note:  When we say RDBMS, we generally refer to a database that implements the relational data model, supports ACID transactions, and uses SQL for query and data manipulation.

SQL databases: The old kid on the block 

The SQL databases can be categorized into relational database management systems (RDBMS). The basic building blocks of data storage are in 

  • Tables 
  • Columns
  • Rows

The data can be queried via SQL (Structured Query Language) which is a fairly easy language to learn as it is (declarative and english like). The data within the database is related/joined via key columns between two or more tables called primary and foreign keys that maintain referential integrity within the database. Hence maintaining accuracy and consistency of data throughout the database.

 

SQL Relational database: Properties under the hood

RDBMSs maintain transaction control by using atomic, consistent, independent, and durable (ACID) properties to ensure transactions are reliable. The following defines each of the associated properties

Relational databases follow the properties of ACID

  • Atomicity All-or-nothing nature of a set of operations.
  • Consistency:system must be left in a consistent state while processing transactions, it either reflects the state after successful completion of the transaction or must roll back to a state prior to the start of the transaction.
  • Isolation refers to the interaction effects between transactions. Under what conditions is the state modified by one transaction visible to other active transactions in the system
  • Durability indicates that once a transaction has committed, the effects of the transaction remain despite events such as errors and crashes.

Current SQL Relational DB Market Landscape

https://db-engines.com/en/ranking

The graph crawls publicly available information, searches, job offers, mentions on websites and more. Oracle, Microsoft SQL Server, and MySQL don’t show much change in popularity over the past few years.  But they all are clustered together far above all other entries.

Another solid research done by Gartner last year still shows the RDBMS vendors as leaders in their magic quadrant .They keep improving and coming up with additional features to make the relational database better

SQL relational database key Features:

  • ACID transactions at the database level makes development easier.
  • Fine-grained security on columns and rows to prevent unauthorized users.
  • Most SQL code is portable to other SQL databases, including open source options.
  • Typed columns and constraints will validate data before it’s added to the database and increase data quality.

  Shortcomings of SQL relational database key Features:

  • ERD – Entity-relationship modeling must be completed before testing begins, which slows development.
  • RDBMSs don’t scale out when joins are required.
  • Sharding over many servers can be done but requires application code and will be operationally inefficient.
  • It can be difficult to store high-variability data in tables.

NOSQL databases in action 

Companies like Facebook, Google and Amazon started creating NoSQL technologies because they had to process exabytes of data with scalable database management systems that could deliver performance and availability of the system to billions of users.

NoSQL databases shine at having schema-less capability that allows for the ingestion of multi structured data.

A distributed NoSQL database uses commodity hardware to scale out 

(1) leveraging less expensive hardware and/or cloud infrastructure;

(2) scaling on demand and without downtime.

The point above can be seen in the graph below

  • Big Data Applications: Large volumes of data are easily handled by NoSQL databases.
  • Scalability: The system performance grows linearly with the addition of new servers in the cluster. distributed clusters of hardware instead of scaling up by adding expensive and robust servers
  • Low Cost : Commodity servers and Open Source softwares can be used for a full production scale implementation helping in lowering the total cost of ownership
  • Flexibility: Different types of NoSQL databases are available to address specific application requirements.

Looking into the future it is apparent that data volumes will be exponentially increasing along with its varied data formats i.e unstructured where relational SQL databases do struggle hence giving birth to NoSQL databases. 

Classification of NOSQL databases and usages pattern

There are four main categories of NOSQL databases which are used for different use cases.

  • Document
  • Key Value
  • Columnar 
  • Graph

What are Document Databases

Data is represented often as an object or JSON-like document . The flexible, semistructured, and hierarchical nature of document databases allows them to evolve with applications requirements. Document databases store and retrieve documents and the basic atomic stored unit is a document.

A document database contains a collection of key-value pairs stored in documents that are both natural and flexible for developers to work with. 

Some use case where document databases shine 

  • Content management systems
  • Blogging platforms
  • Analytics platforms
  • e-commerce platforms
  • Document search

Document Database Vendors: 

  • MongoDB (10Gen)
  • CouchDB 
  • Couchbase 
  • Cosmos DB

Some of the key benefits of Document databases:

  • Natural Data Model: Data that is accessed together is stored together which helps with writing less code and users get higher performance.
  • Flexible Schema: Dynamically Adapt to Change in requirements
  • Universal: Common format of JSON

Some Shortfalls of Document databases:

  • Key Transactions can be done only within a document at the database level. 
  • It lacks fine-grained security at the element level.
  • Harder to query the data with languages like SQL as the database normally has their own proprietary non standard query language, which prohibits portability.

What are Key-Value Databases 

A key-value database stores data as a collection of key-value pairs in which a key serves as a unique identifier with its associated value. The value can be any sort of byte array, data structure, or binary large object (BLOB), and works well for storing enormous amounts of data. 


Key-value stores are the simplest NoSQL databases. Below is a conceptual view of the data structure 

Key-value database use cases

  • Session Data: Web applications may store user session details and preference in a key-value store. 
  • Real-time recommendations and advertising are often powered by key-value stores because the stores can quickly access and present new recommendations or ads as a web visitor moves throughout a site.
  • In Memory Data Caching: to speed up applications by minimizing reads and writes to slower disk-based systems.
  • Shopping Cart: Can handle the scaling of large amounts of data and extremely high volumes of state changes while servicing millions of simultaneous users through distributed processing and storage.
  • IOT: The data from an IOT device can be stored efficiently in a key-value data model

Popular Key-Value Database Vendors: 

  • Memcache
  • Berkeley DB
  • Memcache
  • Redis
  • Riak
  • DynamoDB

What are Graph Databases 

We don’t need to get into graph theory the math behind this database for a non technical overview.

A graph database’s purpose is to make it easy to build and run applications that work with highly connected datasets. Typical use cases for a graph database include social networking, recommendation engines, fraud detection, and knowledge graphs. 

Let’s talk about the basic building blocks of a graph data model 

  • Node represents an entity (a person, place, thing, category or other piece of data), 
  • Relationship represents how two nodes are associated. 

Looking at the example below of social media Use profile which is represented as Nodes and Relationships are shown as arrows connecting the Nodes with each other.

Another example below of  an organization’s personnel database structured as a graph. The entities are employees and departments, and the edges indicate reporting relationships and the departments in which employees work.

.

Data in the world is becoming increasingly more connected. To represent this type of highly connected data only a graph data model is able to store, process, and query these connections efficiently. 

What are some real life use cases for graph database

  • Fraud detection
  • Real-time recommendation engines
  • Master data management (MDM)
  • Network and IT operations
  • Identity and access management (IAM)
  • Knowledge Graph
  • Artificial intelligence

Popular Graph Database Vendors: 

  • Neo4j
  • Apache Giraph
  • MarkLogic 

What are Column family store Databases ? 

“A columnar store database stores data using column oriented model rather than by rows, which makes it suitable for analytical query processing”

Stores data efficiently

Row-oriented databases store each record in one or more contiguous blocks on disk. Column-oriented databases store each column in one or more contiguous blocks. 

Columnar storage lets you ignore all the data that doesn’t apply to a particular query, because you can retrieve the information from just the columns you want. 

Some of the popular cloud databases which use columnar storage are Amazon Redshift, Google BigQuery, and Snowflake. A columnar database stores its data by columns, with the data for each column stored together.

 “Here comes a curveball: What are Wide Column Databases”

Wide column databases use a concept called a keyspace. A keyspace is kind of like a schema in the relational model. The keyspace contains all the column families (kind of like tables in the relational model), which contain rows, which contain columns.

Within a given column family, all data is stored in a row-by-row fashion, such that the columns for a given row are stored together, rather than each column being stored separately.

Benefits of Column Store Databases

  • Compression. Column stores are very efficient at data compression and/or partitioning.
  • Aggregation queries. Due to the columnar design of storage in the database, it performs aggregation queries fast (such as SUM, COUNT, AVG, etc).
  • Scalability. They are well suited to massively parallel processing (MPP) by spreading data  across a large cluster of machines 
  • Parallelization: Operations on different columns can easily be processed in parallel.

Popular Column Store Database Vendors: 

  • Apache Cassandra
  • Hypertable
  • Apache Accumulo

 

NoSQL Paradigm better suited for the age of Digitization 

With the adoption of cloud computing , mobile , social media and other key big data technologies NoSQL is becoming a preferred database technology to fuel the pro

cessing of large scale data use cases. 

Majority of the fortune 500 companies have instances where both NoSQL and RDBMS databases co-exist to cater to the unique use cases.

By Leveraging the power NoSQL framework enterprises are better equipped to both develop applications with agility and operate at any scale to set the stage and meet the demands of the new Digital Economy.

References:

https://www.cs.ucdavis.edu/
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments