In this article we are going to understand between Relational vs Non-Relational database
Evolution of Storage Technologies
There is a plethora of terms and jargon that can be confusing when first starting to read about data and its storage. We’ll provide clarification on various terms below:
Data (Relational vs Non-Relational database)
Data (plural of datum) is defined as distinct pieces of information. Data can exist in several different forms: numbers, text, bytes, Instagram pictures, or YouTube videos. These represent various types of data that can be stored and transmitted electronically. Note that data is usually interpreted in a context, e.g., a data representing prose in the Hindi language can’t be interpreted as a picture and vice versa language can t be interpreted as a picture and vice versa.
There are broadly three categories of data:
Structured Data has some pre-defined organizational property about it that makes it easily searchable and analyzable. The data is backed by a model that dictates the size of each field of the data: its type, length, and any restrictions on what values it can take on. Data stored in SQL databases are structured. Structured data is generally formatted in a universally understandable and identifiable manner.
In most instances, a schema formally specifies structured data. Whenever you are working in any variant of SQL, you are almost always dealing with structured data.
Unstructured Data is characterized by a lack of organization and a data model that describes the structure of a single record or attributes of any individual fields within the record. Videos, audio, blogs, log files, social media posts, etc, are all examples of unstructured data. It is data without any conceptual definition or type.
Semi-structured Data is a cross between structured and unstructured data, and though there’s no explicit data model or structure definition, one may be implied. Semi-structured data contains semantic tags but does not conform to the structure of typical relational databases. Examples include JSON and XML data. It sits in between the spectrum of structured and unstructured data. Another example to consider is that of metadata related to videos and audios files, which themselves fall under unstructured data. The metadata such as creator, last accessed, permissions, etc can be considered as semi-structured data.
Semi-organized data includes sections that are structured and parts that are not structured. For example, X-rays and other large images consist largely of unstructured data comprising of millions of pixels. It is difficult to scan and query these X-ray images in the same manner as a massive relational database would be indexed, searched, and evaluated. However even though the files themselves consist of only pixels there exists a small section known as metadata within each file that consists of details such as author, creation timestamp, last accessed, etc. This metadata allows for some form of analysis of unstructured data.
This brings us to the question: if all of unstructured data has associated metadata then what is the difference between semistructured and unstructured data? In today’s world there’s hardly any truly unstructured data with no organization or metadata. In fact, the distinction between semi-structured and unstructured data is sort of a grey area and disputed. However, both are far away from the structured rigorously organized data living in relational databases. Often what is referred to as unstructured data such as videos, images, documents, social media postings, files etc is really semi-structured data. However, for simplicity data is usually divided and described as structured or unstructured.
In this articles, we’ll be exclusively dealing with structured data.
Database Management System (DBMS)
Usually, when we refer to databases such as MySQL and PostgreSQL, we are talking about a wholesome system, called the database management system. It’s a software that allows the user to create, maintain, and delete multiple individual databases. It provides peripheral services and interfaces for the end-user to interact with the databases.
The database is an ordered and structured data set, typically stored and retrieved electronically. The structure and organization of data help in efficient retrieval.
There are broadly two types of databases:
- Relational or SQL Databases
- Non-Relational or NoSQL Databases
Relational databases consist of data stored as rows in tables. The columns of a table rigidly follow a defined schema that describes the type and size of the data that a table column can hold. You may think of a schema as a model for what each record or row in the table might look like. This is why relational databases only handle structured data. Tables typically have a column as a key used to uniquely mark each row in a table. A relationship between two tables is defined by a column or a set of columns that occur in both the tables.
Relational database management systems (RDBMS) are mature and widely adopted technology. Popular implementations include Oracle, DB2, Microsoft SQL Server, PostgreSQL, and MySQL.
The rise of Web 2.0 companies made NoSQL databases popular. When data sets managed by internet firms have increased in complexity, a new approach to database architecture has come to the fore. The rigid schema of the relational database has been avoided in favor of a schema-less database. NoSQL databases are distributed in different ways that discuss different use cases. The spectrum includes key-value stores (Redis, Amazon Dynamo DB), column stores (HBase, Cassandra), document stores (Mongo DB, Couchbase), graph databases (Neo4J), and search engines (Solr, ElasticSearch, Splunk). The primary distinction between these NoSQL databases and SQL databases is the absence of a rigid schema in the former. NoSQL databases, unstructured, and semi-structured data fall under the purview of Big Data.
Big Data typically contains data sets of volumes outside the capacity of standard computing tools (e.g. SQL technologies) to collect, curate, handle and process data within a tolerable timeframe. The “big” in the term Big Data is a moving target that changes to a higher number as software and hardware capabilities enhance to process higher volumes of data.