7  Document-oriented databases

7.1 Learning objectives

Concepts

  • NoSQL databases

  • Document-oriented databases (or document stores)

Practical skills

  • Writing a JSON document.

7.2 NoSQL databases

Now that you have become familiar with relational databases, we will be turning our attention to different types of databases: the document-oriented database (also called document stores), and the graph databases (we’ll cover those in the next chapter). These are two variants of a much wider range of database types that, together, are regrouped under the NoSQL label (which can refer to “non-SQL” or “not only SQL”). These databases have grown in popularity with the rise of the web 2.0 and cloud computing, as well as the diminishing costs of data storage. Unlike relational databases that share a common language (SQL), there is no standard in the NoSQL world. However, we will not be discussing these languages in this course, but instead focus on how document-oriented databases and graph databases differ from relational databases in the way that they organize data.

7.3 Document-oriented databases

There exists a wide variety of document-oriented databases, each with their own particularities. We will focus here on one very popular example of document-oriented databases: MongoDB (https://www.mongodb.com/). Let’s look at the main differences between MongoDB and MySQL.

7.3.1 Documents and collections

Instead of storing records in tables as in the relational model, MongoDB (and other document-oriented databases) do not split the data into multiple tables that are linked together through foreign keys. Instead, data is stored in documents that are grouped together in collections.

While they are not conceptually equivalent, documents are to document-oriented databases what table rows are to relational databases: they are individual records that have a unique identifier. Collections are equivalent to tables, they contain a set of documents, just like tables contain a set of records representend by rows.

7.3.2 Non-structured data

Rather than the structured data format of relational databases, document-oriented databases use non-structured data formats like XML and JSON. MongoDB uses the JSON format (actually, a variation of it called BSON). As you might recall from the first chapter of this course, one particularity of the non-structured data formats is that the structure of the data is included directly in each document, which means that each document in a collection can have a different structure.

7.3.3 No (very few) relations

Another important feature of the JSON format is that it can include a list of values in a single field. This eliminates the need for associative tables and, consequently, the need to retrieve data with more or less complex SQL queries pulling data from multiple tables using joins. With document-oriented databases, the documents contain all the data we need, reducing the need to pull data from other collections. That said, it remains possible to link collections together, but this is not a typical use of document-oriented databases.

7.3.4 Implications for database design

The differences between the relational and the document-oriented databases presented so far may seem a bit superficial or technical, but they also have important implications when it comes to designing databases. Most importantly, the documents include not only the entities’ attributes but also their relationship with other entities. This means that 1) the same data can be repeated in multiple documents and collections, and 2) there is no need for associative or validation tables. The following video shows off my exceptional whiteboard drawing skills but perhaps more importantly explains in more detail the fundamental differences between the two types of databases.

7.4 Storing document collections in a JSON file

A JSON file is nothing more than a text file that contains text that follows the JSON syntax. So we can create them with any text editor (e.g. notepad). A JSON file must always begin with and end with a curly bracket {}.

7.4.1 Key-value pairs

In JSON, everything is a key-value pair except for objects and arrays for which the key is optional.
The key is always between double quotes, and so are values that are strings. Here’s an example of a key-value pair.

"name":"Barry Bonds"

So a JSON file with only that one key-value pair would look like this:

{ "name":"Barry Bonds" }

7.4.2 Types of values

In JSON, values must be one of the following data types:

  • String "name":"bada booms"

  • Number "number_of_children":10

  • Boolean "sale":true

  • NULL "middlename":null

  • Object {"name":"John", "age":30, "city":"New York"}

  • Array ["John", "Anna", "Peter"]

7.4.3 Creating a document (JSON object)

Here’s an example of a fictional course written as a JSON, which is an example of a document that we could eventually want to store in a collection.

{
  "code": "INFO6540",
   "semester": "Winter",
   "year": 2023,
   "description": "This is the course description",
   "brightspace_active": true,
   "students": [
      {"id": "B00123456", "name": "Wayne Gretzky"},
      {"id": "B00654321", "name": "Mario Lemieux"}],
   "supplementary_files_url": null
}

And here is another course.

{
  "code": "INFO6270",
  "semester": "Fall",
  "year": 2022,
  "name": "Singing for dummies",
  "description": "This the course description for INFO6270",
  "brightspace_active": false,
  "room":"ROWE1234",
  "students": [
                {"id": "B00123457", "name": "Justin Bieber"},
                {"id": "B00865321", "name": "Lady Gaga"}
              ],
  "supplementary_files_url": "www.info6270_sup.ca"
}

we can see that the document contains strings (code, semester, description), numbers (year), booleans (brightspace_active), and an array (students) that contain objects.

7.4.4 Creating a collection (JSON array)

A collection is essentially a group of objects grouped together in an array. For example, if we wanted to store all our courses together into a collection of courses, we could do it in a single JSON file that contains an array called “courses”, like this:

{
  "courses": [
    {
      "code": "INFO6540",
      "semester": "Winter",
      "year": 2023,
      "description": "This is the course description",
      "brightspace_active": true,
      "students": [
        {
          "id": "B00123456",
          "name": "Wayne Gretzky"
        },
        {
          "id": "B00654321",
          "name": "Mario Lemieux"
        }
      ],
      "supplementary_files_url": null
    },
{
      "code": "INFO6270",
      "semester": "Fall",
      "year": 2022,
      "name": "Singing for dummies",
      "description": "This the course description for INFO6270",
      "brightspace_active": false,
      "room":"ROWE1234",
      "students": [
        {
          "id": "B00123457",
          "name": "Justin Bieber"
        },
        {
          "id": "B00865321",
          "name": "Lady Gaga"
        }
      ],
      "supplementary_files_url": "www.info6270_sup.ca"
    }    
  ]
}

7.4.5 Multiple collections in a single JSON file

We can include multiple collections in a single JSON file simply by adding a second array. Let’s add a student collection to our JSON file.

{
  "courses": [
    {
      "code": "INFO6540",
      "semester": "Winter",
      "year": 2023,
      "description": "This is the course description",
      "brightspace_active": true,
      "students": [
        {
          "id": "B00123456",
          "name": "Wayne Gretzky"
        },
        {
          "id": "B00654321",
          "name": "Mario Lemieux"
        }
      ],
      "supplementary_files_url": null
    },
{
      "code": "INFO6270",
      "semester": "Fall",
      "year": 2022,
      "name": "Singing for dummies",
      "description": "This the course description for INFO6270",
      "brightspace_active": false,
      "room":"ROWE1234",
      "students": [
        {
          "id": "B00123457",
          "name": "Justin Bieber"
        },
        {
          "id": "B00865321",
          "name": "Lady Gaga"
        }
      ],
      "supplementary_files_url": "www.info6270_sup.ca"
    }    
  ],
  "students":[
    {
      "id":"B00865321",
      "name":"Lady Gaga",
      "courses":[
        {
          "code":"INFO6270",
          "completed":true,
          "grade":"B-"
        }
      ]
    },
    {
      "id":"B00123457",
      "name":"Justin Bieber",
      "courses":[
        {
          "code":"INFO6270",
          "completed":true,
          "grade":"C+"
        }
      ]
    },
    {
      "id":"B00123456",
      "name":"Wayne Gretzky",
      "courses":[
        {
          "code":"INFO6540",
          "completed":false,
          "grade":null
        }
      ]
    },
    {
      "id":"B00654321",
      "name":"Mario Lemieux",
      "courses":[
        {
          "code":"INFO6540",
          "completed":false,
          "grade":null
        }
      ]
    }
  ]
}

7.5 Homework (Lab 4)

GummyBox has decided to store its data into a noSQL database that will include collections of gummies, gummy boxes, invoices, and an additional version of the gummy collection that includes reviews. The database administrator already extracted the data from the gummybox_sales database and sent it to you in an Excel file (see lab 4 on the course BrightSpace). Your task is to create a JSON file with a sample of the data (2 documents) for each collection.

7.6 Additional resources

You can use this online JSON validator and formatter to help you write JSON and convert your data from JSON to CSV or XML or vice versa.

7.6.1 MongoDB demo

While you are not required to learn how to use MongoDB or another type of document-oriented database in this course, you may still be curious to see how such databases look like in real life. So, in this video, I demonstrate how to create a collection and documents in MongoDB.

Errata

At time 8:10, I include an object called orders. What I actually should have done is create an array, and then make each entry in that array an object.

You can install a local MongoDB server on your computer if you want to use it to create your own document collections. MongoDB offers a series of free online courses that can also help get you started.