Demystifying Graph Databases

“…And that’s why graph databases are so cool,” I say, after droning on for some time. “You get it, right?”

“I do,” Kofi says with a blank expression on his face, a telltale sign that he actually didn’t.

“Are you sure?” I ask, sensing the confusion that stubbornly refuses to go away.

He hesitates.“Not really. I mean, I understand the theory, and it’s cool and all, but I just don’t see why I should pick it over something else.”

 

I’ve had some back-and-forths with a number of people about graphs, and I’ve noticed that many of them share Kofi’s sentiments. Some have the preconception that graph databases are complex and avant-garde systems, while others feel that the benefits that come along with adoption aren’t significant enough to be worth the migration effort. In this blogpost , I will attempt to address the lack of enthusiasm towards graph databases. Is the concept really hard to grasp, or it’s something else?

 

To start it off, what’s a graph?

A graph is basically a representation of how things are connected. This representation is expressed diagrammatically as a set of points (nodes or vertices), and edges connecting these points. For example, a map is a graph. It’s essentially a representation of places (nodes) and roads (edges) that connect them. Graph databases are simply databases that store data in this format. So in the database, you have your nodes (items or objects), and the connections between them which are represented as relationships. Let’s say you want to store your family tree. The people in your family are the nodes of the graph, and the relationships between you are the edges. 

Let’s use your uncle, Kojo, and his children as examples, how would they be represented in a typical graph database?

Diana and Kojo are married, and they are parents to Kwame, Lala, and Junior.  This is depicted in the diagram above, and their relationships are quite literally stored in the graph this way.  It’s quite intuitive. The queries are very close to plain language, and similar to how we think about things. 

 

Now let’s move on to the most commonly used databases – relational databases.

A relational database is a type of database that organises data into tables which can be linked (related) based on data common to each. Let’s use the family tree as an example once again.

How will the relationships we saw earlier be represented? Tables. The more complex set of data you have, the more tables you’ll need to keep everything neat and tidy. Kojo’s familial data is fairly straightforward, so we only need two tables; one for persons, and one for marriages. In the graph database we’ll only need 2 attributes, first_name and last_name, but for a relational database, we’ll need to introduce 3 more attributes; person_id, father_id, and mother_id. The person_id is necessary because we’ll need it to identify the person’s relationship with other people and tables. For instance, the person_Id for Kojo will be used to fill in the father_Id for his children. It will also be used in the marriages table to identify who is marrying whom. We can get rid of the marriage table altogether and add a new attribute to the person’s table and maybe call it married_to and use the person_id for identification but that is very untidy. We don’t need a separate table for parents because everyone has a mother and father, but not everyone is/ will get married so there will be a lot of blanks. This is just one of the considerations you have to make when designing a relational database, and there are a couple more best practices you have to have to follow. 

The data will be represented this way:

 

person_id first_name last_name father_id mother_id
12 Kojo Antwi 1 2
18 Diana Ross 7 8
31 Kwame Antwi 7 8
34 Lala Antwi 7 8
35 Junior Antwi 7 8

 

So using the family tree as an example, which database is more intuitive? You’ll probably go for the graph right? So why are graph databases immensely less popular? Relational databases had a

significant head start, but the technology industry is not exactly resistant to change. To attempt to answer this question I have to go back just a few thousand years.

Humans have  been keeping records since forever and a day. As far back as 3000 B.C., when the dominant form of writing was cuneiform(glyphs and pictograms), Mesopotamians were keeping track of commercial and financial records in clay tablets in the form of tablets. The Egyptians meticulously tabulated data at their Karnak temple, the ancient Assyrians and Greeks used tables to keep track of their money lending operations, the English used tables to keep track of how much serfs owed their lords, etc. As time flew by, tabular data became more commonplace and was used in ordinary households and became more elaborate. The Medici dynasty used them in the process of revolutionizing what would become modern banking. 

So we have quite a rich history with tabular data, and naturally we did not stray from that when computers were invented and databases became necessary. Over 5000 years of usage has conditioned us with a relational mindset, and it’s not easy to just shrug that off. 

This is not to say that we should all drop other databases like hot potatoes and fly to ‘Graphland.’ The grass is greener and the sun shines brighter here (wink), but sometimes using graphs for certain database requirements is overkill and just not necessary. 

For example, if you need a database to simply store user login information, using a graph will not provide any benefits. Graphs shine and provide valuable insights when there are relationships and connections between the data. So you’d need to carefully analyse your use cases and evaluate whether the data you will be storing will be a good fit. 

For instance, take a product like Rendezvous, a system that delivers serendipitous trusted recommendations to people through conversational AI. 

A graph is the backbone of the system, and the thought of using any other kind of database system makes me shudder uncomfortably. The data is highly interrelated and the point of the system is to provide insight into what people want. In this use case, graph databases check all the boxes.

Also, it doesn’t need to be an either/or situation. If you find along the line that graph databases can help you in some way, you wouldn’t need to migrate your entire system. You can integrate graphs into your system to cater to certain use cases and gradually migrate when need be. That way it’s neither a tedious nor scary process. And  even if you don’t have a need for it now, the world is changing, data will probably become the premier resource in the world at some point, and when that happens, the connections between that data and the insights they can provide will be invaluable, so it’s best to get ahead of the curve and familiarize yourself with them now, because graphs are the future.