[Yet Another example of] online dating using graph databases.
When it comes to dating, everybody is highly motivated. So it is no surprise that the nerdy among us put their advanced knowledge to work when seeking out a mate. The most recent celebrated example is Chris McKinlay, who used a statistical modeling approach to find which type of women to go after. The result: after 88 dates, McKinlay found the right woman for him, who, as it turns out, had been hacking her profile in a different way (see “How a Math Genius Hacked OkCupid to Find True Love”).
But interest in applying technology to find love is also highlighting a shift toward graph database technology that is starting to transform applications in a large number of industries. Here is the evidence:
- Several of the largest dating sites in the world have shifted toward graph databases in the last nine months.
- LinkedIn has a large team working on a proprietary graph database, which sits at the center of nearly every operation at LinkedIn.
- Twitter depends on a graph database, and has released FlockDB, a graph database it created, as open source.
- Neo Technology, the creator of Neo4j, the most popular graph database, now has more than 30 Global 2000 companies adopt its technology, including enterprise brands like Wal-Mart, eBay, Lufthansa, and Deutsche Telekom.
- Teradata just released a new type of SQL called SQL-GR, intended to make the graph analytics easy for enterprise users.
- According to a report by industry observer DB-Engines, “Graph DBMSs are gaining in popularity faster than any other database category,” growing 300 percent since January of last year.
It seemed appropriate to use Valentines Day and online dating as an opportunity to explore why graph databases are increasingly powering the search for love, as well as what the lessons are for other sorts of applications.
It’s the Relationships, Stupid!
Social Graphs are becoming more and more crucial to online dating, as dating companies discover how much more accurate their recommendations become when considering the network effects.
Snap Interactive, the company behind the dating site AYI – are you interested?, uses aone billion person social graph to significantly improve the likelihood of finding a match. It does this by using the graph to recommend people in one’s extended social network: friends-of-friends, and friends-of-friends-of-friends, who statistically speaking are much more likely to go out on a date than complete strangers. In just the last six months, more than half a dozen online dating companies around the world have quietly implemented graph databases to help them bring the power of the network into their decision-making. Key graphs include not just the social graph, but also the passion graph (of shared interests), location graph, and others.
Glassdoor, which is for careers and jobs what Yelp is for food, accomplishes much the same thing, but with companies, jobs, and job seekers, also with a graph of nearly a billion people, consisting of its users and their friends. Both Snap and Glassdoor report they have significantly improved the accuracy of their recommendations by using a graph to navigate their connected data. By finding and making better use of networks, many different types companies are breaking new ground with respect to intelligent real-time analytics. In his session at Strata, Eifrem, CEO of Neo Technology, reported that many people, once they learn what a graph database can do, start seeing graphs absolutely everywhere.
Treating relationships, sometimes called the edges of a graph, as a first class object is the fundamental innovation of graph databases. The database doesn’t only store just information about individual things, but it also stores the relationships between those things. This capability makes it much easier to express sophisticated questions, and get answers in a small fraction of the time it takes a traditional database. The relationships in the database can express the nature of each connection (parent, child, owns, friend) and capture any number of qualitative or quantitative facts about that relationship (weighting, start and end date, etc.).
Because of this you can write a queries that express constraints like:
- Find all men who are connected within three friends of my women friends who like sailing but not bowling and who live within 30 miles of my zip code.
- Find all women who don’t know any of my friends within two levels, but enjoy spending time in some of the same places that I do.
- Find all men in my friends-of-friends network who enjoy the most activities similar to me.
Queries like the ones described above can take pages of SQL and execute slowly on relational databases. A graph database can return results in a snap, breaking existing SQL speed limits, often with just a few lines of code.
Signs You May Need A Graph Database
Eifrem, who has always been a massive booster of graph databases, was surprised how quickly companies have been finding new uses for graph databases in the last few years.
“When we started out, we thought that we would find acceptance in three key vertical markets: internet services and independent software vendors, financial services, and telecom companies,” said Eifrem, who is also the co-author of the O’Reilly book Graph Databases. “But it has turned out that we have found a home for our graph database in dozens of industries, across an even greater variety of use cases.”
Those use cases range from recommendations and real-time analytics, to fraud detection, impact analysis, identity & access management, portfolio management, resource optimization, product line management, and others.
In addition, the related field of graph analytics is also growing. In this model, data is stored in many repositories, not just in a graph database, and is brought together into a graph analytics engine for a particular analysis. Loosely speaking, graph databases are like OLTP databases and graph analytics engines are like OLAP systems. When looking at a graph-based technology, the first question to ask is: Is it a database or an analytics engine. You can find out more about the popularity of graph databases space today at industry observer DB-Engines, an organization that ranks databases by popularity. Right now, Neo4J dominates the space, but there are many entrants that have powerful support such as the Apache Giraph project which is based on Hadoop.
Eifrem said that the rise in acceptance of graph databases is based on several factors:
- The world is connected. The value in computing these days is no longer about automating business processes (this was the dominant use case when the relational database was born). Today’s problems center around understanding the real world in all its connected and dynamic glory. The world is a graph: might as well embrace it.
- Change happens fast. The world moves a lot faster than it did 20 years ago. Back then, it didn’t matter if it took months and years to design and write systems. Today the timescale is months. Putting connected data into a relational database is hard. And getting it out is even harder. By putting your graph data into a graph database, the modeling time and development time are both drastically reduced, and it’s much easier to change the model once the system has been built.
- The need for speed. The best decisions are the ones made with the very latest information. Graph databases are brilliant at answering very complex and valuable questions in real time. This has been a holy grail of sorts in the analytics space, commonly referred to as real time analytics. Older database systems often grind to a halt when trying to answer these kinds of questions. For certain types of graph-friendly questions, if ask the question today, and you might get the answer tomorrow. By then the customer has left your web site. The workaround has been to pre-calculate all of your recommendations at night and serve them up during the day. This sounds great until you realize something really important happened between last night and this moment that affects how you want to treat that person.
Eifrem said that any of the following problems may indicate that you should consider a graph database:
- Performance problems with your relational database due to the complexity of your queries or data structures. According to Eifrem, “our customers have often reported an performance increase of 1000x or more over Oracle and MySQL for certain queries.”
- Projects taking a very long time. “This can be a side-effect of trying to make graphy data fit into tables: you end up with very long and complicated queries and code. Because it’s convoluted and hard to understand, it takes a long time to write and test, not to mention tune. This happened to me when I was CTO of a startup back in 2000, and is what led me to design the property graph model (literally on a napkin)”
- If you feel like your business is being trapped by your data, and there are questions you want to be able to ask that just can’t be answered, then you might want to see if a graph database can unlock your hidden graph.
“Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code,” said Volker Pacher, Senior Developer at eBay, who has been using Neo4j for the last year. “At the same time, Neo4j allowed us to add functionality that was previously not possible.”
Graph databases seem to be well on their way to following the trajectory of on-line dating, which started as an oddity, then became increasingly successful, and now is accepted as a great way for many people to find a mate. If you seem to have a graphy problem, you may want to set up a date between your data and a graph database sometime soon.
Follow Dan Woods on Twitter:
Dan Woods is CTO and editor of CITO Research, a publication where early adopters find technology that matters. For more stories like this one visit www.CITOResearch.com. Dan has done research for Teradata.