Google X’s next moonshot is to conquer the human genome

source:http://www.wired.co.uk/news/archive/2014-07/25/google-x-baseline

The latest Moonshot project of the Google X secret laboratory revolves around the collection of medical data. The mysterious division’s experiment is called the Baseline Study, according to the Wall Street Journal and involves the design of an ambitious database, which will see genetic information mapped so that the company can draw up a picture of what the healthiest possible human body would look like.

The first stage of the project will involved harvesting the genetic and molecular data from 175 people in order to create the database, which will have much larger and broader datasets than any other studies of its kind. The hope is it will become a tool to help physicians detect and treat major health issues.

As such, the study won’t place any emphasis on tackling specific diseases, but will collect many hundreds of samples using all manner of diagnostic tools. Once the data is accumulated, Google will able to scan through it and let its computers discover patterns that will serve as biomarkers for disease discovery.

Rather than focussing on finding cures for various diseases, the project will work entirely on developing preventative medicines, techniques and technologies, including diagnostic tools that are able to work better at an earlier stage. It is conceivable, for example, that one biomarker could be associated with an inability to break down fatty foods or an ability to prevent heart disease.

Andrew Conrad who works for Google’s research team told the WSJ that people shouldn’t expect immediate cures to complex diseases, but that he hopes advances will be made in “little increments”. Ultimately the project is a huge gamble and there is no guarantee that it will result in the researchers discovering biomarkers that tell them anything major, or even anything at all.

Information that is collected will include participants’ full genomes and entire genetic history. Google has promised that all data collected as part of the research will remain private and will not be handed over to insurance companies. The medical school boards at Duke and Stanford Universities, which are involved in the project will oversee the research, recruiting the volunteers and ensuring that their data is anonymised before it is handed over to Google.

 

[daily graph news]Google’s open source graph database

http://www.eweek.com/database/google-releases-cayley-open-source-graph-database.html

Google Releases Cayley Open-Source Graph Database

Cayley will be used to help Google continue to refine the idea of linking data together in graph databases, including Google’s Knowledge Graph.

Google has been using, improving and boosting its Knowledge Graph search services for several years to show users how information can be linked together in graphics form to help find desired results. Now it is again pushing forward in the graph database world through the open-source release of Cayley, which will be used in the continuing development of graph databases.

 

The availability of Cayley was announced by Google software engineer Barak Michener in a June 25 post on the Google Open Source Blog. “Four years ago this July, Google acquired Metaweb, bringing Freebase and linked open data to Google,” he wrote. “It’s been astounding to watch the growth of the Knowledge Graph and how it has improved Google search to delight users every day.”

 

Since then, the concepts of Freebase and its linked data have spread through Google’s worldwide offices, wrote Michener. “I began to wonder how the concepts would advance if developers everywhere could work with similar tools. However, there wasn’t a graph available that was fast, free, and easy to get started working with. With the Freebase data already public and universally accessible, it was time to make it useful, and that meant writing some code as a side project.”

 

Google is making that happen now with the release of Cayley, an open-source graph database that is being called a “spiritual successor” to graphd, wrote Michener. Cayley “shares a similar query strategy for speed” with graphd, while adding its own unique features, including RESTful API, multiple (modular) back-end stores such as LevelDB and MongoDB, multiple (modular) query languages and ease-of-use features that make it convenient to work with for developers, he wrote.

 

 
Internet of Things: Choose an Intelligent Database
 
 

“Cayley is written in Go, which was a natural choice,” he added. “As a backend service that depends upon speed and concurrent access, Go seemed like a good fit. Go did not disappoint; with a fantastic standard library and easy access to open source libraries from the community, the necessary building blocks were already there. Combined with Go’s effective concurrency patterns compared to C, creating a performance-competitive successor to graphd became a reality.”

 

To illustrate some of the uses of Cayley, Google developers created a YouTube video that describes the building of a small knowledge graph using the application. “The video includes a quick introduction to graph stores as well as an example of processing Freebase and Schema.org linked data,” he wrote.

 

Interested developers can also check out a demo dataset in a live instance running on Google App Engine to see how it works. “It’s running with the sample dataset in the repository — 30,000 movies and their actors, roles, and directors using Freebase film schema,” wrote Michener. “For a more-than-trivial query, try running the following code, both as a query and as a visualization; what you’ll see is the neighborhood of the given actor and how the actors who co-star with that actor interact with each other.”

 

The open-source project is hosted on GitHub.

 

Graph search, an open-source database project built on all the networking we do online every day, is the most far-reaching search IT to go mainstream since Google started storing up and ranking Websites more than a decade ago, according to an earlier eWEEK report. Basically, a graph search database anonymously uses all the contacts in all the networks in which you work to help you find information. Anything you touch, any service you use and anything people in your networks touch eventually can help speed information back to you. It avoids anything non-relevant that would slow down the search.

 

Google is a large user and producer of open-source software.

 

In December 2013, Google joined the Open Invention Network (OIN), which was created in 2005 as an intellectual-property company that works to promote, protect and openly share Linux patents among its members and the open-source community. The OIN is a consortium of open-source user companies. The other members of the OIN are IBM, NEC, Philips, Red Hat, Sony and SUSE, a business unit of Novell. Canonical and TomTom are associate members of the group. Google had previously been involved with the OIN since 2007 as an “end-user licensee,” according to the OIN.

 

Facebook has also been experimenting with real-time graph search, which enables users to quickly find content they have touched at some point in their Facebook lifetimes, according to an eWEEK report in January 2013. Queries written in the blue bar across the top of the Graph Search page can fetch photos, videos, links, documents—anything the user has touched or shared, or had shared with—on Facebook from the first day the user joined the social network.

- See more at: http://www.eweek.com/database/google-releases-cayley-open-source-graph-database.html#sthash.NB5q3hnt.dpuf

[daily graph news] 50 Shades of Graph: How Graph Databases Are Transforming Online Dating [from Forbes]

[Yet Another example of] online dating using graph databases.

=============================================================

When it comes to dating, everybody is highly motivated. So it is no surprise that the nerdy among us put their advanced knowledge to work when seeking out a mate. The most recent celebrated example is Chris McKinlay, who used a statistical modeling approach to find which type of women to go after. The result: after 88 dates, McKinlay found the right woman for him, who, as it turns out, had been hacking her profile in a different way (see “How a Math Genius Hacked OkCupid to Find True Love”).

But interest in applying technology to find love is also highlighting a shift toward graph database technology that is starting to transform applications in a large number of industries. Here is the evidence:

 
  • Several of the largest dating sites in the world have shifted toward graph databases in the last nine months.
  • LinkedIn has a large team working on a proprietary graph database, which sits at the center of nearly every operation at LinkedIn.
  • Twitter depends on a graph database, and has released FlockDB, a graph database it created, as open source.
  • Neo Technology, the creator of Neo4j, the most popular graph database, now has more than 30 Global 2000 companies adopt its technology, including enterprise brands like Wal-Mart, eBay, Lufthansa, and Deutsche Telekom.
  • Teradata just released a new type of SQL called SQL-GR, intended to make the graph analytics easy for enterprise users.
  • According to a report by industry observer DB-Engines, “Graph DBMSs are gaining in popularity faster than any other database category,” growing 300 percent since January of last year.

It seemed appropriate to use Valentines Day and online dating as an opportunity to explore why graph databases are increasingly powering the search for love, as well as what the lessons are for other sorts of applications.

It’s the Relationships, Stupid!

Social Graphs are becoming more and more crucial to online dating, as dating companies discover how much more accurate their recommendations become when considering the network effects.

Snap Interactive, the company behind the dating site AYI – are you interested?, uses aone billion person social graph to significantly improve the likelihood of finding a match. It does this by using the graph to recommend people in one’s extended social network: friends-of-friends, and friends-of-friends-of-friends, who statistically speaking are much more likely to go out on a date than complete strangers. In just the last six months, more than half a dozen online dating companies around the world have quietly implemented graph databases to help them bring the power of the network into their decision-making. Key graphs include not just the social graph, but also the passion graph (of shared interests), location graph, and others.

Glassdoor, which is for careers and jobs what Yelp is for food, accomplishes much the same thing, but with companies, jobs, and job seekers, also with a graph of nearly a billion people, consisting of its users and their friends. Both Snap and Glassdoor report they have significantly improved the accuracy of their recommendations by using a graph to navigate their connected data. By finding and making better use of networks, many different types companies are breaking new ground with respect to intelligent real-time analytics. In his session at Strata, Eifrem, CEO of Neo Technology, reported that many people, once they learn what a graph database can do, start seeing graphs absolutely everywhere.

Treating relationships, sometimes called the edges of a graph, as a first class object is the fundamental innovation of graph databases. The database doesn’t only store just information about individual things, but it also stores the relationships between those things. This capability makes it much easier to express sophisticated questions, and get answers in a small fraction of the time it takes a traditional database. The relationships in the database can express the nature of each connection (parent, child, owns, friend) and capture any number of qualitative or quantitative facts about that relationship (weighting, start and end date, etc.).

Because of this you can write a queries that express constraints like:

  • Find all men who are connected within three friends of my women friends who like sailing but not bowling and who live within 30 miles of my zip code.
  • Find all women who don’t know any of my friends within two levels, but enjoy spending time in some of the same places that I do.
  • Find all men in my friends-of-friends network who enjoy the most activities similar to me.

Queries like the ones described above can take pages of SQL and execute slowly on relational databases. A graph database can return results in a snap, breaking existing SQL speed limits, often with just a few lines of code.

Signs You May Need A Graph Database

Eifrem, who has always been a massive booster of graph databases, was surprised how quickly companies have been finding new uses for graph databases in the last few years.

“When we started out, we thought that we would find acceptance in three key vertical markets: internet services and independent software vendors, financial services, and telecom companies,” said Eifrem, who is also the co-author of the O’Reilly book Graph Databases. “But it has turned out that we have found a home for our graph database in dozens of industries, across an even greater variety of use cases.”

Those use cases range from recommendations and real-time analytics, to fraud detection, impact analysis, identity & access management, portfolio management, resource optimization, product line management, and others.

In addition, the related field of graph analytics is also growing. In this model, data is stored in many repositories, not just in a graph database, and is brought together into a graph analytics engine for a particular analysis. Loosely speaking, graph databases are like OLTP databases and graph analytics engines are like OLAP systems. When looking at a graph-based technology, the first question to ask is: Is it a database or an analytics engine. You can find out more about the popularity of graph databases space today at industry observer DB-Engines, an organization that ranks databases by popularity. Right now, Neo4J dominates the space, but there are many entrants that have powerful support such as the Apache Giraph project which is based on Hadoop.

Eifrem said that the rise in acceptance of graph databases is based on several factors:

  • The world is connected. The value in computing these days is no longer about automating business processes (this was the dominant use case when the relational database was born). Today’s problems center around understanding the real world in all its connected and dynamic glory. The world is a graph: might as well embrace it.
  • Change happens fast. The world moves a lot faster than it did 20 years ago. Back then, it didn’t matter if it took months and years to design and write systems. Today the timescale is months. Putting connected data into a relational database is hard. And getting it out is even harder. By putting your graph data into a graph database, the modeling time and development time are both drastically reduced, and it’s much easier to change the model once the system has been built.
  • The need for speed. The best decisions are the ones made with the very latest information. Graph databases are brilliant at answering very complex and valuable questions in real time. This has been a holy grail of sorts in the analytics space, commonly referred to as real time analytics. Older database systems often grind to a halt when trying to answer these kinds of questions. For certain types of graph-friendly questions, if ask the question today, and you might get the answer tomorrow. By then the customer has left your web site. The workaround has been to pre-calculate all of your recommendations at night and serve them up during the day. This sounds great until you realize something really important happened between last night and this moment that affects how you want to treat that person.

Eifrem said that any of the following problems may indicate that you should consider a graph database:

  • Performance problems with your relational database due to the complexity of your queries or data structures. According to Eifrem, “our customers have often reported an performance increase of 1000x or more over Oracle and MySQL for certain queries.”
  • Projects taking a very long time. “This can be a side-effect of trying to make graphy data fit into tables: you end up with very long and complicated queries and code. Because it’s convoluted and hard to understand, it takes a long time to write and test, not to mention tune. This happened to me when I was CTO of a startup back in 2000, and is what led me to design the property graph model (literally on a napkin)”
  • If you feel like your business is being trapped by your data, and there are questions you want to be able to ask that just can’t be answered, then you might want to see if a graph database can unlock your hidden graph.

“Our Neo4j solution is literally thousands of times faster than the prior MySQL solution, with queries that require 10-100 times less code,” said Volker Pacher, Senior Developer at eBay, who has been using Neo4j for the last year. “At the same time, Neo4j allowed us to add functionality that was previously not possible.”

Graph databases seem to be well on their way to following the trajectory of on-line dating, which started as an oddity, then became increasingly successful, and now is accepted as a great way for many people to find a mate. If you seem to have a graphy problem, you may want to set up a date between your data and a graph database sometime soon.

Follow Dan Woods on Twitter:

 

Follow @danwoodsearly

Dan Woods is CTO and editor of CITO Research, a publication where early adopters find technology that matters. For more stories like this one visit www.CITOResearch.com. Dan has done research for Teradata.

[daily graph news] With Google Glass, Demand for Graph Databases Increases

ran into the following post from Neo4j blogs. Now think about a world of wearable devices and computers that communicate with each others… providing data streams of various types of information, in a highly dynamic networks of moving objects.

TechCrunch in an article on Apigee and predictive analytics technology takes note of the increasing demand for graph databases.

…there is the increasing amounts of data that people and machines create. With that scaling in data, there is a growing demand for new types of analytics capabilities. Graph databases are becoming more popular for the varied amounts of data they aggregate and analyze. These graph databases organize nodes, which might be things like a street light or people. The properties of a graph database describe the nodes. A graph database also has “edges” that connect the nodes and properties, defining the relationship between them. The value is derived when analyzing the patterns between the nodes and the properties.

As sensors become more widely used in wearables such as Google Glass, the demand for graph databases will increase. It will be important to correlate the data from the any number of sensors that might be in a house, a car or city street. There will also be the need to analyze increasing amounts of text from medical records, contracts, etc.

http://techcrunch.com/2014/01/08/apigee-acquires-insightsone-to-deepen-api-insights-with-predictive-analytics/

Breaking down the walls

While I’m chating with researchers I always heard “database guys” or “data mining guys” or “system guys”. While tagging with “X guys” help to know what a researcher is doing and what kind of areas/papers/conferences he may be active in, it somehow reduces possibility of collaboration and inspiration. We need channels to encourage multidisciplinary. Refuse a close world research. Stop building a high wall of jargon and stop setting a guard at the doors who will only let those speak their languages in. One change comes into my mind is to start some crowd-sourcing connecting researchers from different areas, arguing the same issue from different perspectives. I believe this leads to better solutions and effective collaborations for many real-world problem. If Ask.com or Stackoverflow/Mathoverflow works, there is no good reason/excuse for a failure of such a platform for professional researchers.

An article by Phil Bernstein– changes are suggested to major database conferences towards system conferences.
Systems & Databases: Let’s Break Down the Walls
Making such connections can always introduce surprise in a good way.