X
x
Scrabbl
Think beyond ordinary
Subscribe to our newsletter to explore all the corners of worldly happenings

Google Launches Dataset Search – The New Platform for Scientists, Policy Makers, Data Journalists and Other Work Groups

Google unveiled beta version of Dataset Search, the new search engine to help scientists and other professionals. A data set is a collection of discrete items of related data that may be accessed individually or in combination or managed as a whole entity.

Google Launches Dataset Search – The New Platform for Scientists, Policy Makers, Data Journalists and Other Work Groups

Google was born on the internet with the idea to create the world’s best search engine. From initial days, the company’s goal was always to organize global information, and its first target was the commercial web. Now, the technology leader wants to do the same for the scientific community, Policy Makers, Data Journalists and Other Work Groups with a new search engine for datasets.

The service, called Dataset Search, just launched, and it will be a companion of sorts to Google Scholar, the company’s popular search engine for academic studies and reports. Institutions that publish their data online, like universities and governments, will need to include metadata tags in their web pages that describe their data, including who created it, when it was published, how it was collected, and so on. This information will then be indexed by Google’s search engine and combined with information from the Knowledge Graph. So if dataset N was published by NASA, a little information about the organization will also be included in the search.

In a blog post, Natasha Noy, a research scientist at Google AI who is a member of the Dataset Search team, says the aim is to unify the tens of thousands of different repositories for datasets online. According to Natasha, they want to make that data discoverable, but keep it where it is.

At the moment, dataset publication is extremely fragmented. Different scientific domains have their own preferred repositories, so is the case with different governments and local authorities.

“Scientists say, ‘I know where I need to go to find my datasets, but that’s not what I always want,’” says Natasha. “Once they step out of their unique community, that’s when it gets hard.”

Natasha sites the example of a climate scientist she spoke to recently who told her she had been looking for a specific dataset on ocean temperatures for an upcoming study but couldn’t find it anywhere. She didn’t track it down until she ran into a colleague at a conference who recognized the dataset and told her where it was hosted. Only then could she continue with her work. According to Natasha, this wasn’t even a particularly boutique depository. The dataset was well written up in a fairly prominent place, but it was still tough to trace.


The initial release of Dataset Search covers the environmental and social sciences, government data, and datasets from news organizations like ProPublica. However, if the service becomes popular, the amount of data it indexes should experience mammoth growth as institutions and scientists ascent to make their information accessible. This should be helped by the recent flourishing of open data initiatives across the globe.


Natasha believes, in the last several years the number of repositories has exploded. She acknowledges the increasing importance of data in the scientific literature, which means journals ask authors to publish datasets. The government regulations in the US and Europe also are in need of relevant datasets. There has been a general rise of the open data movement as well.

Having Google involved should help make this project a success, says Jeni Tennison, CEO of the Open Data Institute (ODI). Jeni says, Dataset search has always been a difficult thing to support, and she is hopeful that Google stepping in will make it easier.

To create a decent search engine, you need to know how to build user-friendly systems and understand what people mean when they type in certain phrases, says Jeni. Google obviously knows what it’s doing in both of these areas.

In fact, says Jeni, ideally Google will publish its own dataset and that will enhance Dataset Search and the way it gets used. Although the metadata tags the company is using to make datasets visible to its search crawlers are an open standard (meaning that any competitor like Bing or Yandex can also use them and build a competing service), search engines improve most quickly when a critical mass of users is there to provide data on what they’re doing.

It is vital to simply understand how people search, what kind of terms they use, how they express them. According to Jeni, if we want to get to grips with how people search for data and make it more accessible, it would be great if Google opened up its own data on this.

This implies, Google should publish a dataset about dataset search that would be indexed by Dataset Search. Can we have anything more appropriate?