Google recently launched a new search engine built specifically with data scientists in mind, to help them find data easier and to facilitate data-gathering processes.
The new tool is called “Dataset Search” and it enables easy access to millions of datasets, stored in thousands of different data repositories across the Internet.
Dataset Search allows you to find datasets wherever they’re hosted, whether on a digital library, publisher’s website or a personal webpage. This includes data published by governments across the world.
The new tool is free for everyone to use, but Google puts a special highlight on data scientists and journalists, whose jobs usually require them to handle vast amounts of datasets.
Prior to launching the new search engine, Google published new guidelines for dataset providers, which required them to pay more attention to the way they describe their data, to do it in such a way that Google understands the content of the webpage.
Similar to SEO best practices that drove publishers to create relevant content for their audience, and to publish it in such a way that Google’s search engine can rank it accordingly, dataset providers will have to comply with the new guidelines. This is even more important as it seems the new search engine operates mostly based on those guidelines.
“These guidelines include salient information about datasets: who created the dataset, when it was published, how the data was collected, what the terms are for using the data, etc. We then collect and link this information, analyze where different versions of the same dataset might be, and find publications that may be describing or discussing the dataset”, Google notes on its public launch announcement.
Google’s approach is primarily based on an open standard for describing the information (schema.org), which is valid for anyone who publishes data. Dataset providers, regardless of their size, are encouraged to adopt this common standard, in order to attract as many datasets as possible in the new ecosystem, which is rapidly expanding.
You can simply type in what you are looking for, and you’ll be directed to the published dataset on the repository provider’s site. For example, if we want to analyse the cryptocurrency market, here’s what Dataset Search provides us with.
We can see data such as historical prices, financial data, charts and top cryptocurrencies. However, datasets are as accurate as the providers build them. Google does not interfere with the actual datasets, it just facilitates finding them. This means that even if the dataset is listed in the search results, just like any other web search, it may not be complete or updated with the latest results.
“This launch is one of a series of initiatives to bring datasets more prominently into our products. We recently made it easier to discover tabular data in Search, which uses this same metadata along with the linked tabular data to provide answers to queries directly in search results. While that initiative focused more on news organizations and data journalists, Dataset search can be useful to a much broader audience, whether you’re looking for scientific data, government data, or data provided by news organizations.”
Although a new tool, Dataset Search already works in multiple languages and Google announced that support for additional languages is coming soon.