Olivier Elemento’s weblog

Olivier’s science weblog

Mining the Deep Web February 23, 2009

Filed under: Uncategorized — oelemento @ 8:49 pm

There’s a pretty interesting article in the NY Times about the Deep Web, that is, the data that is stored in databases and available through web interfaces. The article mentions some of the strategies that scientists (and web search companies) use to mine the Deep Web. Essentially, these strategies involve making a first few queries in order to guess the type and structure of the data contained in a given database, then either building a model of the data or making many more targeted queries in order to essentially map out the content of the database. This type of research is particularly interesting for us biologists, since we use many such databases (pubmed, genome browsers, database of gene expression, etc), and these databases are not at all connected with each other. Clearly, tools that automatically query and integrate data from the Deep Web would be very useful for us.

http://www.nytimes.com/2009/02/23/technology/internet/23search.html

 

Leave a Reply