Our work on crawling the Deep Web has received some attention over the last few days. It started with a post on Google's Webmaster blog. Judging by the number of in-links to the blog (see the bottom the page) and the several news articles that picked it up, there were quite a few reactions on the blogosphere and beyond.
Matt Cutts, Google's main interface to web masters gives a nice explanation of why this work is useful to site owners. Anand Rajaraman details some of the history behind the technology that led to this work.
In summary, a nice example of research on data management having impact on the Web.
Monday, April 14, 2008
Subscribe to:
Post Comments (Atom)
1 comment:
A new web community based method for extracting and identifying seeds sets for crawling has been developed by Daneshpajouh et al. If you are interested, you can find more info in their paper:
A Fast Community Based Algorithm for Generating Crawler Seeds Set, (WEBIST-2008), Funchal, Portugal, May 2008.
Post a Comment