Monday, June 29, 2009

Fusion Tables: The third piece of the puzzle

When I joined Google in 2005, the goal of my group was to explore the different aspects of structured data and the Web. The first and most burning need was to address the deep web, the collection of databases stored behind forms and invisible to search engines. We developed a completely automated system that has crawled millions of forms in over 50 languages and hundreds of domains. The system surfaces pages from the deep web by guessing good queries that can be posed on the forms, and inserting the resulting HTML pages into the Google index. These pages are shown in the top-10 results for over 1000 queries per second. For all the details, see the VLDB 2008 paper by Madhavan et al.

In a second project, we explored the collection of tables that are already on the surface web. We found over 150 million high-quality tables and developed a search engine for tables (see the VLDB 2008 paper by Cafarella et al. for the details). We also showed how to leverage 2.5 million table schemas that were part of this collection. This collection is now available to the research community.

On June 9th, we launched Fusion Tables, that represents the third piece of the puzzle of structured data and the web. The main goal of Fusion Tables is to make it easier for people to create, manage and share on structured data on the Web. Fusion Tables is a new kind of data management system that focuses on features that enable collaboration. We started with a relatively small set of features, but we’re rapidly expanding them, keeping our users’ requests as our top priority.

You can read the official announcement of Fusion Tables, and a great example of how it is used for data collection in the domain of water. In a nutshell, Fusion Tables enables you to upload tabular data (up to 100MB per table) from spreadsheets and CSV files. You can filter and aggregate the data and visualize it in several ways, such as maps and time lines. The system will try to recognize columns that represent geographical locations and suggest appropriate visualizations.

To collaborate, you can share a table with a select set of collaborators or make it public. One of the reasons to collaborate is to enable fusing data from multiple tables, which is a simple yet powerful form of data integration. If you have a table about water resources in the countries of the world, and I have data about the incidence of malaria in various countries, we can fuse our data on the country column, and see our data side by side. Importantly, we can do this while maintaining complete control of our own data.

Collaboration is not only about integration. Once the data is visible side by side, we may want to discuss it to understand it better or resolve conflicts. With Fusion Tables you can discuss data at multiple levels of granularity: rows, columns and individual cells. Hence, the data and the discussions are deeply integrated (or should I say, fused?)

Given our focus on collaboration, there are a lot of things we do not do (and we're pretty honest about it!). We do not support complex SQL queries or high throughput transactions. Despite our love for query optimization, we’ve implemented very little of it in the current system. We will, of course, add to these capabilities with time, but our real goal here is to explore data management for a broader audience of users and needs.

Please try it out and send us feedback! Our top priority now is to respond to our users' needs.

Monday, March 9, 2009

Coffee: a Competitive Sport

I went up to Portland, Oregon last week to attend the 2009 United States Barista Championship. No, I was not competing, and unfortunately, not one of the judges either.
You can see all my pictures here.
Portland, by the way, has the largest number of cafes per-capita in the US and has a very active and sophisticated coffee scene (I'm sure you appreciate how hard it is for an ex-Seattle resident to admit this). If you're in town, check out Stumptown Coffee.

The competitors came from all over the country, including the expected share of west-coast baristas and even a guy applying all the charm of a cowboy into his espresso drinks. There were quite a few baristas from Intelligetsia Coffee & Tea, including 4 out of the 6 finalists and the champion, Mike Phillips.

So how do these folks compete? They basically put on a show for 15 minutes, which initially can be quite deceiving because they look incredibly relaxed. The show includes their choice of background music and often some accents in their clothing. In those 15 minutes, they need to prepare espressos, cappuccinos and their "signature drink". They prepare 4 of each, for each of the sensory judges.

All this while, the competitors need to show deep knowledge of their coffee, beginning by explaining their choice of blend, and how each of the flavors comes out in the drink. As they prepare the drinks, they are closely watched by a couple of technical judges, who are watching for every little detail of handling the espresso machine, waste management and timing. Multiple video cameras are following them very closely as they do this, and every now and then the emcee will elicit a cheer from the crowd (let's have it for Mike's first 2 espressos!). If you want to see an example of a wonderful performance, watch the performance of Stephen Morrissey from Ireland when he won the 2008 World Championship.

It was a fascinating crowd from all walks of the coffee industry. There were many spectators in the bleachers, some were huge coffee fans and others who wondered how exactly they got there, but were having a great time anyway. And of course, there was an amazing buzz on the floor around each of the competitors' bars -- after all, everyone in the room was caffeinated...

Finally, the awesomeness of the experience came out most poignantly when I was having a conversation with one of the other attendees and I mentioned to him that I work for Google. He asked: what part of Google do you work for? Food services?

Sunday, January 11, 2009

A Report on Healthcare and Information Technology

I served on a committee of the National Research Council that studied the challenges posed to Computer Science (and computing in general) in the area of healthcare. The report was just released and can be found here.

Needless to say, healthcare is a fascinating area, and progress will require not only collaboration among multiple disciplines (within computer science and others), but also paying attention to the workflow and constraints of the industry itself (doctors' work habits, the way insurance works, or doesn't work, hospitals as businesses, etc).

When reading the report, keep in mind that the goal of the committee was not to analyze what is wrong with the industry right now, but rather articulate the scientific challenges we should be addressing. Gio Wiederhold and Susan Davidson were the other database folks on the committee, which also had experts from other fields of computer science, bio infomratics, and from the medical estblishment.

Saturday, January 3, 2009

A Trip to Vietnam

I just got back from a great trip to Vietnam! (see pictures). Let me first introduce my travel buddy, since he made the trip what it was.

Anhai Doan is a former Ph.D student of mine (he makes sure I emphasize the 'former' part, and adds it himself when I accidentaly forget). Anhai grew up in Vietnam and left after highschool to do his college studies in Hungary, and then went to the U.S for graduate studies, where he ultimately ended up at the University of Washington. He is now an associate professor at the University of Wisconsin, Madison. Anhai is the most well known computer scientist from Vietnam (winning the ACM Distinguished Dissertation Award in 2003 made quite a few waves in Vietnam).

I've been planning to tour Vietnam with Anhai for quite a while now, and a few years ago even decided to tell him about this plan. That turned out to be a great decision. Anhai applied his magic at every step of the way, whether it was whispering the right words in peoples' ears, slipping a well deserved tip at the right time, or shielding me with his body until I was authorized to cross a Vietnamese street on my own. Think of a combination of a (junior, Vietnamese) Godfather-type figure with the potential of some day being a Jewish mother. Since Anhai left right after highschool, he did not see much of the country, and this opportunity gave him the chance to do so.

Anhai does have a few more white hairs than he had before the trip and I take responsibility for that (but I think it's a fair trade for a signature on one's Ph.D dissertation).

We started out in Hanoi -- a city with very strong character with its Old Town full of specialty shops, French Quarter with a very appropriate feel (including the bakeries!), and the promenade around Hoan Kiem Lake at the center. In the middle of Old Town, we found the Green Tangerine Restaurant that was absolutely wonderful (French with Vietnamese influence).

We then went on the mandatory (but very worthwhile) trip to Halong Bay with its many small peaks. We spent a night on a boat there (with a bunch of Australians, the latest invadors to Vietnam). That night was a soccer game (first of two matches) between Vietnam and Thailand, the great rivalry of Southwest Asia. Surprisingly, Vietnam won, which meant the boat crew was ecstatic, and with it being Xmas eve, they started pouring free drinks (accompanied by dried squid from the bay). The second half of the soccer story occurs while we're in Saigon. The next morning, while kayaking in the bay, Anhai and I discovered that kayaking is a team sport, and we have some work to do on that front.

We then flew to Nha Trang, a beach town/resort. It was rainy there, but we still had a good time hanging out in the cafes and even had a schnitzel (I don't think Anhai will ever forgive me for that cultural experience). From there we drove (i.e., sat in the back of a car) to Da Lat, a beautiful city nestled in the mountains, and a resort for Saigonians who need to escape the heat. Also a city with a nice lake in the middle and an amazing variety of flowers.

In Da Lat I was introduced to the Vietnamese 'custom' of serving complimentary tea even when you order coffee. Speaking of coffee, it was a mixed experience. At times, Anhai managed to explain to baristas how to make my macchiatones, and in others we drank Vietnamese coffee.

From Da Lat we drove to Saigon, a much more business-like city than Hanoi and more steeped in history of the American War. The night we got there was the second of the two soccer games, and the result was a draw, meaning that Vietnam won the Suzuki Cup.

Within minutes of the game's end, the streets were flooded with happy Vietnamese. And I mean happy! Literally, you could not move amongst the people and their mopeds. Anhai got me a little Vietnamese flag and we started making our way through the crowds. Being a foreigner and on the tall side by Vietnamese standards, I drew quite a bit of attention. Every time I waved my little flag, I drew cheers and smiles -- almost as if I was the one who scored the winning goal! It was a totally amazing experience!

From Saigon we took a two-day trip to the Mekong Delta, which has now been added to my list of candidate places to retire. Imagine your life when all you need to do for lunch is go to your back yard and fish for a few minutes or pick some fresh fruit from a tree (yes, they have wifi everywhere there too!). It was really fascinating to see how life is arranged when water is such an integral part of your landscape (the delta is actually made of 9 different strands of the Mekong River).

We celebrated New Year's Eve with Anhai's highschool friends in Saigon. That was a great chance to get a glimpse of the life of young professionals in Vietnam, which brought home the point that this country has an amazing future, judging by its people's character and how far they've come in the last 20 years. I'm already thinking of my next trip there!