Monday, June 25, 2007


I just returned from SIGMOD 2007 in Beijing. This was only the second time SIGMOD has gone out of North America, and it was quite a success. The organization, led by Profs. Zhou Lizhu and Tok Wang Ling and the technical program, chaired by Beng-Chin Ooi from the National University of Singapore, were excellent. It’s hard to beat the Summer Palace as a location for a conference banquet.

I wish I could give a technical summary of all the innovations reported at the conference. Instead, I’ll comment about a conclusion I drew from the three keynotes and give a quick plug for my own paper (this is my blog, after all).

SIGMOD featured 3 excellent keynotes from esteemed members of our community. Phil Bernstein from Microsoft Research talked about the progress made on Model Management, a research agenda he started 7 years ago, and on the challenges ahead. H.V. Jagadish from the University of Michigan talked about how to make databases usable, identifying some of the pain points and research challenges. Finally, Gerhard Weikum from the Max Plank Institute in Saarbruecken, Germany, talked about research combining databases and information retrieval. Each of the wrote a paper for the proceedings that is very worth reading.

I found a couple of common themes among the keynotes. First, they are all pushing the community in important and non-traditional directions. I found that extremely heartening. Second, I think the three keynotes, each from a different angle, support the claim that we need a much better understanding of users and their pain points when they work with structured data. That's a very touchy subject for database folks, who are used to spending their time 'under the hood'.

In Jagadish’s case, usability was the subject of the talk, and hence understanding users is crucial. In Phil’s case, he gave the example of generating schema mappings (mappings between disparate databases), and he was trying to get at what the pain points may be there (he argued that in the contexts he’s been considering, producing schema matches, typically the first step in mapping generation, is not longer the bottleneck). In Gerhard’s case, the question that comes up is what is the right answer when we combine DB&IR in a single system, i.e., what is the real user need. In a DB system, the semantics of the query clearly dictate the answer, but when you combine structured and unstructured data, it's no longer clear what the ranking criteria should be.

The point I’m trying to make is the following. As a community, we need to study user needs as they work with structured data, whether they are creating data, trying to understand existing data, formulating queries or creating mappings. Importantly, we need to keep in mind that a user’s task is rarely just to get an answer from a structured database. Users are typically working with both structured and unstructured data, and their tasks are broader than a single query. A useful interaction with a system is one that brings them closer to completing their task (I know, this his fuzzy, but that's why it's research).

It is tempting to push these problems to the HCI community, but I would argue this is a mistake. These problems will not be high enough on the agenda of the HCI community (there, if your device doesn’t move or perform magic, it’s uninteresting), whereas for us they are crucial for identifying good research directions and evaluating them. As a community, we need to find a way to encourage research on usability and to learn from the HCI community how to evaluate such research. We need to bring this agenda squarely into our conferences.

I'm not the only one to touch on this topic, and we're not the only community to see this need. A recent report titled “The Landscape of Parallel Computing Research: A View from Berkeley”, argues a similar point about developing novel programming models. I thin visualization is an important component of this research agenda (see Anant Jhingran’s blog post about this very point), and see Laura Haas’ ICDE excellent recent keynote and paper for a very nice articulation of this argument in the context of data integration.

My student Luna Dong presented our paper on indexing dataspaces. This paper has the distinction of being the first technical paper I published with dataspaces in its title. The paper describes a set of indexing methods that enable efficient querying of a collection of loosely coupled data sources (i.e., we do not have semantic mappings between them). Because of the nature of dataspaces, the queries we support enable the users to specify structure when she knows it, and keywords to complement the structure (we call them predicate queries and association queries). The basic idea underlying the solution is to extend the technique of inverted lists from IR to incorporate information about the structure of the data. Importantly, the technique also incorporates hierarchies in the data, and therefore it enables uniform querying data sets that have different underlying structures. Luna performed a series of experiments showing the benefits of our approach, and comparing them to techniques for indexing XML that are the closest contenders to address these problems.

Saturday, June 23, 2007

Tibet -- A Feast for the Eyes

Oriana and I just returned from an 8-day trip to Tibet. You can see a selection of pictures here, and the story below.

In the days before our departure, all I seemed to hear were stories about people who got very sick in Tibet because of the high altitude. In fact, in all the stories, the person who got really sick and had to be flown out was always described as “young, in his forties, and in otherwise very good shape”. I decided to ignore the stories and think positively.

Getting off the plane in Lhasa airport we felt light-headed, like waking up from with a hangover. Had to walk very slowly to the baggage claim (but still able to see the big sign with my full name held up by our guide to be). We spent the first day in the hotel, trying not to move and let the body adjust. (It’s recommended to come to Tibet clean because they advise you not to take a shower the first night as part of the acclimation program.)

By the second day, we were able to walk and even climb the Potala, and by the third day even the headache went away. It took our tube of toothpaste about the same time to get used to the high altitude.

We spent the second day in Lhasa, going to the Jokhang (the most revered religious structure in Tibet), and the Barkhor, a lively market surrounding it. Already there, we experienced first hand the devout nature of the Tibetan people. While there were some tourists, we were overrun by locals making their way into the temple to make their offerings. In other parts of the world I’ve seen little old ladies pushing their way to a bus; here they were pushing their way to the Buddhas to offer anything from barley flour and yak butter to coca cola.

We then went to the Potala, Lhasa’s best known structure, the seat of the Dalai Lamas. It indeed was as impressive as I imagined it to be (yes, from the movies). The night views of the Potala were especially impressive.

The most striking thing about Tibet (and the reason I’ve been talking about going for the last so many years) is the prevalence of color everywhere. It starts from the prayer flags erected on most houses, on bridges and peaks of mountain passes. The colors of the Tibetan clothing are wonderful. The window and door treatments, even in the poorest places are simply mind boggling. Even after a few days in Tibet I was simply amazed and happily taking it in as we drove throughout Tibet.

And drove we did (in the backseat of a Toyota Land Cruiser). Distances in Tibet are significant, and it’s not unusual to get stuck behind an army convoy or have to wait on the roadside for a high ranking official convoy to pass by. The drivers were actually reasonably careful (apparently, traffic fines are high enough). There are mileposts everywhere, so even though you’re often in the middle of nowhere, you can be quantitative about it.

The third day we drove to Shigatse, 260km west of Lhasa, home to the Tashilhunpo monastery, the seat of the Panchen Lamas for many years. (Supposedly, the relationship between the Dalai and Panchen lamas is like the sun and the moon, but there is more to the story than that). The Tashilhunpo was the most impressive monastery we saw and was full of (religious) life. On the way to Shigatse we visited Yamdrok Lake at 5000m (just when we thought we adjusted to the altitude…)

After the Tashilhunpo we realized there we cannot possibly be impressed by another monastery. We had also spoken to a few other tourists who were on their way to the Himalayas. However, changing plans in mid-trip was nearly impossible. Nevertheless, Oriana started a long and drawn out negotiation with our guide, driver and tourist agency that seemed about as complicated as a typical M&A deal she negotiates in her day job.

As the negotiations proceeded, we drove to Tsetang (east of Lhasa), the cradle of Tibetan civilization (and home to the nicest hotel & breakfast we had). After a 40km drive on a very bumpy and windy road we arrived at Tibet’s first monastery, the Samye monastery. We were proven wrong; were blown away by Samye as well.

The next morning, with the help of a promise of a tip, the negotiations came to a close and we started driving westward towards the Himalayas. We started at 7am, and drove for 12 hours through multiple mountain passes, very rural areas of Tibet and a couple of other hurdles that I was advised not to blog about. At 7pm, we were standing at 5000m elevation, looking at Mount Everest and its sibling peaks (Makalu, Lohtse, and Cho Oyu). The scene was definitely worth the drive, though perhaps one can argue that the sight of Rainier from Seattle is about as impressive. Naturally, the only other tourists with us at that vista point were a bunch of young Israelis. (Except for multiple groups of Israelis, we saw some French, some Americans and at some points, what seemed like the entire nation of Japan). At 7pm we started a 300km drive back to Shigatse, the closest place with a reasonable hotel.

Driving through Tibet we noticed that it probably has the highest per-capita number of pool tables. Pool tables were adorning the sides of the roads in the most rural villages. In some cases, they were actually used for playing pool, and in others, as stands for decorative items. Our guide didn’t offer a convincing explanation of the phenomenon. Cell phone reception, even in remote parts of Tibet, was typically better than in my home or office. That enabled me to read (and mostly ignore!) my email.

From the culinary perspective, once you get over your craving for yak meet, yak milk, yak butter, your main choices are Nepalese, Indian and Chinese food. I did manage to find a descent cappuccino in Lhasa!

I will not make any political comments on Tibet here, but will mention one anecdote. Apparently, in the past few months, two groups of American students went to Everest base camp and staged pro Tibetan independence demonstrations. As a consequence, except for making it harder for others to reach there, their innocent Tibetan guides and drivers were put into prison for 5 years. So if you’re going to make political statements, make sure you understand the local dynamics before you put your friends at risk.

In summary, Tibet is a wonderful place to visit. Although it is modernizing very rapidly, you can still see the old, especially if you head out of Lhasa. Be prepared for long drives a few dirty toilets here and there. But make sure you bring a good camera (thanks Pandu for the great recommendation!) and take it all in! I’m sure that of all people, my mother-in-law Helen is the happiest we made the trip because now she doesn’t have to hear me talking about going there anymore.

Saturday, June 2, 2007

SigTube: 5 Minute Presentations of Conference Papers

The success of YouTube and the like have proven a very simple point: a 5-minute video is a very effective mode of communication. People love to see stuff in 5-minute nuggets (or less). I'm suggesting we learn from this observation for better dissemination of scientific results.

I'm proposing that along with every paper published in conference proceedings, we also create a 5-minute video presenting the highlights of the paper, and make the presentations available on the web for free. I'm calling this SigTube (mostly to encourage people to come up with a better name).

A 5-minute presentation (done well) can give quite a bit of information and insight about a publication, certainly more than the 100-word abstract or the paper's introduction. I know I would love to sit through a bunch of these from time to time and learn more about what's going on in my field, even in areas that are farther away from my main interest areas (in fact, probably mostly in such areas). A video also captures the enthusiasm and emphases of the speaker (not to speak of the fact that it preserves their youth for eternity!)

With today's infrastructure and technology, this is pretty much trivial to do. A conference can dedicate a person with a video camera who will film the videos during the conference. Alternatively, some authors may prefer to film the videos on their own and send them in.

Any takers?