I wish I could give a technical summary of all the innovations reported at the conference. Instead, I’ll comment about a conclusion I drew from the three keynotes and give a quick plug for my own paper (this is my blog, after all).
SIGMOD featured 3 excellent keynotes from esteemed members of our community. Phil Bernstein from Microsoft Research talked about the progress made on Model Management, a research agenda he started 7 years ago, and on the challenges ahead. H.V. Jagadish from the
I found a couple of common themes among the keynotes. First, they are all pushing the community in important and non-traditional directions. I found that extremely heartening. Second, I think the three keynotes, each from a different angle, support the claim that we need a much better understanding of users and their pain points when they work with structured data. That's a very touchy subject for database folks, who are used to spending their time 'under the hood'.
In Jagadish’s case, usability was the subject of the talk, and hence understanding users is crucial. In Phil’s case, he gave the example of generating schema mappings (mappings between disparate databases), and he was trying to get at what the pain points may be there (he argued that in the contexts he’s been considering, producing schema matches, typically the first step in mapping generation, is not longer the bottleneck). In Gerhard’s case, the question that comes up is what is the right answer when we combine DB&IR in a single system, i.e., what is the real user need. In a DB system, the semantics of the query clearly dictate the answer, but when you combine structured and unstructured data, it's no longer clear what the ranking criteria should be.
The point I’m trying to make is the following. As a community, we need to study user needs as they work with structured data, whether they are creating data, trying to understand existing data, formulating queries or creating mappings. Importantly, we need to keep in mind that a user’s task is rarely just to get an answer from a structured database. Users are typically working with both structured and unstructured data, and their tasks are broader than a single query. A useful interaction with a system is one that brings them closer to completing their task (I know, this his fuzzy, but that's why it's research).
It is tempting to push these problems to the HCI community, but I would argue this is a mistake. These problems will not be high enough on the agenda of the HCI community (there, if your device doesn’t move or perform magic, it’s uninteresting), whereas for us they are crucial for identifying good research directions and evaluating them. As a community, we need to find a way to encourage research on usability and to learn from the HCI community how to evaluate such research. We need to bring this agenda squarely into our conferences.
I'm not the only one to touch on this topic, and we're not the only community to see this need. A recent report titled “The Landscape of Parallel Computing Research: A View from
My student Luna Dong presented our paper on indexing dataspaces. This paper has the distinction of being the first technical paper I published with dataspaces in its title. The paper describes a set of indexing methods that enable efficient querying of a collection of loosely coupled data sources (i.e., we do not have semantic mappings between them). Because of the nature of dataspaces, the queries we support enable the users to specify structure when she knows it, and keywords to complement the structure (we call them predicate queries and association queries). The basic idea underlying the solution is to extend the technique of inverted lists from IR to incorporate information about the structure of the data. Importantly, the technique also incorporates hierarchies in the data, and therefore it enables uniform querying data sets that have different underlying structures. Luna performed a series of experiments showing the benefits of our approach, and comparing them to techniques for indexing XML that are the closest contenders to address these problems.