Last week I hosted the OpenII kickoff workshop at Google. We had representatives from several companies: IBM, Microsoft, Yahoo, MITRE, Google, one guy who was supposed to represent Oracle but decided to be a professor again, and a couple of professors.
The goal of OpenII, as the name implies, is to create an open-source set of tools for information integration. The tool set will include, among others, wrappers for common data sources, tools for creating matches and mappings between disparate schemas, a tool for searching a collection of schemas, and run-time tools for processing queries over heterogeneous data sets.
The main goal of the effort is to foster innovation in the field of information integration and create tools that are usable for a wide range of applications.
In research, we often innovate on a specific aspect of information integration, but then spend much our time building (and rebuilding) other components that we need in order to validate our contributions. Having a set of open-source tools will enable us to focus on our innovations and perform more meaningful comparisons between our methods.
On the applications side, information integration comes in many flavors, and therefore it is hard for commercial products to serve all the needs. Our goal is to create tools that can be applied in a variety of architectural contexts (e.g., materializing all the data in one repository vs. leaving the data in the sources and accessing it only at query time). In addition, many of the tools (e.g., schema matchers or dedup engines) often need to be extended for the particular domain in hand to fully leverage domain knowledge. Open source tools allow application developers to do exactly that.
You'll be hearing more about this project as we make progress. If you would like to contribute to it, please contact me!