Friday, December 28, 2007
My Dad is 80!
My dad turned 80 this month, and we celebrated the event with a workshop and reception in his honor at the Weizmann Institute of Science, in Rehovot, Israel. The full set of pictures from the event can be seen here.
My dad is a professor of Chemistry at the Weizmann Institute. After fighting in the Israeli War of Independence, he was finally able to focus on his studies. He completed his Ph.D in a little less than 2 years(!!) in 1955 at Syracuse University in New York (fortunately, because that's where he met my mom). When he's asked how he did that, he simply shows the picture below.
He still doesn't understand why it took me an entire 5 years to do my Ph.D, and worse, in a field that uses the term 'science' in a questionable fashion. (When I got promoted to full professor he finally figured I might be doing something right).
My dad's main claim to fame is a 1-page article he wrote during his post-doc. The following makes the point better than I can - it's a quote from Krzysztof Matyjaszewski (a CMU professor) and Axel Muller (professor at the U. of Beyreuth, Germany) in their foreword to the December 2006 special issue of the Journal of Progress in Polymer Science on "50 Years of Living Polymerization":
On June 5, 1956, Michael Szwarc, together with Moshe Levy and Ralph Milkovich pubished an article entitled "Polymerization initiated by electron transfer to monomer - A new method for formation of block copolymers", J Am Chem Soc (1956), 2656.
In this article the term "living polymer" appeared for the first time. It caused a revolution in polymer science.
Michael Szwarc (who was also my dad's Ph.D adviser) received the Kyoto Prize for this work in 1991.
He has worked in many areas over the years, but since the mid-80's my dad has been one of the pioneers in solar energy research, studying methods for chemical storage of solar energy so it can be used any time and transported to less sunny locations. In fact, he published two papers on using solar energy for chemical transformations this year! As a befitting token or recognition, he received an awesome Google solar t-shirt...
And he definitely needs the t-shirt. He still gets up every morning at 6am to either play tennis, or go for a run & workout, which includes running up 15 flights of stairs in the solar tower at the institute!
It was a great event, and a wonderful chance to see many of my dad's colleagues throughout his career, some of whom I had not seen since I was a kid. It was also the first full gathering of all the family's grandchildren.
Sunday, November 11, 2007
Dataspaces for Veterans
I recently had the opportunity to visit the Veteran's Administration Hospital in Washington DC and learn first-hand about their patient-record system. I was pleased to see the principles of dataspaces in action, clearly enabling better healthcare services.
The VA provides services to veterans of the American military and has around 150 hospitals, 800 clinics and 200 nursing centers scattered around the country. To support these services, the VA maintains electronic records for all their patients, a system that has won them many accolades in recent years. The system stores the patients' prescriptions, doctor visits, lab tests and other data about each patient. As their patients often move around and receive treatment in various locations, when a doctor views the data about a patient, it needs to be integrated from multiple VA locations. Each of these locations is running their own system. In addition, data about their patients may reside in systems of the Department of Defense (and their healthcare providers) and various drugstore chains.
Clearly, this is an incredible data integration problem. Today they are aware of at least 130 different "implementations" of their electronic record system, i.e., different schemas. Also, given the different local needs of hospitals and clinics, imposing a single schema on all the VA centers would not work. Using a data integration solution at this scale and in such a dynamic environment would be extremely difficult.
Instead, what the VA did is standardize on a very small subset of patients' attributes, namely attributes describing patients' vital signs. Outside of this set of attributes, hospitals are free to develop their own local data organizations. However, the system lets the healthcare providers see all the data even if it's not completely integrated. So for example, if a doctor wants to see what happened to a patient while they were at a remote location, then the remote data may appear as plain text, and therefore the doctor would have to work a little harder to digest it, and won't be able to pose the queries she could pose on the local data. But being able to see the data in some form is infinitely better than not seeing it at all, and the doctors are extremely happy with the system's capabilities.
The VA also demonstrated two examples of the pay-as-you-go principle that is at the foundation of dataspaces. The first was the fact that they decided that vital signs are critical, so their data sources are aligned on the attributes relating to those (effectively, creating semantic mappings involving the attributes of vital signs), and they plan to continue agreeing on terminology as they see fit. Second, they had a culture that allowed for local innovation, class-3 applications, that represented needs at the local level. When these needs were perceived to be important throughout the organization, they promoted them to class-1 applications, and required all their systems to support them.
Just to make it clear, when I walked in the door they did not greet me and say: "Pleased to see you Dr. Halevy; we'd love to show you our dataspace system". What I'm describing is a post-rationalization of a system that was developed over more than a decade. I believe that their loose integration was the key to their success.
Wednesday, October 24, 2007
A Murder Mystery with a Twist
Will Ian Michaels be indicted and spend the rest of his life in jail? Or perhaps he will be promoted to that COO job he's been eying for a while, or even leave his company and start his own? And in the process, how many eligible (or non-eligible) women will try to seduce him?
Read the book and find out. Not bad for an author who used to be a high-tech guy himself.
Saturday, October 20, 2007
A Trip to the Amazon
We arrived at the docks and I was handed off to an unnamed man with a motorboat. I was told that it would take 20 minutes to get to the EcoPark. As I was sitting there, sailing in pitch dark (with only the moon in a very southern-hemisphere position to provide a bit of light), the sensation of adventure started sinking in. The boater turned on his flashlight every now and then to see where we are, and surprisingly, 20 minutes later, we arrived at a lodge, and I met Antonio, who would be my guide for the stay. After a welcoming drink and a short hike in the forest to my cabin, I plugged in my cell phone and went to sleep.
In the morning, after a yummy Brazilian breakfast, I went on a forest tour with Antonio and John, and American fellow who is 10 years into his retirement (he’s 48 now) and whose travel plans for the next couple of months made even my head spin. In the forest, we got to see an original cinnamon tree, a tree from which they produce “aborto”, which, as the name implies, is used to abort pregnancies (and is also a useful post-hangover medication as well). I got to be tarzan for a photo and see Antonio make gunpowder from scratch. Really. As we were finishing our forest walk it started raining (ah, get it? RAINforest, in this case pourForest would be more appropriate), and we were thankful it didn’t start earlier.
After lunch, I was introduced to cashew trees (yes, apparently they grow on trees, not at Costco). The barman took the cashew fruit and made a nice drink out of it. Later we went to an area with a bunch of monkeys playing about, including one with a red head and one that was called cappuccino monkey (if you’d see it, you would understand why). We were taken to see a few folks from the local Indian tribes (and ended up in funny costumes, doing their dance), and then spent an hour fishing on the river. I even managed to catch a catfish that I threw back into the water once the photo-op was over. Sitting on the fishing boat was incredibly tranquil, with sounds of the toucans flying about (and the news of the latest Google earnings report coming in on my cell).
I had dinner with a retired Swedish couple in their late 50’s. The husband ran an international company providing interior design services for cruise ships. He admitted that when a good friend of his came to him many years ago with sketches describing his idea for an ice hotel, he told his friend that he was crazy. Fortunately, the friend ignored his advice and did it anyway. (Oriana and I got engaged in that ice hotel in 2000).
The night activity involved a canoe trip on the river, looking for caimans and listening to the night sounds. We saw only the eyes of the caimans from afar, caught a couple of turtles, and heard many frogs. When we returned to the lodge, Antonio thought he heard lightening, but immediately corrected himself – they were only fireworks. Why fireworks, I asked? Because there was a soccer game in Rio between two teams from Rio, and the people in Manaus were very happy with the result. And it wasn’t even a terribly important game. But that’s Brazil for you! Celebrating a soccer victory of a team from a city thousands of miles away in a relatively unimportant game, is still good enough reason for fireworks.
The next morning, before going to the airport, we managed to squeeze in a speedboat trip to the meeting of the waters – the place where the Rio Negro and Rio Solimoes come together to create the Amazon River (that then flows to the Atlantic ocean). It was really fascinating to see the two distinct waters (they’re different in color, temperature and PH).
Five hours later I was in Miami, and six hour after that in the bay area. What a transformation!
SBBD 2007
The conference was held in Joao Pessoa (read: John person; apparently he was a poet), which is in the eastern-most point in South (or North) America, the closest place to start a swim to Africa. The venue was a tropical beach resort. I'm not sure when the residents of Joao Pessoa sleep. They seemed to be dancing all night, and at 5am the road along the beach was closed till 8am so people can exercise peacefully. I guess they need all the dance and exercise to burn off the calories from their great food.
Many thanks to Altigran Silva for inviting me and being a wonderful host, to Juliana Freire for all her incredible help and company, and to all my Brazilian friends for their great welcome (yes, my Orkut network has grown exponentially as a result of the trip!)
Monday, October 15, 2007
A Geography Quiz
You were wrong.
The answer is: Brazil 7, China 1. (Sort of what you'd expect if they played soccer against each other).
I was in China earlier this year and went as far west as Tibet (parts of which are about as far from Beijing/Shanghai as San Francisco is from New York). Still, a single timezone throughout.
In Brazil the situation is much more complicated, as the country is challenged both in longitude and latitude. Ordinarily, Brazil spans 4 time zones. However, now they just moved to daylight saving's time.
If you live close to the equator, then the concept of seasons is rather abstract, so DST makes no sense and the northerners ignore it. As a result, the northern part of the country spans 4 time zones on standard time, and the southern part spans 3 time zones on daylight savings time (I realize this means there is an overlap between the timezones so if you're being strict there are only 4 or 5). But if you're flying around Brazil, as I'm doing now, there are 7 zones for you to grasp with.
I flew in to Rio on Saturday, a few hours before they moved to DST. On sunday morning, having successfully woken myself an hour early, I flew to Joao Pessoa, where they're not on DST. Needless to say, things are messed up a bit, but in Brazil you just drink a little more and relax.
Sunday, September 30, 2007
The Oxford Murders
A series of murders is happening in Oxford, tangentially involving people in the math department there. Supposedly, there is a mathematical series underlying the murders, and a couple of mathematicians are trying to figure it out before the series goes too far. A very nice and rather quick read.
VLDB 2007 Trip
Traveling in Europe is always fun. I find it much more relaxing to assume (with some loss of accuracy) that the Euro and the American dollar are about equal in value. It seems more affordable this way. I've never been to Vienna -- a very nice place to visit. After spending 5 days there, I had 5 schnitzels (a favorite childhood dish of mine) and a large but finite number of Viennese cakes & tortes. Fortunately, I did have the opportunity to go for a couple of jogs while there, so am still able to fit into my clothes.
In Aalborg I gave a course on data integration. This was the first time I gave lectures based on the first few chapters of the book I'm writing with Zack and AnHai. The energetic students in Aalborg helped me debug the slides and the presentation, and overall it was a great experience. Though I knew this before, the database group in Aalborg is a very strong one and doing some exciting work (I was initially surprised to see that all the rooms in the department were labeled DatalogI, but then I was told this means computer science in Danish). My host, Christian Jensen, was very kind, and after making sure I got my exercise, took me up the coast to some very charming towns.
VLDB was very interesting. Yet again, I was pleased to see a lot of work going on in the area of data integration, uncertain data, web, etc. I attended two excellent tutorials: Adaptive Query Processing by Zack Ives and Amol Deshpande, and on Probabilistic Graphical Models and their application to data management, by Sunita Sarawagi and ... Amol Deshpande.
The high point of the conference was no doubt the video shown during the 10-year best paper award talk by Surajit Chaudhuri and Vivek Narasayya. The extremely hilarious video showed Surajit giving a demo of Auto-Admin along side Bill Gates during Gates' keynote at SIGMOD 1998. To put it nicely, the video showed Surajit's ability to mask certain unexpected mishaps during the demo and make it all appear to go extremely smoothly.
A few words on DBclips. Everyone I speak to says it's an excellent idea. However, so far, very few people created them for their papers. Since Luna posted the DBclip on our paper less than 2 weeks ago, it's already received over 180 views. Need I say more?
There are simply too many interesting talks and other events going on in parallel during most conferences. There is no way anyone can make it to all the talks of interest to them. Sadly, even people with the best intentions will not have time to diligently go through the proceedings and read all the papers either. A DBclip is an excellent way to reach a wider audience of people. While it takes some effort, it's well worth it. In fact, during my course in Aalborg I showed our DBclip instead of lecturing on data integration with uncertainty. I imagine that these clips will be a useful teaching resource in many graduate courses. Fortunately, it's not too late. You can create your DBclip after the conference (in fact, there are advantages to doing it now).
Thursday, September 27, 2007
Web 2.0 Panel
Our panelists are Sihem Amer-Yahia, Gerhard Weikum, a Donald Kossmann lookalike, Volker Markl, Anhai Doan and myself.
Given that the topic is Web 2.0, we thought the audience should post comments and opinions throughout the panel. Feel free to say anything. We'll monitor the blog during the panel and highlight your comments.
Comment away!
Monday, September 17, 2007
DBclips
You can listen to my first DBClip, created for the paper I wrote with Luna Dong and Cong Yu, on Data Integration with Uncertainty (also an idea that's been mentioned in this blog).
I can't wait to hear others' contributions!
Tuesday, September 11, 2007
The Lives of Others
I will not even try to write a review of this movie. Just google for reviews.
Monday, August 13, 2007
Sergey's Story, in Chinese
My wife Oriana, in addition to being a lawyer, is quite a skilled English-Chinese translator (that's an understatement, believe me). She translated the article (under the obvious assumption that anything that is interesting to Jews would be interesting to the Chinese). And in fact, her translation was recently published in the Sunday Weekly Magazine editions of the SingTao Daily, the most widely circulated Chinese language newspaper in North America.
You can get the full translation here.
As with any Chinese-related content on this blog, the usual disclaimer applies: it's all Chinese to me.
Wednesday, August 8, 2007
Two Zodiacs Away
Generally, I'm a guy who feels pretty comfortable with his age. That's an important survival skill when you work for Google. Every now and then, however, I realize that even though I'm not feeling any older, time is passing.
Today the poignant moment was when I was standing and talking with my group members and interns (all very nice, organized a nice birthday celebration. Special thanks to Bijun who ordered a chocolate babka from Zabars in NYC!). I was chatting with my youngest intern (a graduate student) who mentioned she was also born on the year of the rabbit, like me.
Then there was a pause. It took us both a mere 5 seconds to realize that she's *2* zodiacs away from me. Not one. There is an entire unrepresented rabbit in between us.
While it should be said that she is the youngest graduate student I ever worked with, I still found it startling that I'm working with a rabbit two zodiacs away. I'm sure I'll get over it.
Wednesday, August 1, 2007
Notes from Ed Tufte
After several years of getting the brochures in the mail, and watching Tufte's book on envisioning information sit happily on my shelf, I decided to go for his 1-day course in SF and learn a thing or two about effective visualization.
If you want the full experience, I recommend sitting next to someone who worked on Microsoft Powerpoint at some point in their career. I did that, and it added quite a bit to the entertainment factor of the course. Tufte's powerpoint rant starts about 5 minutes into the course and he makes his last jab in his closing remarks. But more on that in a bit.
The course was interesting, even if it mostly gets you thinking about issues relating to effective visualization. I jotted down a few notes that I'm repeating here mostly so I don't forget next time I'm preparing a presentation. Most of them are obvious, but that doesn't mean they're not often forgotten.
- more detail in your presentation increases your credibility
- more detail does not necessarily imply clutter (if done right)
- annotate anything you can in a visualization. For example, annotate links (otherwise you're implying that they all mean the same)
- don't try to be too fancy. Focus on the content not on the design. For example, tables are a very effective, yet simple presentation. Order the rows in the table according to some performance measure you're trying to emphasize.
- don't focus on being original in your visualizations, focus on getting it right (don't innovate, steal).
- get users out of the decoding business (i.e., remove legends where possible)
The deeper point made in the course is that principles underlying creating effective visualizations mirror the principles that underly thinking processes. Hence, for example:
- Make and show comparisons between different aspects of the data
- Make sure causality of effects is emphasized in the presentation
- Build credibility -- make sure you show all the data rather than just cherry-picking what's convenient for you.
- Enable the audience to drill down and see more data.
- Integrate evidence from multiple sources (aha, a plug for data integration!)
- Always give the source of your data (yes, lineage, folks!)
Following the principle that good presentation should support critical audience thinkers, Tufte also points out what audience members should keep in mind as they listen to a presentation (surprisingly, reading your email on the blackberry is not one of them)
- What is their story?
- Can you believe them? Do they have a any conflicts of interest affecting their perspective? What's their track record? What's their reason for bias?
- What precisely does their argument apply to? What are its limits?
- What do I really need to know when I leave the room?
Ok, back to the powerpoint issue. I actually found myself a bit confused throughout the main point of his powerpoint rant, but I think I get it now. His basic point is that powerpoint forces you into a very low resolution presentation mode. He argues that people can read 3 times faster than you can talk. In addition, powerpoint encourages you to leave quite a bit of detail out and summarize everything in bullets. The human brain can take in much more than what you can convey with a powerpoint presentation. Hence, you're not really using your time with your audience very effectively, since there are better methods of conveying information that make much better use of the audience's mental capabilities. For example, he argues that you should come into a meeting with a 3-4 text summary of your points, have your audience read it, and then have a discussion and answer questions.
The latter suggestion makes it pretty clear when his methods are effective and when not. For example, it's a non starter for large audiences (e.g., conference presentations). On the other hand, there are cases where we do this by default (e.g., hiring meetings don't typically involve slide presentations). So, don't dump powerpoint just yet.
Interestingly, Tufte was not following his own advice very carefully during the day. I felt that the principles he espoused could have been communicated more efficiently (but then, perhaps he assumed that some of the audience also had blackberries to attend to).
Monday, June 25, 2007
SIGMOD 2007
I wish I could give a technical summary of all the innovations reported at the conference. Instead, I’ll comment about a conclusion I drew from the three keynotes and give a quick plug for my own paper (this is my blog, after all).
SIGMOD featured 3 excellent keynotes from esteemed members of our community. Phil Bernstein from Microsoft Research talked about the progress made on Model Management, a research agenda he started 7 years ago, and on the challenges ahead. H.V. Jagadish from the
I found a couple of common themes among the keynotes. First, they are all pushing the community in important and non-traditional directions. I found that extremely heartening. Second, I think the three keynotes, each from a different angle, support the claim that we need a much better understanding of users and their pain points when they work with structured data. That's a very touchy subject for database folks, who are used to spending their time 'under the hood'.
In Jagadish’s case, usability was the subject of the talk, and hence understanding users is crucial. In Phil’s case, he gave the example of generating schema mappings (mappings between disparate databases), and he was trying to get at what the pain points may be there (he argued that in the contexts he’s been considering, producing schema matches, typically the first step in mapping generation, is not longer the bottleneck). In Gerhard’s case, the question that comes up is what is the right answer when we combine DB&IR in a single system, i.e., what is the real user need. In a DB system, the semantics of the query clearly dictate the answer, but when you combine structured and unstructured data, it's no longer clear what the ranking criteria should be.
The point I’m trying to make is the following. As a community, we need to study user needs as they work with structured data, whether they are creating data, trying to understand existing data, formulating queries or creating mappings. Importantly, we need to keep in mind that a user’s task is rarely just to get an answer from a structured database. Users are typically working with both structured and unstructured data, and their tasks are broader than a single query. A useful interaction with a system is one that brings them closer to completing their task (I know, this his fuzzy, but that's why it's research).
It is tempting to push these problems to the HCI community, but I would argue this is a mistake. These problems will not be high enough on the agenda of the HCI community (there, if your device doesn’t move or perform magic, it’s uninteresting), whereas for us they are crucial for identifying good research directions and evaluating them. As a community, we need to find a way to encourage research on usability and to learn from the HCI community how to evaluate such research. We need to bring this agenda squarely into our conferences.
I'm not the only one to touch on this topic, and we're not the only community to see this need. A recent report titled “The Landscape of Parallel Computing Research: A View from
My student Luna Dong presented our paper on indexing dataspaces. This paper has the distinction of being the first technical paper I published with dataspaces in its title. The paper describes a set of indexing methods that enable efficient querying of a collection of loosely coupled data sources (i.e., we do not have semantic mappings between them). Because of the nature of dataspaces, the queries we support enable the users to specify structure when she knows it, and keywords to complement the structure (we call them predicate queries and association queries). The basic idea underlying the solution is to extend the technique of inverted lists from IR to incorporate information about the structure of the data. Importantly, the technique also incorporates hierarchies in the data, and therefore it enables uniform querying data sets that have different underlying structures. Luna performed a series of experiments showing the benefits of our approach, and comparing them to techniques for indexing XML that are the closest contenders to address these problems.
Saturday, June 23, 2007
Tibet -- A Feast for the Eyes
Oriana and I just returned from an 8-day trip to
In the days before our departure, all I seemed to hear were stories about people who got very sick in
Getting off the plane in
By the second day, we were able to walk and even climb the Potala, and by the third day even the headache went away. It took our tube of toothpaste about the same time to get used to the high altitude.
We spent the second day in
We then went to the Potala,
The most striking thing about
And drove we did (in the backseat of a Toyota Land Cruiser). Distances in
The third day we drove to Shigatse, 260km west of Lhasa, home to the Tashilhunpo monastery, the seat of the Panchen Lamas for many years. (Supposedly, the relationship between the Dalai and Panchen lamas is like the sun and the moon, but there is more to the story than that). The Tashilhunpo was the most impressive monastery we saw and was full of (religious) life. On the way to Shigatse we visited
After the Tashilhunpo we realized there we cannot possibly be impressed by another monastery. We had also spoken to a few other tourists who were on their way to the
As the negotiations proceeded, we drove to Tsetang (east of
The next morning, with the help of a promise of a tip, the negotiations came to a close and we started driving westward towards the
Driving through
From the culinary perspective, once you get over your craving for yak meet, yak milk, yak butter, your main choices are Nepalese, Indian and Chinese food. I did manage to find a descent cappuccino in
I will not make any political comments on
In summary,
Saturday, June 2, 2007
SigTube: 5 Minute Presentations of Conference Papers
I'm proposing that along with every paper published in conference proceedings, we also create a 5-minute video presenting the highlights of the paper, and make the presentations available on the web for free. I'm calling this SigTube (mostly to encourage people to come up with a better name).
A 5-minute presentation (done well) can give quite a bit of information and insight about a publication, certainly more than the 100-word abstract or the paper's introduction. I know I would love to sit through a bunch of these from time to time and learn more about what's going on in my field, even in areas that are farther away from my main interest areas (in fact, probably mostly in such areas). A video also captures the enthusiasm and emphases of the speaker (not to speak of the fact that it preserves their youth for eternity!)
With today's infrastructure and technology, this is pretty much trivial to do. A conference can dedicate a person with a video camera who will film the videos during the conference. Alternatively, some authors may prefer to film the videos on their own and send them in.
Any takers?
Sunday, May 27, 2007
Me and Web 2.0
As of yesterday, I uploaded my first video to YouTube. The video shows my daughter (6 y/o) dancing nicely, with my son (18 months) "accompanying" her. As you can see, my son has already attained my level of dancing ability (my daughter passed me a long time ago).
I decided to try out Google MyMaps for fun. I created a map of my life and travels (and had fun doing so). Take a look -- (with the subtle implied hint that I'm happy to be invited to places not yet marked on the map).
Finally, together with Sihem Amer-Yahia from Yahoo!, I'm organizing a panel at VLDB 2007 (Vienna, September) on "Web 2.0 and data management". We have an exciting lineup of panelists that includes Anant Jhingran from IBM (who also blogs furiously), Gerhard Weikum (Max Planck Institute in Germany), Donald Kossmann (ETH Zurich) and AnHai Doan (U. of Wisconsin, Madison). I'm sure I'll be saying more about this panel on this blog, so stay tuned.
Circle of Blue - Eloquent Version
This is a (perhaps very rare) opportunity to directly compare the writing skills of a guy who regularly writes for the New York Times with those of a guy whose readership includes mostly database professionals.
Saturday, May 12, 2007
Circle of Blue
We were all hosted on the Pine Hollow estate, which is an amazing 30+ room mansion on the shores of Lake Michigan, a bit north of Traverse City. The home was built by Leslie Lee, and includes every amenity imaginable to man combined with excellent taste in design.
So what we were we all doing there? This was basically a Circle of Blue powwow. Circle of Blue is a non-profit organization that is dedicated to raising the awareness of the public and policy makers to the diminishing supplies of clean and affordable fresh water. CoB tries to raise awareness through a combination of journalism, photography, film and data collection. Carl Ganter, the founder, is quite an amazing guy and among his other major accomplishments (e.g,. being a photographer for National Geographic) tells a great story of how, through a great work of photography and journalism, he (and others) were able to exonerate a wrongfully convicted father and reveal the real murderer in a case in Illinois. He and his wife Eileen conceived and planned Circle of Blue.
There is no way I can do justice to the entire discussion in a short blog post, nor can I fully convey the tenacity and passion of the people gathered. I will also skip the many details on the major water issues facing our planet (but I will point out that water is one of the few main foci of Google.org, the Google Foundation). Instead, I'll just highlight a few points I found interesting from my perspective.
Our discussion focused on how exactly to leverage tools and technology to raise awareness on water issues. The ideas discussed were all over the map. They ranged from creating blue rings that everyone would put on their faucets (following Lance Armstrong's yellow rings for cancer fighting), to using Web 2.0 tools such as blogs, Google My Maps, Flickr, etc. to help people all around the world to create databases of water-related issues, and to mobilizing the religious right to take up their issue in their congregrations.
In a sense, we were trying to figure out how to recreate the success of the green movement, but in blue. While there is much in common between the two global warming issues and water issues, there are also a few key differences between the two. First, in the case of green, there are some simple things everyone can do to help a global problem (e.g., buy a hybrid, go solar). In the case of water, aside from taking shorter showers and watering your garden more effectively, many of the major issues are of local nature and the problems and solutions vary quite a bit. Second, the people suffering from water shortages at this point are typically far away and that makes it hard for the issue to be on people's minds constantly. New Orleans is much closer to home.
The other interesting point about the discussions was how to combine traditional media like journalism, film and photography with newer technology to create viral awareness of the water issues. While it's great to have the high-quality polished artifacts created by these media, we also need the bottom-up YouTube-type videos and blogs created by a much broader and geographically distributed set of people, but with much less skill (myself included...) to really reach people's attention. We need to collect good data, but mostly, make sure the data is used in effective ways for highlighting the issues and garnering world-wide attention.
This was a highly inspiring meeting for me. If you have any ideas, don't hesitate to post a comment, send me email, or contact Carl Ganter directly. I'm sure this topic will reappear on this blog.
The Slash Effect
The main contribution of this book is to get you thinking. Slashers are people who have multiple parallel careers. Through numerous examples, the book claims that this is a growing phenomenon in today's culture, and describes the challenges, opportunities and benefits having to do with slash careers. The point that I found most interesting about all of the above is that slashing essentially gives you multiple identities in society. Think of what you answer at a party when people ask you what you are or do. Being a slasher means you can give multiple answers, or choose one you think best suits the situation. But more than that, slashing means you gain some internal balance in life, rather than being tied to one professional identity.
Marci gives examples of lawyers turned writers & coaches (including herself), a teacher with a modeling career, a computer programmer who also directs a theater, a lawyer who's also a Baptist minister, Sanjay Gupta, the CNN health correspondent who also does surgery a few times a month, and the list goes on and on. She discusses how people manage multiple careers, some of the cross-over benefits and life-style benefits they obtain, and she offers practical advice on how to become a slasher. The book essentially revolves around all these examples, and every chapter ends with the highlights of its main points (great for future reference).
Being a somewhat formal guy on occasion (perhaps one of my slashes?) I found myself looking for a definition of a slash. Marci seems to focus on aspects of life that are part of your career (it doesn't actually matter whether you derive much income from it, otherwise most of the poets and actors would not have made it into the book). But, for example, does a hobby count as a slash? Does it depend on how much time one spends on the hobby? In fact, many jobs are composed of multiple slashes (e.g., professors spend half of their time teaching, half their time doing research, and the other halves raising research funding and sitting on committees).
Clearly, parenting is the most common form of parallel activity adults engage in. The book contains a chapter on parenting and how parenting and slashing share many challenges. The book even claims that a slash life can prepare you better for parenting (though clearly, some of the slashes may take a back seat for a while).
However, my search for a formal definition of slashing is missing the point. As I stated at the outset, the point of this book is to make you think about all the aspects of your life whether they count as slashes or not. Personally, the most common slash combination I've encountered (and personally experienced) is the professor/entrepreneur combo, and I can speak at length about the benefits and challenges there.
Finally, one point that was not addressed in the book is multiple careers that happen in sequence, rather than in parallel. Perhaps I'll take the opportunity to coin a new term: the double backslash, (for those of you who haven't had the pleasure of using the Latex word processor, I should explain that a double backslash creates a new line in the text). I would think that slashing and double backslashing share many of the challenges and benefits.
In summary, this book is a rather quick read (you can skip parts, but pay attention to the boldfaced sentences). I found myself reflecting on my slash/double blackslash riddled career and wondering what other slashes may come into my life at some point.
Sunday, April 29, 2007
Chinese poetry recital
This post is a few weeks late, but the parental pride is not diminished at all. My daughter Karina participated in the Northern California championship for Chinese poetry recital. This competition is organized by the association of Chinese schools of Northern California that includes over 70 schools and includes many categories (remember: this is all Chinese, so I'm a bit sketchy on some details). To get to the regionals, she had to win her school competition. She was given several poems and in the competition she had to recite one of them. The competitors were judged mostly on pronunciation (and on not using their hands).
There were 21 participants in her category (she competed in the 5-7 age group, being on the very younger side of that). Parents were not let in the room or even to see the judges before the competition.
As the picture shows, Karina took First place! The victory was immediately celebrated by the biggest ice-cream she ever had, but suffice it to say that her maternal line (i.e., mom and grandmother) did not sleep that night of sheer excitement! Clearly, this is one of Karina's achievements that I made absolutely no contribution to.
Pandu's blog
Pandu and I often IM each other (mostly for coordinating critical issues such as espresso consumption). I think the next step is for us to get MySpace accounts, and then we'll be Web 2.0 compliant. Who knows, maybe with such openness, our children will consider talking to us when they become teenagers!
From Darfur to Robotics
"A" is a 16 year old student who escaped the killing in Darfur two years ago. He somehow got to Egypt, and from there crossed the Israeli border. Initially, he (and the others with him) were arrested. After a while, he was let out and joined the Kfar Yamin boarding school not too far from Haifa, Israel. "A" immediately became a star student at the school. One night, when he was walking around the school he noticed the lights on in the science lab. He went in and saw a group of kids preparing for the FIRST regional robotics competition, and immediately fell in love with robotics. Shortly after, he became the leader of the group. "A" led the group to an impressive 4th place standing in the Israeli regional. He was disappointed because that was not enough to earn him a trip to Atlanta for the finals, but everyone else is still in awe of the huge step this young man made in such a short time. I bet Dean did not anticipate this story when he started this amazing establishment!
Sunday, April 22, 2007
Memorial Day
Today is Israeli Memorial Day. We remember the soldiers and civilians that were killed in wars and terrorist attacks during the history of Israel. In Israel, this day is taken very seriously (e.g., the notion of a Memorial Day Sale does not exist). As a striking example, at 11AM on this day, there is a siren sounded throughout the country. Every person, and I mean, every single person, will stop what she or he are doing and will stand with respect for 2 minutes. If you're driving a car, you stop the car and stand outside. 2 minutes of complete stoppage. Immediately after the siren, the memorial services begin at all the cemeteries. In the evening, as the sun sets, the country turns from a day of mourning to a day of celebration of its independence. It's quite a striking transition.
For me this is always a very special day. My middle name, Yitzchak, is in memory of my uncle (my father's brother) who was killed in 1948 during the war of independence. He was 27 when he was killed, serving in the Israeli Air Force. I, of course, never got to meet him, but indirectly, he influenced my life quite a bit. He inspired my father to get into chemistry, which took him to academia. In growing up, I never questioned whether I'd end up as an academic or not.
Since moving to the Bay Area, I've been attending the Memorial Day ceremonies organized by the local Tzofim (the scouts). As part of the ceremony, they read out the names of fallen ones who have relatives in this area. They read the names in chronological order of their deaths. It struck me as I was sitting there tonight that I was sitting in anticipation to hear the end of the list, i.e., the new additions from last year. Unfortunately, due to the war in Lebanon in the summer of 2006, the list was indeed longer, and their stories heartbreaking as usual. Let us all hope these lists stop growing longer. There is too much unnecessary pain already in the region.
Sunday, April 8, 2007
Wikinomics
The main thesis of the book is that mass collaboration changes some fundamental aspects of running a business. Three forces have recently come together to create the perfect storm that facilitates Wikinomics: (1) technology (basically, Web 2.0 where anyone can contribute to anything), (2) the Net-Gen -- the generation of people who grew up collaborating (think: kids who view email as a thing only their parents do), and (3) the global economy, where companies are forced to reach out and collaborate to produce additional value (i.e., The World is Flat).
Perhaps the most succint description of the principle underlying Wikinomics is a rephrasing of Coase's Law (which was coined around 1937 by an English socialist). The law says that a firm will expand until the costs of organizing an extra transaction within the firm become equal to the costs of carrying it out on the open market. For example, if you're a car company, if it's cheaper to manufacture your own tires than use an external supplier, then you will do so in house.
The observation is that the internet has lowered the transaction costs so significantly, that now the right way to think of Coase's law is that nowadays firms should shrink until the cost of performing a transaction internaly no longer exceeds the cost of performing it externally.
The book then goes on to illustrate examples of the different aspects of Wikinomics:
Peer production: examples of great achievements created by large collections of collaborating peers: Wikipedia, Linux (and more importantly, IBM's embrace of the open-source community as an example of a firm doing the right thing).
Ideagoras: essentially, using the open market for research into your specific problems. The observation being that no matter how many researchers you employ, the person with the best ideas for a particular problem is likely not in your lab (the authors still emphasize the importance of internal R&D though). Here, the main examples are InnoCentive, a company that acts as an eBay for ideas, and Proctor & Gamble, that was in quite a bit of trouble, but managed to tap external ideas to make a comeback.
Prosumers: companies that benefit from their consumers essentially developing their products. The big example here is SecondLife, where the consumers create more than 99% of the content being transacted (and the consumers get the IP rights to anything they create!) Another example is Lego that lets users create and share their Mindstorm creations. An interesting example is that of consumers tinkering with iPods (and Apple not standing in the way, as opposed to Sony not taking that approach).
The New Alexandrians: the creation of new data banks that enable research to proceed faster (the Human Genome Project). One of the interesting discussions there was about reshaping the relationships between universities and companies. The recently created Intel Lablets are an excellent example of that (and I predict this is one we'll see more of).
New platforms: Google Maps, need I say more? Ok, I will. Actually, Amazon was way ahead on creating platforms for others to build on, and today Yahoo, eBay, Google and Amazon are creating exciting platforms for others to create additional services on.
The Global Plant Floor -- companies changing the way they interact with suppliers. Instead of Boeing sending exact specs for each part of a new plane, they let the suppliers design and innovate as much as possible. They also let them assemble much more of the components, and therefore, it now takes Boeing 3 days to assemble an airplane once all the parts have arrived, rather than 30. BMW is another example described here, along with Lifan, a Chinese company that is making waves in the motorcycle industry.
Finally, there is a discussion of how wikis really changed the possible interactions in corporations.
A few thoughts.
First, there is no doubt that there is a lot of evidence of the power and presence of Wikinomics presented in the book. While the examples were very good, they were still few. This leaves one with the feeling that Wikinomics may remain a fringe rather than mainstream in business.
Second, one may wonder whether we didn't hear all this in The World is Flat by Tom Friedman. Certainly, there are many interesting relationships between the two books. I think Friedman addressed a narrower aspect of the picture: basically that companies can distribute themselves across the world and work more aggressively with partner companies. Wikinomics goes one step further and discusses how companies should leverage the masses, not just partners, and how that affects the way we think of IP rights and communications within corporations.
Third, as a computer scientist, I wonder what CS has done about all of this. While we've been responsible for many of the technologies that created the tools mentioned in the book, I'm not sure we leveraged the tools ourselves to benefit our own research. Certainly, in education we haven't. In fact, Don Tapscott argues that are very few industries that have changed very little in the past century, but education is one of them. It's still mostly built around teachers standing in front of students and lecturing. Food for thought.
Finally, there is a lot of discussion in the book on IP issues and generally, on how business relationships between companies should be structured. I think every lawyer should read this book, and hence, am moving the book to my wife's side of the bed.
Saturday, March 10, 2007
The J Curve
Had I started this two years ago, I would have definitely blogged about "The World is Flat" by Tom Friedman. Too late now. But I think this book should be required reading for anyone interested in data management (either in research or industry). Just read that book and imagine all the data management services we need to invent to support the flattening of the world. Of course, the book has other merits too.
The J Curve by Ian Bremmer, is essentially a framework for discussing the stability and openness of countries and how you can understand political events in the context of moving on the curve. You need to imagine the letter J rotated about 45 degrees clockwise to get the full effect. The X axis of the J curve is the degree of openness of a country (e.g., travel restrictions, freedom of the media, economic openness, the presence of independent political institutions). The Y axis is the degree of stability of the country (i.e., whether certain events would cause great chaos or not). If you're on the top left of the curve, you're closed but stable (e.g., North Korea). If you're on the top right, you're open and stable (e.g., USA, western Europe). If you're China, you pose an interesting challenge to the curve (more on that in a moment).
One of the main points the book makes is that for countries to go from the left to the right they will have to first go down the curve and therefore suffer some considerable instability. The world can help these countries by raising the entire curve, i.e, make the depths of the curve stable enough so countries will survive the transition. (In practice, that observation lets Bremmer criticize many of the policies the USA has taken w.r.t. some countries).
The book goes through a few examples of countries in each part of the curve. It starts with North Korea, Cuba and Saddam's Iraq as examples of stable but closed. It discusses Iran, Saudi Arabia and Russia as countries that have the potential of sliding down the curve from the left. He shows South Africa as an example of a country that made it through the transition successfully and Yugoslavia as one that didn't. He takes Turkey, Israel and India as examples of countries founded on the right hand side of the curve and who have maintained it that way (though they do face challenges going forward). Finally, there is a chapter on China, where Bremmer argues that despite its economic openness, China is still on the left side of the curve.
What I liked most about this book are the brief yet insightful summaries of the relevant history of each of the countries discussed. The summaries give you the background for why things are the way they are now and let you understand better the challenges facing the countries. I'm finding that it's easier for me now to put current news into context, and in fact, that the curve does give a pretty good framework for thinking about today's world events.
That being said, the chapter on the country I am most familiar with, Israel, was a bit disappointing, so perhaps people from other countries would say the same about their respective chapters. I enjoyed reading about all the complexities in Yugoslavia, though I wished he would have spent a page or two describing some of the main events of the war there (he stopped just before it saying it would be too complicated).
In discussing the book with Donald Kossmann, he wonders whether corporations can also be classified according to their openness. For example, a company who has a very closed set of programmatic interfaces to their products (and I won't name the names he mentioned) may be considered a North Korea of countries. An interesting thought to develop.
In summary, I would definitely recommend this book. If you're an expert on world affairs and attended all your history classes in school, you may find less of a payoff from reading it.
I'm moving on now to Wikinomics.
Monday, January 29, 2007
Uncertainty and data integration
Recently, there has been renewed interest in building database systems that handle uncertain data and its lineage in a principled way. The Trio Project at Stanford and the MystiQ Project at the University of Washington are just two examples (I collaborated for a while on the former, and watched the latter up close while I was a professor at UW).
I think this is a great research area and certainly (no pun intended) a very timely one. I want to make two points though (one of which may raise Jennifer Widom's blood pressure).
First, I think data integration is the killer application of uncertainty and lineage (ok, maybe there is a second -- sensor networks). Fundamentally, data is uncertain when it comes from external sources and some of the transformations it went through on the way are not necessarily correct.
In fact, I think one of the greatest challenges for data integration research is to build data integration systems that deal gracefully with uncertainty (uncertainty can be about the underlying data, the schema mappings and the mapping of keyword queries to structured queries). If you have good ideas about this, please do contact me.
My second point is that there is really no argument here. In fact, I believe that once a database system is able to model and process uncertain data and its lineage, much of the distinction between traditional database systems and data integration systems goes away.
Specifically, by modeling data lineage and that it may be uncertain, you're admitting that the data came from somewhere, and that you're not sure about the transformations the brought it into the database or about its intrinsic meaning. That's exactly what data integration is about -- modeling data that comes from multiple sources. Unlike ordinary databases, where the data might as well have been born in the database because you know nothing about its past, databases with uncertainty and lineage admit that data had a prior life.
So then what's left of the difference between databases and data integration systems? Mostly issues having to do with query processing over remote sources.
I should emphasize -- I'm not claiming that these problems are solved (quite the contrary, see my comment about about data integration with uncertainty). But I do find it quite appealing that a database system models the fact that data came from the outside. That's the way it typically is in the real world, and it's about time databases realize it too.
Structured data and the web
I recently published two related papers on this topic, one at CIDR 2007 and one in the Data Engineering Bulletin (go to Page 19 of the issue). You can read the papers for the details, but I'd like to highlight two key points from these papers that should be kept in mind when researching this area:
- Integration: Whenever you cook up an idea about how to improve web searh by leveraging structured data, or by automatically structuring data on the web, you need to keep in mind how your technique will integrate with other web searches. Users want to go to a single search box to find all their result. So whatever technique you come up with, needs to mesh well with other techniques used by the underlying engine.
- Data about everything: Many ideas work well if the domain of the data is constrained (e.g., you know you're building a portal to search for cars, housing or job listings). But on the web, data is about everything. There is no domain or set of domains that covers all data on the web. In fact, it's not even clear when one domain ends and another one begins. So try to imagine what it's like to deal with data about everything. That changes a lot in the way you think about a problem!
Ein Gedi
While in Israel, I went for a hike in Ein-Gedi, one of my all-time favorite places. The trip was organized by the Tova Milo's database group at Tel-Aviv University. Check out the pictures from Ein Gedi (and a few others).
While Ein Gedi has always been a place for me to find complete peace, staring into the Dead Sea, this time my blackberry made it a slightly different experience.
Sunday, January 28, 2007
Introduction
The posts on this blog will either be about work (i.e., data management ideas) or my family. No politics (you probably don't want to hear it anyway), but possibly a passing comment on coffee or other exciting events.
In way of background, until recently I've been a professor at the University of Washington. I moved to Google in September 2005 and lead a group looks at how structured data can be used in Web search. As I publish papers about this work, I'll summarize them here. For all my publications prior to coming to Google you can check out my UW web site.
One of the goals of this blog is to get people in the data management community to share novel ideas and discuss them. While technical results are well served by pubished papers, Web 2.0 gives us the opportunity to discuss ideas outside our conferences quite easily.