I already blogged about the Quaero project to develop a Franco-German(-European) state-funded Internet search engine that aims at "taking up the challenge of Yahoo and Google" (in the words of French president Jacques Chirac). One to two billion euros are rumored to have been allocated to Quaero, but only a few details have been released so far. This week's Economist has a story (subscribers only) that, besides discussing the implications of state-funded industrial and tech development, outlines the ambitions of the project. They aren't modest:
Search engines can retrieve image, audio and video files, in addition to text documents. But this is done by matching the user's keywords to a text description of the image, audio or video content. Quaero users will be able to search the internet with keywords in the usual way; but (also) to perform searches using pictures and sounds as query terms. “It's beyond Google,” says Marie-Vincente Pasdeloup of Thomson.
Quaero will allow users to search using a “query image”, not just a group of keywords. In a process known as “image mining”, software that recognises shapes and colours will then retrieve still images and video clips that contain images similar to the query image. (...) When Quaero finds an image without a description that matches a properly labelled image, it will append the description from the labelled image to the unlabelled one. This technique, called “keyword propagation”, will enrich the web linguistically: image descriptions in French, for example, will spread as they are tacked on to similar images, so that those images can also be retrieved by users who type in French keywords.
Meanwhile, in Germany, researchers (...) are developing Quaero's voice-recognition and translation technology (...). The idea is that this software will find audio files—such as political speeches or radio broadcasts—and then automatically transcribe and translate them into a number of European languages. The original audio files can then be found using keyword searches. In addition, speaker-identification software will allow users (via computer microphones) to search the internet for audio clips recorded in their own voices, or those of other speakers.
Affaire à suivre.
UPDATE 3 Jan 07 - The German government withdraws from the project