1. Could you describe the Cyc technology in a few simple terms? How does it all work?
Cyc is like an unabridged dictionary for computers. Software applications can use Cyc to look up what something means and then react intelligently.
A lot of what computers do today, is simply push strings of text around. They react to the presence of certain strings by assembling other strings and then blindly displaying them. They do this, of course, without any real sense of the underlying meaning. This is all that happens, for example, in social tagging applications.
In the pursuit of intelligent behavior, software may also use math, too. Google's page rank and Amazon's product recommendations appear semi-intelligent in the "choices" they make through various statistical algorithms.
Unlike Google and Amazon, Cyc explicitly represents knowledge. It relies on concepts, facts and rules to do reasoning which is transparent and traceable. There are no "magic" algorithms that leave one wondering how something happened. One can trace and debug Cyc's reasoning every step of the way. And, since this chain of thought is both transparent and manipulable, one can later layer in new tactics and strategies for solving problems as they become available.
Cyc's concepts are organized into one big "ontology". An ontology is like a taxonomy, but with much richer interconnections between terms. This ontology piece is very useful in its own right, apart from the reasoning engine. We expect it to become an important component of the approaching Semantic Web.
2. How difficult is it to build a complete general knowledge base and commonsense reasoning engine using the OpenCyc product compared to Cyc?
Complete? Well, that's a huge task with either of the tools. But, Cyc has a wealth of facts and rules that are not part of the OpenCyc ontology. It also has natural language capabilities that are not in OpenCyc. The most complete commonsense reasoning engine will come from a combination of the two. OpenCyc's breadth (number of concepts) will outpace that of Cyc, but Cyc's depth (complex rules) will outpace that of OpenCyc. In any case, all of Cyc is now available under a research license; so, if one is building software for non-commercial purposes, they can have access to all of what is in Full Cyc today. One can incorporate OpenCyc into a commercial product for free.
3. Is it possible to integrate Cyc/OpenCyc with an existing operating system and use it as a basis for an OS that has full support for natural language? How far away are we from having advanced AI on our home computers and operating systems?
Many people have approached Cycorp about making an OS controlled by natural language. Usually, they were looking to layer Cyc on top of Linux, but didn't have the resources to pursue the project. It may take a Microsoft or Google to get seriously interested in doing this. Most large companies, however, get caught in a kind of incrementalism -- so we'll have to see if they'll take the leap.
4. How difficult/easy is it to integrate proper speech recognition into Cyc?
Cyc leverages existing speech-to-text technologies and contributes contextual understanding. So, text is the bridge between speech systems and Cyc. Speech systems produce text and Cyc translates the natural language into its own internal representation. When the speech-to-text processor has trouble choosing among ambiguous sounds, Cyc can help identify which possible meanings are more -- well, meaningful.
5. What is the biggest challenge in the development of Cyc?
Bridging the gap between the experts currently using the system and others who can potentially help.