Today we feature an interview with Frank W. Miller, Ph.D., developer of the CornfedSIP User Agent VoIP client for Linux. We discuss the merits of VoIP, its problems and its future.1. Why another softphone?
Frank W. Miller, Ph.D. *Most* of the softphones out there have two attributes: 1) they are tied to a particular service and 2) they run *best* on Windows. The Cornfed SIP User Agent is service independent and targeted specifically at Linux.
2. What made you want to do this?
Frank: I’m a startup guy. This is my third Voice-over-IP (VoIP) startup. The first two were focused on VoIP gateways, first in the core, then closer to the edge. I believe in voice as a basic means of communications. I also believe in the power of the edge that is inherent in the Internet. Those two beliefs lead one to the VoIP user client.
3. Do you think that eventually all phone calls will be carried out with softphones?
Frank: If you mean softphone in the traditional sense of a program running on my computer, the answer is no. There are too many other useful voice communications form factors. If you mean some piece of software that implements voice communications protocols running on any device from a traditional handset to a portable device to a client program on my computer or TV or something, the answer is yes. There will come a time when traditional, dumb, two-wire phones are gone. It might be a hundred years from now, but they will eventually disappear.
4. Do you feel that most softphones are forked from the standard, like in the Skype case?
Frank: One thing I’ve learned is that you cannot be religious about protocols. The vast majority of Skype users have no idea and don’t really care what actual protocols are being used. What they see is an easy to use program that lets them make voice calls with no fuss.
There are at least three interesting VoIP protocols right now, Skype (which is closed), IAX2 (which is fairly Asterisk-specific at the moment), and SIP. Skype is closed and that means you can’t implement it in your softphone very easily. IAX2 is interesting but lacks the mindshare of SIP. The vast majority of VoIP equipment and services being deployed today are based on SIP and I think that will continue for the foreseeable future.
5. Do you feel that video calls will contemplate voice, or will eventually eliminate it?
Frank: I remember first hearing the term “video phone” when I was in grade school. They have always been “just around the corner.” There are two interesting observations about video phones: 1) a lot of people really don’t want you to see them when you’re talking to them and 2) video doesn’t “mix” like audio does.
On the first point, lots of people do other things when they’re talking on the phone. Video conversations require more attention than voice conversations. Also, the ability to not be seen by the other party is quite useful. When you couple this with the fact that a large percentage of the information conveyed in a video call happens with just the voice part, the need for video just isn’t necessarily compelling. It’s not about whether the technology can support it, its about whether people really want to do it.
The second point is also about the user experience. When you mix voice, the ear hears them all without the user having to do anything above what they would do for a single conversation. For video, “mixing” consists of mapping multiple windows so the user can dart from one to another with their eyes. It’s not as satisfying.
6. What are the biggest drawbacks for quality/performance when developing a softphone?
Frank: It’s very important to get the threading right. Softphones have lots of things going on at once and getting all the concurrency right can be a bit tricky. The User Interface and handling the media require quick response times and there can be many signaling transactions in progress at the same time.
In a past life, I wrote operating system kernel code. I was pleasantly surprised (since I liked writing kernel code) at the analogies between the internals of a softphone and some of the concepts you find in a kernel.
7. Do you have plans to port the front end to your SIP library to other operating systems?
Frank: The implementation is broken into two parts: 1) a backend library that implements the basic softphone functions (e.g. call setup and teardown, registrations) and 2) two client frontends (a command-line client and a Gnome GUI client).
The backend is written in highly portable C. This was done on purpose specifically to allow it to be ported to other platforms easily.
The frontend clients were originally written to drive and test the backend only. However, the Gnome client has become something of an experiment in usability. It has some interesting User Interface elements that facilitate multiple simultaneous registrations with different Service Providers and contact management. While the actual Gnome code may not be portable, it would certainly be interesting to port the basic UI design to other platforms.
8. Why Instant Messaging voice support (e.g. AOL, MSN) did not succeed so far as much as Skype or Gizmo have?
Frank: I think the jury is still out on this. I use IM all the time. I have also moved from IM to a voice call relatively frequently. However, most of my IM happens at work, and I just pick up the phone sitting next to me and call whomever I am IMing at the time; this despite the fact that Microsoft Messenger includes the ability to convert the IM session to a VoIP call.
I’ve thought about this a bit. Why do I pick up the handset instead of clicking a button? Well, the first thing is, I don’t have a “handset” connected to my laptop most of the time. Second, I’m just in the habit of using the phone for voice conversations.
For the handset point, I suppose having good automatic built-in handset support and/or a decent handset I can plug into my laptop consistently would solve that problem. For the habit point, I may be too old to change. Perhaps the IM kiddies will start to hit the button instead…
9. Do you have plans to support USB handsets? (their dialing buttons etc)
Frank: One of the interesting things about the Cornfed UI is that the dialpad is not used for dialing; it’s there to allow a user to inject DTMF into a call in progress only. That said, to support the USB handset digit pad (for dialing or DTMF), the Cornfed client would need only an interface that provides digit events that could be hooked into the backend in a manner similar to the way the UI dialpad digits are currently handled. This will likely require ALSA to support generating these events. I don’t think this is currently supported and don’t know when it will be.
10. What features of CornfedSIP should we expect in the future and what the future holds about SIP in general?
Frank: I’ll break this into two parts: 1) the backend library and 2) the client frontends. For the backend, additional features will primarily take the form of additional protocol features, PRACK, SUBSCRIBE/NOTIFY, additional codecs (e.g. GSM, iLBC), and additional DNS support. The clients will primarily evolve to support these features. The CLI will continue to be supported at the same feature level as GUI client.
There are also some client type functions that will be added in the near-term. Recording of calls, playing of recorded sounds into a call, voice mail are all good examples of this.
The client probably won’t support media other than audio in the near future, e.g. IM and video. Instead, integration of other audio functions such as listening to Internet radio or other streaming audio and mixing them together is more inline with a audio client.