posted by Thom Holwerda on Mon 11th Mar 2013 14:51 UTC

History of handwriting recognition

The history of Palm itself most certainly doesn't extend as far back as the 19th century, as most of you will know. The company was founded in 1992 by Jeff Hawkins, joined by Donna Dubinsky and Ed Colligan, and those of you with a proper sense of history will probably know that, with a bit of effort, you could stretch Palm's history a bit further back to the late 1980s. At that time, Hawkins worked at GRiD, where he created the GRiDPad, one of the first tablet computers and the Palm Pilot's direct predecessor.

To understand Palm's history, you have to understand Hawkins' history. To understand Hawkins' history, you have to look at the technology that was at the very core of the Palm Pilot: handwriting recognition. This technology most certainly wasn't new when Hawkins started working on it, and as early as 1888, scientists and inventors were already working on the subject - in one way or another.

Before we move on, it's important to make a few distinctions in order to make clear what I mean by "handwriting recognition". First and foremost, there's the distinction between printed character recognition and handwritten character recognition, which I'm assuming is obvious. What's less obvious, perhaps, is that handwritten character recognition further breaks down in online and offline handwritten character recognition. The latter refers to - simply put - scanning handwritten characters and recognising them as such; this is used extensively by postal services to scan handwritten addresses.

With online handwritten character recognition, characters are recognised as they are written. You could do this in a variety of ways, but the one most of us are familiar with is using a stylus on a resistive touchscreen. However, it can also be done on a capacitive touchscreen, a graphics tablet, or possibly even a camera (I don't know of any examples, but it seems possible). This is the kind of handwriting recognition this article refers to.


First steps

Having said that, the history of handwriting recognition starts in the late 19th century. There were systems which may look like they employ handwriting recognition, the most prominent of which is probably the telautograph. This mechanical device was invented and patented by Elisha Gray - yes, that one - in 1888, and converted handwriting or drawings into electrical impulses using potentiometers, which were then sent to a receiving station, which recreated the handwriting or drawing using servomechanisms and a pen. As ingenious as this system is, it isn't handwriting recognition, because nothing's actually being recognised.

In 1914, a system was invented that is considered to be the first instance of handwritten character recognition. Hyman Eli Goldberg invented and patented his 'Controller', a device that converted handwritten numerical characters into electrical data which would in turn instruct a machine in real-time.

It's quite ingenious. I'm no expert in reading patent applications, and the older-style English and technical writing don't help, but the way I understand it, it's simple and clever at the same time. Characters are written using an electrically conductive ink. A 'contactor', consisting of six groups of five 'terminals' (so, six digits can be written) is then applied to the written ink. The electrically conductive ink of a character will connect the five terminals in a specific way, which creates circuits; in which way these terminals are connected by the ink depends on the shape of the character, thus creating various different circuits (see the below image). These different currents then give different instructions to the machine that's being controlled.

Neither of these systems employed a computer, so we're still a way off from handwriting recognition as we know it today. In addition, there were more systems - more and less advanced than what I've already described - but I'm not going to describe them all; the point I'm trying to make is that the idea of trying to control a machine using handwriting is an old idea indeed, with implementations dating back to the 19th century.

Now let's jump ahead to the late '50s and early '60s, and bring computing into the mix.


Stylator

Before we actually do so, we should consider what is needed to operate a computer using handwriting. It seems simple enough, but consider what computers looked like during those days, and it becomes obvious that a lot had to be done before we arrived at handwriting recognition on a computer. An input device was needed, a display to show the results, a powerful computer, and the intricate software to glue it all together.

The input device was the first part to come. In 1957, Tom Dimond unveiled his Stylator invention, in a detailed article titled "Devices for reading handwritten characters". Stylater is a contraction of stylus and interpreter or translator, which should be a clear indication of what we're looking at: a graphic tablet with a stylus.

Stylator's basic concept isn't all that different from Goldberg's Controller. However, it improves upon it in several crucial ways, the most important of which is that instead of connecting terminal dots with conductive ink to create circuits, you're using a stylus to draw across a plastic surface with copper conductors embedded in it. The conductors are laid out in such a way that with just three lines consisting of seven conductors, all numerical characters can be recognised. The illustration below from Dimond's article is pretty self-explanatory.

As you can see, writing numerals 'around' the two dots will ensure the characters can be recognised. When the stylus crosses one of the conductors, the conductor is energised and the combination of energised conductors corresponds to a numeral. This system allows for a far greater degree of variation in handwriting styles than the Controller did, as you can see below with the numeral '3'.

The two-dot system can be expanded to four dots to accommodate for all letters in the alphabet, but as you can see in the examples below, it does require a certain amount of arbitrariness in how to write the letters.

Alternatively, Dimond suggests, you can employ the sequence in which the conductors are energised to expand the two-dot system to also allow for recognising letters. It's also important to note that the Stylator tablet required you to manually clear the character recognition buffer by tapping the stylus on a separate area because Stylator has no way of knowing when a character is completed.

Dimond lists a number of possible uses for Stylator. "Several uses have been suggested for the Stylator. It is a competitor for key sets in many applications. It has been successfully used to control a teletypewriter. It is attractive in this application because it is inexpensive and does not require a long period for learning to use a keyboard," Dimond writes, "If the criterial areas are used to control the frequency of an oscillator, an inexpensive sending device is obtained which may be connected to a telephone set to send information to remote machines."

There are several key takeaways from Dimond's Stylator project, the most important of which is that it touches upon a crucial aspect of the implementation of handwriting recognition: do you create a system that tries to recognise handwriting, no matter whose handwriting it is - or, alternatively, do you ask that users learn a specific handwriting that is easier for the system to recognise? This would prove to be a question critical to Palm's success (but it'll be a while before we get to that!).

In the case of the former, you're going to need very, very clever software and a very sensitive writing surface. In the case of the latter, you're going to need very simple letters and numerals with as few strokes as possible to make it easy to learn, but the recognition software can focus on just that specific handwriting, greatly reducing its complexity. Stylator clearly opted for the latter due to hardware constraints.

The Stylator, while a huge leap forward over earlier systems, was still quite limited in what it could do. To really make handwriting recognition a valid input method, we need more. Let's make another leap forward, and arrive at a system consisting of a graphics tablet, CRT display, recognition software, and a user interface - essentially a Palm Pilot the size of a room.


The holy GRAIL

Over the course of the 1960s, the RAND Corporation worked on something called the GRAIL Project, short for the Graphical Input Language Project. The description of the project is straightforward: "A man, using a RAND Tablet/Stylus and a CRT display, may specify and edit a computer program via flowcharts and then execute it. The system provides relevant feedback on the CRT display." The entire project is detailed in a three-part final report, and was sponsored by the Advanced Research Projects Agency (ARPA or DARPA, it's been renamed quite a few times) of the US Department of Defense.

The GRAIL Project was part of a larger interest in the industry at the time into human-machine interaction. GRAIL is an experiment into using a tablet and stylus to create computer programs using flowcharts - and in doing so, includes online handwriting recognition, a graphical user interface with things like resize handles, buttons, several system-wide gestures, real-time editing capabilities, and much more.

Let's start with the RAND Tablet/Stylus. I think some of you may have heard of this one before, especially since it was often quoted in articles about the history of tablets published after the arrival and ensuing success of Apple's iPad. The RAND tablet is a massive improvement over the Stylator, and would be used in several other projects at RAND - including GRAIL - even though it was originally a separate research project, also funded by DARPA. As was the case with many other RAND projects at the time, a detailed report on it was written, titled "The RAND Tablet: a man-machine graphical communication device". The summary neatly details the device:

The Memorandum describes a low-cost, two-dimensional graphic input tablet and stylus developed at The RAND Corporation for conducting research on man-machine graphical communications. The tablet is a printer-circuit screen complete with printed-circuit capacitive-coupled encoders with only 40 external connections. The writing surface is a 10"×10" area with a resolution of 100 lines per inch in both x and y. Thus, it is capable of digitizing >106 discrete locations with excellent linearity, allowing the user to "write" in a natural manner. The system does not require a computer-controlled scanning system to locate and track the stylus. Several institutions have recently installed copies of the tablet in research environments. It has been in use at RAND since September 1963.

As I already mentioned, during those times a lot of research went into improving the way humans interacted with computers. After coming to the conclusion that the then-current interaction models were suboptimal for both computer and user, scientists at RAND and elsewhere wanted to unlock the full potential of both user and computer. A number of these projects were "concerned with the design of 'two-dimensional' or 'graphical' man-computer links" (in other words, the first shoots of the graphical user interface).

From the very beginning, RAND focussed on exploring the possibilities of using "man's existent dexterity with a free, pen-like instrument on a horizontal surface". This focus led to the eventual creation of the RAND Tablet, which was, as we already saw in the description in the summary above, quite advanced. The technical workings are slightly beyond my comfort zone (I'm no engineer or programmer), but I believe I grasp the general gist.

The tablet consists of a sheet of Mylar with printed circuits on each of its two sides; the top circuit contains lines for the x position, while the bottom circuit contains lines for the y position. These lines are pulsed with negative and positive pulses, which are picked up by a stylus with a high input impedance. Each x and y position consists of a specific sequence of negative and positive pulses; negative pulses are zeros and positive pulse are ones, which, when combined, lead to a Gray-pulse code for each x,y position. These can then be fed into a computer where further magic happens.

This is just a basic description of how the system works, greatly simplified and based on a very simple, 8×8-line version of the RAND Tablet used in the article for explanatory purposes. There's a lot more interesting things going on deeper in the system (such as ignoring accidental movements), and if you want to know more technical details I highly recommend reading the article - it's quite readable.

The tablet itself was not a goal per se; it was a means to an end, with the end being to make it easier for humans to interact with computers. With this in mind, the RAND tablet would return to the forefront several years later, when RAND unveiled the GRAIL Project. At OSNews and other places, you've probably heard a lot about Douglas Engelbart's NLS, the revolutionary work done at Xerox PARC, and the first commercially successful graphical user interfaces developed at Apple (the Macintosh), Commodore (AmigaOS), and Digital Research (GEM). Yet, I've never seen or heard anything about GRAIL, and to be honest, that's a shame - because it's bloody amazing.

I will provide a summary on what the GRAIL Project entails, but for those of you interested in the nitty-gritty, I advise you to read all three in-depth articles on the project (a total of 126 pages, so grab a coffee) and simply skip my summary:

  1. The GRAIL Project: an experiment in man-machine communications
  2. The GRAIL language and operations
  3. The GRAIL system implementation

The goal of the GRAIL Project was to develop a 'common working surface' for both human and computer - a CRT display. They concluded that the flexibility of the output (the CRT display) should be matched by the flexibility of the input, so that direct and natural expression on a two-dimensional surface was possible, and that's - obviously - where the RAND Tablet comes back into play. The project had four design objectives:

  1. to use only the CRT and the tablet to interpret stylus movement in real-time
  2. to make the operations apparent
  3. to make the system responsive
  4. to make it complete as a problemsolving aid

This led them to the creation of a graphical programming language which uses flowcharts as a means for the user to instruct the computer to solve problems. The flowcharts were drawn by hand on the tablet, and would appear on the screen in real-time. Much like Ivan Sutherland's Sketchpad, the user could draw a 'messy' shape (say, a rectangle), and the computer would replace it with a normalised variant. He could then manipulate these shapes (resize, move, alter) and connect them to create a flowchart. He could also write on the tablet, and have it appear on the screen - and much like the rectangle, the computer would recognise the handwritten characters, and turn them into normalised characters.

To facilitate the interactions, a dot on the display represented the position of the stylus on the tablet, and real-time 'ink' was drawn on the display whenever the stylus was pressed onto the tablet. The tablet surface corresponds 1:1 with the display surface. These three elements combined allowed the user to remain focussed on the display at all times - clearly an intermediary step towards modern high-end graphics tablets which combine pressure sensitive digitisers and styluses with displays.

The system also contained several elements which would return in later user interfaces, such as buttons and resize handles, and would even correct the user if he drew something 'unacceptable' (e.g., drawing a flow from one symbol to another if such a flow was not allowed).

Thanks to the wonder of the internet and YouTube, we can see GRAIL in action - and narrated by Alan Kay. Kay even states in the video that one of the window controls of the Mac was "literally" taken from GRAIL.

The GRAIL Project also introduced several gestures that would survive and be used for decades to come. The caret gesture was used to insert text, a scrub gesture to delete something, and so on. These gestures would later return in systems using the notebook UI paradigm, such as PenPoint OS and Newton OS.

The biggest challenge for the GRAIL Project engineers was to ensure everything happened in real-time, and that the system was responsive enough to ensure that the user felt directly in control over the work he was doing. Any significant delay would have a strong detrimental effect on the user experience (still a challenge today for touch-based devices). The researchers note that the computational costs for providing such accurate user feedback are incredibly high, and as such, that they had to implement several specialised techniques to get there.

For those that wish to know: the GRAIL Project ran on an IBM System/360 Model 40-G with two 2311 harddisks as secondary storage and a Burroughs Corp. CRT display, and the basic operating system was built from scratch specifically for GRAIL. Despite the custom nature of the project and the fact that the System/360 was available to them on an exclusive basis, the researchers note that the system became overloaded under peak demands, illustrating that the project was perhaps a bit too far ahead of its time. At the same time, they also note that areas were being investigated to distribute the processor's load in a more evenly manner.

While those of you interested in more details and the actual workings at lower levels can dive into the three articles linked to earlier, I want to focus on one particular aspect of the GRAIL Project: its handwriting recognition. I was surprised to find just how advanced the recognition system was - it moved beyond 'merely' recognising handwritten characters, and allowed for a variety of gestures for text editing, as well as automatic syntax analysis to ensure the strings were valid (this is a programming environment, after all).

To get a grip on how the recognition system works, we have to step away from the GRAIL Project and look at a different research project at RAND. The GRAIL articles treat handwriting recognition rather matter-of-factly, referring to this other project, titled "Real-time recognition of handprinted text", by Gabriel F. Groner, from 1966, as the source of their technology.

The RAND Tablet had already been developed, and now the task RAND faced was to make it possible for characters handwritten 'on' the tablet to be recognised by a computer so they could be used for human-machine interaction. The researchers eventually ended up with a system that could recognise the upper-case Latin alphabet, numerals, and a collection of symbols. In addition, the scrubbing gesture (for deletion) mentioned earlier was also recognised.

There were a small number of conventions the user had to adhere to in order for the character recognition software to work properly. The letter O had to be slashed to distinguish it from the numeral 0, the letter I needed serifs to distinguish it from the numeral 1, and Z had to be crossed so the system wouldn't confuse it with the numeral 2. In addition, characters had to be written separately (so no connected script), and cursive elements like curls had to be avoided.

Already recognised text could also be edited. Any character already on the screen could be replaced simply by overwriting it (remember, the tablet and display corresponded 1:1). In addition, characters could be removed by scrubbing them out.

So, let's get to the meat of the matter. How does the actual recognition work? The basic goal of a handwriting recognition system is fairly straightforward. You need the features which are most useful for telling one character apart from the other; features which remain fairly consistent even among variations of the same character, but differ among the various characters. In other words, you want those unique features of a character which are always the same, no matter who writes the character.

First, you need to get the actual data. As soon as the stylus is pressed on the tablet's surface, a switch in the stylus is activated, which signals the recognition system that a stroke has been initiated. During a stroke, the recognition system is notified of the position of the stylus every 4 ms (each position is accurate to within about 0.127 mm). When the stylus is lifted off the surface of the tablet, the recognition system is notified that the stroke has ended.

The set of data points received by the recognition system is then smoothed (to reduce noise) and thinned (to remove unnecessary data points). The exact workings of smoothing and thinning are defined by a set of formulas, for which I refer you to the article (I don't understand them anyway). In any case, the goal is to reduce the amount of processing required by reducing the number of individual data points.

The character (now represented by data points) is then analysed for features like curvature, corners, size, and several position features. The entire character is divided up into a 4×4 grid, and the features are located within any of the grid's 16 areas. With this information in hand, the recognition system's decision making scheme makes the call which symbol - if any - has just been written. The recognition system can handle characters consisting of multiple strokes, and it's smart enough so that the user no longer needs to inform the system when a character has been completed.

To give you an idea of how much variation is allowed, look at the below set of strokes. Each and every one of them is recognised as a 3.

The accuracy of the system proved to be very high. The researchers asked people with zero experience with the system to sit down and use it, and they found that the average accuracy rating was 87% (I doubt I can even hit that with modern touch keyboards). People with previous experience with the system hit an average of 88%, and those that helped design the system - and, in fact, on whose handwriting the system was based - hit 93%. The researchers found that several characters proved especially problematic, such as [ vs (, or the asterisk.

The team concludes as follows:

The recognition program responds quickly and is efficient in storage. When the time-delay normally used to separate symbols is set to zero, the lifting of the pen and the display of a recognised symbol are apparently simultaneous. The recognition program - including the data analysis and decision-making routines, and data storage; but not display or editing routines - requires about twenty-four hundred 32-bit words of memory.

The system proved to be capable enough to be used in the GRAIL Project, as you could see in the video (although it was most likely refined and expanded by that point). It's incredibly impressive to see what was possible given the limited resources they had to deal with, but if there's one thing I've learnt over the years pouring over this kind of stuff, it's that limited resources are a programmer's best friend.

So, GRAIL was the whole nine yards - a tablet and pen operating an entire system using handwriting, shape, and gesture recognition. What's the next step? Well, as fascinating and impressive as the GRAIL Project is, it's 'only' a research project, not an actual commercial product. In other words, the next step is to see who first brought this idea to market.

And here, we run into a bit of trouble.


Going to market

We run into trouble because I can't seem to find a lot of information about what is supposedly the first product to bring all this to market. Jean Renard Ward, a specialist and veteran in the field of pen computing, character recognition, and similar, has created a comprehensive bibliography concerning these topics, and put it online for us to peruse through.

In it, he notes that Applicon Incorporated, a company which developed, among other things, computer aided design and manufacturing systems, developed the first commercial gesture recognition system. He references one of the company's manuals, which, as far as I can tell, is not available online. He wonders if Applicon, perhaps, uses the Ledeen character recogniser.

At first, I couldn't find a whole lot of information on Applicon (you'd be surprised how little you can do out of the Dutch countryside). There's a Wikipedia page for the company, but it lacks verification and citations, so I couldn't judge the validity of the claims made. Wikipedia claims that Applicon's products ran on PDP-11 machines from DEC, and that, much like the GRAIL Project, they used a tablet mapped to the display for input, including gesture and character recognition. However, without proper citations, it was impossible to verify.

And then, during one last ditch attempt to find something more tangible, I struck gold. David E. Weinberg has written a detailed history of CAD, titled "The Engineering Design Revolution", which is freely available online (a 650-page treasure trove of awesome stuff). Chapter 7 deals entirely with Applicon and the company's history, and also includes a fairly detailed description of the products it shipped.

Applicon's early systems, built and sold in the early 1970s, were repackaged PDP-11/34 machines from DEC, which Applicon combined with its own Graphics 32 processor and called the AGS/895 Central Processing Facility. The software was written in assembly, and used a custom operating system (until DEC's RSX-11M came out in 1987). The unique selling point here was the means by which the user interacted with the system. As described by Weinberg:

The key characteristic of Applicon's AGS software was its pattern recognition command entry or what the company called Tablet Symbol Recognition. As an example, if the user wanted to zoom in on a specific area of a drawing, he would simply draw a circle around the area of interest with his tablet stylus and the system would regenerate the image displaying just the area of interest. A horizontal dimension was inserted by entering a dot followed by a dash while a vertical dimension line was a dot followed by a short vertical line. The underlying software was command driven and these tablet patterns simply initiated a sequence of commands. The system came with a number of predefined tablet patterns but users could create patterns to represent any specialized sequence of operations desired.

While it doesn't specifically state that it employed handwritten character recognition, it can be inferred from the description - a letter or numeral is simply a pattern to which we arbitrarily ascribe meaning, after all. Applicon supposedly used the Ledeen gesture recogniser, which, in some literature, is actually called the Ledeen character recogniser, and is capable of handwriting recognition. I fully understand if some of you think I'm inferring too much - so I'd be very happy if someone who actually has experience with Applicon's products to step forward and correct me.

From the 1970s and onward, multiple commercial products using handwriting recognition, styluses and tablets would enter the marketplace. Take the Pencept Penpad, for instance, a product on which Jean Renard Ward actually worked. He's got a fascinating video and several images on his website demonstrating how it worked - basically a more compact version of the systems we just discussed like GRAIL and the AGS/895 Central Processing Facility.

As a sidenote, on that same page, you'll also find more exotic approaches to handwriting recognition, like the Casio PF-8000-s calculator, which used a grid of rubberised buttons instead of a digitiser. Even though it has no real bearing on this article, I find the concept quite fascinating, so I wanted to mention it anyway. There's a video of it on YouTube showing it in action.

These relatively early attempts at bringing handwriting recognition and pen input to the attention of the greater public would later catch the attention of the behemoths of computing. GO Corporation, technically a start-up but with massive amounts of funding, developed PenPoint OS, which Microsoft perceived as such a threat that after several interactions between the two companies, Redmond decided it had to enter the pen computing market as well - and so, Windows for Pen Computing was born, which brought pen computing to Windows 3.x. Apple, of course, also followed this trend with the Newton.

All these products - PenPoint OS, Windows for Pen Computing, the Newton - have one thing in common: they were commercial failures. Fascinating pieces of technology, sure, but nobody bought and/or wanted them. It wasn't until the release of the original Palm Pilot that pen computing and handwriting recognition really took off.


Driving the point home

If you've come this far, you've already read approximately 6000 words in the form of a concise history of handwriting recognition. While this may seem strange for an article that is supposed to be about Palm, I did this to illustrate a point, a point I have repeatedly tried to make in the past - namely, that products are not invented in a bubble. Now that patents rule the industry and companies and their products have become objects of worship, there's a growing trend among those companies and their followers to claim ownership over ideas, products, and concepts. This trend is toxic, detrimental to the industry, and hinders the progress of technology.

I've just written 6000 words on the history of handwriting recognition, dating back to the 19th century, to drive the point home that the Palm Pilot, while a revolutionary product that has defined the mobile computing industry and continues to form its basis until today, was not something that just sprung up out of thin air. It's the culmination of over a hundred years of work in engineering and computing, and that fancy smartphone in your pocket is no different.

With that off my chest, let's finally talk about Palm.




Table of contents
  1. Introduction
  2. History of handwriting recognition
  3. Palm's hardware
  4. The Palm operating system
  5. Miscellaneous
  6. I'm ready to wallow now
e p (14)    80 Comment(s)

Technology White Papers

See More