dBASE on the Kaypro II

Thom Holwerda 2026-02-17 Retro computing 13 Comments

Within the major operating system of its day, on popular hardware of its day, ran the utterly dominant relational database software of its day. PC Magazine, February 1984, said, “Independent industry watchers estimate that dBASE II enjoys 70 percent of the market for microcomputer database managers.” Similar to past subjects HyperCard and Scala Multimedia, Wayne Ratcliff’s dBASE II was an industry unto itself, not just for data-management, but for programmability, a legacy which lives on today as xBase.
[…]
Written in assembly, dBASE II squeezed maximum performance out of minimal hardware specs. This is my first time using both CP/M and dBASE. Let’s see what made this such a power couple.
↫ Christopher Drum

If you’ve ever wanted to run a company using CP/M – and who doesn’t – this article is as good a starting point as any.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

13 Comments

2026-02-17 7:33 pm
sukru
The amazing thing with dBASE was…. the entire database was pretty much plain text.
They had external index files for fast access of course. But all formats, including numbers and dates were serialized as strings. You could read the whole thing as “column separated file”
“cat mydb | cut -c20-24”
Or something like that (if you know where exactly the columns were laid)
“But storing numbers as text is inefficient!”
Yet they are more correct. A floating point number will not be able to accurately present a value like 10.2. Where was string “_____10.20” is pretty accurate and device architecture agnostic. (Float32 version would be 10.19999980…)
It makes in place updates difficult to the schema of course. But things like deletions or data updates were pretty straightforward (delete was setting up a flag not to use a record)
Those were easy days. Data recovery could be done by hand.

2026-02-17 9:42 pm
Alfman verbose=1
sukru,
“But storing numbers as text is inefficient!”
Yet they are more correct. A floating point number will not be able to accurately present a value like 10.2. Where was string “_____10.20” is pretty accurate and device architecture agnostic. (Float32 version would be 10.19999980…)
IEEE floating point numbers are not really designed to exactly represent arbitrary numbers. That’s a fair point however I would add that decimal numbers as text is not a great solution for that either…
1) It is not efficient computationally or storage-wise. An arbitrary precision math library will do a better job on both fronts.
2) It doesn’t even solve the problem, it replaces one arbitrary base (base 2) with another (base 10) that has the exact same problem. For example try to represent 2/3rd in base 10: ~0.6666667.
If we want to be able to mathematically represent all rational numbers exactly, then we can store them as ratios in whatever number base is preferred by the architecture. Languages that allow type overriding (like C++) can create classes that abstract rational numbers behind standard math operations while keeping exact results….at least up until you do something “irrational” 🙂
It makes in place updates difficult to the schema of course. But things like deletions or data updates were pretty straightforward (delete was setting up a flag not to use a record)
Those were easy days. Data recovery could be done by hand.
I never used dbase. To me text files make sense as an exchange medium, but IMHO once inside of a database it no longer matters. The abstractions created by SQL becomes primary and the raw storage is not relevant – as long as the option to convert is always present. You make a valid point about data recovery, but that’s what backups are for.
An early mysql developer (possibly an intern?) created a really strange mechanism for storing decimal types that packs decimal digits into binary types. It’s really weird and inefficient. It seems that the developer was trying to optimize BSD on 32bit machine without realizing that fixed point math was a thing and could represent decimal values perfectly even in binary. The solution created a great deal of complexity to keep the numbers in decimal without realizing that some easy fixed point math elegantly and efficiently solves the problem.
It was discussed here…
(take note the conversion of comments from the old osnews website was buggy in terms of handling block quotes)
https://www.osnews.com/story/30272/a-constructive-look-at-the-atari-2600-basic-cartridge/#comment-655624

2026-02-20 3:44 pm
sukru
Alfman,
(I had a longer response but will only focus on this now)
… is not efficient computationally or storage-wise.
Yes. That is why modern dabatases uses integer based types, like BigInt or fixed point Decimal. They are also prone to these problem (either inefficient in storage or calculation, cannot have both). But precisely store numbers as they have been entered.
Tallying up payroll accounts correctly is much more important than saving 10ms in a request.

2026-02-20 7:13 pm
Alfman verbose=1
sukru,
Yes. That is why modern dabatases uses integer based types, like BigInt or fixed point Decimal.
I agree they should, but I was surprised to learn that it wasn’t implemented this way for mysql.
They are also prone to these problem (either inefficient in storage or calculation, cannot have both). But precisely store numbers as they have been entered.
Computing numbers in decimal without transforming into binary was the premise of early instructions (see AAD AAM AAA AAS). But these days it’s totally obsolete and the instructions no longer exist on x86_64.
https://www.felixcloutier.com/x86/aad
This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.
Converting to binary isn’t really a bottleneck and if you need to do anything mathematically significant you’ll be thankful to use CPU registers/ SSE naturally rather than sequentially processing decimal digits from a string one at a time. I think we’d be very hard pressed to find a scenario where text is actually superior, apart from being human readable of course.
Tallying up payroll accounts correctly is much more important than saving 10ms in a request.
Sure, but it’s not like correctness and efficiency are exclusive here. Creating an efficient binary implementation with 1:1 representation is a fairly trivial CS problem to solve. Granted it might be a little less efficient than the floating point registers built into the CPU, but it’ll still better than text.

2026-02-20 10:58 pm
sukru
Alfman,
Yes, BigInt / Decimal can be efficient… but to a point.
The “digits” don’t have to be base-10. It can easily be base-32,768 (16-bit) or even base-2,147,483,648 (32-bit).
The complexity comes during arithmetic time. Addition? Easy. Multiplication? We have to go back to 4th grade and do long multiplication (there are more efficient algorithms too). Division? Long division of course.
Square-root, etc? Forget them. They would lose precision anyway.
Self-correction: Google says there is “Newton-Raphson” algorithm for square root. Maybe a few more arithmetic operations are possible after all.
In any case the database engine will have to implement these, and try to make best use of native register operations along the way.
2026-02-21 1:54 pm
Alfman verbose=1
sukru,
The “digits” don’t have to be base-10. It can easily be base-32,768 (16-bit) or even base-2,147,483,648 (32-bit).
The complexity comes during arithmetic time. Addition? Easy. Multiplication? We have to go back to 4th grade and do long multiplication (there are more efficient algorithms too). Division? Long division of course.
Standard integer types go up to 64bit. And if you don’t care about standards, compilers including GCC do actually support integers up to 128bit ( __int128 ).
2^128 = 340282366920938463463374607431768211456
Going higher requires long multiplication or division algorithms. TBH I doubt most accounting software goes this high anyway.
Square-root, etc? Forget them. They would lose precision anyway.
You are right to highlight the issue. Whenever irrational numbers come into play you either need to accept that the infinite expansion cannot be evaluated exactly or you need to standardize on some convention for rounding them so everyone computes them the same way. But even with this in mind fixed point math still works, so rounding should not be a reason to avoid it.
Self-correction: Google says there is “Newton-Raphson” algorithm for square root. Maybe a few more arithmetic operations are possible after all.
I never heard of that method, but wikipedia says it’s more commonly known as Newton’s method, which many of us are familiar with. It seems that Joseph Raphson’s work and name got overshadowed by Newton.
In any case the database engine will have to implement these, and try to make best use of native register operations along the way.
Sure, but that’s a given when people write software. I’m not sure I follow what the problem is?
To get back to the initial point, when you said…
“But storing numbers as text is inefficient!”
Yet they are more correct. A floating point number will not be able to accurately present a value like 10.2.
Can we agree that fixed point methods (not IEEE floating point) can be correct while also being more efficient than text? If not, then I’d like to ask for a counter example.

2026-02-22 7:10 pm
sukru
Alfman,
This is not about 64 or 128 bit numbers. This is arbitrary precision.
ProsgreSQL for example has decimal and numeric (two names for the same thing). These have up to 131072 digits precision (and up to 1,000 after decimal point)
How are they stored?
Numeric values are physically stored without any extra leading or trailing zeroes. Thus, the declared precision and scale of a column are maximums, not fixed allocations. (In this sense the numeric type is more akin to varchar(n) than to char(n).) The actual storage requirement is two bytes for each group of four decimal digits, plus three to eight bytes overhead.
They are basically very similar to “Pascal style” strings with 3 byte LENGTH + 2x “wide char” numbers (0-9,999). Okay that is a compromise between single digit per byte and full int32 range. (4 digits in 2 bytes).
from
https://www.postgresql.org/docs/9.1/datatype-numeric.html

2026-02-22 10:42 pm
Alfman verbose=1
sukru,
They are basically very similar to “Pascal style” strings with 3 byte LENGTH + 2x “wide char” numbers (0-9,999). Okay that is a compromise between single digit per byte and full int32 range. (4 digits in 2 bytes).
I do get that it’s possible to store the digits in the form of lots of decimals. However using fixed point math we can represent decimals of arbitrary length and precision using binary while maintaining an exact 1:1 representation between the two. In other words, it is not mathematically necessary to use a decimal representation when the results are mathematically identical. Again, if you disagree I’d like a counter example that we can discuss.
IMHO it is sensible to chose the one that’s most compact and performant on a given machine. I’ve written an arbitrary precision math library, and I am quite familiar with the math involved. It’s a topic that I enjoy!
You alluded to optimizing grade school multiplication. I’m not sure how familiar you are with the technique, but it can be done with fewer multiplications…
Grade school long multiplication for two two digit numbers uses four effective multiplications.
AB * CD = A*C*100 + A*D*10 + B*C*10 + B*D*1
*1 *10 *100 are mathematically correct but are trivially optimized to shifting
Example: 45 * 97 = 4*9*100 + 4*7*10 + 5*9*10 + 5*7*1
= 4365
Optimization using only three effective multiplications.
AB * CD = A*C*110 + (A-B)*(D-C)*10 + BD*11
*110 *10 *11 are mathematically correct but are trivially optimized to shifting & addition.
Example: 45 * 97 = 4*9*110 + (4-5)*(7-9)*10 + 5*7*11
= 4365
This optimization works recursively on longer inputs too!
2026-02-22 11:51 pm
sukru
Alfman,
Why we would prefer base-10 equivalent and not base-2 equivalent is a valid question. And I had to cheat a little.
According to Google’s Gemini, the reason is printing.
And it makes a lot of sense.
A base-10,000 number is very easy to print (or parse) takes a few operations. A base-524,288 number? Much more difficult to process.
Essentially every 16-bit block contains a ~13.3-bit value (log). We lose a bit over ~15% storage for significantly more processing efficiency.
2026-02-23 2:54 am
Alfman verbose=1
sukru,
According to Google’s Gemini, the reason is printing.
And it makes a lot of sense.
A base-10,000 number is very easy to print (or parse) takes a few operations. A base-524,288 number? Much more difficult to process.
You haven’t acknowledged it, but do we agree that storing numbers in BASE-10 versus BASE-2 types does not change correctness? This is one of the points I was trying to convey.
Back in the 80s I suppose base conversion may have been more difficult, but it’s not significant for modern CPUs.
Essentially every 16-bit block contains a ~13.3-bit value (log). We lose a bit over ~15% storage for significantly more processing efficiency.
How do you come to that conclusion? I disagree. A base 10 algorithm is going to perform worse than native. Modern CPUs do 64bit operations on every clock. Bearing in mind that a “base-10,000” type only holds about 1/5th as much data as a 64bit native register, a “base-10,000” algorithm ends up needing more CPU clocks while doing less work per clock. There is no contest.
Granted, if you don’t end up doing any math in the database, then the overhead of doing base-10 math might not really matter. But if you frequently run aggregate queries across large datasets (which is quite common in SQL), then dealing with non-native decimal algorithms is going to be a constant source of overhead that would be more optimal converting to a native BASE-2 type.
Also, even if we didn’t care about performance, another negative for BASE-10 algorithms is that it introduces more complexity since there are more scenarios that need to be handled for each combination of math operations.
BASE-10 + BASE-10 (no native CPU support)
BASE-10 + FLOAT (no native CPU support)
BASE-10 + BASE-2 (no native CPU support)
BASE-2 + BASE-2 (native CPU support)
BASE-2 * FLOAT (native CPU support)
FLOAT * FLOAT (native CPU support)
To be fair, any approach will technically work. For better or worse I have a more old-school mindset when it comes to software optimization and software inefficiency has long been one of my peeves.
Edit: We could have a small coding challenge if interested 🙂

2026-02-18 4:13 am
Christopher Drum
Thanks always to OSNews for boosting my projects. This dBASE article seems to have resonated with a lot of people, especially “those who used it professionally” and “those who watched a parent use it professionally and were intimidated by the enormous manual.” Some of the younger crowd has expressed bewilderment it even works.
2026-02-18 10:38 pm
adkilla
Maybe my memory is failing me, but hasn’t this evolved into what is known as InterBase today?

2026-02-19 1:12 am
Christopher Drum
I’m not sure that’s entirely true. My understanding is that InterBase was a completely separate product, acquired by Ashton-Tate when they purchased Groton Database Systems. So both wound up being Ashton-Tate products, but were unrelated AFAIK. dBASE evolved along its own path for years, up to a 2019 release (after changing hands). InterBase continues today under a completely other company. So those two products wound up in different homes, ultimately. I’d probably say dBASE “evolved” into xBase today, and is still seen in products like xHarbour.