posted by Thomas Leonard on Tue 16th Jan 2007 00:32 UTC

"Naming, Conflicts"
Naming

A system in which anyone can contribute must be decentralised. Otherwise, whoever controls the central part will be able to decide who can do what, or it will fragment into multiple centralised systems, isolated from each other (think Linux distributions here).

How can we design such a system? One important aspect is naming. Linux packages typically have a short name, such as gimp or inkscape, and they include binaries with names like convert and html2text, and libraries with names like libssl.so. If anyone can contribute packages into our global system without someone coordinating it all, how can we ensure that there are no conflicts? How can the system know which of several programs named firebird the user is asking to run?

One method is to generate a UUID (essentially a large random number), and use that as the name. This avoids accidental conflicts, but the new names aren't very friendly. This isn't necessarily a problem, as the identifier only needs to be used internally. The user might read a review of a program in their web browser and tell their computer "When I type 'gimp', run that program".

Another approach is to calculate the name from the program's code using a cryptographic hash function. Such names are also unfriendly to humans, but have the advantage that if you know the name of the program you want then you can check whether a program some random stranger gives you is really it, enabling peer-to-peer distribution. However, since each new version of the program will have a different name, this method can only name individual versions of a program.

Content-based naming

Another popular approach is to include the name of a domain you control in the program's name. For example, the Autopackage developer guide gives @purity.sourceforge.net/purity as an example. These names are much more friendly for users. This does require you to be given a domain name by someone, but these are rather easy to come by, and a single domain is easily sub-divided further. Zero Install uses a similar scheme, with URLs identifying programs (such as http://www.hayber.us/0install/MusicBox), combined with the use of hashes to identify individual versions, as described above. Using a URL for the name has the additional advantage that the name can tell you where to get more information about the program. Sun's Java Web Start also uses URLs to identify programs.

Finally, it is possible to combine a URL with cryptographic hash of a public key (corresponding to the private key used to sign the software). This gives a reasonably friendly name, along with the ability to check that the software is genuine. However, the name will still change when a new key is used.

Whichever naming scheme is used, we cannot expect users to type in these names manually. Rather, these are internal names used by the system to uniquely identify programs, and used by programs to identify their dependencies. Users will set up short-cuts to programs in some way, such as by dragging an object representing a program from a web-page to a launcher.

Note that Klik identifies programs using URIs, but using a simple short name (e.g. klik://firefox). Therefore, it is not decentralised in the sense used in this essay: I cannot distribute my packages using Klik without having my package registered with the Klik server, and the controllers of the server must agree to my proposed name.

Conflicts

By using globally unique names, as described above, we can unambiguously tell our computer which program we want to run, and the program can unambiguously specify the libraries it requires. However, we must also consider file-level conflicts. If we have two libraries (@example.org/libfoo and @demo.com/libfoo, for example, both providing a file called libfoo.so) then we can tell that they are different libraries, but if we want to run one program using the first and one using the second, then we cannot install both at once! This ability to detect conflicts is an important feature of a packaging system, helping to prevent us from breaking our systems.

Another source of file-level conflicts occurs when different programs require different versions of the same library. A good package manager can detect this problem, as in this example using Debian's APT:

# apt-get install gnupg
The following packages will be REMOVED
  [...] plash rootstrap user-mode-linux
The following packages will be upgraded:
  gnupg libreadline5

Here, I was trying to upgrade the gnupg package to fix a security vulnerability. However, the fixed version required a newer version of the libreadline5 package, which was incompatible with all available versions of user-mode-linux, rootstrap and plash (three other security-related programs I use regularly). APT detects this and warns me, preventing me from breaking the other programs. Of course, I still end up with either an insecure gnupg or no user-mode-linux, but at least I'm warned and can make an informed decision.

In a centralised Linux distribution these problems are kept to a minimum by careful central planning. Some leader decides whether the newer or older version of the library will be used in the distribution, and the incompatible packages are updated as soon as possible (or, in extreme cases, dropped from the distribution).

Traditional Linux systems also try to solve this by having 'stable' or 'long term support' flavours. The problem here is that we are forced to make the same choice for all our software. In fact, we often want to mix and match: a stable office suite, perhaps, with a more recent web browser. In fact, at work I generally want to run the most stable version available that has the features I need.

In a decentralised system these problems become more severe. There is no central authority to resolve naming disputes (and renaming a library has many knock-on effects on programs using that library, so library authors will not be keen to do it). Worse, if updating a program to use a new version of a library prevents it from working with older versions, then it will now be broken for people using the older library. We cannot assume that everyone is on the same upgrade schedule.

Indeed, upgrading a library used by a critical piece of software may require huge amounts of testing to be done first. This isn't something you want to be rushed into, just to get a security fix for another program. Less actively maintained programs may not be updated so frequently, especially some utility programs developed internally.

Finally, conflicts become much more serious if you allow ordinary users, not just administrators, to install software. If installed packages are shared between users, which is important for efficiency, and packages can conflict, then one user can prevent another user from installing a program just by installing something that conflicts with it. How packages can be shared securely between mutually untrusting users will be covered later.

Table of contents
  1. "Introduction, Use Cases, Traditional Distributions"
  2. "Naming, Conflicts"
  3. "Avoiding Conflicts, Dependencies"
  4. "Publishing Software, Sharing Installed Software"
  5. "Security"
  6. "Compiling, Converting Between Formats"
  7. "Summary"
e p (8)    76 Comment(s)

Technology White Papers

See More