posted by Eugenia Loli on Tue 29th Mar 2005 19:32 UTC

"Pacman, Page 6/9"
6. Recently there was a discussion about Pacman's scaling when it deals with many installed packages. Your thoughts on this and the future of pacman?

Judd Vinet: Well, it's less of a problem for those of us with more memory and less reboots, as most of the db will be cached after the first pacman run. But there is a big slowdown for large pacman databases during the first run (since a reboot). This is because pacman has to read many small files, and some filesystems do not optimize for this type of behaviour.

The filesystem-based database has some great advantages, ones that reflect Arch's values more than a dbm backend would, but there are some performance issues that we can't ignore. I'd like to try using a db (or sqlite) backend in the future and see how much of a performance improvement we can get.

The big plus to having a plain filesystem base is that anyone can peek and poke at it, including custom scripts and whatnot. Using a more complicated backend makes it tough to code up a bash script that scans the database.

So I guess only time will tell -- the jury is still out. ;)

Jan de Groot: It is because the many small files. Pacman looks through all of them. In the beginning pacman was fast, but when we did some good work to get tons of packages in the repositories, it became slow. Personally I'm thinking of putting the package database in a BDB format, which is also the easiest.

I haven't worked on this because I want to wait for the modularized version of pacman other people are working on at this moment. It would be a waste of time to convert pacman 2.9 to a BDB format and see pacman 3.0 appear as a complete rewrite after that.

Tobias Kieslich: Pacman's scaling goes well, but it has some serious trouble with file operations on certain file systems, most prominently on xfs and reiserfs if I remember correctly. This could be solved by using a single file for it, incorporating a known database backend such as db or sqlite or maybe some very small free database engine since db or sqlite might be a bit overkill. I think this is related to the unlimited resources question since Judd could make some experiments here to find the "best" solution if he had the time.

I think Judd can answer this better since pacman is his baby.

Damir Perisa: Pacman is the core of ArchLinux and it's concept and features support a really great concept. I myself have around 1200 packages installed on the laptop and when it comes to -Suy multiple packages it takes long to handle the single files where pacman keeps the information-database on the packages. On bad days (kde update) it takes pacman up to an hour to prepare a -Suy (without downloading and updating). This behaviour is not ideal and i wish that pacman would use a more effective way to handle the information-database. My opinion is to use a DB or at least a single file instead of flat single files to speed up this bottleneck. However, the usual installation of ArchLinux does not contain that many packages and therefore times are much shorter. On my old computer (266MHz) where only around 340 packages are installed, pacman reacts almost immediately doing -Suy.

Dale Blount: Luckily this doesn't bother me much.

Jason Chu: While pacman can use work, I also think that it's really great how far it's come. I try not to criticize pacman's failing, instead I submit patches.

Tobias Powalowski: well let judd and the pacman crew find an answer to that.

Aurelien Foret: In my opinion, pacman assumes quite well its job.

Currently, there is a trend to consider pacman would behave better with another database backend, but I don't really share this point of view. Having a flat file based database of packages makes it so convenient and easy to manage that it is virtually almost unbreakable. A different backend, such as gdbm or bdb, would introduce an overall complexity to pacman, which I wouldn't welcome.

There are still some areas that can be reworked to improve pacman scalability, and I think it is better to focus on improving the current implementation than switching to a new backend.

Anyway, let Judd have the final say on this topic.

For the record, I've already tried to implement a gdbm backend for pacman several months ago, but I was disappointed by the results: it didn't make pacman significantly faster.

But at that time, packages repositories were smaller than nowadays, so it may worth spending some time to perform more tests...

To date, I'm quite involved in the process of creating a library for pacman. Althougt it will allow people to develop new frontends for pacman package management functions, my aim is also to help rationalizing pacman structure and cleaning it. Based on that, it will be easier in the future to extend and enhance pacman.

Table of contents
  1. "Passion, Page 1/9"
  2. "Challenges, Page 2/9"
  3. "Popularity, Page 3/9"
  4. "Maintainance, Page 4/9"
  5. "Development, Page 5/9"
  6. "Pacman, Page 6/9"
  7. "CVS, Commercialism, Page 7/9"
  8. "Installer, Page 8/9"
  9. "Arch Vs The World, Page 9/9"
e p (0)    48 Comment(s)

Technology White Papers

See More