Linked by Thom Holwerda on Thu 18th Jan 2007 15:11 UTC, submitted by Torsten Rahn
Benchmarks "A number of search engines are available for the Gnome and KDE desktop environments, many based around the open source Lucene search engine. It would be tremendous if we could adopt one of these search engines for the Gnome platform, so we can provide the type of integrated search experience for our users that they really need, irrespective of which distort they are using. So to help in this assessment we have carried out a comparison of four different Unix based indexers [.pdf]."
Thread beginning with comment 203081
To read all comments associated with this story, please click here.
not perfect but still nice
by superstoned on Thu 18th Jan 2007 16:57 UTC
superstoned
Member since:
2005-07-07

i think this is a nice article, tough it has some shortcomings. one is the remarks on cpu usage - the author seems to fail to realize beagle using not 100% cpu is in fact a bad thing, for several reasons.

first, the linux kernel will think it's an interactive process, increasing it's priority, thus allowing it to pre-empt user processes, killing interactivity of the system.

second, on a laptop, it's better to use 100% cpu for 1 min than 50% for 2 mins, in terms of power usage...

thus the fact Strigi uses max cpu is positive, not negative. and it makes for a good choice, being up to 40 times faster in indexing (4 min vs 2 hours for beagle vs 3 hours for tracker) - the author states most noted problems are rather trivial to fix.
http://www.kdedevelopers.org/node/2639

anyway, a common plugin engine and dbus-interface would be good, for sure.

Reply Score: 5

RE: not perfect but still nice
by situation on Thu 18th Jan 2007 17:04 in reply to "not perfect but still nice"
situation Member since:
2006-01-10

I imagine most users would run Beagle as 'nice -n 19 beagle' or the like, so the kernel pushes the priority down and workflow isn't interrupted.

Still a big fan of slocate, personally. Can force an update when I want (which can take under 10 minutes, if you have updated recently and are on a relatively new computer). The results are in a simple list format, etc. Not as advanced or user friendly, but it's nice to have a simple version of a desktop indexer available still.

Reply Parent Score: 2

superstoned Member since:
2005-07-07

if you run beagle nice -n 19, and it throttles, the kernel will increase it's priority with... surprise, 19 points, thus it'll run as prio 0. better than -19 (yeah inverted blablabla) which would be the case without the nice, but still not what you want.

and even if it does run on +19, it STILL uses cpu, even when you do a game. a scheduler policy like sched_batch would ensure it NEVER interupts another running process - that's what you want.


and slocate, does that look in files as well? anyway, i'd rather have incremental updates like beagle & friends have.

Reply Parent Score: 2

RE: not perfect but still nice
by meebee on Thu 18th Jan 2007 17:38 in reply to "not perfect but still nice"
meebee Member since:
2006-06-29

While I think, strigi is nice and has potential, it proved to be horribly unstable (0.3.11) in my own tests, left around many zombies etc.
It still has a lot of rough edges.
I also can't confirm, that tracker is 40x slower. It is a bit slower, but more in the region of 20% to 30% (*not* times).
Also, what is it good for to be lightning fast when your search results are not good?
Again, strigi seems to be a very young project, so there is hope that these issues are fixed.

Reply Parent Score: 1

superstoned Member since:
2005-07-07

well, indeed, all these projects are pretty young, so we'll have to wait to see which one will stand out as the best solution. tracker and strigi of course have (imho) the best chance, being reasonably performant and not depending on controversial stuff like mono/java. if both happen to deliver the same d-bus interface, the best will be used the most, and that's the most optimal solution.

btw strigi also delivers database services and is going to be the foundation of meta-data extraction and manipulation in KDE 4, in addition to having Nepomuk (contextual linking, labeling etc) integration, so i think it has the best cards right now... on the other hand, tracker is close to integration in gnome, and even tough gnome mostly doesn't integrate things very deeply (or at least, does so slowly), gnomes don't like stuff smelling kde'ish. after all, they even rejected aRts, even tough it was plain C, had a gnome-lib dependency and was the only technically reasonable solution by then...

but things can change.

Reply Parent Score: 5

Jamie Member since:
2005-07-06

I also can't confirm, that tracker is 40x slower.

I can confirm its most definitely not!

The article in question tested the ancient 0.5.0 release of tracker which was the first version to include our new indexer framework which was completely unoptimised.

The lastest release, 0.5.3, is tons faster and should be much closer to strigi. We are doing some more optimisation work in the next version so will be interesting to see how they compare then.

Note also strigi does not currently do any string processing like stemming so it lacks the ability to do accurate searches on plurals and stems. This might account for its impressive raw speed as well so we really need to see strigi get features like this to do a more "fair" comparison.

Reply Parent Score: 4

RE: not perfect but still nice
by Hiev on Thu 18th Jan 2007 18:44 in reply to "not perfect but still nice"
Hiev Member since:
2005-09-27

so using 100% of CPU is now a feature and not mayor bug?

I don't think so.

Edited 2007-01-18 18:49

Reply Parent Score: -1

borker Member since:
2006-04-04

When nothing else is competing for the CPU, why not use it all? With the appropriate nice setting, strigi will happily settle into the background and give up the CPU to interactive tasks, so no impact on interactive users (or other, higher priority, batch tasks either). Thats why OS' have schedulers...

Reply Parent Score: 2