Linked by Thom Holwerda on Fri 25th May 2007 21:51 UTC
General Development "Hoard is a scalable memory allocator (malloc replacement) for multithreaded applications. Hoard can dramatically improve your application's performance on multiprocessor machines. No changes to your source are necessary; just link it in. Hoard scales linearly up to at least 14 processors. The supported platforms include Linux, Solaris, and Windows NT/2000/XP."
Order by: Score:
how to know i'm using hoard?
by gerryxiao on Sat 26th May 2007 05:42 UTC
gerryxiao
Member since:
2006-12-17

i have downloaded and build it on my box, and setup LD_PRELOAD envrion variable, but how did i know that i'm using libhoard instead of standard ones?

Reply Score: 1

RE: how to know i'm using hoard?
by Ponto on Sat 26th May 2007 07:30 UTC in reply to "how to know i'm using hoard?"
Ponto Member since:
2006-06-18

1. Use man ld.so to get some information whether hoard is loaded at all.
2. Use a debugger and trace a malloc. You should at least see a hoard method in the traceback.

Reply Score: 1

RE: how to know i'm using hoard?
by Elektronkind on Sat 26th May 2007 14:46 UTC in reply to "how to know i'm using hoard?"
Elektronkind Member since:
2006-09-22

On Solaris, just run: pldd <pid>

Where <pid> is the PID of the process you want to make sure is runing with the Horde library. pldd is just like ldd, but is for use against running processes rather than just binaries.

I don't know if Linux or other OSes provide a tool for doing this.

Edited 2007-05-26 14:46

Reply Score: 1

Doc Pain Member since:
2006-10-08

"I don't know if Linux or other OSes provide a tool for doing this. "

Besides Solaris, I've seen the pldd command only on HP-UX.

Reply Score: 2

RE: how to know i'm using hoard?
by big_gie on Sat 26th May 2007 15:01 UTC in reply to "how to know i'm using hoard?"
big_gie Member since:
2006-01-04

i have downloaded and build it on my box, and setup LD_PRELOAD envrion variable, but how did i know that i'm using libhoard instead of standard ones?

Compile your program. Then run "ldd" on it to see which (dynamic) library it is linked to:
> ldd myapp

Reply Score: 1

gerryxiao Member since:
2006-12-17

Compile your program. Then run "ldd" on it to see which (dynamic) library it is linked to:
> ldd myapp

i'm not using it for developing, just want some programs in my box apply libhoard.so to improve performance

there aren't any programs in linux which have the same functions as pldd in solaris, but i'm not sure ;)

pmap seems working in linux
pmap <pid>

Edited 2007-05-26 15:19

Reply Score: 1

big_gie Member since:
2006-01-04

i'm not using it for developing, just want some programs in my box apply libhoard.so to improve performance

So you want to replace the existing library with that one? Interesting thing. Can't help for that though ;)

Reply Score: 1

gerryxiao Member since:
2006-12-17

i'm not using it for developing, just want some programs in my box apply libhoard.so to improve performance
So you want to replace the existing library with that one? Interesting thing. Can't help for that though ;)

libhoard.so is a dynamic link share lib file which includes much same functions suchas malloc(),free() etc with standard GNU c lib, if LD_PRELOAD variable has been setup, any program depending on share libs first look at libhoard.so, if found any functions which is needed , it will not call standard GNU c share lib functions

Reply Score: 2

Comment by bnolsen
by bnolsen on Sat 26th May 2007 17:33 UTC
bnolsen
Member since:
2006-01-06

another allocator is called "nedmalloc" which claims to thrash horde in real world applications.

This article reminded me. When I get back to work on Monday or Tuesday I'll test horde and nedmalloc on our 8 core dell workstation with our threaded ortho rectification and see if there's any noticeable performance improvements.

Reply Score: 2

RE: Comment by bnolsen
by EmeryBerger on Mon 28th May 2007 00:54 UTC in reply to "Comment by bnolsen"
EmeryBerger Member since:
2007-05-28

Hi,

I just downloaded and tried nedmalloc against a particularly brutal benchmark called "larson" on a dual processor Linux box. With Hoard, its throughput was 790,614 memory operations per second, while with Nedmalloc it was 188,706 ops/sec. Compare to GNU libc (the default allocator), whose throughput was 192,485 ops/sec. In short, Hoard outperforms Nedmalloc by more than a factor of 4X.

I'd be interested to hear what happens with your application.

Best,
-- Emery Berger

Reply Score: 2

RE[2]: Comment by bnolsen
by bnolsen on Tue 29th May 2007 15:24 UTC in reply to "RE: Comment by bnolsen"
bnolsen Member since:
2006-01-06

I got to work and ran some tests.

Machine: 8 core clovertown 1.6GHz, 8GB ram. gentoo ~amd64

Process:
threaded ortho rectification.
3 pipelines, 8 threads each
- Input IO
- Ray production & solid intersection
- Pixel Rasterization and output tiling.

Test dataset: 210MP input image (smallest I coul find)
Run each data set twice, take second timing.

Process is 64bit optimized. Memory usage numbers are somewhat misleading because of the kernel paging, but hover between 3.2GB & 3.8GB regardless of the allocator.

Interesting observation:
During libhoard runs occasionally the system levels would dramatically spike (seen in both gkrellm2 & htop)

Timing:

default gcc allocator
real 1m19.477s
user 9m10.210s
sys 0m8.993s

hoard:
real 1m9.135s
user 7m23.508s
sys 0m37.102s

tcmalloc:
real 1m2.222s
user 6m58.014s
sys 0m5.032s

Edited 2007-05-29 15:28 UTC

Reply Score: 1

RE[3]: Comment by bnolsen
by bnolsen on Tue 29th May 2007 16:48 UTC in reply to "RE[2]: Comment by bnolsen"
bnolsen Member since:
2006-01-06

nedmalloc numbers:

real 0m58.323s
user 6m11.111s
sys 0m11.933s

Looks like the nedmalloc guy has the most realistic claims.

Reply Score: 1

RE: Comment by bnolsen
by bnolsen on Tue 29th May 2007 23:09 UTC in reply to "Comment by bnolsen"
bnolsen Member since:
2006-01-06

Yet another test case:

Normal system accumulator.
Equations are objects which are assembled into a normal system and then solved via SVD.

Tons of 3x3 and 4x4 matrices.
Single threaded application.

Base case:

real 0m3.596s
user 0m3.560s
sys 0m0.036s

hoard:

real 0m4.112s
user 0m4.076s
sys 0m0.036s

nedmalloc:

real 0m3.692s
user 0m3.664s
sys 0m0.028s

tcmalloc: CORE DUMP (hehe)

Reply Score: 1

RE[2]: Comment by bnolsen
by EmeryBerger on Fri 1st Jun 2007 00:58 UTC in reply to "RE: Comment by bnolsen"
EmeryBerger Member since:
2007-05-28

Hi -

Could you please send me your benchmarks? There is some sort of performance issue with 64-bit that I am trying to resolve.

Thanks,
-- Emery

Reply Score: 1

Wes Felter
Member since:
2005-11-15

http://goog-perftools.sourceforge.net/doc/tcmalloc.html

https://labs.omniti.com/trac/portableumem/

(Emery Berger seems pretty good at marketing for a university professor.)

Reply Score: 2

EmeryBerger Member since:
2007-05-28

Hi,

Thanks for the pointers.

I ran the "larson" benchmark with both of the allocators you pointed out.

libumem: shockingly poor performance. 76,110 ops per second -- Hoard is 10X faster. Which makes me unsure whether this port is true to the original libumem algorithm (from Bonwick).

tcmalloc: for this benchmark, it outperforms Hoard by about 20%, but also consumes 40% more memory (172M versus 122M). In addition, tcmalloc does some things that Hoard doesn't do because they can seriously harm performance. I tried another benchmark ("cache-thrash") which tests whether an allocator can contribute to false sharing of cache lines (really bad for performance). On this benchmark, Hoard performed around 6.5X faster.

Best,
-- Emery

P.S. I'm going to consider 'good at marketing' as a compliment... ;) .

Reply Score: 2