Linked by Zachary Pinter on Tue 27th Apr 2004 17:09 UTC
General Development Garbage collection (GC) is a technology that frees programmers from the hassle of explicitly managing memory allocation for every object they create. Traditionally, the benefit of this automation has come at the cost of significant overhead. However, more efficient algorithms and techniques, coupled with the increased computational power of computers have made the overhead negligible for all but the most extreme situations.
Order by: Score:
Excellent article
by tuttle on Tue 27th Apr 2004 17:26 UTC

Many people still think that malloc/free will be faster than GC in all situations. But this is no longer the case. The progress in garbage collection algorithms in the last 10 years has been amazing.

The only thing that is still much faster than a good generational garbage collector is stack allocation. But fortunately .NET has limited support for stack allocation using value types.

Very good article...
by Omar on Tue 27th Apr 2004 17:52 UTC

I've been reading OSNews for a few years now, and this is probably my favorite article that I've seen here so far. Good job Zachary.

Nice
by Avi on Tue 27th Apr 2004 17:57 UTC

Nice article. I liked it a lot. I heard the next version of C# (2.0 in Visual Studio 2005) has an improved GC too....

Reference Counting
by Jason G on Tue 27th Apr 2004 18:14 UTC

Though this technique provides a good conceptual example, it is rarely used in practice.

Well, while it's not the best way to go, it's easy to implement. It's also the garbage collection model for Microsoft's COM.

I'm a Java Programmer mainly now starting to get into some .NET stuff too. But in a past life I was a COM programmer. I've been bitten by cyclical references before.

I don't agree with..
by steve on Tue 27th Apr 2004 18:32 UTC

the statement that garbage collection time is small. Any non-trivial Application will spend a large amount of time garbage collecting. The main problem with garbage collection is that it wastes memory and causes virtual memory to become a large damper on system performance.

Take the problem of programs that would like to use Virtual memory as a cache. Many programs load their GUI into RAM, the images of widgets, their config files, etc. Rather then read them off disk every time. When not needed and RAM is constrained, the memory this data is loaded into is swapped. Swapping this into RAM is usually faster then loading (and re-parsing the file) off the disk. However, mark and sweep gc routines require that all this data is constantly being swapped back into memory for the sole purpose of garbage collection. This is why java swing programs seem so slow.

When Longhorn comes out we will see how much better/worse programmers will treat their new enviroment.

re: Reference Counting
by Nicolas Roard on Tue 27th Apr 2004 18:33 UTC

I think it's also used for VisualBasic.

And, it is used for OpenStep programming (ie, Cocoa and GNUstep). Strangely, GNUstep also permit boehm gc, but many people just prefers the classic reference counting scheme.

RE: Nicolas Roard
by Jason G on Tue 27th Apr 2004 18:36 UTC

Yeah, VB 4-6 are pretty much built on top of COM.

Re: I don't agree with..
by murphee on Tue 27th Apr 2004 19:05 UTC

>off the disk. However, mark and sweep gc routines require
>that all this data is constantly being swapped back into
>memory for the sole purpose of garbage collection.

What do you mean by "constantly"? If you use a generational
GC (which is the default on the Sun Java VMs), most GC runs
happen in the nursery or eden space (the new generation).
This generation does not use Mark/Sweep;
The Old generation does, but GC runs happen much less
often there.
And even that overhead can be reduced to nothing; there
is research on the topic "Pre-Tenuring", which recognizes
extremely long lived data and optimizes GC for those
(see this paper:
http://cs.anu.edu.au/~Steve.Blackburn/pubs/papers/pretenure-oopsla-...
for details);

>This is why java swing programs seem so slow.

What the...? What kind of statement is that?
How does the occasional swapping of data concern the
perceived speed of GUI apps? And yes, it is
*occasional*, because most of the GC runs happen
on the young generation (again: no mark/sweep there)
and not the old generation.
If you feel your
Swing apps run too slowly, use the "Concurrent" setting
on the JVMs GC to keep GC pauses down to a minimum,

re: Reference Counting
by jlah on Tue 27th Apr 2004 19:08 UTC

Reference counting is also used for GC in some scripting languages e.g. Perl and Python (I am not sure if python GC mechanism was changed since last time I checked, but Perl definitely counts references - I managed to create a few memory leaks by using circular references in my early days ;) .

Nice article
by Martin Häcker on Tue 27th Apr 2004 20:05 UTC

though I would have liked the conclusion to contain that in almost all general purpose programs having a garbage collector doesn't hinder speed at all.

Simply because with manual memory management you do copy the objects much more often to avoid memory bugs - which is of course not anymore necessary with a garbage collector.

cu Martin

RE: Conclusion
by edwardr on Tue 27th Apr 2004 20:23 UTC

> Conclusion
>
> Although garbage collection can never be as efficient in
> every way as a well-programmed “micro-managed memory”
> application, ...

Actually, this is not true. A "micro-managed menory" app that allocates 100,000 small items and then frees all but 5 of them takes a small constant C * 100000 to allocate and another (C * 100000) to free. However, a copying GC allocates by simple pointer advancing (b/c it's got a completely empty heap) 100,000 items is a smaller constant c * 100000, and then determines that there are 5 live objects, takes some larger constant K * 5, and then frees the other 99,995 in ONE INSTRUCTION!

In short, "micro-managed" memory allocates and frees related to the number of items allocate and freed, and the copying GC allocates related to the number of items allocate, but frees related to the number of items left alive.

So in some cases, copying GC can be faster. But even if this is the case, usually GC'd languages are slower for other reasons, too, not just GC (like being interpreted, etc.)

@steve
by Rayiner Hashem on Tue 27th Apr 2004 20:25 UTC

If Java GUI apps seem slow because of this, than its a fault of the Java GC, not a problem inherent in garbage collectors. Functional Objects' Dylan IDE, which uses the MPS, feels just like a native Windows app on my machine.
See:

www.ravenbrook.com/project/mps/

@edwardr
by Rayiner Hashem on Tue 27th Apr 2004 20:29 UTC

This is very true. Good memory managers that use different policies for different heaps can have extremely good performance for many common workloads. For example, generational GCs are very good for functional code that quickly creates then destroys lots of objects. The overhead is much less than it would be when using a fully general-purpose memory allocator. Depending on your workload, a good GC will be on average as fast as a good malloc(). When you have detailed knowledge about your program's memory allocation patterns, you can outperform both using customized allocators (eg: pool allocators), but then again, a good memory manager will offer you a range of memory management options in addition to plain GC.

re: Reference Counting
by Allan Sandfeld on Tue 27th Apr 2004 21:21 UTC

Reference counting is probably the most widely used automatic memory reclaiming technique, it just not very good as a general purpose garbage collectorr. It is used in most object oriented languages or toolkits as a way to keep track of implicitly shared objects, and lots of other places that just needs a fast and secure way to share object between peers (as long as you make sure objects aren't peers they will not be self-referential).

I would love to see garbage collection as part of broad toolkit of memory managers, but so far as stand alone models they have seriously underimpressed me.

Incremental or not incremental
by renoX on Tue 27th Apr 2004 21:30 UTC

I'm not sure I understood which GC in the list are incremental and which are not incremental?

Incremental GC does not generate annoying pause, as the GC can be stopped any time and continue later without too many trouble (well except that the modified object must be updated).

@renoX
by Rayiner Hashem on Tue 27th Apr 2004 21:42 UTC

Whether a GC is incremental is usually pretty independent of the algorithm used. Eg. there are incremental generational copying collectors, stop-the-world copying collectors, etc.

Python GC
by sanxiyn on Tue 27th Apr 2004 23:43 UTC

Python still uses reference counting as of 2.3, but it also has cycle detector that runs periodically to prevent memory leaks, since 2.0.

Java GC information incorrect
by Anonymous on Wed 28th Apr 2004 01:01 UTC

How GC is performed in Java is not guaranteed!

It is not specified in the Java Language Specification and varies by platform and JVM vendor. Even if the official Sun JVM has been observed to do it one way on a particular OS it does not mean that all JVMs (including others from Sun) will do exactly the same on other OSes.

Please be more careful before making such statements.

Excellent Article
by Reformist on Wed 28th Apr 2004 01:07 UTC

Very informative and well written. I would have liked to see a little more detail on the methods used by the Java and .NET VMs, perhaps a diagram =)

Response
by Zachary Pinter on Wed 28th Apr 2004 02:45 UTC

Thanks for the compliments and critiques everybody. I apologize for any specific details I might have overgeneralized. I certainly do not claim this to be a comprehensive analysis, but rather a "glance at" or introduction. The article was actually a final paper for my Science of Computing class I just completed. I'm eager to hear any further ellaborations or corrections on the material you might have. The different applications and approaches of this technology fascinates me.

Re: Response
by Anonymous on Wed 28th Apr 2004 03:15 UTC

Please read "Sun Certified Programmer & Developer for Java 2 Study Guide (Exam 310-035 & 310-027)" by Kathy Sierra and Bert Bates.

Their explanation of Java GC is by far the best I've read. In this they explicitly state that the JVM may do ABC or it may do DEF. And even if we reverse engineer the JVM and determine exactly how it does it we cannot be sure that the identical JVM will work that way on other OSes.

I personally have studied the the GC operations on various J2ME implementations (where memory is scarse and needs to be cleaned up often) and can say that the GC runs 90% of the time but that is due to how the vendor (KVM) implemented GC and not on Sun or the Java API.

Your explanations of GC in general are very well articulated and I don't mean to criticize. I mean only to correct you.

GC
by Jordan on Wed 28th Apr 2004 03:43 UTC

I have a couple big problems with garbage collectors.

They tend to run when memory is scarce. This makes them extremely poor neighbors when running on systems with other applications. In fact a good number of the performance problems I see with Java GUI apps is because they end up eating huge chunks of memory and cause the rest of my machine to start slowing down.

Worse, when they do run their collectors, they run them all at once which pegs your CPU and creates a noticeable hiccup in your application (and your machine). With traditional micro-managing, you spread your CPU cycles over the lifetime of your application.

Now that's not to say garbage collection isn't extremely handy, however I'd rather use it to debug my manual memory allocations. After all, the time spent running my applications is significantly longer than the time spent writing them.

GC and C++
by Solar on Wed 28th Apr 2004 06:49 UTC

> Traditionally popular programming languages, such as
> C/C++, require the programmer to “micro-manage” memory
> consumption.

Traditionally, programmers coming from GC languages have a hard time to grasp the concept that, if you start to write destructors for every C++ class or have numerous "new()"s in your source, your design is broken. In over 90% of the cases, you have no need to use anything but automatic variables and default destructors.

And I like to know when my cleanup functions are called: when the object leaves scope, instead of some time when the GC fires, or never (as might be the case in Java). That allows me to do meaningful things in destructors.

That being said, note that it's quite easy to "defeat" garbage collectors. One example for Java: Forget to unregister your event handlers before you release a widget, and you got a memory leak.

GC's are nice, but I prefer them to be *optional*.

The real problem is finalizers
by Esben Mose Hansen on Wed 28th Apr 2004 06:53 UTC

Any GC mechanism that depends on finalizers as the only object clean-up method trades an irritant with a real problem. The irritant is memory management. The problem is resource management --- semaphores, files and so on. One of the clean, effective and easy solution is to use constructors and destructors to manage the allocation and deallocation of these resources. This garanties proper deallocation at the proper time, without ever risking accessing deallocated or never-allocated objects. This idea is often nicknamed RAII (Resource Allocation Is Initialization). This technique cannot be used with finilizers, however, which may never be run (at all!)

This does not mean that GC is bad. C++ (which seems to survive everything, beats-me-how) now has several GC availble, including some very interesting "conservative" collectors, which garanties that destruction occurs exactly when the object goes out-of-scope. This is, IMHO, the way to go. This way, wrapper objects can be made which handles nearly resource management --- not just memory.

This is not about performance, though I admit that Java and .NET has problems in this regard. This is about correctness, which is far less easily solved.

Finalizers
by Treza on Wed 28th Apr 2004 09:09 UTC

Finalisers ( user defined "destructors", a la C++ ) are bad for GC managed languages :
- Usually, the precise instant of destruction is random, as well as the order of destructions between several objects. Reference counting GCs are more "predictable"(?) than Mark/Sweeps ones.

- The "finalised" objects need to pass trough the collector twice as the finaliser may want to make the object uncollectable.

I recommand the reading of the classical paper "Uniprocessor Garbage Collection Techniques" by Paul R. Wilson, available as draft and full text, see http://citeseer.ist.psu.edu/wilson92uniprocessor.html, http://www.cs.utexas.edu/users/oops/papers.html

Doesn't beat *me*...
by Solar on Wed 28th Apr 2004 09:58 UTC

> C++ (which seems to survive everything, beats-me-how)...

Versatility. The very feature that makes it one of the ugliest bitches in the programming language arena also makes it applicable for just about everything...

GC
by Michael on Wed 28th Apr 2004 11:19 UTC

Good article about garbage collectors.
Last year I switched from C++ to OCAML programming which uses a fast hybrid generational/incremental garbage collector. It's much more fun just to code without the need for doing the memory mangement myself than to debug memory leaks and segfaults and desctructors. In most real applications ocaml-code is as fast as c-code, so the gc comes for free.

@Solar
by Rayiner Hashem on Wed 28th Apr 2004 12:01 UTC

Traditionally, programmers coming from GC languages have a hard time to grasp the concept that, if you start to write destructors for every C++ class or have numerous "new()"s in your source, your design is broken. In over 90% of the cases, you have no need to use anything but automatic variables and default destructors.
This is true for the styles of programming common in C++. However, in lots of functional code, you need indefinite extent for objects, and GC is the only sane way to do it. Can you imagine trying to manually manage memory for closures?

Also, languages other than Java also have non-GC related cleanup methods available. It is very common for a Lisp or Dylan programmer to define a macro "with-open-file" (or whatever resource you want managed, and use it as such:

with-open-file (fs = "foo.text", element-type: <byte>)
read-to-end(fs)
end;

(Example stolen from fun-o's website ;)

This technique can be generalized to any resource you want, and can even be used to manage locks automatically:

with-held-lock(lock)
do-stuff();
end;

So the lack of determinite finalization really isn't a significant barrier to anyone except those coming from a C++ background ;)

@Esben Mose Hansen
by Rayiner Hashem on Wed 28th Apr 2004 12:05 UTC

Conservative collectors do not guarantee determinite deallocation. Conservative collectors simply let you use GC with a non-cooperative language, by assuming that if a particular word looks like a pointer (regardless of whether it may just be an integer), it should be treated as a pointer into the heap. This is not a greatest way to do GC, but it performs pretty well in practice.

What you're thinking of is something like boost::shared_ptr, or one of the other resource-managing templates. They are good within the confines of C++, but they are much more cumbersome to program (C++ really doesn't make the wrapper transparent except for at invocation), and perform poorly compared to a real GC.

Re: Jordan
by Lars on Wed 28th Apr 2004 14:41 UTC

I have a couple big problems with garbage collectors.

They tend to run when memory is scarce.


Not if you use an incremental collector.

Re: I don't agree with..
by Marcus Sundman on Wed 28th Apr 2004 20:44 UTC

> > off the disk. However, mark and sweep gc routines require
> > that all this data is constantly being swapped back into
> > memory for the sole purpose of garbage collection.
>
> What do you mean by "constantly"? If you use a generational
> GC (which is the default on the Sun Java VMs), most GC runs
> happen in the nursery or eden space (the new generation).

Pages with eden are seldom swapped out since they are constantly in use.

> The Old generation does, but GC runs happen much less
> often there.

In GUI design the rule of thumb is that reaction latency should never exceed 0.1 seconds. (This is because of our perception of cause and effect, and keeping the cognitive load at a minimum.) It's not OK if a GUI program has 2-3 second pauses every minute, and a JVM that pauses for a minute several times per hour is totally unacceptable for GUI programs. Now combine this with the poor VM performance of linux (especially 2.6) and you get a disaster. I've experienced several GC pauses of half an hour and one of about an hour, during which the linux POS is totally unusable.

gc exclusion zones?
by tech_user on Wed 28th Apr 2004 23:20 UTC

wouldif be useful to have automatic gc, but exclude it from certain parts of your code, in an exclusion lock, for when you know that you don't want to be interrupted by a gc thread? this complements the hints/invokes to a gc round ...

Re: GC and C++
by HanSoft on Thu 29th Apr 2004 09:03 UTC

> Traditionally, programmers coming from GC languages have a
> hard time to grasp the concept that, if you start to write
> destructors for every C++ class or have numerous "new()"s
> in your source, your design is broken. In over 90% of the
> cases, you have no need to use anything but automatic
> variables and default destructors

Very true! GC programmers tend to think that for every single bit of information, you have to do an allocation. Well, I don't (like) to do OO and even less GC, just for the reason that it breeds BAD programming habits.

Any real good C programmer knows that you estimate the size
of any 'object' and then do the allocation in one chunk. If you can't (because it is too large) you do another algorithm, but you're not gonna fragment your memory. If it is memory conservation over speed you can do a realloc(). If not, who cares.

For my own compiler, I estimate the size of the objectcode during tokenizing, so don't say it can't be done. If you really can't tell where you're going you can always use the 'increase allocation' technique, where you start allocating a chunk (say 10K) and every other allocation is increased by a certain percentage (say 50%), so the next allocation is 15K.

However, this depends on the problem you have to solve. I doubt very much that an OO/GC programmer can beat the speed of my C programs. That doesn't mean that there is no niche for GC languages, but don't say they're more efficient: they're not and they never will be!

Still beats me that someone can 'forget' what he allocated..

Well..
by CrimsonBane on Thu 29th Apr 2004 14:26 UTC

Perhaps you should try some real world server programming. With virtual CPU emulators like valgrind finding a memory leak in C/C++ now a days is pretty trivial. People have been saying for the last 10 years that GC is just as good as new/delete. Although I have yet to see anything I can use that has the same performance as C and has GC.

Memory Pools
by Jonathan Bartlett on Thu 29th Apr 2004 18:35 UTC

The most often overlooked form of memory management is memory pooling. It's not a universal technique, but it works in many situations, and is used in Samba and Apache at least.

Pooling is lifecycle-based memory management. When you know which parts of your application's lifecycle a structure will be in, you can create a "pool" of memory which will all be freed at the end of that part of the lifecycle.

For example, Apache has memory pools for the lifetime of the server, the lifetime of the connection, the lifetime of the request, and others. If you allocate memory that you know will not be used past the end of the request, you allocate it in the request pool. Then, all objects in the request pool are freed simultaneously at the end of the request. This is by far the fastest method of memory management, because mallocs() and frees() are constant-time and very fast. Pool allocation is about the same speed as normal malloc().

There are two reusable implementations of this I'm aware of: Apache's portable runtime and GNU's obstack library. However, it's a very powerful technique if your application fits that model.