posted by Zachary Pinter on Tue 27th Apr 2004 17:09 UTC
IconGarbage collection (GC) is a technology that frees programmers from the hassle of explicitly managing memory allocation for every object they create. Traditionally, the benefit of this automation has come at the cost of significant overhead. However, more efficient algorithms and techniques, coupled with the increased computational power of computers have made the overhead negligible for all but the most extreme situations.


In fact, in many situations automated memory allocation can be significantly faster and more efficient than traditional approaches. Combined with increased programmer productivity, it would be negligent not to consider automated memory management a core component in nearly all future programming paradigms.


Though recent approaches have expanded the feasibility of an automated memory manager, the basic technology has been around for quite some time. GC was first invented by John McCarthy in 1958 as part of the implementation of Lisp. Though it has gained popularity through Java and .NET, it is included in languages such as Eiffel, Haskell, ML, Modula-3, Perl, Prolog, Scheme, and Smalltalk. Traditionally popular programming languages, such as C/C++, require the programmer to “micro-manage” memory consumption. Manual memory management is enormously tedious and error-prone; however, it can be more efficient in many ways if properly handled. This discrepancy in efficiency has slowed the widespread adoption of the automated approach.

Algorithms and Technologies

Though GC can be handled in a variety of ways, the core idea is the same across all techniques. The goal is differentiate between “garbage” and “non-garbage” objects and automatically free the memory that “garbage” objects are occupying. “Garbage” essentially refers to an object that is unreachable by any other currently reachable object. If it cannot be reached, it is not useable any longer and is therefore considered “garbage.”

Reference Counting

Reference counting is a form of garbage collection where each object stores a count of all the other objects that refer to it. Every time an object is assigned or an operation is performed that would alter a reference to an object, the reference count for all involved objects are appropriately updated. If an object’s reference count decrements to zero, then it is effectively unreachable and considered “garbage.” Though this technique provides a good conceptual example, it is rarely used in practice. Reference counting requires frequent updates and it cannot account for cyclical references, where an object indirectly makes a reference to itself. Additionally, though not nearly as troublesome as the first two problems, reference counting requires every object to reserve a special storage space for a reference count.


The mark-sweep collection technique is one of the most basic tracing garbage collection technologies. Tracing garbage collectors start by considering every object to be garbage and then tracing through every reachable object. Anything that is not traced as reachable is remains “garbage.” A mark-sweep collector is able to recognize objects that have already been flagged as reachable and therefore can properly handle cyclical references. After it marks every reachable object, it then scans through all objects in the heap and frees any object not flagged as reachable. The downside of this approach is that the memory becomes fragmented since objects have been removed sporadically throughout the heap. Also, a mark-sweep collection can be quite resource-intensive, since it is forced to scan every object in memory, regardless of whether or not the object is flagged as “garbage.”


The copying collection technique can be dramatically faster than mark-sweep because it is able to dismiss “garbage” objects without ever scanning or analyzing them. Instead of simply removing “garbage” objects from their position in memory, it relocates all “non-garbage” objects to an entirely new location in memory, trashing everything left-over in the memory. It does this by splitting its allocated memory into two sections: the active section and the unused section. In the active section, objects are stacked on top of each other contiguously, making object allocation extremely efficient. New objects are allocated simply by placing them on top of the existing heap. When it runs out of free memory, the garbage collector traces through all reachable objects and copies them contiguously to the unused section of memory. The active section of memory is still full of reachable and unreachable objects, but that doesn’t matter since all the reachable objects have been copied to the unused section. Then, the active section and unused section switch places, dismissing the previous section of memory. When the new active section becomes full, the process is repeated again and the sections are swapped. This approach has the advantage of providing excellent allocation speed and it virtually eliminates fragmentation. The downside, however, is that long-living objects are continuously swapped between the two sections every time the collector is run. Additionally, it requires twice the amount of memory to run effectively.


To account for the benefits of the copying collector while considering memory constraints and long-lived objects, the mark-compact collector was formed. Though it is a bit more complex than copying collectors, the fundamental idea is the same. There is a definite separation between the active section of the memory and the free/unused section. However, unlike the copying collector, the mark-compact collector scans for all reachable objects and compacts them at the bottom of the heap, leaving the remaining memory free for consumption. The catch is that it doesn’t have to fully compact the memory each time, and since long-lived objects tend to get pushed to the bottom of the heap, they are not as frequently collected and compacted.


To further account for the differentiation between handling short-lived and long-lived objects, the generational approach was developed. In a generational collection, the heap is divided into multiple generations. Objects are created in the base (youngest) generation, and are promoted to higher (older) generations after passing some form of criteria, usually related to the age of the object. Garbage collection can be done at different time intervals and even using different techniques based on the generation the object is in. The only problem with this approach is when an older generation object references a younger generation object. Since the tracing collector does not trace into older generation references, this intergenerational reference gets lost in the mix if it is not referenced by some other object in its own generation. Luckily, there are only two ways for an older object to reference a younger object: either a reference in the older object is modified to refer to a younger object, or a young object that refers to other young objects gets promoted to an older generation. Languages that employ a generational collector account for this problem by maintaining a list of intergenerational references whenever such an occasion occurs.

Table of contents
  1. "Garbage Collection, Page 1/2"
  2. "Garbage Collection, Page 2/2"
e p (0)    35 Comment(s)

Technology White Papers

See More