Linked by kragil on Wed 23rd Jan 2013 20:26 UTC
Google "Native Client enables Chrome to run high-performance apps compiled from your C and C++ code. One of the main goals of Native Client is to be architecture-independent, so that all machines can run NaCl content. Today we're taking another step toward that goal: our Native Client SDK now supports ARM devices, from version 25 and onwards."
Thread beginning with comment 550424
To view parent comment, click here.
To read all comments associated with this story, please click here.
RE[10]: Comment by Laurence
by Neolander on Fri 25th Jan 2013 07:06 UTC in reply to "RE[9]: Comment by Laurence"
Neolander
Member since:
2010-03-08

Difference is that you can determine when to impose this possible 'random latency bubble' unlike with a GC where the GC logic decides when to do a memory reclamation sweep.

This means that you can release memory back at a pace that is dictated by yourself rather than the GC, a pace which would minimize 'latency bubbles'.

As I said, you can decide to run the GC yourself in a non-critical area, after a lot of memory handling has occured, so that it has no reason to run on its own later. It is no more complicated than running free() in itself.

After that, let us remember that GCs are lazy beasts, which is the very reason why GC'd programs tend to be memory hogs. If you don't give a GC a good reason to run, then it won't run at all. So code that does no dynamic memory management, only using preallocated blocks of memory, gives no work to do to the GC, and thus shouldn't trigger it.

But even if the GC did trigger on its own, typically because it has a policy that dictates it to run periodically or something similar, it would just quickly parse its data structures, notice that no major change has occurred, and stop there. Code like that, when well-optimized, should have less overhead than a system call.

Now, if you are dealing with a GC that runs constantly and spends an awful lot of time doing it when nothing has been going on, well... maybe at this time you should get a better runtime before blaming GC technology in itself for this situation ;)

Also, given that you control exactly which memory is to be released back at any specific time, you can limit the non-deterministic impact of the 'free' call.

The beauty of nondeterministic impacts like that of free() is that you have no way of knowing which one will cost you a lot. It depends on the activity of other processes, on the state of the system's memory management structures, on incoming hardware interrupts that must be processed first...

If you try to use many tiny memory block and run free() a lot of time so as to reduce the granularity of memory management, all you will achieve is increase your chances of getting a bad lottery ticket, since memory management overhead does not depend on the size of the memory blocks that are being manipulated.

Which is why "sane" GCs, and library-based implementations of malloc() too for that matter, tend to allocate large amounts of RAM at once and then just give out chunks of them, so as to reduce the amount of system calls that go on.

Reply Parent Score: 1

RE[11]: Comment by Laurence
by Valhalla on Fri 25th Jan 2013 21:40 in reply to "RE[10]: Comment by Laurence"
Valhalla Member since:
2006-01-24

So code that does no dynamic memory management, only using preallocated blocks of memory, gives no work to do to the GC, and thus shouldn't trigger it.

This code is no problem to handle without garbage collection either, it's hardly something to sell the notion of a GC with. Only difference is that even when not doing an active memory reclamation sweep, the GC still uses cpu/ram for it's logic, causing overhead, there's no magic that suddenly informs it of the current gc heap state, this is done by monitoring.

But even if the GC did trigger on its own, typically because it has a policy that dictates it to run periodically or something similar, it would just quickly parse its data structures, notice that no major change has occurred, and stop there. Code like that, when well-optimized, should have less overhead than a system call.

Than what system call? You wouldn't call free to begin with unless you were explicitly freeing memory.

The beauty of nondeterministic impacts like that of free() is that you have no way of knowing which one will cost you a lot. It depends on the activity of other processes, on the state of the system's memory management structures, on incoming hardware interrupts that must be processed first...

These activities of 'other processec etc' affect the GC the same way, the GC heap is not some magic area with zero-latency allocation/de-allocation, also it fragments alot easier than the system memory does as it uses but a subset of the available system memory which is most likely alot smaller than the memory available to the system allocator (more later).

Again, by dictating the pace of memory allocation/deallocation you can limit the non-deterministic impact of memory allocation/deallocation in order to minimize latency problems, we're not talking about adhering to real-time constraints here which is another subject, but to prevent latency spikes.

And latency spikes have always been the problem with GC's, as you give away control of memory reclamation to the GC you also lose the control of when and what memory is to be reclaimed at a certain time. So while in a manually managed memory setting, you'd choose to only release N amount of allocated objects at a given time to minimize latency, the GC might want to reclaim all memory at a single go, thus causing a latency spike.

Which is why "sane" GCs, and library-based implementations of malloc() too for that matter, tend to allocate large amounts of RAM at once and then just give out chunks of them, so as to reduce the amount of system calls that go on.

This is exactly what the system memory allocator does, except it has all the non-allocated memory in the system at it's disposal.

Just to make this point clear, the system allocators of today do not employ some 'dumb' list that is traversed from top to bottom while looking for a free memory chunk of a large enough size.

System memory is partitioned/cached to be as effective as possible in allocating/de-allocating memory chunks of varying sizes. I would suggest this article for a simple introduction: http://www.ibm.com/developerworks/linux/library/l-linux-slab-alloca...

Now, managing your own pool in order to minimize allocation/deallocation overhead is nothing new, it's been done for ages. However for this to be efficient you want to have pools of same sized objects, else you will introduce memory fragmentation in your pools.

This is what happens in the GC heap, the GC must be able to manage memory of all sizes, so it will introduce memory fragmentation. And unlike with system memory allocation, the GC can't pick and choose from the entirety of available RAM in the system, it can only pick and choose from the chunk of memory it allocated and now manage.

This means that it will fragment easier, and when fragmentation occurs which prevents allocation of a size N block there are two choices, one is to resize the GC heap by asking for a larger block from the system which is likely very expensive and also ineffective use of memory as we likely have the memory needed, just fragmented, the other and more commonly used option is to defragment, also known as compaction.

Here the GC moves blocks of memory around so as to free as much as possible space for further allocations. While this is not as expensive as resizing the heap, it's still very expensive.

Now system memory also fragments, however given that system memory is not near as memory constrained as the GC heap (which typically is a small subset of available system ram), it will take so much more fragmentation for it to cause problems.

Now I want to point out that I am not adverse to garbage collectors, they simplify alot of coding and their use makes sense in tons of situations, but unlike what some people think they are not some panacea. Garbage collecting comes at a cost, this is undeniable and has been established since long, modern gc's of today goes a long way in minimizing that cost, but it will always be there.

For certain code this cost has no significance, for other code the benefits outweigh the costs, and for some code the costs are simply unacceptable.

Reply Parent Score: 3

RE[12]: Comment by Laurence
by satsujinka on Sat 26th Jan 2013 09:13 in reply to "RE[11]: Comment by Laurence"
satsujinka Member since:
2010-03-11

Any of the tricks a system allocator is capable of is also possible in a GCed language. It's not like the algorithms are only possible by the system (and if they were the runtime can just use the system allocator.)

Further, in practice it turns out that you're completely incorrect about fragmentation. GCed languages have less issues with fragmentation.

GC may have a base cost that's higher than manual memory management, but it can be pushed into the same ranges as manual memory management, even if it takes more work to do so. That's the trade off, GC makes life easy when you don't care, but makes life more difficult if you need the performance. But sacrificing GC for some dubious performance gain is not good engineering (though choosing a language with optional GC would be a good decision if you know you'll have performance issues, if only because it gives you more options.) So: write the code with GC, profile it, tune your code, repeat; if after a certain number of iterations you still can't get the performance you need disable the GC in that section or drop down to C or assembly and rewrite your bottlenecks there.

Reply Parent Score: 2