Modern languages such as Go, Julia and Rust don’t need complex garbage collectors like the ones use by Java C#. But why?
To explain why, we need to get into how garbage collectors work and how different languages allocate memory in different ways. However, we will start by looking at why Java in particular needs such a complex garbage collector.
Good info on how Go deals with memory versus how Java, mainly, handles memory. The most interesting start of a rabbit hole is the mention of research work around memory allocators.
These discussions always focus on technicalities and miss the big picture. Without an automated memory management (GC, to a lesser degree borrowing in Rust) built into the language itself, there are no rich APIs, ergo the language is only suitable for low level programming.
We can work on complex data structures all day long in our own code, but once any interfaces are exposed to the external world they have to fall back to primitive data types or limit the interaction (manually via docs or automatically via borrowing) to very simplified life cycle scenarios. In practice this means without a GC it is not possible to outsource anything operating on the main data model directly.
> Without an automated memory management, built into the language itself, there are no rich APIs
I disagree. Rust’s borrowing is similar to a non-const reference in C++. And I don’t see how not having a GC nor borrowing would cripple the possibility to pass complex structures through an API.
Not knowing who owns what can be problematic and often seen as a bad code smell. But ref counting can replace the need for a GC, and be quite more efficient. That’s what iOs uses and they don’t need nor use a GC.
If your point is you are happy with lower level APIs (Qt is a good example of what can realistically be achieved in C++ like languages) then I fully agree with you. I don’t say this doesn’t work, just that the lack of GC lowers abstraction level of public interfaces. You end up with a relatively high level language you can use for your own code and a dumbed down version for APIs. It doesn’t take much to hit a limit, imagine you have a DOM like structure and you want to pass it through a processing pipeline (filter, map, reduce style). Without a GC, you would either implement it all by yourself, copy the whole model and pass it as a value, or only outsource operations on primitive/leaf objects.
By the way, I would classify reference counting as a GC, just not a very good one. I’d prefer a more reliable GC, built into the runtime (so all libraries use the same one) and, yes, it should perform better too.
It also is important that a language that has good tooling, API and framework support. The problem with Go, Rust, etc, is that you have to resort to some archaic setup and often roll your own libraries or frameworks. Even something as eccentric as F# has far better support for data loading/import and cleaning, giving it a far larger use case for practical business application implementations. Ironically, the often touted feature of manual memory management or simple things like RAII, is eventually replaced with “poor man” like implementations in libraries which technically try to do the same thing as a GC.
Memory fragmentation due to recurring allocations and deallocations are still an issue in modern large scale applications. The overly simplistic examples provided by the author to compare against something complex like JGit is mind boggling. Memory fragmentation in operating systems and system languages (like C/C++) is still a real problem till this day, a simple search in Stack Overflow would turn up such recent results. Worse, using outdated Hotspot (Java) and Framework (.Net) runtimes as modern comparisons is just outright misleading. Try OpenJ9, GraalVM (Java) and Core (.Net) for a true modern runtime. The fact that the author conflates C# as .Net, further limits his knowledge of newer language features such as TCO in recent runtimes.
If I had to pick a systems programming language, I’d choose better options such as Rust or even C++. At least Rust is a genuinely modern language with support for modern paradigms, though it lacks the maturity of C++ frameworks and libraries. Languages like Go are meant for easy learning and use, but falls apart when you need to be concise and expressive. For example, if you love writing plenty of boilerplate code, then Go is a language choice for you.
Through the years, I have had some interns explore Rust and Go.
At least for our needs, neither seem to offer any clear value proposition to move us from our C++/Python flows.
I have really tried to like Rust. But it just infuriates me, because it doesn’t seem to make things any easier than C++ and the syntax ain’t that much better. Any positive experiences in either language (Go/Rust/etc)?
I really wish Visual Programming languages were back. If we’re going to go high level, let’s go all the way. It’s just infuriating how behind the curve the language research field really is (with respect to other areas in computing) with the amount of attention they get. (Salty EE/CE farting in).
There are many engineering reasons for choosing a language. A new “better” one is rarely better choice.
Syntax, to me, is in the 3rd place, after engineering suitability and technical features. It falls under ergonomics – definitely nice to have but less important than the first two.
Visual programming failed because at scale it is not at all ergonomic and sacrifices more important features, like version control.
I’m on the opposite scale. I have been burned enough that I consider syntax the top consideration.
If I have to spend a whole day to understand what someone’s old line of code is supposed to do. It turns out it was easier/faster to do that functionality myself.
We couldn’t find enough value proposition for Rust, and our project literally checked a lot of the pros for the language.
I think ndrw is saying without a GC the data structure has to be predetermined ahead of time. For instance, it’s easy to ingest random data in dynamic languages because they can allocate memory on the fly and clean up later. This is why middleware is written in languages like Lua, Java, or Python rather then C, C++, or Go.
This has been my experience anyway. As much as I dislike deploying interpreted languages over compiled languages, they have their uses since they are usually dynamic.
The article is published behind a walled garden so not readable
You are able to read it if you have not read other articles on medium this month.
I know. I just can’t be assed to clear my cookies or use a VPN.
This article left out how many languages (specifically the functional ones) solve the problem of concurrency and that is share nothing among threads. With that you only need to pause one thread if you want to collect garbage. I have read some years ago that the Eiffel language uses one garbage collector per thread.
Yup, Erlang is another such example, with each process (akin to goroutines with more features, used more pervasively) having its own stack and doing its own GC, making the GC’s work trivial (if it has to run at all; many Erlang processes are too short-lived for the GC to kick in).
And that solves another problem. If the processes are short-lived they don’t get fragmented memory.
Well, Erlang at the end of the day is basically a glorified phone switch simulator, so it’s not a good example of a useful general purpose language. J/K
F# on .Net is another good example. It is also has great support for data science applications, where using something like Go or even Rust would make it unpractical. Imagine doing memory allocation for all your datasets, all the time.
adkilla,
With high level abstractions, there’s no need to micromanage every allocation. High level constructs and objects can manage themselves. So even for scientific applications, I don’t think programmers will be bogged down with memory management issues…
Personally I prefer languages with explicit allocation/deallocation and not depending on a garbage collecting process at runtime. The safety aspects as GC languages was one of the most compelling reasons to switch to them, but now that static code verification is making large inroads to address human error, code safety is no longer exclusive to GC/managed languages.
I’m sure there are p[enty of engineers and scientists who are going to stick with what they’ve got and don’t care about hopping on the latest and greatest bandwagon. GC’s main shortcoming is realtime applications (jiter), and higher memory usage in general, but many scientific applications aren’t terribly affected by that. So they should use whatever works for them 🙂
Alfman nailed it.
Reading the article I notice lots of mostly unidirectional critiques, that makes it more political or personal preference document than technical discussion.
If you intend to discuss performance, give me the performance of both options under general purpose computing, not some select case. I don’t care about how many clock cycles, what I care about is the difference in real world use.
Java vs Go, I quite like Rust, and I’ve toyed with F#, but for most of my data analysis or report work there isn’t a chance in hell I’m going to personally write libraries in Rust or Go to replace the massive resource C, C++ or Java already provides me. Even then, why bother, what’s the point as most of this stuff isn’t real time and not even close to needing fine resolution?
If jitter is a concern, by all means use low level languages. Higher level abstractions (functional programming, OO) depend on infinite memory, and GC is one of ways of delivering it. Other ways are actually having “infinite” memory and never deallocating anything (OK for short-lived programs) or doing memory handling manually and using only low-level language features when memory handling becomes too difficult.
I don’t understand where the claim that OO depends on “infinite” memory comes from (or functional programming for that matter).
Both models expect require memory allocation, expect it to never fail and, by themselves, provide no means of memory cleaning or defragmentation. Some implementations chose to use simpler (than a GC) solutions to deliver on that abstraction, with a varying level of success. C++, for example, has objects but I wouldn’t classify it as an OO language – it is not possible, in general, to construct programs as a group of objects communicating with each other using objects.
Of you need a reference, I remember SICP was using that concept.
ndrw,
That’s more of a linux thing, where it over-commits memory and grants allocations that will ultimately go on to fail when used. This doesn’t have much to do with functional and OOP languages though. Operating systems like DOS returned exceptions and NULLs at the time of allocation failure rather than deferring failure to the point of use. The result is that programs won’t fail due to NULL allocations and out of memory exceptions, instead they tend to fail with the linux “out of memory killer” killing them. I am not a big fan of this approach because it punishes well behaved applications (and languages), but it’s what we end up with on operating systems that overcommit memory.
Fragmentation is a valid point, however in my experience GC languages are often less memory efficient anyways.
@ ndrw
That’s where I’m having trouble understanding your point. Neither OO nor functional models make any assumption about memory. Some OO/functional langs make stuff, like mem management, more explicit than others. But the programming model itself it’s not tied to either approach (explicit allocation/dellocation vs GC).
I thought the original OO approach was for objects to communicate via messages (unless we consider the message a type of object).
It’s always seems abstraction has always been a difficult balancing act.
@javiercero1
Yes. Originally, as described by Alan Kay, OO meant discrete services passing messages, and an object could be anything on the other end.
Privsep is a lot more like the original intent of OO then OO is today.
@javiercero1
Both OO and functional programming paradigms simply assume memory is available, just like electricity or processing power. In practice, there are only two ways of assuring memory is there when is needed: (1) supplying enough RAM to last the lifetime of the program, (2) using GC to automatically recoup unused memory (this still requires enough RAM for active data). Anything less boils down to adding restrictions on what you can/cannot do with the language, which have nothing to do with the original paradigm. Things like documenting or checking who and when should destruct an object, falling back to primitive types, passing data as values etc. Not having a GC is often pictured as an inconvenience but the real issue is in it dumbing down APIs.
ndrw,
Assuming infinite memory is available is not a prerequisit for either OO or functional programming. Granted some programs aren’t able to handle out of memory conditions; trivial programs often just fail because there’s nothing better to do. But if you’ve got something more critical, like a database with many users and perhaps tens of gigabytes of data in ram, you probably want to look into handling out of memory conditions rather than just exiting. You can flush data to disk and purge some of the cache, for example.
The reason checking for memory errors doesn’t work on linux by default has nothing to do with OO or functional programming, it has to do with linux over-committing memory and always returning successful allocations even when the memory is not available. Linux defers allocations. Something like malloc(4GB) can succeed many times even without checking that there’s enough free memory to actually back the request. Later, when the memory needs to be written to, that’s when linux decides to map the address space to physical ram, and that’s when linux applications tends to fail. But just to be clear this is biproduct of linux deferred memory allocation strategy and not functional or OO programming in general. In non over-committing scenarios it is perfectly normal for programs to handle out of memory conditions at the point an allocation fails without assuming that all heap/stack allocations will succeed. Many languages have provisions for handling OOM appropriately…
https://docs.microsoft.com/en-us/dotnet/api/system.outofmemoryexception?view=net-6.0
Yes, for sufficiently small/short lived programs, you can get away with never freeing anything and never garbage collecting. But too much looping can cause memory usage to get out of hand, so it’s not always appropriate outside of trivial “run and done” programs.
Something similar to what you are suggesting that worked for long lived processes was the mark/release functionality of turbo pascal where you could “mark” the heap pointer, run some algorithm that consumes memory and then release memory back to the mark to effectively reset the heap state after a computation completes. This was extremely efficient. Today you can still achieve this kind of use & reset pattern using custom allocators. Even fork/wait has a similar effect, albeit with more overhead.
Alfman,
It doesn’t matter how the illusion of infinite memory is maintained, as long as it works and is transparent to the language.
You are talking about is dealing with OOM conditions – that is an important aspect of software engineering (I complained about Linux OOM behavior here myself), but when OOM happens the programming paradigm is no longer relevant, we are just trying to fail safe.
What I was saying from the beginning of this thread is that the solution to maintaining the illusion of infinite memory has to be transparent to the programmer. That is, we shouldn’t have to drop OO or functional programming techniques only because this would make reclaiming memory by hand too difficult. As of now (aside from never reclaiming memory at all) there is only solution that meets this criterion: garbage collection. GC was invented 60 years ago for exactly this purpose – maintaining functional paradigm in LISP rather than making it look and feel like Fortran with parentheses.
ndrw,
I had trouble following what you’re trying to say with the “illusion of infinite memory” from the beginning of the thread. Realistic programs don’t need infinite memory, only sufficient memory, Many programmers explicitly recognize that memory is limited (This limit can be made explicit in software specs). They can also write software to degrade gracefully, like games evicting resources for distant objects on the fly.
I’d like to add that having a GC doesn’t necessarily free authors from the need to do their own memory management. A game written in a GC language will still need a way to release referenced yet unused resources, otherwise the memory footprint can keep growing until there’s no memory left for new allocations.
Kerbal Space Program is an example of such a game written with garbage collection in .net, Despite this it has leaks that cause the memory consumption to keep growing with use. The GC is not at fault, the game’s programmers fail to release resources and the GC sees all the refrences as active so it’s unable to free them. The result is that the game needs to be completely restarted periodically to clean out the memory…
https://steamcommunity.com/app/220200/discussions/0/1735468061758539756/
Go manages the memory allocations for the programmer. It’s a C like syntax with a GC. The Go experience is closer to that of Java, C#, Python, or Ruby in that the programmer doesn’t need to think about memory allocations.
Now defining structs for all of the data sets would get a little tedious. Malleable data structures is what makes Python popular in the data science community.
Yeah, people have limited time, and Go competes more with Java and C# then Ocaml or Haskell.
Shared-nothing between threads/goroutines is easy in Go. It has channels for IPC, and by default everything is passed by value.
That still doesn’t change that Go uses memory fundamentally different then Java does.