In an email to the GCC development mailing list, one of the main developers of LLVM revealed that his recent employment at Apple has been focused on integrating LLVM with GCC, and is now proposing a long-term merge of the two projects.
Apple Invests in LLVM, Suggests Merge with GCC
About The Author
Ex-programmer, ex-editor in chief at OSNews.com, now a visual artist/filmmaker.
Follow me on Twitter @EugeniaLoli
2005-11-19 9:17 amrayiner
LLVM is a couple of things.
1) It is a low-level intermediate language. It’s semantics are very similar to a stripped-down C, but it looks more like a very high-level portable assembler.
2) A compiler framework, consisting of libraries that take LLVM source files, apply very powerful SSA-based optimizations to them, and generate machine code.
The benefits to GCC are two-fold:
1) It gives GCC a much more rigorously-defined and well-documented intermediate representation.
2) It grants GCC access to a very clean optimization framework, as well as set of modern optimizations. These vary from ones that are common in commercial compilers (many scaler SSA optimizations), to ones that are available in only the best commercial compilers (inter-procedural analysis at link time), to ones that are not yet available on commercial compilers (large scale data structure optimizations).
I was all set to write a post on why it’s a great idea in theory but would never work, but the reaction on the mailing list seems appropriately cautious but otherwise positive. LLVM really is an excellent framework, and I very much admire how Chris Lattner is intent on making it useful not just to C programmers, but many other languages. I read the discussions that happened a year or so ago that lead to the tail-call optimization features in the current version of LLVM, and Chris really took the time to understand exactly what certain languages (specifically Scheme) needed with regards to tail calls and figure out how LLVM could serve those needs. The talk of integrating high-level optimizations related to dynamic typing is very intriguing, and I hope it turns out as well as the tail call stuff did (although, it is a much bigger task and will require much more significant changes to llvm). There is also LLVM’s not-insignificant advantage of having very extensive documentation and many papers on the workings of its optimizations, by virtue of its academic roots.
Edit: LOL. Forgot to change the title, and of course, OSN doesn’t allow me to. The title should be: “wow, it just might happen after all”.
Edited 2005-11-19 09:22
I guess you should try it. LLVM has the additional feature that you have optimizations which span several translation units.
It is able to produce bytecode and it has a virtual machine with a JIT compiler which executes in my experience not slower than native code.
Additionally you have the opurtunity to optimize delivered code based on real word profile data simply by optimizing the bytecode.
If a new optimizer is ready you can run it on your bytecode without compiling the whole project from scratch.
And what does this mean?
2005-11-19 9:59 amrayiner
Could you perhaps be more specific about what exactly you’re confused about? And who are you replying to, anyway?
i a m the guy of the first message: before all thankyou for answer, and sorry for bad english (I’ m italian)
well, my curiosity was for a general understoonding of that technology, because i’ m new in this world.
your explanation is good.
This proposal would essentially mean shifting power away from gcc to the generic backend. The gcc people don’t allow this to happen. AFAIR, Sun used to have an ANSI-C backend for GCC which would’ve been very neat to get gcc output run on really odd platforms with just a silly C compiler. They tried to get it included – no go.
This situation is essentially the same. GCC folks want gcc to be _the_ platform, not a tool, and _source_ as the preferred form of portable product. Plus now, that Tree-SSA is in place, there is hardly the need for yet another optimizer framework that gcc will NOT benefit from.
The only way Apple could have success here would be to fork gcc, call it like, er, “agcs” and improve it so drastically that the mainstream is eventually forced to merge-up and include their desired backend on that route.
2005-11-19 11:03 amDevL
“The only way Apple could have success here would be to fork gcc, call it like, er, “agcs” and improve it so drastically that the mainstream is eventually forced to merge-up and include their desired backend on that route.”
On a side note, OpenBSD has rolled their own modified version of GCC for a long time now as they include ProPolice in their version. On the other hand, they’re still stuck on version 2.95.3. Then again, I heard that the GCC people now were considering including ProPolice, so there just might be a chance of seeing technological advancement get the upper hand of politics once in a while.
[Correction: They’re mostly stuck on 2.95-3. Some architectures have a more recent 3.x version of GCC.]
Edited 2005-11-19 11:08
2005-11-19 1:40 pmAnonymous
SSP is aldready included in gcc 4.1 which will be released earli next year. And SSP is no reason to stay with a anique gcc as the OpenBSD folks does since the patches are available for gcc3 and gcc4 too.
2005-11-19 4:00 pmAnonymous
Then again, I heard that the GCC people now were considering including ProPolice
That’s not quite correct. The GCC people initially rejected ProPolice since their review comments were not addressed by the IBM programmer(s) who worked on ProPolice. The GCC people are not reconsidering this position.
However, a new implementation is currently being written by the GCC people themselves, and it will be one that meets the GCC maintainers’ requirements. This implementation will not be called ProPolice.
2005-11-20 1:57 pmDevL
Ah, I stand corrected.
2005-11-19 12:17 pmAnonymous
Thanks for the reminder.
Sorry but my paranoia bit just has been raised.
Reading the wikipedia one learns (or recalls, as in my case) that once a better project (egcs) “hijacked” a pre-existing one (gcc)… to the point of taking its identity.
No problem with that, that happens all the time in free|open source (remember the Linux VM saga? boy, _that_ was emblematic).
But, in this case, there’s a company behind LLVM — will Apple play nice? Is IP involved? How can we rest assured they’ll play by the FSF book?
I’m all for getting company support to free software, what I fear is being fooled. Pay attention, for this is no MS… Apple really has brains, this makes them real dangerous… or nice friends, if they’re well-intentioned.
2005-11-19 9:03 pmrayiner
Apple doesn’t own LLVM. That’s just where the lead implementor works right now, and I presume they are funding him to work on the project. The LLVM copyright is owned by the University of Illinois, and released under a BSD-like license.
Am i the only one who has a natural inclination against this?
i know its not really fair, but i read “merge with GCC” and i just tought “I don’t like the sound of that”…
this does seem like pretty impressive software, but i don’t like the idea of adding a virtual machine or intermedia code to my C compiler.
2005-11-19 11:34 amcamel
a) GCC is more than “your C compiler” 😉
b) you already have intermediary code generation (on which the optimizations are run
c) I did not read anything about merging the virtual machine into gcc. Why shouldn’t it remain a seperate project that can work with the output of gcc?
2005-11-19 4:56 pmsomebody
No, you’re not alone.
Apple keeps pretty bad track when they start to merge|use any OSS project.
Problem is that Apple is a company and company is always concerned with proffit margins only. And it should be as it is, world is always spinning in one direction only. You can’t blame them for being realistic. But, what I say is that I would preffer to see companies like Apple to stay on the closed side where they belong and not mess with public projects for their (and their only) personal benefit
It is better to have none than semi helper which you can’t control. (my own opinion)
Problems I see with Apple being connected to any FOSS general project are:
– too many closed patents (example Quicktime)
– too many closed basic projects (not talking about general Apple software, but Aqua for example, thay made all necessary to build linux software and zero effort to even make possibility of public variant of Aqua)
– too bad track record in cooperation with OSS
(somewhat OSS is not OSS)
– too inclined to self benefit and almost non-existantly concerned with public (non Apple users aren’t Apples concern while gcc sole purpose is definitely not Apple)
2005-11-20 4:18 amdhazeghi
Apple has been trying various methods for the last 3+ years now to improve the quality of gcc codegen. Whatever their motives, they have made significant improvements to gcc (for instance Precompiled-Headers).
As for corporate sponsorship being evil, who do you think does most of the work on gcc? Employees of corporations like Apple (RedHat, IBM, Arm, AdaCore). They haven’t wrecked gcc yet, indeed, I think most people would say that the project worked rather well.
2005-11-19 9:00 pmrayiner
There is already intermediate code in almost every non-trivial compiler. In GCC, it’s GIMPLE and RTL. GIMPLE is what the high-level optimizations are run on, and RTL is what the low-level optimizations run on. LLVM is simply a more well-defined intermediate representation that has a lot of code written for optimizing and manipulating it. It still get’s turned into native code (unless you target the JIT, of course) in the end.
I encountered this project some time ago and now that I see it again, I might try it out. It seems interesting. It would be nice to be able to run programs jit and compiled.
Compilers (normally) can have intermediate code. It reduces the number of code that has to be written as far as I know (from language to intermediate code and from intermediate code to machinecode).
<quote>this does seem like pretty impressive software, but i don’t like the idea of adding a virtual machine or intermedia code to my C compiler.</quote>
As far as I have tested the virtual machine it is not slower than native code, besides a small startup time.
With C/C++ you have the advantages of a virtual machine (maybe like removing the indirection of virtual calls) combined with a language which has no garbage collection by default. You get a fast system without the memory overhead like with Java.
Apple is more propense to playing nice about this than the FSF. They really dislike changing the architecture of gcc from it’s “oh, it’s an enourmous black stone monolith” version to something more modular. But, I have hope, because LLVM is really cool.
Chris himself stated all the reasons why I personally think this won’t fly, even if I hope I’m wrong:
* LLVM is BSD-licensed, and the FSF will want copyright assignment right-away.
* LLVM is written in C++ (as Chris stated, aw yuck) — I LOVE C++ but the FSF LOATHES it.
2005-11-20 6:39 pmAnonymous
“Chris himself stated all the reasons why I personally think this won’t fly“
And he also stated why it could ‘fly’ despite those reasons.
“* LLVM is BSD-licensed, and the FSF will want copyright assignment right-away.“
Chris believes: “If people are seriously in favor of LLVM being a long-term part of GCC, […] that the LLVM community would agree to assign the copyright of LLVM itself to the FSF and we can work through these details.“
LLVM may be BSD licensed now and relicensing with the GPL might not go through, but that shouldn’t stop developers from reviewing the option for its technical merits now.
“* LLVM is written in C++ (as Chris stated, aw yuck) — I LOVE C++ but the FSF LOATHES it.“
I have no reason to believe or not believe whether the FSF loathes or even dislikes C++. Chris states: “the C++ness only exists in the LLVM portions of the resultant compiler, no other parts have to be converted to C++ (well, except for main). […] While I don’t expect everyone to like use of C++, others will hopefully respect that it allows us to build good APIs, increase modularity, and get more done in less time.“
I don’t think Chris meant the FSF. I also don’t think that the developers (as a whole) working on this compiler collection have any particular bias towards a particular language. (Although individual developers might). Further more,
states: “The steering committee was founded in 1998 with the intent of preventing any particular individual, group or organization from getting control over the project. […] committee members were chosen to represent the interests of communities (e.g. Fortran users, embedded systems developers, kernel hackers)“
Would this allow the easy generation of stack based bytecode like JVN and .NET?
That would be cool.
I’ve seen the idea implemented in the DEC VMS world as an intermediate language. It doesn’t support a dynamic compiler. Obviously, it basis itself on languages that already exists. However, compilations have the same restrictions as the LLVM. It’s a step.
Don’t the open-source licences between LLVM and GCC clash? http://llvm.cs.uiuc.edu/releases/1.6/LICENSE.TXT is the licence for LLVM while GCC uses a modified GPL.
I think LLVM is a wonderful idea for a virtual machine and it already supports GCC as a frontend and I think a Java frontend is in the works. PyPy gives python support to LLVM. So why does LLVM need to be part of GCC?
2005-11-20 11:12 pmAnonymous
“Don’t the open-source licences between LLVM and GCC clash?“
According to http://www.fsf.org/licensing/licenses/index_html#ModifiedBSD they are compatible. What might be a problem is the GCC mission statement http://gcc.gnu.org/gccmission.html :
“– Compilers are available under the terms of the GPL.
– Copyrights for the compilers are to be held by the FSF.
– Other components (runtime libraries, testsuites, etc) will be available under various free licenses with copyrights being held by individual authors or the FSF.“
It’ll depend on what encompasses a ‘compiler’ and the willingness of University of Illinois at Urbana-Champaign to release copyright and let LLVM be relicensed with the GPL. Perhaps a fork, copyrighted by the FSF, licensed under the GPL, can be negotiated/arranged.
2005-11-20 11:34 pmAnonymous
P.S. Perhaps the ‘individual authors’ can get back their copyright and depending on what ‘compiler’ means in the GCC mission statement, there is no problem. LLVM stays “under a much less restrictive license” than the GPL, GCC can use LLVM in the official branch while keeping in line with their mission statement and there is no need for a fork.
I’ve been tracking LLVM for quite awhile as I have high hopes that it will eventually morph into a replacement for Microsoft’s CLR. With that in mind, I worked on making LLVM compile Qt4 last year. It seems with this news, that is going to happen and it looks like Chris is thinking more and more about creating a true alternative to the CLR.
For all of you wondering what this news might bring…? Here is what I think:
One thing that people are easily and often confused about is the “VM” in LLVM’s name. LLVM isn’t a VM like a JVM or C# VM, it’s a VM in the sense of being a virtual/abstract code representation. Almost all compilers use some sort of abstract code representation during code generation, this is just the name of LLVM’s.
The current integration work does not have anything to do with JIT compilation, garbage collection, etc (though LLVM does support those). Instead it’s just a matter of building a faster GCC that produces better code: not fundamentally changing GCC’s user experience.
One important aspect of LLVM’s design is that it is actually built as a collection of libraries for different pieces of code generation and optimization. This allows a compiler like GCC to use the cross-file optimization components and other optimizers, while allowing compilers for more dynamic languages to use JIT compilation and GC if they want.
Again, to reiterate, the goal is to improve GCC (both compile time and the quality of the code generated) not to fundamentally change the way it works.
Create http://gcc.opendarwin.org and create a fork; its the best solution for the current situation; let the GCC coders play their game of ‘whine about the company’ – the alternative can be seen as the one willing to take on contributions, regardless of the origin.
Apple as always tried to improve to GCC. Optimizing things for PowerPC for instance, just because they use it. And now they’re moving to an Intel target and still have to support PowerPC in the mean time they’ll try to improve GCC overall.
It’s their interest to improve GCC because they need it, so I don’t think it will hurt. Anyway, time will tell.
What code speedup can we expect after merge ? I hope LLVM make my box > 50% faster and gcc will compare to modern icc.
Eugenia, I suppose you’re new here, but this is OSNews you know… Just look at the damn numbers… Those interresting and technical news are not worth the hassle. You’ll hardly generate 25 page views, and therefore not much profit… I suggest you stick to provocative flamebait and open trolling fests. A news item that doesn’t opose at least 2 overly zealous groups is never gonna make you rich. The keywords are ‘Linux’, ‘Apple’ and ‘Microsoft’.
Remember that please. Serious readers have already moved far away from your site. Feed your trolls damn it!
I don’ t have understood very well: is llvm a precompiler for gcc? for gain better optimization?
it work actually? (also on powerpc apple machine)