Dave Miller Boots Linux Kernel on Sun’s Niagara

Thom Holwerda 2006-02-18 Linux 32 Comments

The Linux kernel has booted on top of the sun4v hypervisor on Sun’s new Niagara CPU (just the kernel, there was no root filesystem). Sun has been very active in trying to get Linux support for their Niagara.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

32 Comments

2006-02-18 6:02 pm
agentj
Yummy. 32 CPUS. I wish I had that stuff at home. Worth porting Quake ]I[ Arena

2006-02-18 6:42 pm
Get a Life
It has eight cores, each with four “threads”. Even if it had 32 actual processors, it would still be a really bad target for Quake 3.

2006-02-18 8:19 pm
_LH_
With 32 CPU’s, let them be virtual or not, that machine should still be a real killer for 100+ user shell server.
Edited 2006-02-18 20:26

2006-02-18 8:33 pm
Get a Life
What exactly do you mean by “shell server?”

2006-02-18 8:54 pm
_LH_
Something most people don’t even tend to recognise these days, a Unix box providing shell access with compilers and email programs and having tens or hundreds of simultaneus users.
2006-02-18 9:08 pm
Get a Life
I feared you meant some best.com-like service. No, the T1 wouldn’t be especially adept at this. It’s more useful if you want to obtain some pretty high specweb2005 scores. If you want to provide 150 people with accounts from which to run irssi go dig out an old Pentium Pro. If you want a general-purpose processor the T1 isn’t it.
2006-02-19 11:12 pm
vegburner
_LH_ :
“shell access with compilers and email programs and having tens or hundreds of simultaneus users”
Get a Life :
“If you want to provide 150 people with accounts from which to run irssi go dig out an old Pentium Pro”
An old PPro is nice until those 150 people start using make and gcc. Then it’s hell for everyone. And that’s exactly what a shell server is : hellish, if it hasn’t got enough power to cope with the load.
And in that role, I bet Niagara is a very good choice…
2006-02-20 12:14 pm
Get a Life
No. The T1 does not excel at general-purpose integer computation. That is not its design space. And in fact if you intend to provide a build system for large projects for 100 people you will purchase a build cluster (not of T1s) and setup a job queue. If you’re just compiling small programs then an old Pentium Pro will do just fine. The T1 is a single-issue processor, meaning only one thread on any one core executes concurrently. The other threads in Sun’s CMT exist to mitigate stall scenarios, which are going to happen quite often in certain workload/software configurations or often in general in the case of the T1 because of its lack of speculative execution and reordering. The design does not favor typical compilation scenarios. That’s one bet I’ll be glad to take.
2006-02-20 1:55 pm
JonAnderson
The other threads in Sun’s CMT exist to mitigate stall scenarios, which are going to happen quite often in certain workload/software configurations or often in general in the case of the T1 because of its lack of speculative execution and reordering.
T1 switches hardware strands every cycle if there are
other strands in a runnable state. If a strand is not
runnable it’s cycle is given up to another strand that
is. This is what gives it very high integer throughput
but lesser performance on a single threaded workload.
As long as ou have got something else for it to do it
is a very efficient design.
2006-02-20 5:12 pm
Get a Life
It’s capable for transaction processing, but it isn’t meant for and isn’t adept at being the center of a build cluster. And the threads in the design can mitigate the performance impact of other aspects of its design, they don’t afford scalability. That is why going around pretending that this is a 32-processor computer is fundamentally wrong. It’s less accurate than suggesting a P4 with hyper-threading has two processors, because Intel’s SMT implementation at least affords actual concurrency.
2006-02-20 5:42 pm
JonAnderson
I quite agree with your points. I am not sure who is
going around pretending that it’s a 32 processor
computer though.
2006-02-20 5:50 pm
Get a Life
Please look at the start of the thread:
Yummy. 32 CPUS. I wish I had that stuff at home. Worth porting Quake ]I[ Arena
With 32 CPU’s, let them be virtual or not, that machine should still be a real killer for 100+ user shell server.
It is common here (and in other places where people aren’t as technically oriented as their hobbies might point to) for people to mistakenly assume that the Niagara is something that it isn’t. They see that it has “8 cores” or “4 threads per core” and they start thinking that it’s just better at everything than any other processor with fewer cores. The same exact thing happens when the Cell is mentioned. Sun provides a lot of solutions for different problem domains, and it’s best for the consumer to purchase the right solutions for the right problems. It’s probably also best for these companies if people clearly understand their strengths and weaknesses, but unfortunately that is not the case.
2006-02-21 9:12 am
JonAnderson
Sorry, I didn’t realize that anybody was taking that
post seriously as it seemd very much tongue in cheek
to me. T1 does have 4 hardware threads per core. These
are presented to the operating system (Solaris) as
32 virtual CPU’s which are used by the dispatcher to
run software threads. As you correctly point out,
there is only one instruction pipeline per core so
the maximum amount of threads which can be executing
concurrently is eight.

2006-02-18 9:13 pm
CaptainPinko
Worth porting Quake ]I[ Arena
True, provided you could heavily thread the application and remove damn-near-all of the floating-point code. The 8 cores sharte one floating-point unit IIRC. However, with advances in gfx cards and the upcoming physix(sp?) card that is not completely unfeasible.
Frankly, I think the most exciting prospect would be if a ray-tracer could be multi-threaded (should be “embarassingly parallel”) and rely on only integer code (I have no idea how this demand would work in practice). That would make Sun’s Niagara one hell of a render farm in a box.

2006-02-18 10:11 pm
Thom Holwerda
The 8 cores sharte one floating-point unit IIRC.
This will be ‘fixed’ in Niagara 2. Niagara 2 will have one fpu per core.

2006-02-19 8:36 am
CaptainPinko
This will be ‘fixed’ in Niagara 2. Niagara 2 will have one fpu per core.
I believe you are referring to “Rock” and that fix comes IIRC correctly at reducing the number of cores.
Frankly, I’m not sure if the number of FPUs is a big deal. After all, how often are they used outside of scientific/engineering apps? Webservers, databases, file servers don’t need it. I’m willing to be that compilers would sooner benefit from more cores (especially if they were threaded… but parallel helps too) than they would from a faster/more FPU.
I really don’t think FPU is a bottleneck in the target market.

2006-02-19 10:47 am
kaiwai
I believe you are referring to “Rock” and that fix comes IIRC correctly at reducing the number of cores.
IIRC, the core which Niagara is based on is the old UltraSPARC II, because of its pretty basic design, so we’re talking about a fairly old style FPU without the VIS that comes with the UltraSPARC IIe/III/IV/IV+ processors that are out now.
Also, the next version will have the same number of cores, except it’ll be 64 threads and SMP capable, so you can imagine a large 32 CPU machine of Niagara 2 processors; couple that with an improved FPU; it would be the ideal machine for Oracle crunching but also a large server hosting sessions for hundreds of SUN Ray clients.
2006-02-20 10:42 am
JonAnderson
IIRC, the core which Niagara is based on is the old UltraSPARC II, because of its pretty basic design, so we’re talking about a fairly old style FPU without the VIS that comes with the UltraSPARC IIe/III/IV/IV+ processors that are out now.
This is a myth. Niagara is a clean sheet design.
2006-02-19 3:43 pm
ceo1
“Frankly, I’m not sure if the number of FPUs is a big deal. After all, how often are they used outside of scientific/engineering apps?”
Well, that includes every tier 1 & 2 oil company, almost every seismic data processing company, a lot of life sciences/biotech, military & national labs … After all, why worry about such small markets 😉 ?
Be sure that these markets are on Sun’s radar.
-CEO
2006-02-19 4:12 pm
Get a Life
They aren’t the target market of the T1.
2006-02-20 7:45 am
Arun
Frankly, I’m not sure if the number of FPUs is a big deal. After all, how often are they used outside of scientific/engineering apps? Webservers, databases, file servers don’t need it. I’m willing to be that compilers would sooner benefit from more cores (especially if they were threaded… but parallel helps too) than they would from a faster/more FPU.
You are quite correct there. Niagara was designed for tier 1 netwrok facing loads like webservers and database servers. Those loads are traditionaly don’t have a lot of floating point code.
Edited 2006-02-20 07:47
2006-02-20 10:34 am
JonAnderson
[/i]I believe you are referring to “Rock” and that fix comes IIRC correctly at reducing the number of cores.
[i]
Niagara 2 will also have an FGPU per core (as opposed to
per die). It will also support a lot more crypto
alogrithms in hardware. It will still primarily have the
same application envelope as Niagara 1. Adding an FP unit
per core is not the only enhancement in Niagara 2 either.
Rock is targetted to have more single threaded
performance in FP and INT codes.

2006-02-18 8:07 pm
transputer_guy
Well the real point is that multi threaded multi cores are going to be the norm so this will give Linux developers a chance to practice building some “embarassingly parallel” apps across large nos of real HW threads. Now its get more interesting if there is ever a MT MC x86 to play on. I’d like to see some of the EDA tools I use to do FPGA design get targed to these sorts of chips rather than see games.

2006-02-18 8:31 pm
Get a Life
The Pentium D EEs have hyperthreading. They call them the “Pentium Processor Extreme Edition” now or some such. It isn’t actually clear if SMT will be the norm in the x86 market in the near future, since neither AMD nor Intel have said anything (?) about it.

2006-02-18 9:07 pm
flav2000
This is a good start. Maybe this will give Linux a chance to improve on its SMP code. With 8 cores and 4 threads in each, I think a new level of optimization than what’s currently in use is needed.
I wonder how Linux compares in this environment compared to Solaris 10 at this time? Anybody know?

2006-02-18 9:10 pm
Get a Life
People do realize of course that linux currently runs on computers with more than 8 processors, right?
2006-02-18 10:02 pm
rayiner
The Linux scaling work has already been done on 32-64 processor machines. SGI already sells their 128 processor Altix systems with Linux.

2006-02-18 11:53 pm
renox
Yes, but does the stock Linux kernel works on a 128 processor Altix?
Or does the Altix requires SGI variant of Linux?
A part of the scaling has been integrated in the stock Linux, but I’m not sure everything has been integrated.

2006-02-19 5:26 am
CrLf
“Yes, but does the stock Linux kernel works on a 128 processor Altix? Or does the Altix requires SGI variant of Linux?”
I remember reading some posts on the fedora-devel list from some people inside SGI which mentioned that they had sucessfully booted the fedora kernel (ia64) on an Altix.
This was around the time FC3 was being developed, so I guess the stock kernel should have all the necessary changes by now (at least the ones that matter, maybe there are some Altix-specific patches still maintained outside the kernel by SGI, but that’s not very relevant… do you know many people with iron like this?).

2006-02-19 5:21 am
CrLf
“SGI already sells their 128 processor Altix systems with Linux.”
They also sell 256 and 512 CPU versions, capable of running a single Linux instance.
Linux doesn’t exactly *need* niagara boxes to give developers a chance to improve its SMP capabilities. But that will certainly help, since these boxes are cheaper than an Altix or other similar “massive SMP” machines.
But most interesting is the “8 cores * 4 threads per core” thing, because it will allow Linux to get some improvements in the areas that allow it to handle cores/threads(SMT) in a slightly different way than real independent processors. This will help shake bugs and clean up the design, which will reflect in better performance for the masses (read: multi-core and/or hyperthreading on x86(-64)).

2006-02-18 9:54 pm
Bonus
Also how will haveing this chip under the GPL help the software process. I think it’s good because it will stop companies from overcharging and also get graphics working allot better.
2006-02-19 12:50 am
EmmEff
I believe it was David Miller who did one of the original ports of Linux to the SPARC sun4m CPUs way back when. Props to Dave. I wish I had his Linux kernel hacking talent.