Home > Linux > Dave Miller Boots Linux Kernel on Sun’s NiagaraDave Miller Boots Linux Kernel on Sun’s Niagara Thom Holwerda 2006-02-18 Linux 32 CommentsThe Linux kernel has booted on top of the sun4v hypervisor on Sun’s new Niagara CPU (just the kernel, there was no root filesystem). Sun has been very active in trying to get Linux support for their Niagara.About The Author Thom HolwerdaFollow me on Twitter @thomholwerda 32 Comments 2006-02-18 6:02 pm agentjYummy. 32 CPUS. I wish I had that stuff at home. Worth porting Quake ]I[ Arena 2006-02-18 6:42 pm Get a LifeIt has eight cores, each with four “threads”. Even if it had 32 actual processors, it would still be a really bad target for Quake 3. 2006-02-18 8:19 pm _LH_With 32 CPU’s, let them be virtual or not, that machine should still be a real killer for 100+ user shell server.Edited 2006-02-18 20:26 2006-02-18 8:33 pm Get a LifeWhat exactly do you mean by “shell server?” 2006-02-18 8:54 pm _LH_Something most people don’t even tend to recognise these days, a Unix box providing shell access with compilers and email programs and having tens or hundreds of simultaneus users. 2006-02-18 9:08 pm Get a LifeI feared you meant some best.com-like service. No, the T1 wouldn’t be especially adept at this. It’s more useful if you want to obtain some pretty high specweb2005 scores. If you want to provide 150 people with accounts from which to run irssi go dig out an old Pentium Pro. If you want a general-purpose processor the T1 isn’t it. 2006-02-19 11:12 pm vegburner_LH_ :“shell access with compilers and email programs and having tens or hundreds of simultaneus users”Get a Life :“If you want to provide 150 people with accounts from which to run irssi go dig out an old Pentium Pro”An old PPro is nice until those 150 people start using make and gcc. Then it’s hell for everyone. And that’s exactly what a shell server is : hellish, if it hasn’t got enough power to cope with the load.And in that role, I bet Niagara is a very good choice… 2006-02-20 12:14 pm Get a LifeNo. The T1 does not excel at general-purpose integer computation. That is not its design space. And in fact if you intend to provide a build system for large projects for 100 people you will purchase a build cluster (not of T1s) and setup a job queue. If you’re just compiling small programs then an old Pentium Pro will do just fine. The T1 is a single-issue processor, meaning only one thread on any one core executes concurrently. The other threads in Sun’s CMT exist to mitigate stall scenarios, which are going to happen quite often in certain workload/software configurations or often in general in the case of the T1 because of its lack of speculative execution and reordering. The design does not favor typical compilation scenarios. That’s one bet I’ll be glad to take. 2006-02-20 1:55 pm JonAndersonThe other threads in Sun’s CMT exist to mitigate stall scenarios, which are going to happen quite often in certain workload/software configurations or often in general in the case of the T1 because of its lack of speculative execution and reordering.T1 switches hardware strands every cycle if there areother strands in a runnable state. If a strand is notrunnable it’s cycle is given up to another strand thatis. This is what gives it very high integer throughputbut lesser performance on a single threaded workload.As long as ou have got something else for it to do itis a very efficient design. 2006-02-20 5:12 pm Get a LifeIt’s capable for transaction processing, but it isn’t meant for and isn’t adept at being the center of a build cluster. And the threads in the design can mitigate the performance impact of other aspects of its design, they don’t afford scalability. That is why going around pretending that this is a 32-processor computer is fundamentally wrong. It’s less accurate than suggesting a P4 with hyper-threading has two processors, because Intel’s SMT implementation at least affords actual concurrency. 2006-02-20 5:42 pm JonAndersonI quite agree with your points. I am not sure who isgoing around pretending that it’s a 32 processorcomputer though. 2006-02-20 5:50 pm Get a LifePlease look at the start of the thread:Yummy. 32 CPUS. I wish I had that stuff at home. Worth porting Quake ]I[ Arena With 32 CPU’s, let them be virtual or not, that machine should still be a real killer for 100+ user shell server.It is common here (and in other places where people aren’t as technically oriented as their hobbies might point to) for people to mistakenly assume that the Niagara is something that it isn’t. They see that it has “8 cores” or “4 threads per core” and they start thinking that it’s just better at everything than any other processor with fewer cores. The same exact thing happens when the Cell is mentioned. Sun provides a lot of solutions for different problem domains, and it’s best for the consumer to purchase the right solutions for the right problems. It’s probably also best for these companies if people clearly understand their strengths and weaknesses, but unfortunately that is not the case. 2006-02-21 9:12 am JonAndersonSorry, I didn’t realize that anybody was taking thatpost seriously as it seemd very much tongue in cheekto me. T1 does have 4 hardware threads per core. Theseare presented to the operating system (Solaris) as32 virtual CPU’s which are used by the dispatcher torun software threads. As you correctly point out,there is only one instruction pipeline per core sothe maximum amount of threads which can be executingconcurrently is eight. 2006-02-18 9:13 pm CaptainPinkoWorth porting Quake ]I[ Arena True, provided you could heavily thread the application and remove damn-near-all of the floating-point code. The 8 cores sharte one floating-point unit IIRC. However, with advances in gfx cards and the upcoming physix(sp?) card that is not completely unfeasible.Frankly, I think the most exciting prospect would be if a ray-tracer could be multi-threaded (should be “embarassingly parallel”) and rely on only integer code (I have no idea how this demand would work in practice). That would make Sun’s Niagara one hell of a render farm in a box. 2006-02-18 10:11 pm Thom HolwerdaThe 8 cores sharte one floating-point unit IIRC.This will be ‘fixed’ in Niagara 2. Niagara 2 will have one fpu per core. 2006-02-19 8:36 am CaptainPinkoThis will be ‘fixed’ in Niagara 2. Niagara 2 will have one fpu per core.I believe you are referring to “Rock” and that fix comes IIRC correctly at reducing the number of cores.Frankly, I’m not sure if the number of FPUs is a big deal. After all, how often are they used outside of scientific/engineering apps? Webservers, databases, file servers don’t need it. I’m willing to be that compilers would sooner benefit from more cores (especially if they were threaded… but parallel helps too) than they would from a faster/more FPU.I really don’t think FPU is a bottleneck in the target market. 2006-02-19 10:47 am kaiwaiI believe you are referring to “Rock” and that fix comes IIRC correctly at reducing the number of cores.IIRC, the core which Niagara is based on is the old UltraSPARC II, because of its pretty basic design, so we’re talking about a fairly old style FPU without the VIS that comes with the UltraSPARC IIe/III/IV/IV+ processors that are out now.Also, the next version will have the same number of cores, except it’ll be 64 threads and SMP capable, so you can imagine a large 32 CPU machine of Niagara 2 processors; couple that with an improved FPU; it would be the ideal machine for Oracle crunching but also a large server hosting sessions for hundreds of SUN Ray clients. 2006-02-20 10:42 am JonAndersonIIRC, the core which Niagara is based on is the old UltraSPARC II, because of its pretty basic design, so we’re talking about a fairly old style FPU without the VIS that comes with the UltraSPARC IIe/III/IV/IV+ processors that are out now. This is a myth. Niagara is a clean sheet design. 2006-02-19 3:43 pm ceo1“Frankly, I’m not sure if the number of FPUs is a big deal. After all, how often are they used outside of scientific/engineering apps?”Well, that includes every tier 1 & 2 oil company, almost every seismic data processing company, a lot of life sciences/biotech, military & national labs … After all, why worry about such small markets 😉 ?Be sure that these markets are on Sun’s radar.-CEO 2006-02-19 4:12 pm Get a LifeThey aren’t the target market of the T1. 2006-02-20 7:45 am ArunFrankly, I’m not sure if the number of FPUs is a big deal. After all, how often are they used outside of scientific/engineering apps? Webservers, databases, file servers don’t need it. I’m willing to be that compilers would sooner benefit from more cores (especially if they were threaded… but parallel helps too) than they would from a faster/more FPU. You are quite correct there. Niagara was designed for tier 1 netwrok facing loads like webservers and database servers. Those loads are traditionaly don’t have a lot of floating point code.Edited 2006-02-20 07:47 2006-02-20 10:34 am JonAnderson[/i]I believe you are referring to “Rock” and that fix comes IIRC correctly at reducing the number of cores.[i]Niagara 2 will also have an FGPU per core (as opposed toper die). It will also support a lot more cryptoalogrithms in hardware. It will still primarily have thesame application envelope as Niagara 1. Adding an FP unitper core is not the only enhancement in Niagara 2 either.Rock is targetted to have more single threadedperformance in FP and INT codes. 2006-02-18 8:07 pm transputer_guyWell the real point is that multi threaded multi cores are going to be the norm so this will give Linux developers a chance to practice building some “embarassingly parallel” apps across large nos of real HW threads. Now its get more interesting if there is ever a MT MC x86 to play on. I’d like to see some of the EDA tools I use to do FPGA design get targed to these sorts of chips rather than see games. 2006-02-18 8:31 pm Get a LifeThe Pentium D EEs have hyperthreading. They call them the “Pentium Processor Extreme Edition” now or some such. It isn’t actually clear if SMT will be the norm in the x86 market in the near future, since neither AMD nor Intel have said anything (?) about it. 2006-02-18 9:07 pm flav2000This is a good start. Maybe this will give Linux a chance to improve on its SMP code. With 8 cores and 4 threads in each, I think a new level of optimization than what’s currently in use is needed.I wonder how Linux compares in this environment compared to Solaris 10 at this time? Anybody know? 2006-02-18 9:10 pm Get a LifePeople do realize of course that linux currently runs on computers with more than 8 processors, right? 2006-02-18 10:02 pm rayinerThe Linux scaling work has already been done on 32-64 processor machines. SGI already sells their 128 processor Altix systems with Linux. 2006-02-18 11:53 pm renoxYes, but does the stock Linux kernel works on a 128 processor Altix?Or does the Altix requires SGI variant of Linux?A part of the scaling has been integrated in the stock Linux, but I’m not sure everything has been integrated. 2006-02-19 5:26 am CrLf“Yes, but does the stock Linux kernel works on a 128 processor Altix? Or does the Altix requires SGI variant of Linux?”I remember reading some posts on the fedora-devel list from some people inside SGI which mentioned that they had sucessfully booted the fedora kernel (ia64) on an Altix.This was around the time FC3 was being developed, so I guess the stock kernel should have all the necessary changes by now (at least the ones that matter, maybe there are some Altix-specific patches still maintained outside the kernel by SGI, but that’s not very relevant… do you know many people with iron like this?). 2006-02-19 5:21 am CrLf“SGI already sells their 128 processor Altix systems with Linux.”They also sell 256 and 512 CPU versions, capable of running a single Linux instance.Linux doesn’t exactly *need* niagara boxes to give developers a chance to improve its SMP capabilities. But that will certainly help, since these boxes are cheaper than an Altix or other similar “massive SMP” machines.But most interesting is the “8 cores * 4 threads per core” thing, because it will allow Linux to get some improvements in the areas that allow it to handle cores/threads(SMT) in a slightly different way than real independent processors. This will help shake bugs and clean up the design, which will reflect in better performance for the masses (read: multi-core and/or hyperthreading on x86(-64)). 2006-02-18 9:54 pm BonusAlso how will haveing this chip under the GPL help the software process. I think it’s good because it will stop companies from overcharging and also get graphics working allot better. 2006-02-19 12:50 am EmmEffI believe it was David Miller who did one of the original ports of Linux to the SPARC sun4m CPUs way back when. Props to Dave. I wish I had his Linux kernel hacking talent.