Sun Releases New Workstation and Graphics Accelerator

Guest post by Ken Crandall 2002-03-15 Oracle and SUN 21 Comments

Sun has released the new Sun Blade 2000 Workstation, featuring either 2x 1GHz or 2x 900MHz UltraSparc III CPU’s. They have also launched the XVR-1000 Graphics Accelerator which puts Sgi square in their sites, when it comes to the 3D vizualization market. What’s interesting to note, is not only the first more than 1GHz uSparc III processor, but that the XVR-1000 board is based off Sun’s new MAJC processor, with 2 VLIW cores, on-board. Let the VLIW/CISC/RISC discussion begin!

21 Comments

2002-03-15 8:08 pm

Anonymous
wouldn’t you think that sun’s prices are a bit too high??
2002-03-15 9:58 pm

Anonymous
Well of course compared to home PC prices they are very expensive but Sun are not selling home PCs.

I am constantly seeing comparsions between PCs and other systems with the PC turning out Faster and Chaper and therefore better. This is however ignoring what Sun actually sell. They don’t sell an unreliable Operating Systems, their stuff is mission critical, Microsoft may say they are selling such systems but the general experience is that Windows is not in the same league.

Secondly the hardware is better, At shipping time the Pentium 3 had 60 bugs (The Athlon only had 4). At “tape out” the UltraSparc 2 had 1 bug, tape out is generally a year before production even begins.

Sun do not sell PCs. They do not sell speed (unless you count the big multiprocessor boxes). Sun sell reliability, businesses will pay big money for that.

Why? Because it probably works out cheaper and thus faster (crashes take time) than a system which crashes.

I have seen all versions of Windows crash, I have never seen the Linux kernel crash except with hardware problems. You wont get hardware problems with Sun because Sun make the hardware and the OS.

One of the reasons PCs are so cheap is because of false economics – PC companies are not making any money. Only Apple and Dell are the ones making any profit any moment (some of the above arguments also apply to Apple BTW).

So the answer to the question depends on what you want for your money.
2002-03-15 10:28 pm

Anonymous
wouldn’t you think that it hurts them?

basically I think that they should lower the prices a bit because IBM sells about the same stuff 1.5 times cheaper than sun. I don’t mean graphics workstations but servers. those are really expensive. I guess a startup would like to spend less money on AIX or linux server for the same speed than on solaris server.
2002-03-15 10:32 pm

Anonymous
I gotta admit that OSnews is the best news site I ever visited because it posts about the same stuff slashdot does but here I can have a great messageboard conversation without trolls and wide pages, good job ELQ!
2002-03-15 11:22 pm

Anonymous
Most PC & Windows users seem to be blissfully unaware of what Sparc, Alpha, PPC computers are used for. In the semiconductor biz, Sparcs are used to run $500,000 apps to create the next gen of chips. The software must not crash, it must finish jobs that often take days. It must be able to handle x GB files without any BSODs. It usually works on SMP servers with atleast 4 or more cpus. Such cpu chips usually have several times more cache than x86 can ever afford. The fastest scsi fibre channel hard drives with 15000rpm end up in these servers, are many times faster than PC IDE rubbish drives & these are usually Raided for even more speed & reliability. Gigabit Ethernet can be found here too. Now PC may have nVidia game cards that knock the socks off most workstation graphics, but then most other PC component values are trivial compared to workstation parts. The avg pc running MS feels more like a Turing machine to me & still acts up wierd. At least BeOS gives me some of the feel of running a real computer.

Many Sparc apps are drifting over to PC due off course to Linux being familiar ground to Sun users but onlt the bottom end of the tool chain can go there due to crippled 32bit x86 architecture. This will all change when AMD 64bit x86 is released later, it should then be possible to build some pretty fancy SMP x86 workstations.

I remember 5yrs ago MS announced they were going to take over this biz segment as well with NT, for a few yrs many EDA companies switched from nix to NT, well it never took off, those who blindly followed MS are gone now. This is one place where you do see lots of Linux desktop seats (RH) as both workstations & remotes nodes.
2002-03-16 12:34 am

Anonymous
but how does it compare to IBM’s (RS/6000 and S/390) systems? They cost less than sparc.
2002-03-16 1:09 am

Anonymous
SGI opens fire on parked Sun graphics scooters
<BR>
By Andrew Orlowski in San Francisco —
Posted: 15/03/2002 at 20:09 GMT
<P>
Silicon Graphics Inc has responded to Sun’s graphics workstation announcement with guns blazing. Sun launched a very nice piece of kit earlier this week, the Sun Blade 2000 which takes top spot in the floating point benchmarks. With its new MAJC-based graphics accelerator it should compete very well in the CAD market. But Sun’s focus on visualization, and explicit references to SGI’s visualization business, has ruffed feathers in in Mountain View.
<P>
Read the http://www.theregister.co.uk/content/53/24448.html>rest at TheRegister.
2002-03-16 1:13 am

Anonymous
Please do not copy/paste *whole* articles from other web sites, but provide a link and a ‘sample text’ instead. We don’t need legal problems with other sites.

Gil Bates, I have edited out your comment, no offense.
2002-03-16 1:20 am

Anonymous
> Secondly the hardware is better, At shipping time the

> Pentium 3 had 60 bugs (The Athlon only had 4). At “tape

> out” the UltraSparc 2 had 1 bug, tape out is generally a

> year before production even begins.

Where can I find info on these bugs and bug in general in Microprocessors? This kind of thing is usually very interesting.

The Sunblade 100 workstation is pretty cheap, I’d buy one myself if I had the $2500 AU lying around just for fun of tinkering with a 64 bit box it cost less that my curent computer even less if I consider all the upgrades I’ve given it over the years.
2002-03-16 1:47 am

Anonymous
jj is right. The minimum config used in the ASIC layout shop I work in is 8 CPUS/16 Gig RAM. It’s common to have directories reach over 80 Gig in size with some individual files reaching several Gig. Windows is a joke in this environment. As for IBM systems, most EDA CAD venders ship Sun first, HP second and IBM maybe. In short, to stay in business you use Sun HW.
2002-03-16 2:35 am

Anonymous
PIII : http://developer.intel.com/design/PentiumIII/manuals/

(click “specification updates” on the navbar)

Athlon XP :

http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_739_3…

(click “revision guide”).

(For a good laugh, go check the revision guide of the K6 model 6. I have one of those. It’s scary).

JBQ
2002-03-16 2:52 am

Anonymous
>Where can I find info on these bugs and bug in general

>in Microprocessors? This kind of thing is usually

>very interesting.

This guy has a very up to date page with millions of links, apparantly very popular at AMD.

http://www.jc-news.com

and more technical:

http://www.realworldtech.com

news group: comp.arch

I believe the 60 bugs actually originated on Intel’s site, an “errata” list, AMDs 4 were listed on their site and the 1 in the UltraSparc 2 I read about in an artice sometime.

>IBM sells about the same stuff 1.5 times

>cheaper than sun

Cool! That means you can get a Sun machine for $10,000

but IBM *pay you* $5,000 for the same! Count me in 😉

Weren’t we meant to be discussing VLIW Vs RISC Vs CISC?

My take:

CISC is long since dead, AMD and Intel are both RISC internally and have been for a long time. Last CISC was Cyrix 6×86.

VLIW, interesting but needs good compiler and this is highly difficult.

RISC is still king of the hill, EV8 (the cancelled Alpha 21464) would have *destroyed* everthing in sight.

VLIW better for DSP – i.e. specific repetitive algorithums, not general purpose computing.

But the Russians:

http://www.elbrus.ru/index_e.htm

are still ahead of the west…
2002-03-16 10:57 am

Anonymous
But the Russians:

http://www.elbrus.ru/index_e.htm

are still ahead of the west…

At the CeBIT I could talk with some guys who have tight contacts with Elbrus. They told me that a lot of chief developers are gone to Sun and Transmeta. They have full specificated design for Elbrus 2k, but they don´t have a foundry. The russian foundry they used in former times can only produce 0.5um, but they need at least 0.18um. At the moment they are talking with a french company (STM maybe???), maybe then Elbrus 2k will be produced.

As you know, Elbrus is a VLIW-chip, so you usually need good compilers for them. The guy I talked to, told me that they have an own language called Awtokad (or something liker that), which is a special language for parallel constructs.

Greetings from Anton
2002-03-16 11:23 am

Anonymous
>>CISC is long since dead, AMD and Intel are both RISC internally and have been for a long

>>VLIW, interesting but needs good compiler and this is highly difficult.

>>RISC is still king of the hill, EV8 (the cancelled Alpha 21464) would have *destroyed*

>>VLIW better for DSP – i.e. specific repetitive algorithums, not general purpose computing.

All true, but now it is all down to ridiculous super speculative predicating techniques used to get speed ups, issueing & retiring as many ops per CK as possible to keep as much of the cpu logic as busy as possible. Problem is, each extra trick seems to create far more hardware cost than it pays back. I am sure I read that even though EV7,8 issues up to 8 ops/CK, it only avg retires 1.5. Even the oldest Pentiums avged similar ops/CK but the ops were far simpler & it was better matched to Dram 10yrs ago.

According to Intel (see Architecture Labs & Hyperthreading pages), they still seem to believe in the falacy that the HW should extract whatever little micro || there might be in the code. It’s about time programmers grew up & learned to do || programming directly & demand cpus that support it natively. SW has a hard enough time figuring out || intent from == code, HW has no chance. But then they would have to learn about thread safe coding etc. CSP anyone?

For my monies worth, I would far rather have a lean mean true risc that runs about 1 good op per CK & is fully hyperthreaded so that most all memory latencies are 0’ed out. Then I would put a bucket load of these stripped down cpus in my box. These used to be called uh Transputers.

JLG got it right 10yrs ago, the BeBox got more bang for the buck by having >1 cheap cpu work together than 1 single faster?? cpu, atleast it forced BeOS & apps to be highly threaded from get go. Funny thing is, dual cpu mobos are really quite cheap, but Intel, AMD are forcing use of paired expensive MP only cpus, almost killing the market off before it can start.

About 20yrs ago the Transputer 1st pushed programmers to || coding, probably pushed too hard for that time since there was still atleast another 100x == speed up still to come in the cmos future. But Intel/Alpha use of hyperthreading indicates time has come for programmers to get used to ||.

Reconfigurable Computing is where performance computing must go

Modern HW design of asics & fpga is all about || programming of a sort, Verilog & VHDL can express C like algorithms in a || register transfer level (RTL) style, the CK to CK transitions are the sequential executing of large nos of || paths or forks or threads or process. Why is this relevant to OSnews? When every cpu includes a good amount of fpga on the same chip, C code segments such as MPEG codecs can be compiled (synthesized & placed & routed) directly to the onboard fpga. The more fpga, the faster is the likely result. Expect speed ups of from 5..1000s or more.

When all computer programs are a mixture of pure SW parts ie == .exe files runs on fixed cpu, & pure HW parts ie || .bit files runs on fpga, we call this Reconfiguarable Computing, this is where the future lies for extreme performance. Funny thing is, it is already here, Altera/others have an Arm, & Xilinx/IBM has 4 PPC cores embedded in their newest fpgas. There is though no OS or environment to exploit these new toys. Similar to when 6502s floated around before Woz came along.

To get an idea of possible speed ups, example an MPEG codec.

Assume that 1 line of C code is executed in 1 cpu CK cycle, ie a += b; simple statements.

If I write C code that executes a SW MPEG codec, it runs on a PC in some ms. The kernal of the code say a DCT routine might require say x lines of C code per pixel. The kernal is executed for each pixel, per frame.

If I write C code that simulates a chip design of a HW MPEG codec, it runs anywhere from say 5 to 10^5 times slower than SW on same PC, but then it is meant to be fabbed as a chip or fpga. The 5 x slower SW is due to writing SW model in a more structural RTL way. The 10^5 x slower SW would be a very detailed event logical simulation, we can ignore here. The 5 x slower code will indicate that the kernal of the codec executes some 5x lines of C code as equivalent to 1 CK cycle of an equivalent 1 pixel chip. That kernal will take a few us on avg to complete.

If the same C (or more practically Verilog/VHDL) code for chip is now synthesized to an fpga it will likely run those 5x lines of code equivalent at say 1/16 of the cpu CK rate (fpgas are intrinsically several times slower than riscs on same process technology). The speed up is therefore x/80. This is for 1 pixel engine. If the pixel engine is replicated n times, the C code must be n times slower, but the HW will now be n times relatively faster still.

Speed up is nx/vt.

n is no of || engine copies that can fill fpga, as many as possible

x is no of avg executed C lines of code SW codec would be equivalent to HW engine

v is the typical penalty for rewriting C code as RTL Verilog & simulating on a C cycle simulator

t is the technology penalty for fpgas being slower than risc on same fab.

n could be 1 to perhaps 100s depending on kernal domain

x could be 10 to 1M or more depending on kernal domain, bigger x => smaller n

v in my experience can be about 5,

t could be P4 @2GHz v MicroBlaze @125MHz ie 16, but this is a little unfair on fpgas

A full networking codec chip that was designed recently gave n=1,x=400k, so speed up is about 5k. As a chip design though it took a man yr, a programmer would need to be much more efficient and would expect lower improvements. An automatic C to fpga tool using HandelC or similar as input, would compile an MPEG4 decoder source in a few sec, & give 5x speed up.

>>But the Russians: http://www.elbrus.ru/index_e.htm are still ahead of the west…

Interesting paper, looks like they built somthing but my russian is nyet. However, the english paper implies that they are thinking of IA64 & Transmeta as weaker versions of their own work. Ultimately, though it doesn’t matter, since Reconfigurable Computing supercedes all.

Apologies for length, too much time around here
2002-03-16 1:21 pm

Anonymous
A simpler example of Reconfigurable Computing is a reverse WinModem.

Imagine 2 PCs each with a WinBoard for some heavy DSP function. The driver includes both a .exe & a .bit file. 1 PC uses only the exe file and uses 5% of the host cpu to emulate the DSP using MMX etc. The 2nd PC uses the host to temporarily load the .bit file into onchip fpga & uses up 0.5% of host cpu in the overhead of swapping HW in out & other control overheads. Net result is 10x speed up.

As the tasks get bigger & swapping overhead remains small, the speed up could be much larger. Example Power Line Nic card needs to perform real time FFT for adaptive RF line equalization, the workload could easily equal 10B *+’s per sec, or about a 10G P4 equivalent PC or 500% of 2G PC. The RC driver though could fill the same fpga assuming it is big enough, & overhead might still be 1% or so to get Nic data in/out, more work done, less talk. Speed up is now 500x. Such modems do exist but in a fixed external card. A WinModem version would only need the RF analog interface & host interface.
2002-03-16 4:07 pm

Anonymous
do you think that native compilation is going to stay or does the future lie in interpreted languages?
2002-03-16 5:24 pm

Anonymous
>>CISC is long since dead, AMD and Intel are both RISC internally and have been for a long time.<<

Though AMD and Intel chips may now be RISC in concept, they are still CISC in nature… the work is still done at the hardware level which makes the decoding and generating CPU instructions a 2 step process, when RISC requires the code to be optimized at the software level, though the AMD/Intel concept is not too shabby and makes it easier for programmers who don’t have time to optimize code before runtime!
2002-03-17 3:38 pm

Anonymous
IBM 390 cpu is cisc. those S/390 (zSeries) are still alive and SMP kicking
2002-03-17 6:11 pm

Anonymous
.. It is quite some time back so I am not really sure anymore… but: I think there was a MAJOR flaw in the CPU so they had to greatly cut back performance in order to make it work properly, again?! I don’t care what they ‘officialy’ admit being a bug, I care about the outcome… although I am not a Sun-expert, following up inquirer.net, Sun will be dead as fuckin fried chicken if they move on as they do currently…
2002-03-18 3:59 pm

Anonymous
>but how does it compare to IBM’s (RS/6000 and S/390) >systems? They cost less than sparc.

I don’t know enough about current RS/6000 pricing to comment, but S/390s, including the new “linux only” system, are massively expensive. From a January article on CNET:

“A system with one of its four processors activated costs about $400,000, McCaffrey said. Prices for switching on the other CPUs, or “engines” as they’re called in the mainframe realm, haven’t yet been determined.”

http://news.com.com/2100-1001-822771.html

With regards to:

>IBM 390 cpu is cisc. those S/390 (zSeries) are still alive >and SMP kicking

Well, yes and no. Isn’t the max number of processor elements still sixteen in a single mainframe? And it’s not like they’re known as speed demons for their general raw processor performance. Great RAS and very good transaction performance, especially when running the TPF operating system. But you don’t see IBM publishing many “classical” benchmarks anymore.

–Jeffrey Boulier
2002-03-19 7:19 am

Anonymous
Until the memory can always keep up with the processor, you get more bang for the buck by sending complex instructions down the pipe.