Now that he has covered why cluster computing is a great idea and what’s involved in doing it in Parts 1 & 2, Rob Lucke concludes this series by describing how to take those first steps on the road to building a Linux cluster for your organization.
Building a Linux Cluster, Part 3: How To Get Started
2005-04-25 Linux 12 Comments
When exactly does a cluster make sense as a computing solution?
I would think because it lets you build a computer to handle more tasks without requirering one machine to have more and more power added to it to handle the project. A variety of smaller ones could be used to do the same task. Also, I believe this system is used for producing super computers sometimes too: like the one in Virginia Tech.
…when the task is large enough and can be broken into logical threads of execution/sub-tasks, or when redundency is required for a large number of systems that perform single tasks that can not be broken up.
The details, though, make all the difference; you can have a task that is large and can be broken into parts…yet, it would take more time to manage those parts and divide them than it would be to find a faster single system.
On a personal note, encoding video or audio is a task that would benifit from a modest sized cluster. Running a home accounting or spreadsheet would not, nor would a single-user game (well, the ones that exist now outside of the military). A word processor would be a silly thing to even attempt.
An OS that performs clustering well already and can be highly customized to make it as simple as possible would be ideal. (Thus, Linux and the other open/free unixes tend to be used heavily — as well as for cost reasons.)
> On a personal note, encoding video or audio is a task that would benifit from a modest sized cluster.
There aren’t many encoders that are implemented with the kind of parallelism needed to really benefit from a cluster (the best cases I’ve seen contain two or three main threads). There are even fewer that utilize a cluster message passing interface like MPI or PVM.
Outside of say, weather modeling or predicting, or oil speculation software modeling the earth’s temperatures, etc, when is using clusters a better option than say, a 4-way opteron workstation?
Someone, a while ago, had a beowulf cluster of PS2’s and while that’s awesome as a test of technology, it’s really more academic than anything else- x86 or PowerPC computers would be much better suited to the task.
So for the high-end user, when is a cluster a problem sovler?
I’m asking because I have roughly 14 computers that I’m going to give away to family members, gut for upgrades, or just sell on ebay for whatever it costs to ship em out of my basement.
None of them are the same CPU speed / ram / architecture. Do clusters require that every computer be exactly the same?
You can use transcode on Linux to create a video encoding cluster with Linux. Linux Magazine or Linux Journal had an article on it last year. Other than that, or some distributed project, I dont know why the common user would need one, other than the cool factor.
> None of them are the same CPU speed / ram / architecture. Do clusters require that every computer be exactly the same?
No, for a simple cluster the requirements on hardware aren’t very stringent. However if you have 14 old computers, any cluster you build out of it will perform worse than those 14 old computers combined. Again, there is still the issue of WHAT you are going to run on it. Don’t get me wrong, I’ve worked with Linux and proprietary clusters for a few years and they are extremely useful, but mostly to scientific and engineering applications.
> So for the high-end user, when is a cluster a problem sovler?
Rendering, transcoding, and that’s about it… Even if you built a transcode cluster, it doesn’t scale very well beyond 5 or 6 nodes. In other words, the performance of transcode on the (very small) 32 node rackmount cluster I maintain for testing won’t be much better than on a handful of UP P7s.
There are plenty of interesting things you can do with a cluster, but the easiest to execute (like, say, computing digits of PI using a Taylor method) are usually unexciting for hobbiests, and the more fun stuff (like weather simulation with packages like FOAM) is often a very serious time and effort investment.
Clusters are for _big_ parallelizable tasks. A 4-way opteron machine, if you just count CPUs, is small for a cluster. For example, our small cluster at work has 10 dual-processor pentium 4 nodes. We use this for Real Work (atmospheric science, actually); if you’re just playing around in your basement, that’s a different story.
One problem we’re running into is that some of our models do so much communication it gives little benefit to run them on our gigabit ethernet connected cluster. This is where clusters fail. A faster interconnect would literally double the cost of the nodes, so it is unclear where we should spend our funds.
Fiber Channel is capable of 2Gbps over short runs, I believe it’s called Short Haul Copper Fiber channel, where you don’t actually use fiber optic links, you can use copper-based cables, but the range is limited to like 30ft between nodes, or if it’s a star type network, between the server and nodes.
THe short haul copper stuff is pretty cheap–just look on ebay, the gear they sell is from 2 years ago, but has depreciated so fast I can afford it (but don’t have a use really.)
yeah, I was just wondering if there’s anything useful for me to do with them, by the way, I’m an electrical engineer. The circuit simulation software I run pretty much taps out at 2 CPUs though.
I think the point was that those were examples of general tasks…not that any specific program would benifit from a cluster.
> yeah, I was just wondering if there’s anything useful for me to do with them, by the way, I’m an electrical engineer. The circuit simulation software I run pretty much taps out at 2 CPUs though.
Well, I believe the VLSI reasearchers at my school do use some cluster time, I might be able to find out what simulation software they are using (I’m a humble undergrad, so I don’t know off the top of my head).