It was Stability’s armada of GPUs, the wildly powerful and equally expensive chips undergirding AI, that were so taxing the company’s finances. Hosted by AWS, they had long been one of Mostaque’s bragging points; he often touted them as one of the world’s 10 largest supercomputers. They were responsible for helping Stability’s researchers build and maintain one of the top AI image generators, as well as break important new ground on generative audio, video and 3D models. “Undeniably, Stability has continued to ship a lot of models,” said one former employee. “They may not have profited off of it, but the broader ecosystem benefitted in a huge, huge way.”
But the costs associated with so much compute were now threatening to sink the company. According to an internal October financial forecast seen by Forbes, Stability was on track to spend $99 million on compute in 2023. It noted as well that Stability was “underpaying AWS bills for July (by $1M)” and “not planning to pay AWS at the end of October for August usage ($7M).” Then there were the September and October bills, plus $1 million owed to Google Cloud and $600,000 to GPU cloud data center CoreWeave. (Amazon, Google and CoreWeave declined to comment.)
↫ Kenrick Cai and Iain Martin
As a Dutch person, I can smell a popping bubble from a mile away, even if tulipmania is most likely anti-Dutch British propaganda.
In all seriousness, there’s definitely signs that the insane energy and compute costs of artificial image and video generation in particular are rising at such an insane pace it’s simply unsustainable for the popularity of these tools to just keep rising. Eventually someone’s going to have to pay, and I wonder just how much regular people are willing to pay for this kind of stuff.
The problem with these buisness models of they need a massive scale to be viable and Everyone is battling for that top spot.
We’ve seen the same play out in other markets (like cloud suppliers or online ads) where one dominant platform emerges and the rest basically fight over the scraps. The trouble is to be a viable top player you Need to be spending millions upon millions.
Its why you’ll likely see the winner coming via the backing of one of the existing tech giants (again)
Thom Holwerda,
You’re not wrong to highlight the energy intensity of AI generation, however realistically it might still use much less energy, and have bargain prices, next to a human doing the job. And that’s the light we have to view it in because that’s the light employers are going to view it in.
Alfman,
Yes, modern AI in datacenters is heavily optimized. After all the largest cost driver is electricity, not the building or servers themselves (+ cooling which is also dependent on energy).
So, they would use the most efficient architectures possible to squeeze out every last bit of computation from the available budget.
Well,
I cannot say “how it tanked”, but rather “how they could not manage to monetize”. It could be the same different for most folks. But for open source proponents, unfortunately this is a common occurrence.
Stable Diffusion, or the underlying technical method “Latent Diffusion” is a significant leap over our collective state of the image generation knowledge, including the previous champion, GAN (Generative Adversarial Networks).
https://en.wikipedia.org/wiki/Stable_Diffusion
The main leap is, being able to generate practically unlimited amount of concepts conditioned based on a textual input, whereas GANs used to generate realistic images for a single concept with more traditional computer like input.
(Popularized by sites like “Anime Moe”, or “this person does not exist” you’d be able to generate faces with features like “has glasses”, “hair color”, and so on).
For those interested Diffusion method is based on learning how do “denoise” an image starting from a random Gaussian noise. Something that looks like 1990s static TV images when signal was not available. It is a very intelligent discovery which works by learning how to add noise from a real image over many iterations, and ending at total random, then reversing the process.
It then led to additional techniques, like “infilling” (replacing a portion of an image with a description, basically “Photoshop on steroids”), “textual inversion” (adding new concepts not available at training time), “LoRA” (conditioning the output to different styles), “image to image” and so on. Most importantly, since it is based on CLIP, an image and text multi-modal embedding model, it allowed combining models that generate text (think GPT) and images in the same pipeline.
However as the article suggests there is even a much larger imbalance between what they do (coding and model training) and what clients needs (weights download and inference). So even though they had to spend billions to get us these excellent open source models, a masters student could replicate the “client” side over a weekend and generate images on their mobile phone.
I am not sure how they could have “survived” other then going the complete opposite, patenting everything, closing the model and never releasing full weights, and becoming hostile to open source, like aptly named “Open AI”.
Btw, for those interested here is a “model zoo”, which is a gallery of models that generate different style of images (think a “meta” gallery):
https://civitai.com/
sukru,
Sad, but thought provoking. It seems that they were good at achieving AI milestones, but didn’t have a viable business model to pay the bills.
Alfman,
I agree. Paying the bills is not easy. Even Google did not release Gemini to the public before OpenAI pressured them to do so, and then now trying to get people on a $20 / month subscription for the high end version (which would essentially give the compute resources “at cost”).
sukru,
You got me thinking… I wonder if there’s a way training these FOSS AI models could be done via crowd sourcing? Something like the SETI project, but for training models.
I think there are two different challenges here:
1) Incentivizing people to donate resources.
Is the existence of a FOSS model enough to convincing people to donate resources to help train it? I’m not sure, but given enough people it could be a lot of “free” compute power for the project.
2) Building a distributed training technology.
There would likely need to be a new training strategy to take advantage of highly distributed compute resources. A lot of these large models are trained using large shared memory and fast GPU interconnects. Widely available consumer GPUs are less powerful and the methods of distributing the intensive computation are less mature, but maybe there are innovative ways to divide and conquer the work to meet this challenge.
I think that a solution to these two challenges that you refer to, kind of already exists, although they are for a slightly different purpose (3D rendering), and it has been more in evidence recently in some blockchain projects (some so-called AI tokens ), perhaps more of a hype at the moment, but which could, in theory, be a case of REAL usefulness for these projects.
It is not a mere expenditure of computational power to validate payment methods that were created for many, but remain in the hands of a few.
Returning to the subject… when you talk about SETI, I remember that a few years ago there were some tokens that used this premise, from crowdsourcing to scientific computing, for example, through integration with BOINC (FoldingCoin, GridCoin, CureCoin, etc.. .). But none of them went ahead, even though they were “donations” of computing power to a noble cause, such as academic and scientific research.
More recently, projects like RNDR Token (OTOY, which develops Octane Render), Golem (if I’m not mistaken, it has something similar integrated into Blender), etc. propose this premise of distributed computing for image rendering, something like a “mega render farm” (especially useful for rendering massive amounts of frames, like what happens in video, for example).
Thus, by integrating AI into some part of the process (but not all of it, since 3D rendering is different from typical image generation), utilitarian results are generated for those who request the service and an incentivized network (those who collaborate with the network receive tokens that you can sell on exchanges or reuse in the service).
andrenext,
I don’t have much faith in in the scalability of blockchain style projects. They’re so grossly inefficient and those inefficiencies are kind of baked into the consensus algorithms. My opinion is that this overhead for blockchain solutions is intolerable for practical applications except when consensus algorithms are of utmost importance for a project. But then again, bitcoin proves that people are willing to burn electricity emitting however many tons of CO2 emissions every year solving absolutely useless hash problems because there’s a speculative currency attached.
I view this as extremely wasteful, however if people are determined to use this energy either way, then I agree in principal it might as well do useful work at the same time. In practice though I’d worry that bitcoin miners would just continue consuming their energy and a new blockchain will just increase carbon emissions rather than help offset them.
Yeah, honestly I don’t think I would be able to do any better. Would people support FOSS development on patreon? Maybe embed compute units inside free games? It’s frowned upon to do it without permission, but what if it’s disclosed? I really don’t know what the best way is to fund it would be.
A bit tangential to our topic, but I was looking for ways to distribute blender workloads in my own small cluster but I only found 3rd party services to do that, which don’t interest me. Using multiple GPUs in the same computer works automatically but there should be an easy way to distribute the load across computers. If anyone knows of a solution in blender and not a 3rd party renderer, I’d like to see it!
Yes it is very different. Most 3d renderers are interested in the GPU’s 3d raytracing cores whereas NN training on nvidia GPUs would typically use cuda’s built in matrix functions. I agree with you that this GPU marketplace could exist, but it’s unclear to me if it would make much difference for a project like Stability AI. After all they were already renting GPUs from amazon AWS. The technology worked but the main problem was that they couldn’t afford it.
andrenext, Alfman,
I also though about a SETI@Home like approach, but quickly remembered it is not going to work. Unless you have a $60,000 priced machine sitting idle at home.
Unlike SETI, fold, and other similar projects which had small amounts of inputs per node, but large compute requirements, this not only has massive compute requirements, but also enormous RAM needs.
Basically each “step” in learning is differentiating a large network of tensors, and this can be done in parallel, bringing back values to a central (or distributed set of) parameter servers.
However, each data package is currently 300 billion floating point numbers. At FP32, that would require 1.2 TB of RAM, preferably VRAM.
Datacenters use cards like H100 with 80GB RAM each, with 4 of those per node, and then use specialized optical fabric to connect multiple of them. Basically something not sitting idle anywhere.
You can of course use CPU for slower process (at least 10x slower), and then stream the data. But it is easy to see how significantly inefficient (i.e.: completely wasteful) it is.
That is why nvidia stock has soared, and many large companies like Google or Amazon are designing their own in house AI accelerators.
I have shared this a few articles ago:
https://www.semianalysis.com/p/google-ai-infrastructure-supremacy
“Google AI Infrastructure Supremacy: Systems Matter More Than Microarchitecture”
sukru,
Well, that’s kind of the point I was making and why I mentioned the need for innovation around divide and conquer strategies. It could be very useful to optimize the training around smaller more domain specific problems and then find a way to merge these together into higher dimensional models only at the very end. I suspect these types of optimizations around a large number of locally proficient models could end up yielding other benefits too, like being more optimal for AI that learns on the fly and not be so dependent on huge static models that have to be precomputed.
As for what I meant by innovating on the divide and conquer front, for example instead of one massive model containing everything, a lot of knowledge can be compartmentalized into smaller and more specialized NN. A NN identifying different bug types doesn’t have much to do with quoting Shakespeare and there’s no reason divergent topics must be imprinted on a single giant model. Obviously smaller specialized NN would be far easier to distribute and you could train a high level NN to route topics to the appropriate specialized NN in much the same way a librarian is good at finding sources without necessarily knowing detailed information about those topics.
This could be the future of NN anyway. Not only are smaller networks much easier to train locally, but their specialization could actually enable them to make deeper insights into their respective domains (say code optimization) compared to a giant know it all model. While it’s neat that the know it all model works, it doesn’t seem to be a fundamental requirement for AI to work.
Alfman,
I would have written a larger response, but a bit time crunched now.
Wrt. divide and conquer: it is exactly how these models are trained in the first place.
The GPT4 if famously a “mixture of experts” (MoE) model, with reportedly 8x200B parameters. Which means the ones we could take on locally (3B to 7B) are already obsolete.
Secondly, the matrix operations, due to their nature would require N^2 connections when divided into N pieces, meaning requires N times more network bandwitdh.
The datacenters already use tricks like specialized optical switches with micro mirrors to reduce latency (micro seconds), and use 2TB/s local RAM and up to 400Gbit/s node-to-node connections between servers.
Compare that to our local machines with ~400GB/s RAM, and at best gigabit connections and milliseconds latencies. Add in that 20x extra transfer due to inefficient division of labor, and it is easy to see how an insurmountable task it becomes.
(Not the mention power bills or environmental damage).
sukru,
We need to be clear that it’s only “obsolete” by wasteful standards, but it does not rule out creating smaller more specialized domains that are easier to distribute. As usual though, software engineers are guilty of using all the resources that are available even if they’re not technically needed to solve the problem. Older engineers were really capable of doing more with less and maybe this is a lesson AI engineers could relearn.
When I suggested divide and conquer, I did not mean training the same large model on lesser hardware, which would have the properties you describe. What I meant was training smaller but more specialized models. I think this is going to have to happen for future NN to evolve faster anyway.
That’s another great reason to transition to smaller but more specialized NN.
It’s also worth noting that the fastest training GPU on the market is the MI300X right now… the B200 will be faster but use proportionally more power. The MI388X that will probably be available soon will also have 50% more memory than the B200.
AI cost is driven by hardware and power costs…. MI300X costs about $15k, B200 is expected to cost $30-40k…. its also worth noting that if you have a mixed AI / Scientific workload AMD is vastly faster at FP64 and uses half the power for the same FP64 workload.
You can literally buy 2 MI300X for the price of one B200… and end up with 4x the FP64 HPC compute and equivalent AI compute per watt.
cb88,
Yep, consumer grade nvidia hardware doesn’t implement FP64 natively. Emulating it is both slow and inefficient. However I would say 64bit is overkill for NN. As the number of states grows exponentially the difference between states becomes so minuscule that it’s not beneficial. 32bit is already more than enough. Some DNN even work at 8 bits or even 4 bits because the depth is so much more important than the numeric resolution of each neuron. You can fit 8 X 8 bit neurons in the same space that 1 X 64bit neuron would take.
Where I do miss 64bit types is with physical simulations.. Especially in highly chaotic systems with small numeric errors compounding over time. These types of numeric errors can be noticeable in huge open world games and space sims that use 32bit single floating point numbers. Standard floating points have 24bits of precision, which is good when you are close to the origin (0,0). But the accuracy reduces dramatically as worlds get bigger and you get farther from the origin.
You probably already know this, but for the sake of discussion say a game uses a 32bit float to define a position in meters. Physical simulations at the origin might start out on the order of ~100nm resolution. But if you move 1000km away and take the difference of two points there, you only have 10cm resolution, which will create jerky physics and graphics anomalies. We might try to mitigate this in a single player game by moving the coordinate origin to wherever the player is at so that the floating point resolution is highest near the player. This doesn’t work as well for multiplayer. games or simulations that need high accuracy across space.
So…I would say 64bit floats makes a lot of sense for open world gaming and nvidia should support them. However they’ve decided to segment the market and 64bit floats were dropped from consumer grade GPUs.
To be fair. AMD’s consumer GPUs are also not that great for FP64
HW cost is only part of the equation.
A GPU that is 2x as fast for 1/2 of the price is useless when your competitor has an already proven turn key solution that gets you compute results NOW and not in a few months.
AMD’s achiles heel has always been their SW stack.
NVIDIA sees themselves as much of a SW company as a HW vendor. The CUDA ecosystem is far too entrenched for AMD to make much of a dent, even if some of their HW is impressive. Which is a pity.
Also, NVIDIA has an scalable single image architecture for NUMA GPU interconnect. That with CUDA just makes it a no brainer for huge models, which are the ones coming into play right now and that spill significantly over the local GPU memory.
Tulip-mania is NOT propaganda. It is a way to whitewash their own failiures in 1720 when the “south sea company” almost brought down the entire economy of western europe and the american colonies.
Both were cases of FoMO, but on a scale heard of before or since.
To me its obvious, this AI craze is no different than crypto.
Its a bubble waiting to pop, new AI is popping up just like new tokens. We will soon have ai that runs on Blockchain, basically merging the two. I liked it better when it was just seti and fold @home.
No its very different. Sane people on the use side of things, rather than those powering the new AI are reporting its very useful. And it is. Right now its heavily subsidized, with the hope that the price will go down in the future due to advancements. Don’t get me wrong there is a lot of crap out there with AI, with people coming up with very dubious uses of AI, but there is a lot of amazing worth wile stuff that accelerates people’s productivity.
I think a better analogy than web3/crypto or tulipmania is … the dot com bubble. Yeah the internet is useful, and can make us all more productive, but pets.com’s of the AI world are even more overvalued and less useful.
Oh and it may destroy the world in one of many ways if we aren’t careful. I guess that’s unique to AI. Welp, its been a good run humanity.
Bill Shooter of Bul,
This, exactly! Yes, there’s lots of crap that will fail, but it is a mistake to assume that just because some startups fail, the entire AI industry will. In cases where AI improves productivity, those are going to experience long term demand by businesses looking to save on labor costs. They will gladly use AI to increase their profits. Some people are looking for reasons to portray AI as a failure; naturally there will be winners and losers but the point is this technology isn’t going away.
This is such a political hot potato that I might regret bringing it up, but it’s eerily relevant…
https://www.theguardian.com/world/2024/apr/03/israel-gaza-ai-database-hamas-airstrikes
The terminator movies may have gotten it wrong. AI may not need invulnerable military bots to get humans to destroy each other.
People tend to fear things they don’t understand.
My team is reporting tremendous amounts of productivity increase using AI tools. We can get results in hours that used use to get a week just to get the tooling ready.
I can now make an informed data-driven decision, by the next day after the morning meeting setting up the asks. When before I would have expected my guys/gals to take a few weeks to come back with the data/simulation/scenarios. With an insane amount of back/forth emails and meetings in between to steer things, cut the false starts, etc.
It’s not a “craze,” just because you don’t understand it. In contrast seti@home is a completely useless waste of electricity that has zero impact in my every day life, much less my productivity.
Stability AI burst onto the scene in the early 2010s, riding the wave of excitement surrounding artificial intelligence and machine learning. Founded by a brilliant entrepreneur with a keen eye for innovation, the startup promised to revolutionize industries with its cutting-edge AI-powered solutions. Investors flocked to fund the venture, captivated by the potential for astronomical returns.
amazing content