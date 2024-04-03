It was Stability’s armada of GPUs, the wildly powerful and equally expensive chips undergirding AI, that were so taxing the company’s finances. Hosted by AWS, they had long been one of Mostaque’s bragging points; he often touted them as one of the world’s 10 largest supercomputers. They were responsible for helping Stability’s researchers build and maintain one of the top AI image generators, as well as break important new ground on generative audio, video and 3D models. “Undeniably, Stability has continued to ship a lot of models,” said one former employee. “They may not have profited off of it, but the broader ecosystem benefitted in a huge, huge way.”
But the costs associated with so much compute were now threatening to sink the company. According to an internal October financial forecast seen by Forbes, Stability was on track to spend $99 million on compute in 2023. It noted as well that Stability was “underpaying AWS bills for July (by $1M)” and “not planning to pay AWS at the end of October for August usage ($7M).” Then there were the September and October bills, plus $1 million owed to Google Cloud and $600,000 to GPU cloud data center CoreWeave. (Amazon, Google and CoreWeave declined to comment.)↫ Kenrick Cai and Iain Martin
As a Dutch person, I can smell a popping bubble from a mile away, even if tulipmania is most likely anti-Dutch British propaganda.
In all seriousness, there’s definitely signs that the insane energy and compute costs of artificial image and video generation in particular are rising at such an insane pace it’s simply unsustainable for the popularity of these tools to just keep rising. Eventually someone’s going to have to pay, and I wonder just how much regular people are willing to pay for this kind of stuff.
The problem with these buisness models of they need a massive scale to be viable and Everyone is battling for that top spot.
We’ve seen the same play out in other markets (like cloud suppliers or online ads) where one dominant platform emerges and the rest basically fight over the scraps. The trouble is to be a viable top player you Need to be spending millions upon millions.
Its why you’ll likely see the winner coming via the backing of one of the existing tech giants (again)
You’re not wrong to highlight the energy intensity of AI generation, however realistically it might still use much less energy, and have bargain prices, next to a human doing the job. And that’s the light we have to view it in because that’s the light employers are going to view it in.
I cannot say “how it tanked”, but rather “how they could not manage to monetize”. It could be the same different for most folks. But for open source proponents, unfortunately this is a common occurrence.
Stable Diffusion, or the underlying technical method “Latent Diffusion” is a significant leap over our collective state of the image generation knowledge, including the previous champion, GAN (Generative Adversarial Networks).
https://en.wikipedia.org/wiki/Stable_Diffusion
The main leap is, being able to generate practically unlimited amount of concepts conditioned based on a textual input, whereas GANs used to generate realistic images for a single concept with more traditional computer like input.
(Popularized by sites like “Anime Moe”, or “this person does not exist” you’d be able to generate faces with features like “has glasses”, “hair color”, and so on).
For those interested Diffusion method is based on learning how do “denoise” an image starting from a random Gaussian noise. Something that looks like 1990s static TV images when signal was not available. It is a very intelligent discovery which works by learning how to add noise from a real image over many iterations, and ending at total random, then reversing the process.
It then led to additional techniques, like “infilling” (replacing a portion of an image with a description, basically “Photoshop on steroids”), “textual inversion” (adding new concepts not available at training time), “LoRA” (conditioning the output to different styles), “image to image” and so on. Most importantly, since it is based on CLIP, an image and text multi-modal embedding model, it allowed combining models that generate text (think GPT) and images in the same pipeline.
However as the article suggests there is even a much larger imbalance between what they do (coding and model training) and what clients needs (weights download and inference). So even though they had to spend billions to get us these excellent open source models, a masters student could replicate the “client” side over a weekend and generate images on their mobile phone.
I am not sure how they could have “survived” other then going the complete opposite, patenting everything, closing the model and never releasing full weights, and becoming hostile to open source, like aptly named “Open AI”.
Btw, for those interested here is a “model zoo”, which is a gallery of models that generate different style of images (think a “meta” gallery):
https://civitai.com/
Sad, but thought provoking. It seems that they were good at achieving AI milestones, but didn’t have a viable business model to pay the bills.
You got me thinking… I wonder if there’s a way training these FOSS AI models could be done via crowd sourcing? Something like the SETI project, but for training models.
I think there are two different challenges here:
1) Incentivizing people to donate resources.
Is the existence of a FOSS model enough to convincing people to donate resources to help train it? I’m not sure, but given enough people it could be a lot of “free” compute power for the project.
2) Building a distributed training technology.
There would likely need to be a new training strategy to take advantage of highly distributed compute resources. A lot of these large models are trained using large shared memory and fast GPU interconnects. Widely available consumer GPUs are less powerful and the methods of distributing the intensive computation are less mature, but maybe there are innovative ways to divide and conquer the work to meet this challenge.
I think that a solution to these two challenges that you refer to, kind of already exists, although they are for a slightly different purpose (3D rendering), and it has been more in evidence recently in some blockchain projects (some so-called AI tokens ), perhaps more of a hype at the moment, but which could, in theory, be a case of REAL usefulness for these projects.
It is not a mere expenditure of computational power to validate payment methods that were created for many, but remain in the hands of a few.
Returning to the subject… when you talk about SETI, I remember that a few years ago there were some tokens that used this premise, from crowdsourcing to scientific computing, for example, through integration with BOINC (FoldingCoin, GridCoin, CureCoin, etc.. .). But none of them went ahead, even though they were “donations” of computing power to a noble cause, such as academic and scientific research.
More recently, projects like RNDR Token (OTOY, which develops Octane Render), Golem (if I’m not mistaken, it has something similar integrated into Blender), etc. propose this premise of distributed computing for image rendering, something like a “mega render farm” (especially useful for rendering massive amounts of frames, like what happens in video, for example).
Thus, by integrating AI into some part of the process (but not all of it, since 3D rendering is different from typical image generation), utilitarian results are generated for those who request the service and an incentivized network (those who collaborate with the network receive tokens that you can sell on exchanges or reuse in the service).