The “AI” bubble is showing cracks, and Microsoft ruins Excel

Thom Holwerda 2025-08-21 Office 20 Comments

It’s not AI winter just yet, though there is a distinct chill in the air. Meta is shaking up and downsizing its artificial intelligence division. A new report out of MIT finds that 95 percent of companies’ generative AI programs have failed to earn any profit whatsoever. Tech stocks tanked Tuesday, regarding broader fears that this bubble may have swelled about as large as it can go. Surely, there will be no wider repercussions for normal people if and when Nvidia, currently propping up the market like a load-bearing matchstick, finally runs out of fake companies to sell chips to. But getting in under the wire, before we’re all bartering gas in the desert and people who can read become the priestly caste, is Microsoft, with the single most “Who asked for this?” application of AI I’ve seen yet: They’re jamming it into Excel.
↫ Barry Petchesky at Defector

I’m going to skip over the mounting and palpable uneasiness that the cracks in the “AI” bubble are starting to form, and go right to that thing about Excel. Quite possible one of the most successful applications of all time, and the backbone of countless small, medium, and even large business, it started out as a Mac program to supplant Microsoft’s MultiPlan, which was being clobbered in the market by Lotus 1-2-3. It wasn’t until version 2.0 that it came to Intel, as an application that contained a Windows runtime. It was a port of Excel 2.0 for the Mac.

Anyway, it took a few years, but Excel took over the market, and I don’t think any other spreadsheet program has ever even remotely threatened its market dominance ever since. Well, not until Google Sheets arrived on the scene – it’s hard to find any useful numbers, but it seems Google Sheets is insanely popular in all kinds of sectors, at least according to Statista. They claim Google’s online office suite has a 49% market share, with Microsoft Office sitting at 29%. I have no idea how that translates into the usage shares of Google Sheets versus Microsoft Excel, but it’s a sign of the times, regardless.

One of the things you’d expect a spreadsheet to do is calculate numbers and tabulate data, and to do so accurately. The core competency of a computer is to compute, do stuff with numbers, and we’d flip out collective shit if our computers failed to do such basic arithmetic. So, what if I told you that Microsoft, in its infinite wisdom, has decided to add “AI” to Excel, and as such, has to add a disclaimer that this means Excel may not do basic arithmetic correctly?

COPILOT uses AI and can give incorrect responses.
To ensure reliability and to use it responsibly, avoid using COPILOT for:
Numerical calculations: Use native Excel formulas (e.g., SUM, AVERAGE, IF) for any task requiring accuracy or reproducibility.
Responses that require context other than the ranges provided: The COPILOT function only has access to the prompt and context provided to or referenced by the function. It does not have access to other data from your workbook, data from other files or enterprise information.
Lookups based on data in your workbook: Use XLOOKUP to look up data based on a table or range.
Tasks with legal, regulatory or compliance implications: Avoid using AI-generated outputs for financial reporting, legal documents, or other high-stakes scenarios.
Recent or real-time data: The function is non-deterministic and may return different results on recalculation. Currently, the model’s knowledge is limited to information before June 2024.
↫ Microsoft’s Excel COPILOT FUNCTION support document

Look, we can all disagree on the use of “AI”, where it makes sense, where it doesn’t, if it even does anything useful, and so on, but I would assume – for the world’s sake – that we can at least agree that using “AI” in an application used to do very important calculations for a lot of business is a really, really dumb idea? Is the person doing the bookkeeping in Excel at Windmill Restaurant, in Spearville, Kansas, properly aware of the limitations of “AI”, or are they not following technology that closely, and as such only hear the marketing and hype?

A spreadsheet should give accurate outcomes based on the input given by humans. The moment you let a confabulator loose on your spreadsheet, it ceases being a tool that can be used for anything even remotely serious. The fact that Microsoft is adding this nonsense to Excel and letting it loose on the unsuspecting public at large is absolutely wild to me, and I can assure you it’s going to have serious consequences for a lot of people. Microsoft, of course, will be able to point at the disclaimer buried in some random support document and absolve itself of any and all responsibility.

I’d like to point out that Lotus 1-2-3 probably still runs on Windows 11, for no reason at all.

About The Author

Thom Holwerda

Follow me on Mastodon @[email protected]

20 Comments

2025-08-21 5:44 pm
mlasica
While I understand your reluctance to having “AI” showed in every possible piece of software, I would argue that in the context of Excel it actually makes perfect sense. The way you approach it, I assume you have never been a heavy Excel user. The current models might not be great at math but they do analyze patterns pretty well and I have personally experimented with Gemini combined with Google Sheets with promising results.
The key here is to use the tool not for calculations, which are already handled quite well, but for deriving understanding from large amounts of data. Something that is done through BI tools like PowerBI. A lot of companies still rely on vast amounts of Excel files spread around their Sharepoint shares. Having a way of connecting these and making sense of it without having to build complex BI solutions would be fantastic.
Additionally having an assistant write formulas for you is really helpful.

2025-08-21 8:18 pm
Andreas Reichel
Greetings!
While I am at cross with Thom most time r/ AI (and maybe everything else) and although I like AI as coding help, I very much side with Thom at this one: Spreadsheets are the last reasonable use case for an AI, except maybe lookup and illustration of the functions.
In my experience, AI is great at drafting algorithms and boiler plate code. Its very bad at getting anything correct. Spreadsheets are all about getting correct numbers out and there is no algorithm since you are using predefined formulas.
And before you ask: I consider myself heavy user of Spreadsheets (being even a contributor to Apache POI).
@Thom
Look, we can all disagree on the use of “AI”, where it makes sense, where it doesn’t, if it even does anything useful, and so on,
I like the new Thom! God bless and cheers!
2025-08-21 8:23 pm
Andreas Reichel
Adding to myself: I got pretty rich from all those “Corporate spreadsheets on sharepoint”, migrating those into real software.
Spreadsheets are great for drafting and can be good as long as they are in the hand of one single user (who understands everything from A1 to ZZ32516). This biggest problem with spreadsheets is accidental wrong reference to cells in a large sheet. You will never find or understand it again.
Now imagine AI entering the room. 🙂

2025-08-21 9:32 pm
sukru
This is a “you are doing it wrong” moment for the Excel team.
Copilot and similar “AIs” are very good at language. Specifically they are LLMs after all (“Large Language Models”). They do great job on understanding and summarizing, rewiring, editorial correction, translation (sorry Thom), and of course generating structural language.
For example, one of the early things I tried was “here is my SQL schema, how can I find the top selling products in our database?”. And of course the results were impressive (Chat GPT 3 times… wow… it feels like decades ago).
Now…
If Microsoft wanted to do this, properly…
The would stop at:
Generate and understand formulas: Easily create new columns and rows that perform calculations based on your existing data. Copilot can also explain how each formula works.
https://support.microsoft.com/en-us/office/get-started-with-copilot-in-excel-d7110502-0334-4b4f-a175-a73abdfc118a
But not do:
Identify insights: You can ask Copilot questions about your data. Copilot shows insights as charts, Pivot Tables, summaries, trends, or outliers.
At least not yet, at least for non-trivial tasks.
After all, seeing and understanding the formula is something it can help, while directly jumping to conclusions might easily lead to “vibe” data analysis.

2025-08-21 9:35 pm
sukru
Self Reply:
“Very good” at translation does not mean entirely accurate:
https://www.hicom-asia.com/translation-ai-vs-human-translators-what-you-need-to-know-in-2025/
Modern AI solutions achieve 70-85% accuracy , while professional human translators deliver 95-100% accurate results .
2025-08-22 6:41 pm
Alfman verbose=1
sukru,
Copilot and similar “AIs” are very good at language. Specifically they are LLMs after all (“Large Language Models”). They do great job on understanding and summarizing, rewiring, editorial correction, translation (sorry Thom), and of course generating structural language.
For example, one of the early things I tried was “here is my SQL schema, how can I find the top selling products in our database?”.
I think LLMs are appropriate for writing the formulas/queries to then be executed by the spreadsheet/database. This is both feasible and useful. However asking the LLM to compute the results of the formula/query using NN is setting it up to fail because NNs are not particularly good at computation. Asking the LLM to perform the computation is very different than asking the LLM to create the formulas to be evaluated. A naive user may get this wrong. I think LLMs can (and likely will) become better at “outsourcing” tasks to appropriate tools in the future, but so far there is a tendency to do what’s “instructed” rather than do what’s “meant”.
For example if I ask an LLM to sort thousands of values, it should know not to attempt the sort itself but instead to call a sorting function. I believe this will continue to improve though. I read something about latest generation LLMs being able to forward requests to other LLMs, issuing tasks to non-LLM programs would be useful too.

2025-08-22 8:30 pm
Andreas Reichel
@Alfman
I think LLMs are appropriate for writing the formulas/queries to then be executed by the spreadsheet/database.
I still see a problem with it: SQL/schema is deterministic and type safe, but spreadsheets are not. Even Dates/Times are nothing but doubles. And don’t get me started on the various attempts of users expressing dates using strings.
I agree with you that AI can be helpful identifying the right formulas together with an illustration how to apply them — but I remain highly skeptical of AI doing anything itself in the spreadsheet, especially when multiple, stacking calculations are involved. The recursive nature of spreadsheets will make it impossible to assure any correctness — and spreadsheets are all about accuracy.

2025-08-22 9:22 pm
Alfman verbose=1
Andreas Reichel,
I still see a problem with it: SQL/schema is deterministic and type safe, but spreadsheets are not. Even Dates/Times are nothing but doubles. And don’t get me started on the various attempts of users expressing dates using strings.
I am critical of data typing in spreadsheets as well, among other gripes, but I don’t hold that against LLMs and I don’t see it as an inherent reason that an LLM can’t be used.
I agree with you that AI can be helpful identifying the right formulas together with an illustration how to apply them — but I remain highly skeptical of AI doing anything itself in the spreadsheet, especially when multiple, stacking calculations are involved.
Given what I’ve seen LLMs do with source code, I am already impressed with the ability to reason about functions. It doesn’t seem to be a barrier and “stacking calculations” in spreadsheets is the same idea.
I haven’t personally used LLMs on spreadsheets, so I can’t attest to that integration one way or another. But if it works in excel as well as it does with source code, then it may already be a useful acceleration aide.
The recursive nature of spreadsheets will make it impossible to assure any correctness — and spreadsheets are all about accuracy.
You are right to be concerned, honestly I am as well. I am strongly inclined to verify everything an LLM does, however I am impressed with the progress. I guess I should go try it and see how well the AI in excel works. Obviously we need to dock points when the LLM makes errors, but so do humans. It would be interesting to see an LLM go up against average humans at spreadsheet tasks and see who actually makes more mistakes.

2025-08-23 3:19 am
sukru
Alfman,
This is both feasible and useful. However asking the LLM to compute the results of the formula/query using NN is setting it up to fail because NNs are not particularly good at computation. Asking the LLM to perform the computation is very different than asking the LLM to create the formulas to be evaluated. A naive user may get this wrong. I think LLMs can (and likely will) become better at “outsourcing” tasks to appropriate tools in the future, but so far there is a tendency to do what’s “instructed” rather than do what’s “meant”.
I agree, and LLMs are actually moving in that direction for a long while.
The overall process looks like this:
1. Read and interpret the user query
2. Convert that into a structured query in the relevant language: SQL / SOAP / websearch / even python!
3. Run that query on an external service and get results.
4. Interpret results and summarize <<<
The step 4 is where things get murkier.
For example, I use Google’s Gemini for product shopping all the time:
“Give me audio mixer recommendations with stereo XLR output, at least 4 XLR inputs and a USB input, too. Should have at least basic Equalizer functionality. Phantom Mic power, Effects, and Monitor outputs would be nice to have. Should be compact desktop form factor, and I prefer a budget up to $200”
This is converted to whatever the language Google has for product search, performs that search and gets results, possibly with online reviews from web, puts them together in a table.
And then interprets the results to give me a few good recommendations from brands like Yamaha or Behringer.
Now it might miss a lot, maybe there are 40 other slightly better options. But for my task, I don’t care about perfect accuracy. “Good enough” saves me a lot of time in product shopping.
However….
Even Dates/Times are nothing but doubles. And don’t get me started on the various attempts of users expressing dates using strings.
Andreas Reichel, is right. It would be much worse in a spreadsheet setting. Not only the data structure is harder to interpret, the “small” failures might lead to much larger issues.
Say, I have a spreadsheet of all audio mixers I’m selling at my online store. If I query it for the top recommendations for a possible advertising campaign, missing out would mean I’d be spending hundreds of dollars running after false information.
That is why that “interpretation” should be at least optional and transparent.
Hi User!
Here is the information I found in the 12,345 products in the spreadsheet.
The top 5 items ranked by Column A are the following:
A, B, C, D, E (in a table format)
I have also encountered 23 errors, 12 missing price information, and 5 suspicious prices (seems too large).
Would you like to review them as well?”
2025-08-23 12:04 pm
Alfman verbose=1
sukru,
I have also encountered 23 errors, 12 missing price information, and 5 suspicious prices (seems too large).
Would you like to review them as well?”
I wouldn’t necessarily blame the LLM for the garbage-in -> garbage-out problem in a spreadsheet. I don’t recall ever coming across a human made spreadsheet with error checking. Garbage output due to unexpected input is normal in spreadsheets, which highlights the shortcoming of spreadsheets more so than LLMs. That said what you are saying is an interesting idea. An LLM might actually be able to add error checking where there previously was none.
I still think passing all the data through the LLM and asking it to is perform the computation is the wrong tool for the job, but on the other hand if the LLM had access to robust spreadsheet analytical tools working on the LLM’s behalf, then it seems very reasonable to me that these tools can & should have error checking. And then an LLM could do an excellent job interfacing between humans and the analytical tools The result would be robust error reporting along with good performance. In a way I suppose someone could accuse an LLM of cheating if it is given access to superior analytical tools that were never provided to human spreadsheet authors, but this isn’t the fault of LLMs. Regardless, spreadsheet LLM might be in an excellent position to improve the notoriously lax error checking situation. I think it would be interesting to work on these problems.
2025-08-24 11:11 am
sukru
Alfman,
Spreadsheets are designed for their simplicity. Any human above grade level education can pick them up and use them.
But, yes, that means it could be a work of “100 monkeys typing on a keyboard”. But it still works.
The good thing is, this might be starting to change. Yes, LLMs are not perfect, but LLMs are the next frontier in user interfaces. The early “voice assistants” did not bring us much other than “Alexa set up an alarm clock”, but these might finally take the baton from the GUI and allow “Star Trek The Next Generation” style voice interfaces.

2025-08-22 6:14 am
cpcf
I’m not a big Excel user, but I was forever helping office staff create and manipulate spreadsheet based data. I hated it. Now with Copilot enabled one or two initially got the knack of asking AI the right questions, they started solving the problems of other staff, and the word spread, Now the rest of the office staff picked it up pretty quickly, and I’m free to get on with important stuff.
2025-08-22 6:54 am
mbq
The best part is that they suggest you to manually undo all the typical Excel bullshit like formatting, merged cells and para-headers before running the analysis: https://support.microsoft.com/en-us/topic/format-data-for-copilot-in-excel-1604c8eb-57f1-4db1-8363-d53336228c65
2025-08-23 2:59 am
FriendBesto
What I find baffling is how massively inflated stocks due to bubbles get to figure in “economic growth”. So analysts get to report that “economy X is booming” because the price of tulips grew by orders of magnitude, while “economy Y is stagnating” because their economy doesn’t rely on tulip mania.

2025-08-24 11:19 am
sukru
FriendBesto,
The tulips were completely useless outside of decoration, however the stocks for these “bubbles” have fundamental economic realities behind them.
Remember the “dot-com” bubble? It burst and we no longer have that pesky thing called “The Internet” and everyone is back to fax machines. Just as nobel prize winning economist predicted:
https://www.snopes.com/fact-check/paul-krugman-internets-effect-economy/
Economist Paul Krugman wrote that by 2005, it would become clear that the Internet’s effect on the economy is no greater than the fax machine’s.
The same is true for AI “bubble”. It is not that much useful in optimizing power distribution, improving mobile photos in cell phones, planning large scale cluster jobs in Linux datacenters, translating and summarizing web pages, or nothing particularly useful at all.
I’d be as correct as that economist to predict this has no future. At most 10 years. The future is analog!
(Okay, the tone somehow went to deep sarcasm. Remembering that “fax machine” quote triggered it)

2025-08-24 4:59 am
the.nair
Copilot is the Sh****t of all, the amount of incorrect info it shares is disgusting. I started writing a novel and needed a name that resonates with it. It gave me a bloody incorrect name that is not a name , but it was given as a proper name etc. I built the entire backstory using that and then found out it is not correct. I deleted the copilot account. I think I will keep away from AI for anything , other than summarising some parts that are not very critical

2025-08-24 10:09 am
Alfman verbose=1
the.nair,
I started writing a novel and needed a name that resonates with it. It gave me a bloody incorrect name that is not a name , but it was given as a proper name etc. I built the entire backstory using that and then found out it is not correct.
I find your conundrum very strange. What does it mean it gave you a bloody incorrect name? Can you share it? If you ask for a president’s name and it gives you a non-president’s name, that’s bloody incorrect. But if you ask it to generate a fantasy name for a piece of fiction, how can it be “incorrect”? I don’t understand this. And even if you determined it to be incorrect only after you created the work, why can’t you simply change the name to something more suitable?
I don’t really know how good LLMs like copilot are at coming up with random names like that. The relatively small models I’ve played are noticeably repetitive. When it comes to facts, LLMs are known to hallucinate. In my testing a maliciously engineered prompt can get the LLM to provide inaccurate output as the LLM tries to go along with the request. But sometimes even an innocuous prompt can give bad information.
I don’t have time to do it now, but it would be interesting to empirically test an LLMs ability to create random output like names, numbers, countries, etc. to get a better idea for it’s ability to pick random things. I suspect humans also struggle to generate unbiased random output. One of the things that’s funny about LLMs is that I am increasingly finding that they are able to generate decent code to perform tasks that they cannot themselves perform. So being able to internalize the execution of code seems like it will be a powerful evolutionary step above and beyond using a NN to generate output exclusively.

2025-08-24 11:08 am
sukru
Alfman,
I would guess it would have more success writing a “random fantasy name generator program” in C or Python than generating those names itself.
2025-08-25 2:27 am
the.nair
i had asked for a Sanskrit name! and it conjured one rather than looking one up

2025-08-25 3:05 am
Alfman verbose=1
the.nair,
i had asked for a Sanskrit name! and it conjured one rather than looking one up
I have no idea how much Sanskrit has made it into the LLM training data sets. But assuming there isn’t much data on it, then it could be among the sort of things LLMs are known to hallucinate about.
Unfortunately LLMs lack a “quality” signal in the output, and therefor don’t have the good sense to adjust confidence levels accordingly. The result is that massive extrapolations from a tiny bit of data become stated as truth. This has made LLMs unreliable for factual information that isn’t well represented in the training data. This may improve in the future.
The newest LLMs have a mode that shows the internal “thinking” that isn’t normally shown. I find this really enlightening because you can see the process by which it decides to go with an answer. The LLM has a sort of debate with itself – think of it like sticking a debugger in a human brain. I tried talking to one such LLM about it’s own “thinking”,, but the LLM seemed completely unaware of its thought process. LLMs are evidently not programmed to remember their thinking, only the conversation happening in the open. I’m not really sure what effect it would have if we changed this.