Remember last year, when we reported that the Red Ventures-owned CNET had been quietly publishing dozens of AI-generated articles that turned out to be filled with errors and plagiarism?
The revelation kicked off a fiery debate about the future of the media in the era of AI — as well as an equally passionate discussion among editors of Wikipedia, who needed to figure out how to treat CNET content going forward.[…]
Gerard’s admonition was posted on January 18, 2023, just a few days after our initial story about CNET‘s use of AI. The comment launched a discussion that would ultimately result in CNET’s demotion from its once-strong Wikipedia rating of “generally reliable.” It was a grim fall that one former Red Ventures employee told us could “put a huge dent in their SEO efforts,” and also a cautionary tale about the wide-ranging reputational effects that publishers should consider before moving into AI-generated content.↫ Maggie Harrison Dupré
Excellent response by Wikipedia. Any outlet that uses spicy autocomplete to generate content needs to be booted off Wikipedia.
I know “spicy autocomplete” is supposed to be some sort of derogatory term for AI, but I don’t understand what it’s supposed to mean.
It’s a reference to the way the GPT AI models generate text one word at a time. However Thom’s under a wrong impression that “spicy autocomplete” is a problem. The problem is not (and has never been) outputting text one word at a time because a black box can do this intelligently and in a mathematically equivalent way to any other way of outputting text. The problem is that said black box cannot differentiate between fact and fiction, which has absolutely nothing to do with the order in which words are written.
Nah, it is autocomplete because it simply outputs part of a training data that are most frequently coexisting with the query in the said training set, just like the standard autocomplete outputs training strings which begin with something close to the query. And spicy because of hype, I guess. Black or white, for a system based on frequency, “truth” (output fitness it effectively optimizes) only depends on popularity of certain statement in the training, so won’t be too reliable given what can be found on the Internet (statistically).
Surely OpenAI and alikes are using numerous tricks and biases to tweak the reliability of the output, but follow some misinformed or bizarre topic to get either falsehoods or stochastic noise.
mbq,
You’re assuming such a predictive oracle can’t create coherent and intelligent output, but that’s not a logical conclusion. To be genuinely fair, we need to judge these oracles as black boxes without bias towards their implementation.
I hereby declare the following axiom:
An oracle that predicts intelligent output must itself be intelligent.
Assume we have oracle A, that we declare to be intelligent. The implementation of oracle A, and even if it’s human or not, is irrelevant. Now oracle B, a predictive model, is trained to reproduce the outputs of oracle A with the exact same statistical odds. In principal, Fairness requires us to be blind to the mechanisms and just look at the inputs and outputs, and since they produce the same outputs then in principal they are indistinguishable.
Just to be clear, I’m not claiming chatgpt is a perfect “oracle b”, but I am absolutely claiming that your and Thom’s justification for dismissing predictive “autocomplete” AI models is baseless and biased. It’s perfectly justifiable to generate intelligent output this way.
What? You mean words vomited by a stochastic parrot are not facts? Next time you tell me I shouldn’t use my stochastic parrot to present a legal case to a judge:
https://yro.slashdot.org/story/24/02/29/2124254/bc-lawyer-reprimanded-for-citing-fake-cases-invented-by-chatgpt
PS: Calling things that aren’t AGI “AI” (especially without any qualifier as to what kind of non-AGI “AI” it is) has created mass confusion in the general populace that those things can’t actually think.
kurkosdr,
I’m guilty of that. People who study AI don’t assume AI means AGI. I think of AI in terms of solving specific problems like chess, jeopardy, handwriting/voice recognition, language models, etc. But you may be right that the general population might not know/understand the distinction.
I no longer consider Wikipedia a “generally reliable” source after “censorship” issue.
The Catalonian Wikipedia blocked me for stating that Étienne Terrus (painter, friend of Matisse and others), who was born in France was French instead of “Catalonian of the North”, which is indoctrination about a false “Catalonian Countries” (Països Catalans) for the criminal independentists (Spanish Constitution Art. 2 literally states that Spain is an “indivisible territory”).
So, Wikipedia is nothing more today that a source for indoctrination and false information.
franzrogar,
I agree the wikipedia moderators certainly aren’t perfect, they’re humans with human biases. Many of them don’t like being contested and they have the power to promote one view over another. I think this is really hard to solve in a systematic way though. While they are Imperfect, I feel they do a better job than stack exchange, which can be ruined by overzealous moderators. In that case, I blame a faulty incentive system that rewards doing something over doing nothing.
It would be interesting to use AI for moderation, it could be better than humans at being impartial. However this trait is also exploitable. A good example is our extremist politics where politicians want to spread their lies in place of facts. Arguably it’s bad to be impartial there. Most of us here on osnews are better educated and see through at least the most egregious political lies, but for better or worse there are hoards of uneducated voters who are extremely gullible and exposing them to false information is harmful. Ironically, the exact same thing is true of AI itself. AI is great at using the information it has, but it has no compass for the truth beyond that due to the old “garbage in garbage out” problem.
Also, human moderator, (Thom, probably), i think the fact that comment has ended in the queue due to “AI” filtering it out for the bad words in it pretty much demonstrates my point.
(Come back later to see my original comment if it has been manually approved)
The123king,
WordPress doesn’t use AI but rather rule based “moderation” to stop spam. IMHO it’s not that great. Long term users ought to get the benefit of doubt and not have our posts blocked so trivially. New accounts are much more likely to be used for spam and that should be reflected in the rules. My posts used to be regularly blocked by wordpress for including too many links. I assume your post was flagged by the same rule. This is why I remove the “http://” part so that wordpress doesn’t flag it as a false spam positive 🙁
The comment was regarding AI making decisions regarding moderation and why it’ll suffer from the AI equivalent of the “Sc*nthorpe problem”
The123king,
I wasn’t familiar with that, but I see what you mean.
I agree with you sometimes dumb filters can fail in very silly ways. Although in principal you could train an AI model to replicate human moderation much more accurately using large training sets. This would be smart enough to look at context, which many rules engines fail to do. But even then, it’s still subject to bad training data and GIGO. Also, a moderation AI could be used to train a combative AI that has the opposite goal and is very proficient at evading the first AI. This could actually make things worse.
Idk, I’ve long disagreed with wikipedia on what is or is not a reliable source. There are articles that I’ve seen rejected due to lack of reliable sources from site XYZ, but there are other articles over there that only have sources from XYZ. Sometimes you just can’t argue with the powerful mods. So like the web there is a good amount of good information, and a lot of crap on wikipedia. Eh, its free and volunteer based and does more good than harm.
Indeed, I rarely use it as more than base fact-checking and I always follow the source before simply taking an article at face value. It’s impossible to separate human-generated content from human bias, especially for any emotionally or politically charged subjects.
My most recent encounter with a grievous censorship edit on Wikipedia involved a complete whitewashing of a potentially malicious software product, likely done by either an employee of the company or an ardent fan of the software. All controversial and potentially negative factual information about the product was scrubbed from the article, and now it reads like an advertisement (which ironically is strictly forbidden by Wikipedia’s rules and guidelines). Any attempt to change it gets reverted and the IP address that restored the factually correct information retrieved from the Internet Archive is banned. Ask me how I know.
Bill Shooter of Bul,
You shot the Bul on the head 🙂
Wikipedia has tons of great information, but not always and sometimes they fall short of encyclopedic goals. I’ve had some disagreements with wikipedia mods, but the mods always win, which the privilege of being a wikipedia mod.
Still it’s a such a useful reference tool. I’d rather have wikipedia than not.