A statement from the University of Minnesota Department of Computer Science & Engineering:
Leadership in the University of Minnesota Department of Computer Science & Engineering learned today about the details of research being conducted by one of its faculty members and graduate students into the security of the Linux Kernel. The research method used raised serious concerns in the Linux Kernel community and, as of today, this has resulted in the University being banned from contributing to the Linux Kernel.
We take this situation extremely seriously. We have immediately suspended this line of research. We will investigate the research method and the process by which this research method was approved, determine appropriate remedial action, and safeguard against future issues, if needed. We will report our findings back to the community as soon as practical.
This story is crazy. It turns out researchers from the University of Minnesota were intentionally trying to introduce vulnerabilities into the Linux kernel as part of some research study. This was, of course, discovered, and kernel maintainer Greg Kroah-Hartman immediately banned the entire university from submitting any code to the Linux kernel. Replying to the researcher in question, Kroah-Hartman wrote:
You, and your group, have publicly admitted to sending known-buggy patches to see how the kernel community would react to them, and published a paper based on that work.
Now you submit a new series of obviously-incorrect patches again, so what am I supposed to think of such a thing?
They obviously were _NOT_ created by a static analysis tool that is of any intelligence, as they all are the result of totally different patterns, and all of which are obviously not even fixing anything at all. So what am I supposed to think here, other than that you and your group are continuing to experiment on the kernel community developers by sending such nonsense patches?
[…]Our community does not appreciate being experimented on, and being “tested” by submitting known patches that are either do nothing on purpose, or introduce bugs on purpose. If you wish to do work like this, I suggest you find a different community to run your experiments on, you are not welcome here.
Because of this, I will now have to ban all future contributions from your University and rip out your previous contributions, as they were obviously submitted in bad-faith with the intent to cause problems.
This is obviously the only correct course of action, and the swift response by the university is the right one.
Based on the names of the authors this seems like a Chinese experiment into attacking Western people once again.
From what I read, at least one of them is from Singapore.
No matter who or what outside the university instigated submitting intentional bugs, and/or vulnerabilities, the submitter has been banned. So, all seems to be okay.
The Chinese government probably has their own complete fork of linux with social credit baked in.
How about – “Computer science students with eastern spelling in their names identify a vulnerability in the way open source software is developed and tell everyone about with with their real names attached”.
We have good reason to be paranoid about Chinese and Russian attacks, but I think this allegation was premature. I hope there are more intelligent countermeasures going on under the hood back at Linux HQ.
Apparently if we’re simply going by the names of authors, it turns out a significant number of the top publications in CS, CE, and EE conferences and journals must have been a long running Chinese experiment into attacking them sensible and innocent Western people.
Let me guess, all the researchers were all woke as fuck and refused to use slave as secondary to master.
You mean, students in that U.S. university are Chinese nationals studying from the comfort in Beijing?
It seems our western friends are believing in conspiracy theories once again.
I have yet to find photographs of refugees leaving in droves from the Uyghur region in China to neighbouring countries to escape China’s genocidal intern camps. I will only provide satellite photographs of the alleged intern camps, and the west will believe it as a true gospel.
Their research confirms what I’ve suspected for ages – a “semi-anonymous” malicious contributor can inject vulnerabilities into high profile open source projects (Linux) with a very high success rate (up to 80% success in some cases), and retry if their attempt/s are detected until they succeed.
Worse; it looks like the researchers were only caught because they were pushing too hard to gather stats, actively trying to prevent their attempts from being merged, and published a paper describing what they’re doing and why. Had they been real attackers I very much doubt they would’ve been caught at all.
Brendan,
Where did you get 80% from? I didn’t see that in the paper.
As outsiders, we want (and sometimes naively expect) our operating systems to be invulnerable, but as engineers working these systems we know that’s not the case. These systems are very complex, and for better or worse they’re written in a language, C in particular, that is notorious for these vulnerabilities. This paper confirms what many engineers already know, linux is not impervious to such problems. The CVE incidents are proof that vulnerabilities can and do make it into the kernel. We don’t have data on how many of those commits were “hypocrite commits” as the paper puts it, but when it comes to the attack surface it doesn’t really matter so much why a vulnerability is there, just that it is.
Even though the community has understandably condemned them, I do find the research to be quite insightful. It would have been even more interesting to see the exact same research conducted into proprietary software companies as well for the sake of comparison. Unfortunately I don’t think it could be done legally without the company’s permission, and it’s doubtful a company would authorize the publication of such research. I suspect it would paint a very similar picture though.
While it’s not an apples to apples comparison, I find table 3 in the paper interesting. indicates that 47% of the vulnerabilities found in the kernel were contributed by companies and only 37% by volunteer maintainers despite the fact that maintainers contributed more code overall. I wonder if that’s coincidental or if the company contributors are really more prone to creating such vulnerabilities?
IMHO the industry can improve if we want it to. Moving to safer languages would bring obvious security improvements. Also, provable correctness helps too, but it’s unclear that companies really want to change.
In “Table IV: Comparison of the catch rate of each stealthy method” (on page 12), they’re saying that using concurrency issues to hide attempts achieved a 19.4% catch rate (or a 80.6% success rate).
I’d expect/hope that almost all CVEs are either accidents by software developers or accidents by hardware developers (that software developers weren’t aware of). In these cases it’s like a race to find the vulnerability (will the good guys find it and fix it before the bad guys find it and exploit it?) where I like to think the good guys win most of the time and the bad guys spend a lot of effort searching and rarely win the race.
“Hypocrite commits” are different – there was no accident, the bad guys know where the vulnerability is before the “race to find the vulnerability” begins, and it looks like a lot less effort for the bad guys,
As engineers we know that prevention is a better alternative – things like KAISER (before the meltdown hardware vulnerability was discovered), and refusing to map all RAM into kernel space, and using a micro-kernel, and using better languages (e.g. Rust vs. C). This kind of prevention means that the “race to find the vulnerability” doesn’t start so the bad guys don’t have a chance of winning it.
For “hypocrite commits” what would prevention look like? If you look at “VIII: Mitigations against the insecurity” section of the research paper you’ll only find mitigations and no prevention (and the suggested mitigations are all flawed). However, the 2nd sentence of the conclusion (“Three fundamental reasons enable hypocrite commits; the openness of OSS, …”) makes a method of prevention relatively obvious.
Note that OSS can be “less open” without becoming proprietary (e.g. only accept commits from companies).
Yes. For both proprietary and “less open OSS”; for the sake of comparison it would’ve been nice if the researchers had to pass job interviews, had to provide an identity in a way that satisfies the payment of wages (and taxation, superannuation and insurance), and faced liability in the form of being fired and then possibly sued (and related consequences for obtaining future employment).
I think the information in table 2 is influenced by the type of work being done. E.g. companies (especially hardware companies) are probably more likely to be adding new code (e.g. new device drivers) and I’d expect minor patches for new code to have more vulnerabilities than minor patches for old code.
Brendan,
Ok. Yeah race conditions are notoriously difficult to pin down and often times won’t reveal themselves in testing.
I’d hope so too, but I’m not aware of a way to make a factual determination about intent.
To be fair though this threat isn’t unique to open software, bad actors do succeed in planting backdoors/weaknesses in closed software too. In some cases there’s even suspicion that the companies themselves are complicit. Closed source makes it impossible to do proper detective-work independently. It’s a case of “it’s all good, you can trust us when we say you can trust us”. But the exploits are still very real.
https://www.reuters.com/article/usa-security-congress/insight-spy-agency-ducks-questions-about-back-doors-in-tech-products-idUSL1N2HC02B
https://www.zdnet.com/article/solarwinds-the-more-we-learn-the-worse-it-looks/
Granted, these researchers aren’t going to take this liability for an academic paper, but it’s statistically probable that it’s happening and nation states are probably secretly involved too.
Could be. It was just interesting to me that more of the vulnerabilities came from companies. I’m not really sure if there’s a reason patches from companies would be intrinsically more complex or difficult than patches from normal maintainers. A more in-depth analysis is required to check.
In my professional experience bugs are very common regardless of corporate setting. I have more experience with proprietary software, much of which was written decades ago, haha. Even so, we still come across bugs fairly often and I’ll fix them whenever I randomly come across them. Certain types of bugs are much harder to identify because they span across call frames and threads. It can be extremely tedious to locate bugs when you don’t know if one is there. Mostly we don’t have the time nor resources to catch everything. And I have to assume that I myself commit bugs periodically as well.
Actually, I would say we have pretty much the same thing going on whereby the NSA infiltrates cryptography companies.
For example: https://www.reuters.com/article/us-usa-security-nsa-rsa-idUSBREA2U0TY20140331
Clearly technical people all around have to be much more careful.
Lennie,
Agreed. And in this vein I think the most bang for the buck will be switching to safer languages where many of these types of vulnerabilities are avoided & solved without programmer intervention. The need to minimize distractions is well understood in some industries to reduce human errors. Take piloting: a programmer, like a pilot, only has finite focus. The more tasks that we ask of our brains, the more likely we’ll be to trip ourselves up and make mistakes even when we are experienced and know better. It’s time for software engineering to get past the inundation of low level vulnerabilities by embracing languages that solve the low level problems and allow us to stay focused on the high level objectives.
Of course this is much easier said than done. There’s so much legacy code, which is unlikely to be replaced any time soon. But it is a bridge that we have to cross if we want to solve the problems.
https://www.kernel.org/doc/html/v4.12/dev-tools/sparse.html
Alfman there is something to remember that Linux kernel does have sparse and over the years the defects it finds have been increased. Yes sparse requires you code contain extra stuff that the C standard does not mandate to be there or it will nicely error out.
There is the historic problems with hardware issues and using a protective language. Horrible reality of hardware issues that sometimes you have to-do what would normally be a buffer overflow or equal to make the hardware behave.
Please note this is not presume the OS that have attempted to be written in C# and java have run into the low level hardware being I need stupid.
So safer language may improve things. Remember you are going to need unsafe areas to deal with quirks of hardware and those areas will still need as many good assessment tools as you can get.
Alfman think spectre and other cpu issue bugs in ring 0. Methods lot of so called safer languages use also could make it way harder to alter memory operations to avoid cpu issues.
Low level OS development needs a lot of control in lots of key areas to deal with vulnerabilities.
sel4 has mathmatical proofs done in a higher language on top of standard C that gives the same advantages of using the likes of rust but using C. Automated checking of human work with is also another valid path.
Inundation of low level vulnerabilities this is forgetting that vulnerabilities come from the hardware level deeper in and the software level has to deal with them. This is why its hard to embrace languages that solve things like buffer overflows because you might need to do a buffer overflow to trigger a bit of hardware to behave itself. Yes if you go the rust/C# route of having areas of unsafe code you still need means of validating the unsafe stuff to make sure it unsafe only to the amount it needs to be.
The problem is not just legacy code. New defective hardware is still being made and operating systems and at time user space applications have to deal with security faults that causes. This is a really hard complexity. Yes we would like a safe programing language but we also need a unsafe programing language at the same time because of this problem.
oiaohm,
There’s no doubt that safer languages would improve things. And I also think it would be possible to encapsulate nearly all “unsafe” code in a zero cost abstraction library such that there’s no need for drivers to resort to unsafe code themselves. It might take a bit of engineering to get there but I think it’s achievable.
On the other hand, I think it could actually be easier to create zero cost abstractions that help address the issues. We may need some new language constructs to get us there, but in principal there’s no reason we can’t have spectre mitigations and a safe language.
SEL4’s proofs are indeed a step in the right direction, although if anything a safe language would make future proofs simpler and more accessible to regular developers. I certainly think proofs can play a big role in future operating systems.
Sure, but defective hardware doesn’t require defective software. Obviously it’s not ideal, but even ugly drivers can still be safe. In the worst case you end up with unsafe quirks, but it’s the exception and not the norm.
–Sure, but defective hardware doesn’t require defective software. —
I wish this was not the case.
https://github.com/xoreaxeaxeax/sandsifter
with the x86 some of the stuff here showed why particular when the Linux kernel is on particular x86 cpus why there are buffer overflows in the code. Horrible due to the ISA defects the buffer overflow does not happen but if you just fill the buffer exact the end buffer is missing.
There are cases were you have todo action that normally would be an exploit if everything was functioning as expect.
sel4 validates the silicon as well as instructions and approved platforms are only sane hardware so they don’t have to include quirks in their auditing software to allow like the following hey we are doing a buffer overflow here that will overwrite a value but due to hardware defect this overwrite is fiction and required so the hardware functions right. Linux support some of the insane hardware. This level of hardware insanely is why low level has been so resisting to moving to safer languages as there a need to perform actions that appear unsafe from software but due to hardware faults are 100 percent safe and are required to make that hardware behave itself. This is why I said at times the program will have todo what appears to be a unsafe action and we want only that unsafe action to work around the hardware defect.
This is what makes doing a low level safe language so hard the stack of unique actions different hardware needs that appear unsafe but are safe on that hardware and required so everything works.
Yes the guy trying to get a PCIe graphics cards to work with a raspberry pi compute module so far have found 8 quirks of the evil class I describe in x86 pcie implementation. This does explain some of the mega fun with java and c# based OS as well.
Alfman I wish low level hardware was not this far busted.
oiaohm,
You are going need to clarify what you mean. I know about undocumented and buggy opcodes, but that doesn’t mean a compiler has to use them in practice.
If some CPUs have intentional or unintentional backdoors, well obviously that’s a problem, but it’s tangential to the question of using safe languages to help developers avoid common faults. So I’m going to ask for a specific example where using a safe language would be worse than using C.
I think there has been a miscommunication somewhere in this thread regarding what safe languages mean. Using a safe language does not (and can not) imply that the hardware will execute it correctly. It only means that the code cannot generate the kinds of low level software faults that can plauge unmanaged languages. Safe languages are not a panacea for faulty hardware though. Perhaps in the future we might be able to compensate for hardware faults too, but the only way this could ever be possible is if the hardware architecture is 100% open and documented.
Alfman the long term barrier to safe languages for OS development have been the different hardware faults. Like DMA operations on some bits of hardware you have to do stupid of writing to a particular unallocated memory block to wake the MMU to-do the DMA transfer to hardware and no error comes back. Of course to a safe language processing this code it brings up error. Heck even GCC this brings up error of writing to unallocated memory. There are places in the Linux kernel with flags to tell gcc with C to not check for things because gcc will display error they wrong but the hardware needs the stupidity or have code stuff in asm that is absolute insane.
Alfman remember the idea of using safer languages is not new.
https://en.wikipedia.org/wiki/JX_(operating_system)
This was using java and we have seen C# and many other language. All of them have hit the same problem how to safely deal with how stupid hardware can get.
–It only means that the code cannot generate the kinds of low level software faults that can plauge unmanaged languages. —
This is also not 100 percent true remember item like rust has a unsafe mode because there is just stuff you cannot do managed. So all the faults that can plague a unmanaged language can plague any modern managed language that support unsafe areas. The nightmare at low level you have to have those unsafe areas to deal with horrible broken hardware.
So even if Linux kernel changed 100 to rust not all problems are going away due to the unsafe code that has to be there to deal with horrible hardware. We will still need the tooling to correctly check out the unsafe areas to ask developers is this is dangerous code in this unsafe area is required this way or is it doing something more dangerous than it should.
Alfman I am not against the idea of migrating to safe languages. The problem here is we have prior examples and attempts that tell us the problem area. Like or not a low level you cannot write eveyrthing you need to using a totally managed language. You need areas to deal with the hardware of a unmanaged language. Of course the areas where you are using unmanaged are going to be areas that a exploit in that area is going to be very damaging.
From my point of view yes managed language helps but its only half what is required. The other is good tooling for the parts that cannot be managed to make it simpler for a developer to understand if the horrible rule breaking they they though they were doing to make hardware work is the only thing they have done.
Of course better tooling to detect faults in the parts that cannot be managed can be used in theory to reduce the risk C while in the migration process.
oiaohm,
That’s not an example of something where a safe language can’t be used though.
Not necessarily. If a hardware device is mapped to a certain IO and memory range, we can certainly build abstractions that grant access to those resources “safely”, so long as we understand that “safe” in this context means safe from software generated faults. Obviously the abstraction itself has no idea how the hardware itself will react to any particular IO operation, so the hardware may do something incorrect or unsafe. All a safe language does it protect from software faults, but it’s still up to a developer to program the driver correctly even if it doesn’t contain provably incorrect software faults.
–Obviously the abstraction itself has no idea how the hardware itself will react to any particular IO operation, so the hardware may do something incorrect or unsafe.–
This is the limitation of safe languages.
The point is tooling means you can work out of something is incorrect or unsafe.
https://lwn.net/Articles/720550/
The Linux kernel memory model is example of this. Where it include a lot of information how real and model hardware will handle memory management between threads and with out of order execution.
Safe language without the tooling is going to be problem. Yes you can get some really stupid cpus with reorder so needing to allocate something after its been accessed not before because those function are going to flip does happen.
If you look at most safe language compilers they are not designed to support platform miss behavior information. Under a unsafe language this is not a show stopper because you can do a if (this_horrible_broken_platform) {do_this_horrible_incorrect_action } else{ do sanity} as a developer of course ideal world we don’t want developers doing this due to increasing bug risk with more code.
The problem rust and other so called safe languages has to solve is how to take in like the Linux kernel memory model detailing badly behaved hardware and use that to generate correct code for the hardware that at times may appear insane. Remember the safe languages normal take this level of control away from the developer.
Yes it insane that on some hardware use after free is valid and use before allocation is valid even in particular cases worst cases you do what seams right like allocate then use and it errors out.
Most safe languages are design on the theoretical CPU that is always 100 percent sane. Problem is reality we have cpus and chipsets that by design contain sections of insanity. The hardware level is a true nightmare.
Alfman remember we do more code in C and C like with operating systems than historic assembler this has reduced the lines of code developers need to do so thing work. But C and C like is flexible enough to deal with horrible hardware.
Yes I am not against moving to safer languages but it more how to-do it and keep all the functional tools like the memory model the Linux kernel has to tell if hardware interactions is right or wrong.
I would have been happier if rust was still like early C++ where it was outputing C that the existing tools for detecting hardware insanity issues could run over it.
Yes you can have perfect to theory C that should not crash or do anything stupid in fact do crash or worse like brick the system all due hardware insanity. Good part is hardware insanity can be made into formal models. Bad part is be it a unsafe or safe programming language complier neither is design to use these formal models so tooling with Linux and other OS projects are used on top of the language to check that the code is safe in the way sel4 does does not work with rust as information about what rust will do exact enough is hidden.
Ideal world again this from tooling should be part of the compiler not something programmer can forget to run.
We need safe language compilers to be design allowing for the case the hardware is insane with a method to tell the compiler how the hardware is insane so correct code can be generated allowing for the platform insanity.
oiaohm,
Safe languages can perform safe multithreading too though. This is actually an area where safe languages hold a huge advantage over C because they can verify the implementation is free of common multi-threading bugs. Maybe you could make the case that we need to implement more MT primitives than current safe languages offer, but it isn’t a fundamental limitation of safe languages in general.
To whatever extent CPUs exist that violate the ISA specs, they will also break “correct” C and other programs as well. I don’t accept the inference that this is a byproduct of using safe languages.
–Safe languages can perform safe multithreading too though. —
This is not 100 percent true. The Linux kernel memory model contains a lot of examples where you rust so called safe multi-threading is unsafe same with C# and Java.
–To whatever extent CPUs exist that violate the ISA specs, they will also break “correct” C and other programs as well.–
Depends what the ISA violation is. The most common is memory operation stuff. Correct C to standard can handle all forms of invalid memory operations. Linux Kernel Sparse with locking on memory can handling more issues than safe languages do. It is one of the things I brought to the attention of those working on rust Linux kernel drivers is they need to bring sparse support at least to rust. There is more tooling that needs to be brought in.
The safe language is not enough.
oiaohm,
I would argue that linux kernel is bad example because it was not built on safe languages primitives. If it were it could take advantage of the language’s safety guaranties.
I disagree. Like I said earlier, it might be handy to give safe languages more primitives for OS work, but there’s no reason a safe language cannot be just as reliable as C. There’s no voodoo here. A safe language is designed to make common human errors a compile time error, but the run time code that a safe language compiler produces isn’t fundamentally different than the run time code a C compiler produces.
This would constitute at least a significant research breach on the grounds that the researchers would have been required to obtain human research ethics approval to experiment on the community.
This is somewhat similar to the experiment a few years ago where the authors deliberately sent bullshit articles to journals.
That’s a good parallel, though the results were different. The Linux kernel developers did what the “journals” would not and actually performed due diligence and peer review on the bullshit commits. The “journals” claimed they peer-reviewed those bullshit articles, and had to be informed by the submitters that they were garbage. I prefer the approach the Linux kernel devs took, I must say.
According to the paper they did seek ethics approval and were granted an exemption: “The IRB of University of Minnesota reviewed the procedures of the experiment and determined that this is not human research. We obtained a formal IRB-exempt letter.” (“§VI.A Ethical Considerations”)
Ethical judgements can be notoriously difficult and some university review boards will consider software experiments that don’t involve human subjects or the collection of personal data to fall outside their scope. No doubt many companies have even less rigorous controls.