Over the last few months I have been on and off digging into the history of early PC networking products, especially Ethernet-based ones. In that context, it is impossible to miss the classic NE2000 adapter with all its offshoots and clones. Especially in the Linux community, the NE2000 seems to have had rather bad reputation that was in part understandable but in part based on claims that simply make no sense upon closer examination.
A deep dive into this very popular and widespread NE2000 adapter.
No. Next question.
Really, NE2000 was not *great* but it worked well, and was our primary card of choice for long years across my middle school, high school and university years.
I even programmed the thing. It has quirks, and not everything went according to the TI design docs. However once you read the Linux Kernel, BSD Kernel, and many other implementations do the same *seemingly brain dead thing*, and actually nothing but that *brain dead thing* works, you appreciate the patience of those developers (the hard part was about initialization. Initially card behaves like NE1000, which is a 8 bit card, and reading the ROM is broken at that point).
Anyways, for the DMA: it has on on card buffer. Yes, there is no direct copying to the system memory, but you don’t need to *poll* either. When an interrupt comes, you just extract a buffer, and let it continue. If I recall correctly it could store about 8 eth frames before it had to discard new ones.
sukru,
The ne2k was my card of choice too because it was a defacto standard of sorts and it just worked on windows and DOS. I wasn’t using linux at the time though.
I never looked at the driver myself, but the article’s author suggests the linux driver did not follow the spec and has a couple of bugs as a result…
Port IO was fine for slower network speeds at the time, and simplicity was a selling point, but it would eventually become a bottleneck. Memory mapping was fairly impractical at the time before plug and play and very limited address space.
In hindsight I find it unfortunate that ethernet standardized on ~1500 byte packet sizes. Of course it was easier to assume a fixed packet size than have an ethernet level standard to negotiate it. Our networks have continued to evolve along the lines of more optimal hardware with much higher bit rates, but these 1500 byte packets continue to hold us back many decades later with jumbo packets largely missing from the internet.
IMHO 32bit IPv4 addressing was the other big mistake. No engineers at the time appreciated just how successful and long lasting their creations would become, haha.
Jumbo frames to the rescue 😉
https://en.wikipedia.org/wiki/Jumbo_frame
> “jumbo frames are Ethernet frames with more than 1500 bytes of payload, the limit set by the IEEE 802.3 standard. Commonly, jumbo frames can carry up to 9000 bytes of payload, but smaller and larger variations exist and some care must be taken using the term. Many Gigabit Ethernet switches and Gigabit Ethernet network interface controllers can support jumbo frames.”
evert,
As stated, jumbo packets are usually unavailable across the internet.
https://serverfault.com/questions/118203/are-jumbo-frames-mtu-up-to-9k-realistic-common-over-the-open-internet
The performance benefits are real, but alas many WAN segments don’t support it at all. Initial connections tend to work fine but bulk transfers fail and timeout. Jumbo packets and MTU size detection should have been standardized decades ago, and yet a lot of infrastructure doesn’t support it even today. It sucks that jumbo packets are neglected because much of this equipment has to be periodically replaced anyways. So IMHO it’s kind of inexcusable that jumbo frames aren’t better supported.
Oh well, I wish it were different but it is what it is.
Thanks, good points, indeed a pity that jumbo frames aren’t better supported.
Yes the frame sizes no longer make sense, but there is a history to all these.
The original Ethernet did not have dedicated connections between a switch and the devices. They were sharing a single bus using coax cables (and the dreaded terminators which would occasionally break down the entire network).
It went onto replacing token ring, which had worse problems.
Anyways, it worked based on actively looking for collisions on the shared bus, and doing an experimental backoff and automated retries when multiple devices tried to send packets at the same time.
https://en.wikipedia.org/wiki/Carrier-sense_multiple_access_with_collision_detection
That requires a minimum packet size (64 bytes), so that computers on a the longest supported bus can see collisions started from opposite ends.
It also meant there needed to be an upper limit. But that one was more of a practicality than physics.
And 40 years later, we are still stuck with it. The up side? A very old Ethernet based device will still hook up to a modern network and will work fine.
sukru,
Yes, I concur about the history. Although “ethernet” isn’t the only game in town and there are other standards that use different packet sizes and the routers between such networks are expected to fragment packets.
https://www.cisco.com/c/en/us/support/docs/asynchronous-transfer-mode-atm/ip-over-atm/10479-mtu-atm.html
In retrospect fragmented packets are quite problematic though and it would have been better to have endpoints auto-detect appropriate MTU sizes. To do this effectively and efficiently (ie minimize extraneous packets and eliminate trial and error) requires routers to play a role though, IMHO this should have been incorporated into our early networking standards.
Well, yes, but making routers capable of jumbo packet doesn’t fundamentally break older devices. Packets smaller than the path MTU are not an issue, it’s only the inverse scenario that needs to be solved.
Incidentally these MTU path issues can be frustrating in contexts that have nothing to do with jumbo frame sizes, such as VPN encapsulation where the VPN’s overhead ends up reducing the bytes available for normal traffic.
MTU size detection (PMTUD) is standardised, but on IPv4 it is optional and might not be supported, so you end up with brokenness or a fallback to fragmentation. On IPv6 it’s part of the base spec for IPv6 so should always work.
Alfman,
I would not be surprised if there were bugs in Linux or other implementations. In my experience, most of the NE2000 cards were actually clones, not many original branded ones were out there. And all of them had little quirks here and there, making them behave exactly the same a bit of an art.
But they really made networking accessible. Even today, I cannot find a simple “ne2000” style simple card for 10GBe, or more.
About 1,500 byte Ethernet packets.
The local Ethernet cannot handle this automatically. It has to be enabled on the switch and set on each connected device. If it is a static IP assignment the MTU has to be set statically too. Otherwise DHCP can do it.
Nothing on the Internet supports jumbo frames as far as I know. Some experimental university fiber networks, I think. But TCP will automatically adjust so there’s no need to fragment 9,000 byte packets. If your ICMP is working it will back down the TCP MSS to 1,460 or whatever fits.
Zan Lynx,
That’s kind of the point. Nearly all modern internet systems are stuck with the 1500byte packets that we inherited from the past.
Yes, I remember the “internet2” network established by universities used jumbo frames two decades ago or so.
https://en.wikipedia.org/wiki/Internet2
Coincidentally evert’s jumbo frame link mentions that the 9000 byte jumbo frame size originated in part from internet2…
Of course most of us can’t get on this network using our commercial internet providers.
That’s optimistic, haha. The TCP stack can be oblivious to what happens on the network. For starters ethernet switches don’t autonegotiate frame size. Next routers cannot negotiate at the TCP layer, so TCP in and of itself is insufficient.
If you are running something like a VPN on your local machine, the VPN should set the interface’s MTU and the TCP stack will use it. However if you route network traffic over a VPN, it will often get fragmented.
Path mtu detection is one of those things that should ideally work everywhere, but has not been particularly reliable in the past. I haven’t looked at this in a couple years and I’m not sure if real world networks have improved a whole lot.
At least this is supposed to be enabled on all IPv6 routers. With IPv6, packets are never fragmented by a router, therefor it is crucial that the sender knows the MTU and IPv6 requires routers to support a “packet too big” message, which should provide endpoints with enough to detect MTU automatically, but some firewalls/routers block ICMP rendering this mechanism useless. And with many people tunneling IPv6, this means MTU problems are more common with IPv6 and for this reason some admins intentionally deploy IPv6 with a suboptimal MTU like 1280byte instead of using the interface’s actual frame size.
http://www.netscout.com/blog/asert/ipv6-fragmentation
packetpushers.net/ipv6-and-the-importance-of-the-icmpv6-packet-too-big-message/
http://www.networkworld.com/article/2224654/mtu-size-issues.html
blog.cloudflare.com/path-mtu-discovery-in-practice/
IMHO it would make a ton of sense to deploy jumbo frames simultaniously with IPv6 deployment to kill two birds with one stone, but sadly my ISP offers nether 🙁
Path MTU discovery was added later and is optional for IPv4 so can’t be guaranteed to work, similarly many DHCP clients will ignore the MTU option.
IPv6 is a different story, SLAAC does support announcing the network MTU, and PMTUD is part of the basic IPv6 spec so it should be supported by any device that supports IPv6.
Jumbo frames is something that each ISP would need to enable on all their routers, at which point it would gradually spread out. But unless there’s demand, they won’t bother, despite the fact that it would improve performance at no cost.
bert64,
Not only that, but even in my own network not all devices support the same MTU size. Some only support 4k, others a full 9k, and others still only 1500. Also the VPN has a slightly smaller MTU to account for its own header, which ends up fragmenting packets.
For these reasons I don’t consider DHCP MTU advertisement really all that useful compared to actual path MTU discovery. Ideally it should be something that just works automatically without having to configure anything at the router/clients/etc, which is why I wish they had done this decades ago so it could just work today.
We agree on that, but it still poses a bit of a causality challenge. The would-be demand is hindered by the lack of availability. Once it reached a critical mass, the remaining ISPs would be pressured into supporting it. But currently as long as deployment sits at 0%, ISPs can safely ignore it without repercussion to their business.
It was pretty much the first card that was compatible across (most) platforms and for that reason alone it was fantastic.
This article verifies my hunch that when it comes to community-written drivers for Linux, they can vary from excellent to utter crap.
I prefer the Mac and Windows model of device support, where something is either supported officially by the hardware manufacturer, or not supported at all. That way, if as a customer you stick to competent companies (for example Pinnacle for TV tuners, Intel or Atheros for Wifi etc) you know what you are getting. I never understood why Linux has to compensate for the laziness of hardware vendors. Apple Mac OS has a much narrower hardware support list than Linux but offers a 10x better user experience precisely because it refuses to do that.
kurkosdr,
To be fair a lot of the code base is created from reverse engineering black box hardware without specs. Still, it’s a valid point.
Yes, ideally the hardware companies would provide the drivers, but there’s a multifaceted dilemma here. The FOSS community understandably doesn’t want to run proprietary binary blobs in the kernel. The hardware companies are often unwilling to support alternative operating systems and when they do, they may not be willing support the drivers long term as the community expects. (I’ve had to physically throw away working hardware on account of a manufacturer not supporting a new versions of windows). And the other issue is that our predominant FOSS operating system lacks a stable API.
…in short, nobody’s on the same page.
Linux on the desktop has never had apple’s level of market share, and most vendors are chasing market share. It’s true that it is much easier to support a limited/fixed set of hardware as apple does. For users this can be both a benefit and a con, for example becoming more vendor locked. Users who buy from linux hardware shops should expect good support from the manufacturer, but for better or worse the vast majority of linux users are installing linux on “windows” hardware. So most of us are not running linux on systems that have full & explicit linux support out of the box, which explains a lot of the frustrations users have.
Which is the major design flaw in Linux (the kernel) to be honest: Why should an OS kernel assimilate every driver in existence? Make a stable ABI (driver API) and let third-party drivers be written for it. Does the Desktop Linux stack try to assimilate every desktop app in exist… oh never mind, that’s what the whole deal with “repositories” is about. But this is a rant for another day.
The Windows NT kernel has an driver API that is kept stable between major OS versions. Every driver from Windows VIsta (Windows NT 6.0) and onwards works in Windows 10, save for GPU drivers which also depend on WDDM (which has no promises of stability). There was some pain during the XP 32-bit -> Vista 64-bit transition, but that was 14 years ago. Sure, a stable driver API constrains design decisions. But if the alternative is having the kernel assimilate every driver ever made, it’s a good compromise. Linux is a “dirty” kernel full of old crusty code for random drivers nobody really reviews (good luck finding a kernel contributor with a NE2000 out there). The Windows NT kernel doesn’t have this issue.
Linux laptops are a scam. Some of them contain community-written drivers for things like Bluetooth and keyboard buttons. Which is the problem here: Linux is letting hardware manufacturers be lazy at the benefit of laptop OEMs, who can claim “Linux compatibility” just because a crappy community-written driver exists. Mac OS X is smarter than this: The hardware manufacturer has to support their product officially. This means that once a hardware manufacturer does release an official driver, that niche is “theirs”. This is how Mac OS X (and Mac OS 9) had any kind of hardware support in the dark years of PowerPC.
The technique is apparently not working, since Linux has stayed at 1.5% market share forever. People by experiences not OSes. 1 person consciously buying a Linux laptop with officially supported hardware (no community-written drivers) is better than 1000 people installing a Linux distro on their Windows systems, having a bad experience, and then deleting the Linux partition. Because that 1 person will praise the OS to his friends and create a niche. Again, Mac OS X.
This is a major design flaw of Linux (the kernel): Why does an OS kernel have to assimilate every driver in existence? Make an ABI (driver API) and let third-party drivers be written for it. Does the Dekstop Linux stack try to assimilate every desktop app in exist… oh never mind, that’s what the whole deal with “repositories” is, which is another misdesign, and another rant for another day.
Keep the driver API stable, and you can use the existing drivers for a long time. No “support” from the hardware manufacturer needed. Windows NT maintains stability between major versions, which means that any driver from Windows Vista era (Windows NT 6.0) and onwards will work on Windows 10. There was some pain during the transition from XP 32-bit -> Vista 64-bit, but that was 14 years ago.
Sure, a stable driver API does restrict design freedom, but if the alternative is having community-written drivers and crapping up your kernel with old buggy code for ISA cards, stable driver APIs are a good decision overall.
Mac OS went through some laughable market share during the mid-90s (just before the first iMac), but it still had available hardware peripherals. Because some hardware manufacturers realised that once they had a Mac OS driver out, that niche was “theirs”. Linux on the other hand minimises that reward by giving every hardware manufacturer a community-written driver. That’s the kind of high-level strategy that geeks like Linus Torvalds won’t understand.
Linux laptops are a scam. They also have community-written drivers for this and that. Which is the problem with community-written drivers: It allows hardware manufacturers to be lazy, and it allows laptop OEMs to claim “linux support” even if that support is by an unofficial driver (which can be utter crap for all we know). This is also what the Linux folks never understood: 1 user having a good experience is better than 1000 users trying the OS on a repurposed Windows laptop and coming to regret it. Because that 1 user will go to his friends and say how good the experience was, thus creating a niche. This is how Mac OS X survived the pre-Intel era.
kurkosdr,
I agree linux isn’t doing itself any favors with instable ABI, however many of us believe keeping drivers open source is a feature and not a a design flaw. Quite the contrary being dependent on manufactures for binary blobs has flaws (even in windows as stated). We don’t even need to guess what this would looks like for linux, android shows us that model and boy does the proprietary driver situation suck! And that’s for a dominant platform! I honestly don’t think it would be wise or even viable for less popular desktop platforms to put their fate in the hands of manufacturers. This situation where FOSS developers end up building their own drivers without manufacturer support is not ideal obviously, but there’s not really much choice. Personally I am glad that mainline linux is not dependent on manufacturers!
I’ve seen drivers stop working until I obtain new drivers from the manufacturer and/or microsoft’s windows update.
Even so linux desktop market share is smaller still. There are more linux vendors than in the past, but the market is niche. Most people will prefer to buy a windows PC and throw linux on it to save money, but the explicit trade-off that needs to be acknowledged is that this option comes with no official support. I wish linux hardware was cheaper, but scales of economy….
I don’t think that’s a fair statement to make about linux vendors. They do a lot of work to make their systems work with mainstream distros.
It depends. Linux has always been more of a hacker OS anyways, so it’s not necessarily competing in the same space as windows and mac.
The problem is, computer users generally disagree. They want working drivers, and don’t care about ideology.
darknexus,
I agree that plenty of people don’t care about ideology and are indifferent to open source. But it doesn’t necessarily mean they aren’t affected by the problems stemming from manufacturers holding us captive with proprietary drivers. I tend to think that linux users on average are more aware of these things, which is partly why we choose linux. Some people may want linux to change and demand linux do this or that, but if it doesn’t align with the interests of the existing community then I don’t really see the community having the motivation to make those changes. Ultimately it’s probably best for people to choose a platform that fits their preferences already, assuming one exists. Of course the real problem is that the tech industry is so consolidated that it can leave us with few meaningful choices.
It’s a design decision, having a stable ABI brings its own problems.
You lose flexibility, because you have to retain compatibility.
At some point you simply can’t retain compatibility anymore, so you force a sudden jolt on people and obsolete lots of hardware overnight causing all kinds of pain. Even MS have been forced to change the ABI and break drivers.
By having closed source binary blob drivers, you can’t easily support new architectures – and supporting new architectures is something Linux has done much better than other systems. If theres an open source driver for x86 linux, 99% change that hardware will work when attached to an ARM system, or SPARC, or RISC-V etc.
MS had a lot of pain moving from x86 to amd64, and their transition to 64bit went a LOT slower than linux or macos did. They are having the same again now that they’ve started to support ARM.
Those old crufty ISA drivers aren’t any concern if you aren’t using them, you simply never load them do it doesn’t cause any detriment. On the other hand, if you happen to have one of those ISA cards and a way to physically connect it there’s no reason you can’t use it, which would be entirely impossible on other systems.
Having open source drivers is a significant benefit, and it’s considered a more important benefit than a stable ABI.
bert64,
I agree that having the source is very important. Being dependent on others for binaries spells trouble. We’re seeing these troubles play out on android and with many linux ARM SBCs. I for one think having driver source code is crucial for the success of FOSS platforms.
I would say that source code is not mutually exclusive with a stable ABI though. It’s not one versus the other – we could have the benefits of both.
This is a major design flaw of Linux (the kernel): Why does an OS kernel have to assimilate every driver in existence? Make an ABI (driver API) and let third-party drivers be written for it. Does the Dekstop Linux stack try to assimilate every desktop app in exist… oh never mind, that’s what the whole deal with “repositories” is, which is another misdesign, and another rant for another day.
Keep the driver API stable, and you can use the existing drivers for a long time. No “support” from the hardware manufacturer needed. Windows NT maintains stability between major versions, which means that any driver from Windows Vista (Windows NT 6.0) and onwards will work on Windows 10. There was some pain during the transition from XP 32-bit -> Vista 64-bit, but that was 14 years ago.
Sure, a stable driver API does restrict design freedom, but if the alternative is having community-written drivers and crapping up your kernel with old buggy code for ISA cards, stable driver APIs are a good decision overall.
Mac OS went through some laughable market share during the mid-90s (just before the first iMac), but it still had available hardware peripherals. Because some hardware manufacturers knew that once they had a Mac OS driver out, that niche was “theirs”. Linux on the other hand minimises that reward by giving everyone a community-written driver. That’s the kind of strategy nerds like Linus Torvalds can’t understand.
Linux laptops are a scam. They also have community-written drivers for this and that. Which is the problem with community-written drivers: It allows hardware manufacturers to be lazy, and it allows laptop OEMs to claim “linux support” even if that support is by a community-written driver (which can be utter crap for all we know). This is what the Linux folks never understood: 1 user having a good experience is better than 1000 users trying the OS on a repurposed Windows laptop and coming to regret it. Because that 1 user will go to his friends and say how good the experience was, thus creating a niche. This is how Mac OS X survived the pre-Intel era.
NE2000 the genuine item was not that bad. The major problem with NE2000 was the attack of the clones. I cannot remember the brand here its in a storage box somewhere I have 3 of them branded PCI NE2000 compatible but if you don’t setup NetBEUI network between the cards TCP/IP between the cards does not work. Yet replaced those cards with different brand NE2000 card and everything was good again.
kurkosdr and Alfmanplease note I was running into this trouble with NetBEUI and other things with NE2000 clone cards when using Windows. So its not that the Linux drivers are exactly bad. The NE2000 clone themselves were not great.
–Again there are highly dubious claims such as that the NE1000/NE2000 had “no method for selecting a transceiver”, which is only true if the jumper block on the card (the standard method at the time) does not count.–
Please note the reference document title for the section in the quoted document here “What’s wrong with NE2000 clones” not what is wrong with the NE1000/NE2000.
You understand this claim when you handled the clones at the time. It was quite common “Taiwanese cloners produced knockoffs” to make cost cut designs. Jumper block costs money removing complete from a design saves PCB space and parts cost. This claim is not 100 percent fault. ISA NE1000/NE2000 clone made card you could not be sure had a method of selecting the transceiver because it cost more money to have that functionality.
The reality is you cannot pickup NE2000 clones and think they will all behave the same or have the same feature set or even connect to each other if they are different brands and models. Yes ultra fun of NE2000 clone cards I had was a revision 1 of a card would not connect to a revision 2 of a card by crossover or hub. Yes network hub before switches come common.
There were quite a few certified “NetWare-compatible cards” as in genuine certified by Novell NE2000 cards (not the ones made by Novell) these were fully functional and well behaved. But there were in fact more companies making not certified cards there were missing features or purely did wacky insane things.
Novell Eagle ISA series are electronically defective every single one of them its either bad resistors or bad capacitors they work well after half hour each replacing what ever defective parts was on that model and that was from new. Of course without doing this you would have intermittent glitching.
Yes the most extrema was requiring NetBEUI setup to transfer TCP/IP and this come from a vendor added so called network optimiser so not only did that at times not include feature that were part of NE1000/NE2000 reference designs the clones also added some really strange features.
NE2000 collection of cards is quite a messy place. The Linux driver not doing things by the book is partly Linux bigger market share than freebsd so the driver developer there was exposed to more cards and more breakage.
NE2000 Linux drivers developer starts before the idea of doing hardware particular quirks in the Linux kernel. The logic of making 1 driver to cover every make of NE2000 without quirks I would say is mistake. Yes there is a clone of the NE2000 where the wrong order in the Linux kernel NE2000 driver is in fact required so you can read the packets out the card. This should be a quirk. This is also why particular NE2000 clone cards would work with Windows and Linux with errors sometimes but not at all with freebsd or netbsd.`
The big thing here is how much diversity and quirks there is when you get all the different vendors NE2000 card together. Of course depending what vendors a person had access to of NE2000 cards completely alters their option of NE2000 cards between either the best card ever to the worst nightmare on earth. Yes person like me that saw them as a total mixed bag some good and some bad because we got a true mixed bag.
There is an easy fix for this:
“We only support fully 100%-compatible NE2000 cards.”
(basically cards that from the perspective of the electrons appear identical to an official NE2000)
Blacklists work great to avoid the other semi-compatible cards.
This is what PC game developers in the 80’s and early 90’s did when they stopped debugging stuff that broke on semi-compatible PC clones. It’s either reasonably PC-compatible or our game will not work on your computer, kiddo.
But this goes contrary to the Linux spirit that it must support every hardware in existence, even highly customized/semi-compatible hardware. You see, nerds (in the absence of a real manager) love to hack around hardware errors and compensate for the mistakes of others, just for the sense of achievement they get when doing so. The fact this practice whitewashes the mistakes of those lazy hardware manufacturers at the expense of driver quality (and kernel quality, since the Linux kernel assimilates drivers into the kernel) and the bad user experience rebounds on them doesn’t bother them. Just because DOS supported something with a custom driver, it doesn’t mean Linux should. Linux is more like MacOS X in fact. It tries to assimilate every driver into the kernel. So, if it had true management, it would adopt a MacOS X-like philoshopy of “we only support compatible hardware”.
A similar problem exists today with all those mostly-compatible-but-actually-not Intel High Definition Audio implementations, so this gripe isn’t out of historic/academic interest. Linux needs to make a judgement call: Either relinquish the drivers to the hardware manufacturers (and let them introduce workarounds for any defects their hardware has to their heart’s content) like Windows does it, or adopt an “100% compatible of GTFO” mentality like Mac OS X does.
Or more accurately, Linux would have to make that judgement call if it had real managers running the place.
You do realize there are many manufacturers that write the Linux kernel drivers these days, right? Or many of them sponsor individuals to maintain the driver for them. It isn’t like the old days anymore where all the drivers are hacked together and put in. Especially network card support. Sure there are still plenty that are reverse engineered. But so many manufacturers now directly support it. Hell, even Microsoft contributes code to the kernel.
kurkosdr,
It’s less about “managers” and more about the FOSS community and even vendors themselves contributing drivers for upstream inclusion. Both mainline linux and the distros decide whether or not to merge the drivers and most users appreciate when they do.
I too have gripes about the ABI and the monolithic nature of the kernel. This in combination with proprietary blobs are why the android driver situation is such a mess.
In terms of unsupported hardware not making the cut for you, I understand that completely. However practically all of your complaints seem to ignore the fact that you could choose to buy explicitly supported linux hardware just like you can with windows or mac and this has gotten a lot easier over the years (so long as you’re willing to open your wallet). Sure, there’s plenty of generic and/or unsupported hardware that may or may not work, but here’s the thing: linux managers are not forcing you to take the unsupported route, that’s your choice. Given everything you’ve said, it seems buying only officially supported linux hardware is the clear answer for you. I’m guessing you probably don’t do that, but if not then isn’t that a bit hypocritical?
While I do sympathize with user frustration resulting from unsupported hardware, it needs some perspective. If a mac user buys a peripheral that has no official mac support and then has trouble getting it to work on his mac, who does he blame? Apple? The manufacturer? Or does he share some of the blame for purchasing unsupported hardware in the first place?
The thing is, if something is unsupported on Mac, it just plain doesn’t work, it doesn’t lead to 1 month of frustration and troubleshooting only to figure out later it’s unsupported. Screw “choice”, make support explicit and non-support also explicit.
But I agree this is a philosophical question, and in fact we could go on for ages. My only objective point is that MacOS X has proven that explicit support and explicit non-support leads to a better experience for the novice user and hence leads to more successful OSes in the market (when it comes to small-market share OSes at least). But again, this assumes that you agree that market approval is the best measure of success.
kurkosdr,
If MacOS is better for you, then just go with that.
I still maintain the onus is on buyers to either buy hardware with explicit support, or be prepared to troubleshoot themselves. Will a generic network card, printer, webcam, etc work with a platform that’s not explicitly supported? It *might*, I’ve experienced generic hardware working flawlessly as well as it being unusable. I’ve used a specialized SDR device on a mac successfully without officially being supported. These things can be hit and miss. If you don’t want to take the chance, it’s best to look for manufacturers that explicitly support your platform. Surely we should be in agreement about this.
That’s kind of subjective. I prefer a platform that best fits my needs over one with more market share. Of course this gets complicated owing to the fact that market share is a factor in achieving critical mass. For example sometimes I think BSD’s have a better implementation, but I use linux simply because it’s better supported. This is pragmatic, but ironically when everyone does this, it leads to the marginalization of alternatives, which end up being less supported because they don’t have critical mass.
–“We only support fully 100%-compatible NE2000 cards.”
(basically cards that from the perspective of the electrons appear identical to an official NE2000)–
This is not the easy fix. With NE2000 the ones paying for Linux kernel driver developer where in fact the ones making defective cards.
https://github.com/torvalds/linux/blob/master/drivers/hid/hid-sony.c#L972
Even sony has limitations. Do note here sony the maintainer of hid-sony is directly paid by sony. The reality here is if you don’t want to end up in a anti-trust location you have to support so much quirky hardware.
If you go though all the quirks in hid-sony.c you will find that some of the quirks are in fact for official made sony parts. That makes a fun problem for driver developers who in the past did not have the sony official specification sheet that says doing X is wrong even that a sony reference part does X but that is really a defect/quirk.
Yes even with the NE2000 you have problems do note I mentioned about Novell Eagle ISA having major electrical problems. That was basically the reference board to the NE2000. This leads to a horrible problem. You design and make a NE2000 clone you compare its behavior to the reference board NE2000 you have it matches one problem your reference board is quirked due to poor quality parts.
NE2000 if you look at the Linux kernel developer who worked on it majority are from the companies that made the clones. And worst sections of NE2000 was before the Linux kernel had started the quirks policy.
–Either relinquish the drivers to the hardware manufacturers (and let them introduce workarounds for any defects their hardware has to their heart’s content)–
With the Linux kernel hardware manufacturers are allow to introduce workarounds for any defect in their hardware as long as they provide a kernel developer and work open source.
There is a downside to what windows does. This is a real case I run into I had 3x 3com 100Mbps networking cards of different versions, note same model different versions. Under windows 2000 they required 3 different drivers that file overlapped with each so you could not build a bridge network with them. Yet under Linux of the same timeframe you could. Interesting point here the Linux kernel driver is written at that time by 3com and so are the 3 windows drivers. Sometimes you don’t want to let the hardware vendor do what ever they like because it just results in them fragmenting drivers in incompatible ways because that easier than fixing the problem properly. You can find repeating examples of drivers in the Linux kernel written vendor being a single driver for multi versions of hardware yet the window has multi incompatible drivers covering the same group hardware with increase risk of security faults and issues with X device does not work with Y device that don’t exist with Linux.
The reality is Linux development of drivers is halfway between windows and macos. There are benefits and disadvantages to all 3 options.
Yes people who say that the Linux kernel should have a stable kernel driver ABI fail to notice that this does bring its own set of major fails. Like 32 bit windows on desktop being restricted to 4GB of memory at first the reason was a real one that binary drivers for windows did not support PAE mode mostly commonly.
Each route has it own problems. I am not saying that the Linux kernel policy of mainline drivers is always the best but the Apple policy of we only support a selected list of hardware is not ideal for consumers because this leads to a lack of competition and Windows where you let hardware developers do what they like does lead to another set of problems that can be as confusing as hell.
oiaohm,
I haven’t tried to do this in windows lately since I don’t really use windows to build networks anymore. But if I recall correctly in windows NIC bonding/teaming used to be implemented only at the driver level and could not work across different drivers. In linux it’s always been a feature of the kernel rather than the network driver such that you can link together completely different interfaces.
It’s not really that we fail to notice, it’s just that we feel there are better trades-offs. Unstable ABIs create a lot of churn. I’m not one to say it should never change. architectural changes may justify driver changes out of necessity but in a mature OS ABI breakages should be relatively infrequent and not so routine. I do less kernel development than I used to, but it used to be a regular source of frustration especially if multiple kernel versions needed to be supported.
Yeah I agree. I value the choice and freedoms linux give us. I don’t feel particularly compelled to go back to commercial offerings where they’re increasingly acting like they own my computers.
–I haven’t tried to do this in windows lately since I don’t really use windows to build networks anymore. But if I recall correctly in windows NIC bonding/teaming used to be implemented only at the driver level and could not work across different drivers. In linux it’s always been a feature of the kernel rather than the network driver such that you can link together completely different interfaces.–
I forgot that feature came into NDIS bridging with windows XP but the problem I run into with the “3x 3com 100Mbps networking cards of different versions, note same model different versions.” is that the drivers would not bond under XP at all because the 3 drivers for the 3 card would not load and of course I tried that under 2000 as well. I should have wrote 2000+ funny part is I have those 3 cards still yes you still cannot install them into single system even with Windows 10 have them work in parallel to each let alone bonded..
Doing windows system repairs you do still find wacky where you replace the m.2 wifi card and the m2 wifi drivers conflit with the laptop built in lan or web cam or some other part. Yes I have had that fun one replace broken m.2 wifi and the usb connected web cam inside the laptop stops working remove the driver for the m.2 wifi and every comes back good. These are problems that come about because all the windows drivers are developed in independently to each other so the developers are not seeing the other drivers so have not worked out they have caused a disaster.
–Unstable ABIs create a lot of churn. I’m not one to say it should never change. architectural changes may justify driver changes out of necessity but in a mature OS ABI breakages should be relatively infrequent and not so routine. —
There is bad news here. Fairly much every time intel/amd/arm… releases a new instruction set that is basically every new generation something at the instruction set level has changed. So the instruction set architecture(ISA) under the OS has massive churn. Even way cache and so on is done in the hardware causes performance churn. If you are trying to be a high performing OS like the Linux kernel tries it comes very hard to have a mature OS ABI without breakages.
–I do less kernel development than I used to, but it used to be a regular source of frustration especially if multiple kernel versions needed to be supported.–
I do agree its a regular source of frustration but is really simple to miss that the lower level at the ISA and hardware has a lot of churn and if you want to keep best performance possible this drives churn up into the OS ABI for kernel space.
This is another one of those trade offs. The more performance and platform support you OS kernel is going after the harder it comes to avoid OS ABI churn.
Microsoft with spectre and the like over time are forced by major ISA/hardware defects to break drivers. Of course not being willing to break drivers is costing windows performance in many areas.
This is one of those annoying trade offs. Pick either way and you have to give something up.
oiaohm,
It’s not that big a deal actually. The vast majority of linux users install a generic precompiled linux distro anyways. To the extent that you, as an end user, want to build with CPU specific optimizations, you can do it if you want, but in practice this is the exception rather than the norm. Very few linux users are running CPU specific kernel and userspace binaries.
I still don’t think this needs to happen very frequently though.
I’m not saying an ABI needs to be supported forever, it’s about getting a better balance than we have today.
He’s talking about (i believe) the 3c905, 3c905B and 3c905C. If you happened to have multiple cards in your system wether configured for bridging, routing or just multiple nics windows had separate drivers for all 3 revisions and you couldn’t have more than 1 installed at any given time..
bert64 yep you got them right. 3c905, 3c905B and 3c905C you can put all 3 at the same time in Linux system and have them work but under windows you are screwed.
There are many more binary drivers like this under windows where X + Y equals neither works and this is due to vendors doing their own thing.
–To the extent that you, as an end user, want to build with CPU specific optimizations, you can do it if you want, but in practice this is the exception rather than the norm. Very few linux users are running CPU specific kernel and userspace binaries.–
Alfman this is a big miss understanding on your part. A generic Linux kernel of x86-64 bit contains code paths particular for AMD and Intel… different CPUs. The Linux kernel alters it code path based on what CPU detects. You see this clearly
cat /sys/devices/system/cpu/vulnerabilities/* due to how the Linux kernel is changing based on what it detects the cpu is and what countermeasures it need to enable for security. But this is only a tip of a very big iceberg. CPU specific optimizations are part of your default generic Linux kernel build for distributions just being runtime enabled/disabled. This is also why you performance does not go up much if you do a CPU specific build the big change there is CPU compatibility goes way down.
oiaohm,
I don’t “misunderstand” that actually. The point was that the vast majority of linux users are not running a kernel or software specific to their CPUs (other than in very broad terms).
Do these kernels have conditional CPU code in them? Sure. I just don’t see much evidence that an ABI would actually present much of an issue. Like I keep saying, sometimes there are real reasons an ABI needs to be changed, but assuming the engineering is sound this is the exception rather than the norm.
–Do these kernels have conditional CPU code in them? Sure. I just don’t see much evidence that an ABI would actually present much of an issue. Like I keep saying, sometimes there are real reasons an ABI needs to be changed, but assuming the engineering is sound this is the exception rather than the norm.–
Problem is more than what you would think. If you follow all the bit that advertise their usage under /sys/devices/system/cpu/vulnerabilities/* over 95 percent of them require a ABI breaking change. Yes Microsoft with windows implements less of these features than Linux due to the the ABI breakage and pray they don’t come problem. There are a lot of performance alterations that are also a ABI breaking change.
The reality is Linux kernel developers are most not breaking the kernel ABI just on a wim, Not having a locked kernel ABI allows them to do more security mitigations and more performance optimizations.
The reality of this was in fact written in
https://www.kernel.org/doc/Documentation/process/stable-api-nonsense.rst
If you read the complete thing you see the example about the Linux kernel USB stack how its changed for performance and security that would have been blocked with a stable ABI. You see that the Linux kernel developers remove functions that no in kernel driver users this is for security of course this screws over out of tree drivers.
This is a good problem how do you know when you can delete a ABI function that has been deprecated without either the Apple model or the Linux kernel driver must be mainline model? Yes you can look at windows and find a lot functions that Microsoft has implement that you can find no users of that Microsoft is still maintained because they don’t know if those functions are used or not this does increase attack surface area.
Horrible as it sounds choosing a in ring0 address space stable ABI=agree to tie hands on how to implement security mitigations in ring0 and performance optimizations in ring0.
The Linux kernel maintaining a stable ABI to userspace is a different matter ring3 there is a lot you can do to contain security because in ring3 the process code does not have the privilege todo massive harm to the system in most cases.
The reality is you want a stable driver ABI and all cpu security alterations implemented in a timely manner what you are asking for is a Microkernel where the drivers run in userspace with all the performance overhead that brings. Andrew S. Tanenbaum the author of Minux pointed this fact out to Linus Torvalds the lead developer of Linux in 1991 with functional examples and those examples are still true today. Yes and Linus Torvald responded with the overhead costs of Microkernels and the performance cost of doing it this way and what he said there is still also true today. So in over 30 years now the trade offs have not changed.
Redhat does for their enterprise kernel attempt to maintain a stable kernel ABI then end up putting up at times are bogus CVE so they can get performance alterations into those kernel and remain competitive as well as still have ABI breakages due to security problems.
The Linux mainline kernel does 4 release a year. –but assuming the engineering is sound this is the exception rather than the norm– this point is about to have problem. Lets take the Spectre (security vulnerability) mitigaiton from 2018 to now this mitigation has been revised in mainline kernel 10 times so far(please note the so far). Each one if the Linux kernel had been a stable ABI would have been a abi breaking change. Do note I said 4 releases per year there is a ABI break so every Linux kernel since the outbreak of spectre right up today has had what would have been a ABI breaking change because of just the Spectre fault. Between mitigation against the fault and remove the performance cost of the mitigation as much as possible. So one security fault causes years of ABI distribution if you are fixing it properly. Now you run into the next problem. Developers fixing a security fault don’t want to advertise what they are up-to.
oiaohm,
They actually do though and I’ve experienced it several times. It’s not because it had to break but because they simply lacked a well engineered ABI to adhere to. It’s not that one couldn’t be developed.
We’ll have to disagree on this one, I’m afraid.
–They actually do though and I’ve experienced it several times. It’s not because it had to break but because they simply lacked a well engineered ABI to adhere to. It’s not that one couldn’t be developed.
We’ll have to disagree on this one, I’m afraid.–
Please note I wrote not just on a whim(sorry I miss a h there). That is they sometime do ABI changes on a whim. But the majority are not.
Problem here is well engineered ABI sounds like a good idea. Its like the readfile syscall the reason why that need if you look at historic well engineered its a stupid idea because back then they did not have the atomic operation problem between multi cores. This is the problem what is well engineered with this years hardware and knowledge can be completely stupid next year when a security fault comes out.
Also there are quite a few Linux kernel space ABI changes that person doing them provides no description why they are doing them yet they are getting approved then you find out 5 to 10 years latter that it was a change to fix a non public security fault that was covered in the Linux kernel security mailing list.
Linux kernel is open source but not all development details are in fact open to the general public or those maintaining drivers outside mainline and worst not even to those developing drivers in kernel. Again look a spectre security fixes where different Linux developers at the started end up working solo because their NDA with intel said they had to be solo. There is way less changes in the Linux kernel that just happen to be on a whim. The changes are generally more engineered than they first appear. There is a problem with NDAs and other thing where this information is not openly shared. Of course the information not being openly shared why they are doing X makes it look like it happened on a whim with no engineering consideration when that not in fact true.
There has been a lot of work after spectre to alter the rules to allow more of the engineering information of the Linux kernel to be shared. Please note I said more not all. So levels of being in the dark on the Linux kernel engineering.
3) Out of tree modules not working way to mainline they are really in dark.
2) Mainline modules developers or modules working to mainline will have access to a percentage of the Linux kernel engineering information horrible provided without detailed information why at times.
1) those in the security mailing list have access most engineering information.
0) Mr nobody has the complete engineering information because there are still NDA and CVE like requirements(the 90 days with vendor before being public) in way.
Like I am not personally happy that 0 exists the way it does or the way 2 is being handled but they are interlinked. It does lead to a mistake like you made thinking the in kernel space ABI of the Linux is not engineered. Its complex horrible engineering but its not a non engineered API/ABI. It would be valid to say its not publicly engineered API/ABI due to the legal requirements.
oiaohm,
I don’t agree about the extent to which a well designed ABI causes security problems. And even if there was an exception, we shouldn’t let such rare events dictate policy. IMHO it’s more about politics than serious technical concerns. We know we’re not going to convince the other party, so how about we agree to disagree. That seems to be the inevitable outcome here.
oiaohm,
Who are you kidding, you’re a smartass all the time, haha.
The thing is I have absolutely no reason or way to act on your claim when you can’t even identify specifically what you are referring to. That what I was trying to tell you with that silly doctor sketch 🙂 I’ll deal with any such problems if they come up, but it’s 100% pointless for me to go in search of a problem I don’t have.
oiaohm,
While we’re on the topic of building the kernel, I’ve got a question that you might have a solution to.
I’ve been automating builds for an internal distro, but I still don’t have a good way to automatically determine the modules/drivers/options that are going to be needed on a target system. Are you aware of any tool to help with this?
In general I’ve been adding drivers manually as needed, but I’ve been thinking of automating this somehow. I’m not aware of a tool that does this already. The closest thing is make localmodconfig, but it uses lsmod based on the current distro’s kernel, which is not what I want or need.
https://www.linuxquestions.org/questions/slackware-14/how-to-compile-kernel-with-only-needed-drivers-881776/
https://bbs.archlinux.org/viewtopic.php?id=103406
Ideally it would scan the PCI/USB ids and automatically build the needed drivers. One way or another I could build a database with automation tools to do this, but I’m not sure if this work has already been done somewhere else…?
Alfman I am suspecting you are not going to 100 percent like my answer.
The idea of scan PCI/USB ids and work out what parts are need cannot work in a dependable way. There are too many parts that have the same PCI/USB id that do in fact require different drivers by quirks. You are needing a lot more detail. Some cases you are needing the driver to run over hardware with the module init to work out what hardware is and what extra drivers it needs to work. Since this is from module init data from a different version of Linux kernel can also screw you over.
https://wiki.archlinux.org/index.php/Modprobed-db
The closest I have seen is like the archlinux modprobed-db to track what the Linux kernel has used in modules and then feeding that data into loadmodconfig.
loadmodconfig is more usable than you think.
https://github.com/torvalds/linux/blob/fcadab740480e0e0e9fa9bd272acd409884d431a/scripts/kconfig/streamline_config.pl#L137
Linux kernel supports lsmod command being replaced with a file listing all modules that the target system will need by the LSMOD environmental var.
Even so embedded system it really easy to work out the complete list of everything that could get connected to it. Desktop computer this can be a true nightmare.
The module options required that gets a even bigger pain in but. You can have two identical model development boards but due to hardware defects need different module options. I have also seen this in ATX motherboards.
Alfman reason why I said you are not going to like this is the best automated option will be build the kernel at least twice once with basically every feature you can run that collect data what modules are need. From that data take a stab at it and hope you did not miss anything important that you don’t regularly use. For embedded systems where you really do know in advance what should connected this process above works. Desktop users not so much.
Worst one I have seen was from IT person who was not me who always said I was stupid to use stock distribution kernel that I should be optimizing. Server crashed and we need to access the backup drives. My system since it had all the drivers of the server no problems accessing the backup drivers. The other guy he was thinking something had to be broken with the backup drives and was looking at costs of data-recover and so on. Remember panic of a disaster like this you brain is going to forgot you customized your computers kernel unless to make it idiot clear you have.
For desktop usage the performance gain by removing modules not built into the kernel is basically zero. Because a not load module does not end up consuming cpu power. The headaches of not having that module when you need it turns out to be quite a lot.
The boot time savings having all modules built into the kernel instead of just loading the ones you need by the initramfs system is also close to zero.
Even scheduler modifications Liquorix can end up worse than using the mainline.
I can understand wanting LSMOD option in the Linux kernel so for a embedded system you can loadmodconfig build on a bigger and faster system.
oiaohm,
I concede the point that they ids may not be able to distinguish all variations of the hardware, however I consider that a runtime problem for linux’s plug and play system. It doesn’t change what I want to do in my case because I just want to make sure that all the drivers that handle those ids get included. Automating this should be no more difficult than what I’m doing today manually.
The lsmod approach does me no good because it assumes the existing kernel already knows which modules I need, which won’t be the case. But that’s an interesting idea anyways. Logging pci/usb ids would be more useful for me.
This is no worse than anything I deal with today anyways, I can add more variations to the database as needed.
This is exactly what I want to avoid though. I don’t have to do this today and I don’t think it’s necessary either. After all, the kernel is not loading all the drivers every time it boots up, so there’s no reason to build all the drivers when the kernel isn’t even going to attempt to open them. Essentially I’d like to automate what I’m already doing manually. I’m pretty confident I can get it working to the point where it can satisfy my needs, but I wasn’t sure if someone has already done the work, which is why I wanted to ask.
–I concede the point that they ids may not be able to distinguish all variations of the hardware, however I consider that a runtime problem for linux’s plug and play system–
Simplest example of problem with the USB/PCI ids is the USB modeswitch devices where the USB id can in fact change after a modeswitch. So you would need something like modprobed-db to run along collecting the IDs as they change. Yes there are PCIe devices that do the same thing.
This leads to a problem. You have a new Linux kernel with new hardware support a bit hardware in your machine that you have never see the PCI/USB ID of because the device has never been switch on you will not know about.
–This is exactly what I want to avoid though. I don’t have to do this today and I don’t think it’s necessary either. After all, the kernel is not loading all the drivers every time it boots up, so there’s no reason to build all the drivers when the kernel isn’t even going to attempt to open them.–
This says you are missing the problem. You don’t know what ones the Linux kernel is going attempt to open until it does. You cannot be sure know all the PCI/USB ids in you system until you have monitored a boot and everything has been activated.
archlinux modprobed-db I recommend a lot over the plain lsmod. Yes the plain lsmod can miss init drivers. When I say init driver they init the hardware then exit once done. So by the time you are building kernel they are no longer in your normal lsmod command answer so you are needing to use modprobed-db with the LSMOD option so your kernel works.
modprobed-db like item is what you need. To use a modprobed-db item means building a fully functional kernel do you can probe as much as possible and get the most detail as possible. Yes this is even if you want to be collecting the list of PCI/USB ids the system has.
I do not know of a method of doing this automated that will not need at some point some manual tweaking. The fun of finding broken bits of functionality.
One of the specialist pci network accelerator cards changes it PCI id value 4 times before it to full functionality. Yes this is a network card times 4 you need 4 network card drivers to get it fired up. The first 3 follow the process of load driver init card then modeset to the next card that results in the card being removed from the pcie bus so the kernel unloading the driver. Yes if you had done a PCI map after the system was up with that card you are missing 3 ids so missing 3 drivers. Fun part is that card can also be made so it only need 1 driver not 4. Yes once fired up the card that requires 1 driver and the card that required 4 has the same PCI id. The horrible like this that lead to you must boot the system and monitor to know what drivers are really need. Some items are going to magically appear and disappear though the system init process.
oiaohm,
I do understand the problem and I’ve written my own device loader for my distro which has been working for 16 years now, so don’t tell me I’m missing the problem. It’s not really that complex or mysterious.. Drivers are only loaded when their device identifier gets picked up, not before. If there’s nothing to trigger a driver to get loaded, it won’t get loaded. Again, this is a non issue for my purposes.
Of course you’ve got to initialize all the bus drivers before the devices can be scanned, but in general a device has to present a device identifier before the kernel will load it’s driver. I haven’t seen any modern PNP hardware that needs the driver to be loaded before it can be identified. This doesn’t mean it doesn’t exist though so can you give a specific example? Honestly I don’t think it’s a problem for my purposes.
Like I said before, lsmod and modprobe-db are inadequate for my purposes as they assume an already running kernel with all the necessary modules loaded and that the modules haven’t changed.
Everything I want I’ve already done by hand, so I don’t expect it will be a big problem to copy distro’s loader’s logic into the automated build process. It just means I have to extract information from the source code and I don’t know if any of this has already bee done before. If not, then maybe it could be useful for others too.
–Of course you’ve got to initialize all the bus drivers before the devices can be scanned, but in general a device has to present a device identifier before the kernel will load it’s driver. I haven’t seen any modern PNP hardware that needs the driver to be loaded before it can be identified. This doesn’t mean it doesn’t exist though so can you give a specific example? —
I will give you this has been more common in high end network accelerators. Its very much the same old usb_modeswitch. Except they are exposing that they are X ethernet controller that OS will commonly have driver to so that people can download from internet the driver/kernel to drive these beasts.
There are already m.2 storage drivers appearing with the same technology. These are your controller reconfigurable m.2 drives. These are the ones where you Quad-level cell(4bits)/Penta-Level Cell(5bits) storage chips yet you can choose to expose a virtual m.2 drive as section as ‘Single-Level Cell’. Some of these have a boot mode and activated mode on different PCI IDs.
This is the problem going forwards at some point you are going to have a bit hardware that is multi PCI ID just like you have USB devices that modeswitch that have two or more USB IDs.
Yes the recent change to support / on tmpfs instead of read media is to deal with some of the evil m.2 storage drives coming.
Also there are a few rugged laptops where a PCI device appears in shutdown after particular bits of hardware are shutdown to configure battery state as in do I full cut power from the battery include disabling ability to change or do I shutdown with the ablity to change battery still enabled. This is a water resistance feature that the charge port cannot be shorted when fully relay cut off.
Please note I said –modprobed-db like item– modprobe-db runs in background though a complete bootup and shutdown detecting anything that appears in that process. To do this by PCI ID and USB IDs to make sure you have got a complete list you need to run something like this. Going forwards we are going to have more mode switching hardware.
–It just means I have to extract information from the source code and I don’t know if any of this has already bee done before. —
This idea going forwards is running into the same problem why Static program analysis cannot find every fault. There is a dynamic side to hardware.
Alfman the route I am recommend is the future proof route. The route you have been doing has been with less mode switching hardware around.
Yes some hardware developers are doing more mode switching so Windows 10 cannot do stupid things. Like cut power to change batteries while OS is running and they do that by taking away the control device when a particular device is activated and returning that device when that device is disactivated. Pcie bus support hotplug of cards so doing this is 100 percent legal. Does make life of those going forwards attempting to make a optimized Linux kernel lot more fun.
Yes the m.2 coming could mean the old method you miss the module you need to access the m.2 boot partitions and only have the module to access the M.2 drive when its fully activated.
The annoying things about these modeswitch PCIe devices is that you will have multi models some with the modeswitch some without. Same as you have some USB devices with modeswitch and without and the final driver is the same thing.
Please note these pcie stuff mode switching are not bus driver issues.
Basically we are head on a path that its going to come impossible to use a statically acquired list to build a custom kernel as the number of mode-switching parts increase.
As I said from the start you were not going to like what I was recommend where its build 2 kernels. Sooner or latter that will be your only path. Why because you will be needing to active every PCIe device to see if another PCIe device appears and you will need to do a correct shutdown path to see if any more PCIe devices appear. This will be the only way to get a full finger print of the system. The PCIe bus supporting hotplug is a double side sword. Yes this is not using hotplug in the way it was originally intended. So you will be needing something like modprobed-db sooner or latter. Yes the appearance of devices under this pcie modeswitch solutions can change based on how they were inited. So a driver update in the Linux kernel that alters how a device was inited make make a PCIe device that was hidden now appear.
Alfman welcome to the cursed future. I am not particularly looking forward to it for hardware debugging.
oiaohm,
I was hoping you’d give a specific example so I can actually look it up.
But if a particular system needs to use “quirks” then we add it to the database on move on, it’s not that big a deal to me.
You are way overthinking it, I just need to get at the same data that depmod already produces and modprobe already uses. The only question is whether I can shortcut the full compilation process.
— I just need to get at the same data that depmod already produces and modprobe already uses–
This information is incomplete.
A more common device you will find that has this modeswitch problem is 4G and 5G m.2 modems. These are wacky. They can have a jump that disables modeswitch mode. But they can also be the worst kind of horrible.
In modeswitch mode. You power on device by the USB port in the m.2 for these devices you see a USB cdrom drive OS need to basically init that(not mount just enough to make the device entry appears) before you can send the modeswitch command. You send send the mode switch command the USB device pulls the complete disappearing act now a PCIe device appears that is the 4G/5G modem. When you shutdown the pcie device disappears and the USB cdrom device reappears. You can also find this with some wifi cards inside laptops.
You look at the depmod data 4g/5g/wifi m.2 modem/pcie part it will not tell you that this device needs the USB cdrom drive or that the device has two totally different id one on USB and one on PCIe. Those doing power management have been coming across these cross linked devices.
There are also pure PCIe devices The following of mine is not one but contains enough here that I can explain the problem because there are other cards like it that will trip you up.
26:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev ef)
26:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]
Notice this is one device as in in the 26 slot. But it has two unique PCI ids [1002:67df] and [1002:aaf0]. I current have a HDMI cable plugged in that has audio out to the screen. Some vendor versions of this card get creative that if you don’t have a HDMI with audio out plugged in the audio device[0403][1002:aaf0] will in fact be hidden yes will pull the disappearing act when you unplug the monitor as well.
Not all PCI/PCIe device have a single PCI id lots have multi PCIe ids. Depend on the version of the card and what is connected alters what you see on the PCIe id list.
Remember PCIe support hotplug. People think hotplug means remove complete card from slot. PCIe hotplug also support adding and remove a feature of the card on the fly. Of course the HDMI audio driver for AMD is not the GPU output driver. Of course if it hidden.
Alfman from that PCI information can you tell me what AMD GPU that is or what brand that is or what card that really is. You can tell that its not a 570X or 580X but there are still 5 choices of what it can been with no manufacture information. Like that gives you no clue if it has display port audio or anything else fancy on that card that is currently not being displayed that will hotplug in on need.
Even using modprobed-db and tracking all the used modules you can still be missing modules. Horrible reality here without detailed hardware information you will not have enough information to get it perfectly right.
Fun point is that pcie hotplug is legal valid operation. Yes a driver many put firmware on a card then the device that was on pcie basically hotplug removes itself then the real latter on devices show up as they hotplug themselves in.
pcie list of devices is runtime dynamic. usb list of devices is runtime dynamc. i2c port list of devices can be runtime dynamic. Yes there have been fun things of tell a i2c device to power down and you have powered to the usb ports the keyboard and mouse are connected to on some motherboards. There are devices crossliked from i2c buses to usb and pcie. The 4g/5g/wifi device on m.2 is example of usb to pcie crosslink.
The internals of a modern day computer can be a true rats nest wiring mess with cables magically being connected and disconnected on the fly.
oiaohm,
It’s exactly the same information that linux itself has to determine a module needs to be loaded. Like I’ve been telling you, I’m not inclined to over-engineer some contrived solution to fix something that’s already been working without fail for me for over a decade. If I were to run into something that doesn’t work this way, I’ll just add it to the database and move on, this is not a show stopper to me.
You know that I can’t research this without specific IDs. These claims are not actionable to me. Just think if doctors spoke this way…
As always, thanks for the discussion. 🙂
–You know that I can’t research this without specific IDs. These claims are not actionable to me. Just think if doctors spoke this way…–
No point being a smart ass. I said my card AMD GPU card does not do the disappearing PCIe result in disappearing I don’t have the vendor names and the model of them on hand but there are AMD GPU cards with the same PCI ID as mine that do the displayed pci interface based on connected monitors.
My question to you was ID my card from the PCIe information I gave you. I know my card does not have the problem but other AMD cards do with the same PCI IDs. Even if I gave you a glxinfo data you would not be able to ID the real vendor of my GPU even that its a major brand. This still does not get enough for all the AMD boards. Like some of the newer boards do they have a USB-C port or not that is hidden.
This not appearing problem is part covered in the modprobedb usage documentation with the instruction make sure you have everything plugged in because if you don’t have something plugged in a Pci ID/usb id may not be in there so module that you need for that feature may not loaded. Modprobedb does have it defects that do come from the fact of the hidden pci/usb devices. Runtime detect picks up more but its not perfect there will still need to be per system tweaking. Yes new versions of drivers can and do at times change number of displayed pci devices on some pci cards and the new pci id to appear will be the triggers to cause new drivers in the kernel to be loaded. So you have a collected pci id list of your system from a older kernel that may not match the pci id list from a newer kernel.
Its a vendor choice if they do appearing and disappearing pcie with their GPUs. If you have not worked out this can be a vendors choice to on card auto power management like why have HDMI audio out processing on if nothing connected needs it.
–I were to run into something that doesn’t work this way, I’ll just add it to the database and move on, this is not a show stopper to me.–
This is not something you can database and solve correctly all the time from the device provided information. You need to build a database like modprobedb on the running system.
–It’s exactly the same information that linux itself has to determine a module needs to be loaded. —
This is you miss a basic the Linux kernel to know that a module needs to be loaded need to see the PCI/usb ID… in the list to know it can unload a module automatically it need to see that the device has been removed.
The Linux kernel does need to have some of the device cross link information for power management that is not in depend module because its not required in that area. Yes if you look at Linux kernel power management quirks you will find some of the cross linked problem child devices. Please note some not all.
But generally the Linux kernel module depend system is does not need to know that 10 ids on 1 card should appear on one card. Linux kernel only has to load the 10 modules that match to those PCIe ids. The problem I am talking about is those ids don’t need to stay displayed.
Yes loading a firmware into a card that activates the card does not mean you are in loading the vendor power management firmware or have exact feature list of the card and what drivers will need to load at max functionality.
It does not help that video and pdf notes to usb side of this problem from linuxplumbers are no longer online. Yes they are referenced in the Linux kernel documentation and are now a dead end to nothing. That presentation named one of the interlinked m.2 4G modems and one interlinked m.2 wifi card.
Alfman like with a AMD GPU I can think way kind of around problem. Not 100 percent auto tool. This is where you as a person manually put in a list port types on the back of the GPU. Think a AMD GPU for mining/compute that does not have video outputs why would it ever need to expose HDMI audio interface and then you have another AMD GPU without HDMI monitor with audio plugged in its also not needing to have a HDMI audio interface both use the exact same PCI ids from software side they look almost identical. There are other examples I don’t have in front of me where you absolute cannot tell from the software side. Lot will require a machine particular configuration file to have extra data that cannot be got by depend mod or dependably by looking at PCI ID or USB IDs due to modeswitching/appear on demard stuff. Yes seeing a AMD GPU in the list might be a trigger to show user questions. Some of the others like does X device have modeswitching on you are not going to have a general user answer but monitoring a complete boot and shutdown may be able to auto answer that.
My problem is I cannot give exact ID to put in database to auto solve this. There is so much real world hardware where 100+ different models share the same PCI and usb ids. There need to be so much human configuration.
pci id and usb ids were not designed to provide the data to build a properly optimized kernel by providing a full straight up functionality list of the connected devices. They are only designed to advertise information to know that a feature is now available so please load driver to allow user to use it.
Back in 1995 to around 1998/2000, I had an NE2000 installed in my everyday driver. I ran MS-Dos-6.22 (IPX/ODI), Windows95 (IPX/ODI, TCP/IP) and Windows98se (IPX/ODI, TCP/IP) without any issues at all. It was a good card, that ran fast and solid. However I used coaxial cable with the card, so I can not say anything about stability in regards to twisted pair.
Nope… It was not horrible. It was a good card.
brostenen I had a stack of clones of NE2000 from china makers in the 1995-2000. Some were good cards some were absolute nightmares. I have a old list of vendors for NE2000 where particular vendors are on the whitelist as providing nice card that work well and other vendors are on the list for providing NE2000 problem child cards. Yes there was blacklist and a greylist on problem child cards.
This is the problem talking about NE2000 not all were created equal. Some were really good cards that worked perfectly with coax and twisted pair(got on my whitelist). Some only worked well with coax and some only worked well with twisted pair(grey list) and some where complete garbage that would not work with either(blacklist don’t buy again). Some had strange breach of NE2000 behaviors.
Problem here a lot of people like you brostenen who had good experience with limited NE2000 cards never saw the bad ones. Then the people who got just the bad ones really did have major gripes with NE2000 particular vendor clone cards lot of them did not work out that it was particular vendors/model of cards were problems. People like me saw both the bad and the good know that is a vendor/model problem. So people like understand both sides and why both side have totally different ideas on NE2000 quality. There were good NE2000 cards there where total garbage NE2000 cards and there NE2000 cards that landed in the middle between good and garbage that the reality of NE2000 history.