A page is the granularity at which an operating system manages memory. Most CPUs today support a 4 KB page size and so the Android OS and applications have historically been built and optimized to run with a 4 KB page size. ARM CPUs support the larger 16 KB page size. When Android uses this larger page size, we observe an overall performance boost of 5-10% while using ~9% additional memory.
In order to improve the operating system performance overall and to give device manufacturers an option to make this trade-off, Android 15 can run with 4 KB or 16 KB page sizes.
↫ Steven Moreland
Android 15 has been reworked to be page-size agnostic, meaning that a single binary can run on either 4 KB or 16 KB versions of Android. Any assumptions about page size have been removed from Android as well, the EROFS and F2FS file systems as well as UFS are now compatible with 16 KB, and a whole lot more things have been changed and refactored to make this transition as effortless as possible.
Application developers do need to do a few things, though. They’ll need to recompile their binaries with 16 KB alignment, after which they’ll need to be tested in a 16 KB version of an Android device or emulator. To make this possible, starting with Android 15 QPR1, the Pixel 8 and Pixel 8 Pro will get a new develop option that will reboot the device in 16 KB mode. In addition, Android Studio will gain a 16 KB emulator target as well. The 16 KB page size is an ARM-only feature, so people running the emulator on x86 devices will emulate the 16 KB page size, in which “the Kernel runs in 4 KB mode, but all addresses exposed to applications are aligned to 16 KB”.
Of course, Google urges Android developers to test for 16 KB page sizes as soon as possible.
So this requires new executable alignment, meaning no existing programs will run? And it’s not possible to JIT compile x86 user code to run on such a system, since the kernel cannot implement 4Kb page size semantics?
malxau,
I agree with you, it’s really odd. Here is the relevant quote:
Most local data/stack memory is aligned to 16bytes and not the page size, be it 4KB, 16KB, 2MB or whatever. I don’t think page size makes any difference at all to local allocations in typical user space software. We can change the page size on linux and it doesn’t require recompiling all the software.
The only functions that should care are those that explicitly allocate pages. sbrk accepts any byte length and does not depend on page size because the kernel can just round up.
https://linux.die.net/man/2/sbrk
mmap seems to be the main culprit here:
https://www.man7.org/linux/man-pages/man2/mmap.2.html
Unless mmap is specifically instructed to use the specified address using MAP_FIXED (normal software shouldn’t need this), the kernel will always return a page aligned address. No problem there. The length does have to be a multiple of page size though. So mmap lengths may need to change. But the thing about it is that correct code and memory allocators were already required to do that per the spec anyways.
Testing the software is definitely warranted, but correct code should require no changes at all. I don’t do android development though, is there something about the android SDK specifically that is dependent on 4K pages? Anyone know?
Most unix software is dynamically linked to standard libraries that include allocators. These can also be updated without recompiling the software. Does android not do this? That seems a bit weird.
Ah, this link contains more specific details!
https://developer.android.com/guide/practices/page-sizes#16-kb-impact
Apparently the recompile is needed to change the alignment specified in the ELF sections. Specifying a 4kB page size in the ELF could cause the loader to use an unaligned address on a 16kB architecture or configuration. Although I still don’t see why they couldn’t just fix the loader to use a larger alignment anyway. It seems to me if they did that the vast majority of native android software should still work fine and that would be worth doing. Am I missing something?
Ehhh… it affects the granularity that page permissions can be configured. I have some simple code to detect buffer overruns that places allocations on one page and marks the next page invalid, which means it needs to know the page size. If the page size is larger than the program expects, it ends up marking the page containing the allocation invalid, which immediately crashes. And my logic for detecting the page size is… uhh…
https://github.com/malxau/yori/blob/f982c24fd31439a7a108e79561c41f9393e30c65/lib/osver.c#L638
Gosh, I’m glad that’s worked for you. I tried it on PowerPC before (which supports configurable page sizes) and the result didn’t boot. I think though we’re starting to see why…
I was looking at PE with the same questions. Each executable contains multiple sections, and sections need different page permissions. But the problem is that the relative alignment within the executable is fixed – the read only data is at a specified offset relative to the start of the code, etc. The compiled code is expecting to find its global variables at that location. Increasing the page size requires the layout of the image in memory to change, which requires relocations that aren’t compiled into the binary.
With PE, the file on disk is 512 byte aligned, so increasing the in-memory alignment to 16Kb doesn’t take any extra disk space. It would force any 4Kb system to “space out” sections in memory though. There doesn’t seem to be any serious drawback to using a 64Kb (or whatever) section alignment, which should still work with a 4Kb page size.
malxau,
Yes, I can see that being the case if you hardcode the page size like that. If you are writing kernel code, hard coding might be ok, but normal linux software is meant to call “sysconf(_SC_PAGE_SIZE)” as documented in the man pages. This complies with the POSIX.1 standard.
https://www.man7.org/linux/man-pages/man3/sysconf.3.html
I’d expect for Android software to be using the same syscalls, but admittedly it’s an assumption on my part.
I don’t know the specifics of PE file relocation, though what you say makes sense. On unix the mmap call isn’t even obligated to honor the input address so I suspect the linux loader is tolerant of of that. Maybe things are platform specific. I might need to look closely at the code to get a clear answer.
malxau,
For fun, I checked some old code of mine to see how it handled page sizes. It used guard pages too! The guard page doesn’t really need to be allocated, doing this could waste a lot of memory with huge page sizes!
Here are some more links down the rabbit hole, haha..
https://dram.page/p/relative-relocs-explained/
https://medium.com/@boutnaru/linux-security-aslr-in-statically-linked-elfs-55556d13adc
I’m not sure if all architectures are able to handle position independent executables. A dynamic linker is normally able to relocate sections, but maybe not all code is compiled to be relocatable.
For the record, normal Windows software is meant to call GetSystemInfo. I doubt I’m the only one to not call it though, because I don’t think we’ve ever seen a page size change on an existing architecture before.
The Windows kernel exposed a PAGE_SIZE constant, but… do you want to be able to load older drivers on newer CPUs? I think the answer is “yes.” If it becomes common to change something like a page size, we’ll need quite a bit of ecosystem adjustment to work with it. Even this thread is about supporting 16Kb, not supporting arbitrary values or supporting future changes.
What I’m saying/seeing is the code can be relocated but expects to be together – ASLR implies loading things anywhere, but still keeping relative offsets unchanged. This also seems to be explicitly called out in the article you linked to:
Supporting arbitrary page sizes implies that any global variable access needs to understand a run-time defined offset to the beginning of the global variable area, and that offset can’t just be a global variable for obvious reasons. The normal way relocations work is to have a table of pointers whose values can change, but relocating sections means the location of that table of pointers needs to be able to change – it’s an extra layer of relocation. ASLR implies that location can’t be well known, but also can’t be at the same relative location. There’s probably some cute solution here but I’m not immediately seeing it.
This would have been easier in a segmented architecture like DOS, where code is relative to CS and data is relative to DS, so the loader can position them independently and the code is identical. Flat memory is simpler, but…well, it’s simpler.
In other words, my hardcoding of the page size doesn’t seem that uncommon or unusual, since it looks like the executable formats and run time loaders of the world did the same 😉
malxau,
Yes, I do concede that it’s possible developers won’t use the page size function. I just don’t think it’s that common for application developers to write their own page code over a standard library.
I get that. Things are a bit different on the linux side due to the fact that every module needs to be recompiled for every kernel build anyway.
Yes I read that too, however after way too much testing, the article does not seem to agree with what linux is doing, or maybe it’s just glossing over the details. The main elf binary data & code is randomized to one address, the heap is randomized to another. So far so good. But I’m also finding that shared library code & data are randomized to yet another address that is NOT at a constant offset to the main code. All the shared libraries have a common offset with respect to each other, but not with respect to the main binary. To be honest though I have no idea why it works this way. Why doesn’t ASLR take advantage of randomizing the offset between shared libraries too?
Testing shows that the offset between shared libraries is deterministic and constant on subsequent runs. However if I replace one of the shared libraries with a larger version of itself, the loader is forced to make more space for it. This effectively bumps the addresses of all subsequent shared library sections. The relative offsets between libraries before and after this change are therefor different. It makes sense to me why this needs to work, but then why doesn’t ASLR randomize it? My searches failed to come up with an answer.
(I’ve read that ASLR on windows and linux are different, so maybe nothing I’m saying applies to windows).
I don’t know the exact mechanics used for -pie under the hood. But my understanding is that the addresses are known to the dynamic linker. It knows not only the where the binaries are being randomly positioned, but also the compile time symbol offsets into each binary. Adding these two values should give you the final memory address. I’m not exactly sure how -pie conveys this information to the code, but it must come from the dynamic linker. I’d have to study this more to fill the gaps in my knowledge 🙂
Even though ELF files have provisions for the compiler to assign addresses, I think they’re treated more as hints. Obviously if they overlap or whatever the dynamic linker is going to place them where it wants to. Without -pie this is easier to comprehend (or is at least what I am more familiar with).
malxau,
BTW, what an awesome discussion! Thank you 🙂
I think this is confusing shared libraries with sections. What I mean by section is “individual part of a binary that requires specific page permissions.”
Running “link /dump /headers msvcr80.dll” as an example:
This is showing a .data section (will be on RW pages), a .rdata section (will be on RO pages), a .reloc section (for relocation information), and a .text section (will be on executable pages.) Each of these things needs to be on a page size boundary, because they get different page permissions. Each section is given a virtual address, which is really relative to the base address that the DLL loads at (it’s not absolute.)
Here, it’s easy to see that this binary implicitly assumes 4Kb pages, because all of the virtual addresses are 4Kb aligned. Loading this on a 16Kb page size means the virtual addresses of each section need to change relative to each other – not across shared libraries, but within a single shared library.
Normal relocation information has a fixed virtual address, in this case 0x76000 from the start of the DLL. But if the page size were 16Kb, it can’t be at that offset, so now the challenge is we need some relocation information that tells us where the relocation information is, etc.
That’s why the comment here about all native Android code needs to be relinked with 16Kb alignment to work. But it’s also why those binaries still won’t work on a 64Kb page size machine, etc. Nobody seems to be discussing how to build an executable that can run on an arbitrary page size, just realigning one to work at 16Kb. PE is actually nice here though, because the virtual address layout is not tied to the file layout, so internally aligning at 256Kb or somesuch is quite feasible, but it still requires all binaries in the universe to be relinked.
That’s why having APIs to query page size seems a bit…superfluous. Executables are already hardcoding page size assumptions into their layout. The page size can only be successfully queried if the executable can load, and it can only load if it is a multiple of the system’s real page size. In today’s world, it really only loads if the executable matches the system’s page size.
malxau,
Yes, I understand that.
I was not positive this was the case, but it does look like you are right. It would be possible to make these relocatable, but doesn’t seem to be done, at least not by default.
In theory it’s the same process that adjusts the addresses between shared libraries. It’s a matter of getting the compiler to generated a list of addresses that need to be adjusted. For example here is PIE code that loads and saves a global variable relative to RIP (really sorry about the formatting)..
A relocation table with addresses 0x4011e7 and 0x4011f2 would be all that’s strictly needed to adjust the relative offsets. The loader can see which section the address points to and add in a new offset corresponding to space that needs to be added between sections.
I disagree with this, it still seems like the right thing to do per the spec.
I’d like to do more tests to learn more about the implementation. Technically it should be doable if the compiler and loader support it. I guess the main question is whether it’s worth doing.
(Sorry.)
What I’m struggling with is, “where is that relocation table?” and “how does the code find it?”
It can’t be relative to rip if the “gap” between the code and relocation information can change.
It’s undesirable to have it at a fixed address.
Maybe it’s possible to guarantee all modules are loaded at a certain address alignment, allowing code to find the “beginning” of the module via some kind of “& 0xFFFF00000000” type thing, and from there, find a global relocation. That seems to both limit module size and lessen the effectiveness of ASLR though.
malxau,
Haha, I don’t know. Maybe an existing ELF relocation structure could be used, but I’m not familiar with the implementation to say for sure. I did not see evidence that relocating individual sections actually works, rather I was making deductions about how it would have to work.
Why not?
Say there’s two sections, aligned to 4k, with code in section 2 referencing an address in section 1. Note I don’t know the actual mov opcode length, I use 6 as an example…
Now say we need to realign the sections to 1M, despite sections moving and relative RIP, the adjustment is strait forward. The distance increased by 0xff000 and all the offsets can be adjusted accordingly
Hopefully I didn’t misunderstand you, but did my example make sense? I think it solves the issue for page alignments (assuming the compiler produces the location of the offsets). I assume that every process on the system would be using the same page size. This would be important because in order for multiple processes to share the same code memory they would obviously need to share the same offsets in the code. If the offsets were different in every process, then we wouldn’t be able to share the code memory between them.
Hypothetically then, with respect to ASLR, we could randomize the section addresses too on initialization, but the random offsets between sections would have to become finalized once loaded. We couldn’t randomize the section addresses again without invalidating the offsets in other processes.
I think that’s where things are going, with two caveats.
First, note this supports page sizes that are 1Mb or an even power of 2 less than 1Mb. That is, it doesn’t support arbitrary page sizes, it supports a range of sizes up to a maximum. It sounds like what’s being proposed is to do this with 16Kb alignment, which works for 16Kb pages and 4Kb pages, but not 64Kb pages. I’d claim this support is insufficient to be future proof, and we’ll end up with another ABI incompatibility in a decade or so.
Second, the “vibe” I’m getting, although it doesn’t seem explicitly stated and I’m not certain, is that ELF is physically laying out the file with those 16Kb gaps. That’s why they’re not jumping to 64Kb or 1Mb, because there’s a disk space cost to making that number larger. In PE this should work with no additional disk space, although I’m hesitant to do it because it means testing these new binaries with all previous versions of the PE loader to check that they will operate correctly.
malxau,
That’s my understanding as well. I think arbitrary sizes would be possible as discussed, but that’s not what they are doing.
You’re right that linux binaries have an awful lot of empty bytes, which will get worse with larger pages like 512k. This won’t be too noticeable with larger binaries (pages on average will be more full), but a small 20k program might require a few 512k pages, that’s a lot of overhead 🙁
As if Android didn’t already waste too much memory, this will make it even worse… 🙁
Minuous,
It’s true. Garbage collected languages tend to use ~3X more memory than unmanaged languages, depending on circumstances of course. But in this day and age I personally think switching to 16kB pages is justified…
https://developer.android.com/guide/practices/page-sizes#benefits
16ko pages may not increase RAM usage that much. But performance, especially for matrix calculation (think IA…) will be improved.