Now that ARM’s memory tagging, used extensively by Android ROMs such as GrapheneOS and now also by Apple, is becoming the new norm to aid in improving memory safety, the x86 world can’t sit idly by. As such, Intel and AMD have announced a ChkTag, x86’s version of memory tagging.
ChkTag is a set of new and enhanced x86 instructions to detect memory safety violations, such as buffer overflows and misuses of freed memory (use-after-free). ChkTag is designed to be suitable for hardening applications, operating system kernels, hypervisors for virtualization, and UEFI firmware. ChkTag places control in the software developers’ hands to balance their security needs with operational elements that often become prominent when deploying code. For example, ChkTag provides instruction-granular control over which memory accesses are checked. Compilers can offer optimizations and new language features or intrinsics. ChkTag prepares x86 for a future with increasing amounts of code written in memory-safe languages running alongside code in other languages. Furthermore, ChkTag loads tags from linear/virtual memory that can often be committed on demand.
↫ Intel and AMD’s announcement
It’s important to note that ChkTag – why not just call it CheckTag – isn’t ready yet, nor is there any indication when it will be included in any processors from Intel and AMD. The goal is to catch certain memory safety problems in hardware. According to Intel and AMD’s shared announcement, developers will have fine-grained control over the feature, allowing them to tap into the functionality in whatever way they deem necessary or valuable for their software in specific circumstances.
My fear is that Intel and AMD will use this feature as a product differentiator, restricting it to either more expensive processors or to Xeon/Threadripper processors, thereby fracturing the market. This would inevitably lead to spotty support for the feature across the x86 landscape, meaning most ordinary consumer won’t benefit from it at all.

Intel probably, but AMD has no problems bringing “pro” features such as ECC to consumer systems, forcing Intel to follow along. If this new ChkTag features doesn’t cost too many transistors, I expect AMD to offer it across the line-up.
Traditionally Intel definitely has done this, but AMD historically seems to be more loose, such as with virtualization extensions available, generally, top to bottom of the stack. Not just more expensive or “pro” desktop/workstation CPUs.
Assuming this new instruction set appears tomorrow, and is put on the celerons on the weekend, the world will still have billions of chktagless x86s for the coming decade. So, Microsoft won’t be in a hurry to introduce support for it, and even when it does, it will come with enforced edge, copilot, and startmenu ads shits, so I won’t get it because I won’t update the machine. I’m 47 years old, and I am losing my job to LLMs. I don’t expect to earn a living long enough to see that chktag thing in action.
Sure, you’ll say, it will appear on ATMs or such. I don’t expect to have cash in curculation for that long as well.
From the article…
This is NOT needed for languages that are already memory safe. It’s really unsafe languages that benefit from hardware accelerated tagging. Tagging should still be considered worse than a memory safe language because safe languages provide stronger assurances that the compiled code doesn’t contain these faults in the first-place. Whereas tagging doesn’t fix the fault, but makes ti more likely to be caught when it does happen.
Here’s a much more technical writeup, obviously it doesn’t cover this new x86 tagging accelerator…
“Memory Tagging and how it improves C/C++ memory safety”
https://arxiv.org/pdf/1802.09517
Looks like memory tagging has been hacked via side channel attack:
https://www.youtube.com/watch?v=DoPb4mG-7TY
Seems blackhat can work out the tag and then its easy to adjust the pointer as required.
If your OS Kernel has zero bugs, memory tagging won’t make the OS more secure.
So is it really any benefit?
Speculative Execution of modern processors has caused many security issues. Is there a way to keep the performance improvements of Speculative Execution but prevent the security issues?
tom9876543,
Thanks for the link!
I’m not a huge fan of probabilistic solutions to the problem of coding faults. Even techniques like ASLR address the symptom and not the real cause. As an interim hack, fine whatever, but I’ve always felt that the end goal needs to explicitly be fixing the root cause, which none of these probabilistic techniques do. Bah.
Yes, haha, you get it! Speculative execution results in inherent timing leaks. The only way to reveal zero information through timing leaks is to make sure all paths take the same time. This is easy to do by forcing every path to match the worst case time, but then that defeats the opportunistic performance benefits. Not all timing leaks are critical to security, so in theory we might add meta data to programming languages to tell the CPU if a specific speculative shortcut would compromise security, this way the software could speculate were it doesn’t matter and disable speculation where it does. But this is all very wishful thinking and I have no confidence in software using it correctly.
IMHO the industry should move away from speculative CPUs and do more work with explicit parallelism as in the GPGPU model. GPUs have proven themselves with graphics, but I think that clever algorithms can make GPGPUs work with normal programs as well.
Rather than running one task as quickly as possible, a parallel model (with thousands of threads) can afford to execute those threads slower (and without the risks associated with speculation) but still come out far ahead thanks to the parallelism. Not only is the GPU far faster in aggregate, but because GPU threads are very simple and efficient, it consumes less power to do the same work. Alas a lot of software relies heavily on sequential CPUs and doesn’t translate to GPGPU well. However I don’t think it would be impossible to get there. The mainframe programming model, where tasks get queued into batches, opens up some interesting opportunities parallelism. Even if the individual tasks are very sequential in nature, a GPU might execute thousands of these batch requests in parallel.
This model could work with many tasks that we don’t normally think of using GPUs for. Think of a router with iptables or a network daemon like nginx. Even with high performance speculative CPUs, the computational performance pales in comparison to GPUs. A GPU implementation can offer magnitudes more threads, and do so way more efficiently, meaning less power and heat. This scalability would likely be wasted on a home computer, but in the data center where loads are never ending, GPGPU implementations could be hugely beneficial.
Anyway, this is very interesting to me, but I need to stop before this wall of text gets any worse, haha.
Alfman,
You can always avoid speculative execution. For example, many algorithms can be coded to avoid if branches by using arithmetic techniques.
However they are harder to read, much more tedious to program, and as you mentioned usually slower.
For example, if we had
if a < b { return "less"; } else { return more" }
we can rewrite this with a mask and low level pointers or with an array indexing.
int mask = (1 – (a < b)) (0x1111 1111 1111 1111 = true, 0x0 = false)
return (~mask & “less”) | (mask & “more”); return one of the string addresses based on mask
We can also do a lookup. However the advantage of speculative execution is we won’t be making double the work (here you can see we actually calculate both branches, but return only one, and drop the other — exactly like what speculative execution would do)
sukru,
Yeah, there’s lots of cool math tricks you can do.
For kicks, I tried your program.
Obviously C consideres any non-zero value to be true, but I honestly wasn’t too sure about the reverse, casting “a<b" to an int. In that case would true become 1 or ~0 as you had documented. Testing in GCC right now gives me 0 and 1, not ~0.
Outputs…
So clearly the assumption doesn’t hold in GCC. The fact that the mask isn’t all ones or zeros leaves puts to output the blended bits of the two pointers…fun stuff 🙂 I fixed this as follows…
Outputs…
I checked the assembly output to make sure that “a<b" actually performs the math and doesn't result in a conditional, which it did not.
Note that setl opcode only returns 0 or 1 (not ~0), but I don’t know if the C standard strictly specifies this, so in other cases or on other architectures there might be a different valid value for true, so it may be dangerous to rely on this behavior.
https://www.felixcloutier.com/x86/setcc
Granted, I’ve over analyzed your example. I understand the point you are making, the way we code programs can change the speculative aspects of the code’s execution in subtle ways 🙂
Alfman,
Sorry, it was supposed to be (0 – (a < b)) not (1 – (a < b)) :facepalm-emoji:
(essentially a typecast to int)
% cat test.cc
#include
using namespace std;
int main() {
cout << 0 – (1 < 2) << endl;
cout << 0 – (2 < 1) << endl;
}
And then
% ./test
-1
0
-1 will have all bits set, hence the required mask to pass values as is in the “true” case.
Just for fun I asked Google’s Gemini to implement a 32 element “branchless” sort. (The for loops can be unrolled):
https://gemini.google.com/share/99ff2833e3f7
The algorithms work, but they are limited, i.e.: fixed size here. (Actually *should* work as in I don’t currently have a CUDA machine to test it).
There are also ways to use reduced amount of branches for an hybrid solution (using Radix Sort algorithm). Which is also going to be generally faster wrt. branch prediction errors (for a random input, the branch prediction will obviously fail about half of the time).
Anyway, you can see why these are not more common.
There is no kernel with zero bugs, though. Most kernels are written in at least some C or C++, and I still have to meet the mythical C or C++ programmer willing to take responsibility for any financial damages caused by memory safety vulnerabilities in their code.
And since we apparently aren’t getting rid of C or C++ because performance is a hell of a dug, at least we can mitigate the damage of those memory safety vulnerabilities.
kurkosdr,
What about seL4 and Muen?
https://dl.acm.org/doi/10.1145/1629575.1629596
https://www.muen.sk/
I’ve got to agree, While many come to the defense of languages that are notoriously vulnerable to memory faults, nobody wants to be held accountable for those faults.
There was a time when memory safe languages required compromising in areas like performance. Sometimes managed languages actually perform better than C/C++, but they typically need a lot more memory and experience jitter associated with garbage collection, which is shunned especially in low level code. So continuing to use unsafe languages still made sense then. But now that we’re seeing safe languages that validate memory correctness at compile time, I think it’d be an appropriate time to consider phasing out unsafe languages even in low level performance sensitive domains. IMHO the main challenges for safe languages going forward aren’t so much technical challenges, but adoption.
I like that Rust checks a lot of boxes, but many of us aren’t fans of the syntax. :-/