Intel unveils Lunar Lake architecture, moves RAM on-die

Thom Holwerda 2024-06-04 Intel 19 Comments

Hot on the heels of AMD, here’s Intel’s next-generation processor, this time for the laptop market.

Overall, Lunar Lake represents their second generation of disaggregated SoC architecture for the mobile market, replacing the Meteor Lake architecture in the lower-end space. At this time, Intel has disclosed that it uses a 4P+4E (8 core) design, with hyper-threading/SMT disabled, so the total thread count supported by the processor is simply the number of CPU cores, e.g., 4P+4E/8T.
↫ Gavin Bonshor at AnandTech

The most significant change in Lunar Lake, however, has nothing to do with IPC improvements, core counts, or power usage. No, the massive sea change here is that Lunar Lake will do away with separate memory sticks, instead opting for on-die memory at a maximum of 32GB LPDDR5X. This is very similar to how Apple packages its memory on the M dies, and yes, this also means that as far as thin Intel laptops go, you’ll no longer be able to upgrade your memory after purchase. You choose your desired amount of memory at purchase, and that’s what you’ll be stuck with.

Buyer beware, I suppose. We can only hope Intel isn’t going to default to 8GB.

About The Author

Thom Holwerda

Follow me on Mastodon @thomholwerda@exquisite.social

19 Comments

2024-06-04 11:00 am
kurkosdr
This is very similar to how Apple packages its memory on the M dies, and yes, this also means that as far as thin Intel laptops go, you’ll no longer be able to upgrade your memory after purchase.
Thin laptops are essentially tablets with the SoC etc being under the keyboard instead of under the screen, you shouldn’t expect anything to be upgradeable.
The issue is with OEMs choosing 8GB instead of 16GB minimum (and don’t get me started about how 8GB of RAM isn’t enough anymore, but all software sucks, we already know that).

2024-06-04 11:52 am
Alfman verbose=1
kurkosdr,
Thin laptops are essentially tablets with the SoC etc being under the keyboard instead of under the screen, you shouldn’t expect anything to be upgradeable.
Upgradability is obviously one con, Increasing the watts going through the CPU is another con that means we have to sacrifice other aspects of the CPU at the top end. Depending on the target audience, this may not matter that much. But when the entire product line becomes dependent on the singular SoC model (ie ram/gpu), there’s just less headroom for scaling up. Apple have this problem with their M cpus. Thermal issues prevent their SoC from scaling up and performing as well in real applications as the specs might suggest (ie intensive loads on one subsystem bottlenecking another, which doesn’t happen with discrete components).
I think intel could find there is a market for this in consumer products despite the shortcomings. I just hope that discrete devices remain on the market long term for mainstream consumers who care about it – even if we switch to new types of physical interfaces.

2024-06-04 3:19 pm
kbd
Memory sockets will probably be moved up-market, like detachable batteries already were, preventing switching packs while e.g. traveling. Anyone not paying $1800 for a business-grade laptop will have non-upgradable garbage.
Of course, it depends on how far the PC makers want to copy Apple on this stuff; my spidey senses tell me “as far as the market will let them”, because that’s how it has worked in the smartphone realm. And of course there is a market segment where soldered memory is acceptable, such as the very cheapest, smallest micro-PCs and ultra-compact handheld gaming machines (which are very popular). But this type of design has no business being in 13-17 inch performance/gaming laptops, though we all know that it probably will.

2024-06-04 3:38 pm
Alfman verbose=1
kbd,
Memory sockets will probably be moved up-market, like detachable batteries already were, preventing switching packs while e.g. traveling. Anyone not paying $1800 for a business-grade laptop will have non-upgradable garbage.
…
But this type of design has no business being in 13-17 inch performance/gaming laptops, though we all know that it probably will.
That’s what I’m worried about. It’s admittedly premature to raise the alarm, but if user-servicable tech ceases to be accessible to regular consumers, it could be very regressive. Today normal commodity hardware is upgradable, but tomorrow it may no longer be if it is relegated to unaffordable hardware tiers.
Of course, it depends on how far the PC makers want to copy Apple on this stuff; my spidey senses tell me “as far as the market will let them”, because that’s how it has worked in the smartphone realm.
Yeah, we can’t call it yet, but there is potential for these kinds of developments to be regressive for things like right to repair if the industry broadly follows apple’s lead.

2024-06-04 2:14 pm
CaptainN-
Not every innovation is “buyers beware” – consumers should be wary of buying laptops with 8GB of memory, sure, but there are tangible benefits to having memory on the CPU die, including but not limited to potential cost savings (whether those are passed on to consumers remains to be seen) and lower power usage. Also, so few people upgrade or replace their RAM in laptops – I don’t see much downside here, especially since there are still products on the market (mostly those targeting higher performance with higher power consumption for something like workstations – ThinkPads and other professional platforms) that can be upgraded/repaired (Framework laptops).
The situation is not the same as with Apple – with Apple we’ve lost real professional class hardware (replaced with expensive and less valuable service contracts they call AppleCare, and other cloud services for backups because you might lose everything when your soldered drive fails…). It’s all consumer grade stuff, for good or bad in Apple land. As far as I can tell, Intel and AMD aren’t going to stop making the other stuff – they are just also making this. It makes COMPLETE sense in a world that wants to move to ARM for dubious reasons, like lower power consumption (which is true in the current product offerings, but has nothing to do with ISA).

2024-06-04 5:21 pm
Adurbe
I have to agree. The moving of the memory allows Intel to compete in form factors and profiles it couldn’t otherwise. Ultra thin laptops, tablets, even raspberry pi-like single boards.
What this Doesn’t mean is they won’t continue to offer and sell chips that use the more familiar external memory. They’ll almost certainly retain that format for server focused chips for example.
The truth is (sadly) that most people don’t upgrade memory after purchase, especially in laptops, where most OEM already solder it to the board.

2024-06-04 5:07 pm
Xanady Asem
The memory is on package, not on die FYI
2024-06-04 5:24 pm
benmhall
I love upgrading RAM on laptops, but even before this change it was getting hard to find a model where upgrades were possible. (Microsoft Surface, ThinkPad X1 Carbon and even larger T14s and X390s haven’t been upgradeable for ages now.)
I am typing this on an ASUS Zenbook 14 that has soldered RAM. Sadly, this is the norm now until you get into larger corporate or gaming/workstation grade machines.
I can’t imagine many people noticing or caring outside of tech circles.
The latest Intel Ultra CPUs are already a massive improvement over previous generations. If this is as significant an upgrade, I have no doubt that they’ll be great devices that can go toe-to-toe with what Apple and now Qualcomm are doing on ARM.
Is it ever nice for Intel to have real competition again! I’m not counting them out yet,
2024-06-04 5:29 pm
benmhall
I am very interested to see what they will do at the higher-end. Multi-core high power consumption tends to be where Intel can already compete well. If the RAM for those systems are also on-package, the limits will be interesting. Even for low-end machines, 16GB and 32GB as a limit isn’t great.
Many of the high-end scientific computing laptops we order for work have 64GB/128GB now. I’d expect this to be at least 256GB+ within a few years. (Not at the MacBook Air end of things, obviously.)

2024-06-04 5:53 pm
Alfman verbose=1
benmhall,
I am very interested to see what they will do at the higher-end. Multi-core high power consumption tends to be where Intel can already compete well. If the RAM for those systems are also on-package, the limits will be interesting. Even for low-end machines, 16GB and 32GB as a limit isn’t great.
Many of the high-end scientific computing laptops we order for work have 64GB/128GB now. I’d expect this to be at least 256GB+ within a few years. (Not at the MacBook Air end of things, obviously.)
I don’t expect them to do particularly well in high end workstations because of the tradeoff involved. Discrete components tend to scale better because they don’t have to compete as much for power & thermal capacity. Just as apple’s M CPUs did not deliver uncompromising high end workstations, I expect the same for intel.
What would have been interesting to me is for intel to add significant amounts of on-chip cache while retaining dedicated off chip memory. Not sure this would be cost effective, but you’d get the benefits as low latency cache while still benefiting from high capacity and upgradability needed by workstations.

2024-06-04 11:16 pm
sukru
Physics strike here.
Literally the speed of light affects how much data we can carry in a unit amount of time. Or rather “Speed of Electricity”: https://en.wikipedia.org/wiki/Speed_of_electricity
Consequently, a 1GHz system can carry data at most ~29cm at a time, whereas a modern 5GHz system will be limited to ~6cm. That is roughly the size of the die in one unit dimension.
Hence RAM gets closer and closer, and now essentially on the CPU itself.
But I would have preferred calling this L4 cache, and still supporting the older, larger RAM as extra. Like we do with SSD and HDD hybrid systems. I can have a 2TB nvme as a boot drive along with one or more 20TB drives for larger storage.
This will of course would make programming more difficult. But then we already need to know L1/L2/L3 cache sizes to write optimal programs.

2024-06-05 12:30 am
Alfman verbose=1
sukru,
Consequently, a 1GHz system can carry data at most ~29cm at a time, whereas a modern 5GHz system will be limited to ~6cm. That is roughly the size of the die in one unit dimension.
Hence RAM gets closer and closer, and now essentially on the CPU itself.
That’s true. Obviously lower latency is nice, especially if you intend to optimize for single threaded tasks. But it becomes less important in highly parallel tasks where latency of individual requests don’t matter as much as overall bandwidth backed by a large number of memory channels. Consider how much work GPUs can crank out despite much lower clock speeds. Hyperthreading uses more threads to allow execution units to keep running rather than stalling. Really high core count CPUs can get similar benefits with a large number of memory channels even with latency present.
But I would have preferred calling this L4 cache, and still supporting the older, larger RAM as extra.
Hey, that’s what I said 🙂 I think the hybrid approach makes a lot of sense – the benefit of all worlds.
This will of course would make programming more difficult. But then we already need to know L1/L2/L3 cache sizes to write optimal programs.
Well, most programmers are not targeting such specific CPUs like we used to. There are just too many models to hand optimize everything at scale. Most software will benefit just by having more cache be there, a few gigs of cache would likely cover most programs even with multitasking. And by not tacking tens or hundred gigs onto the CPU, we get more headroom to improve other specs like CPU cores & clocks.

2024-06-05 3:24 am
sukru
Alfman,
Consider how much work GPUs can crank out despite much lower clock speeds.
Yes. They do this exactly by having local RAM. HBM stacks on top of the die providing massive amounts of RAM bandwidths (upwards of TB/s). They cannot achieve this with traditional DDR RAM which tops at 70GB/s for highest end DDR5 (A 15x difference)
Hey, that’s what I said I think the hybrid approach makes a lot of sense – the benefit of all worlds.
Sorry, I missed that. But, it seems like great minds think alike.
Well, most programmers are not targeting such specific CPUs like we used to.
They don’t. But compiler writers and standard library developers do.
https://www.youtube.com/watch?v=cMRyQkrjEeI
https://www.youtube.com/watch?v=BP6NxVxDQIs
https://www.youtube.com/watch?v=FJJTYQYB1JQ

2024-06-05 12:17 pm
Alfman verbose=1
sukru,
Yes. They do this exactly by having local RAM. HBM stacks on top of the die providing massive amounts of RAM bandwidths (upwards of TB/s). They cannot achieve this with traditional DDR RAM which tops at 70GB/s for highest end DDR5 (A 15x difference)
Well, the thing is that CPUs also have “local” sRAM, which lowers latency in the same way a GPU does, so I don’t consider “local RAM” to be a GPU advantage. The latency of discrete GPU ram is actually worse than DDR ram. I don’t have the latest gen hardware, but nevertheless I did perform a benchmark on my i9-11900 DDR4 3200 + RTX3080TI to highlight this…all it does is sum sequential values from memory (strictly hitting the memory modules, does not fit in GPU or CPU cache).
https://ibb.co/6XQjFnt
GPU Latency threads= 1 total billion_ops_per_sec= 0.078 latency_per_op=12.746197ns
GPU Latency threads= 32 total billion_ops_per_sec= 2.466 latency_per_op= 0.405563ns
GPU Latency threads= 1024 total billion_ops_per_sec= 15.946 latency_per_op= 0.062713ns
GPU Latency threads=262144 total billion_ops_per_sec= 1692.346 latency_per_op= 0.000591ns
CPU Latency threads= 1 total billion_ops_per_sec= 2.460 latency_per_op= 0.406463ns
CPU Latency threads= 8 total billion_ops_per_sec= 17.966 latency_per_op= 0.055659ns
CPU Latency threads= 16 total billion_ops_per_sec= 18.591 latency_per_op= 0.053789ns
CPU Latency threads= 32 total billion_ops_per_sec= 18.121 latency_per_op= 0.055186ns
Note that “threads” represents requested threads rather than actual.
The GPU’s memory latency per single operation is 12.7sns. Meanwhile the CPU ram latency per single operation is is 0.4ns. That puts the GPU’s latency at 32X worse than the CPU’s DDR4 ram. As the CPU and GPU get loaded with more threads, each request physically incurs the same latency as before, but since they are operating concurrently these don’t add clock time, therefor the average latency across all threads drops until the memory channels reach saturation. This puts the GPU’s effective latency at 0.000591ns and the CPU’s effective latency at 0.055186ns. Now the CPU’s average latency is 93X worse than the GPU thanks to parallelism.
This was my point. discrete GPUs are able to achieve higher performance despite higher latency RAM access through sheer parallelism. And the same can be said for high core CPUs. Lower latency RAM is significantly more critical in single threaded applications in cases where the CPU would have to stall waiting for RAM, but with high parallelism this gets mitigated.
Sorry, I missed that. But, it seems like great minds think alike.
🙂
They don’t. But compiler writers and standard library developers do.
Sometimes that’s true. I accept that it’s possible to tell the compiler to optimize for a specific CPU, but most of the time we only distribute generic binaries that aren’t customized. Still you bring up an interesting point and I’m not sure how much performance we’re leaving on the table by using generic builds.

2024-06-05 12:42 pm
Alfman verbose=1
all it does is sum sequential values from memory (strictly hitting the memory modules, does not fit in GPU or CPU cache).
The way I stated this wasn’t clear. Because the memory access is sequential, larger blocks of memory than requested can be fetched and the CPU/GPU may optimize for streaming cases like this. It would be interesting to see how random access changes the results.

2024-06-05 12:29 am
yousif
LPCAMM2 dead on arrival??!

2024-06-05 12:33 am
Alfman verbose=1
yousif,
LPCAMM2 dead on arrival??!
I was wondering that too. I’d like to see serviceable memory stick around in some form or another., but whether the market will deliver is a matter of speculation right now.

2024-06-05 10:33 am
spiderdroid
Oh crap! This direction is the very reason I refreained from purchasing Apple Hardware from mobile devices to workstations. Field Replaceable Units are the reason for my purchases, aas I’m a fan of upgrading my machines. I wonder what this change means for business purchases?
2024-06-05 10:39 am
cb88
Its not on die its on package… aka the same as every tablet out there.
On die would not make any sense as the fab processes are drastically differernt for DRAM and Logic.