If you’ve read my first post about Spatial Video, the second about Encoding Spatial Video, or if you’ve used my command-line tool, you may recall a mention of Apple’s mysterious “fisheye” projection format. Mysterious because they’ve documented a CMProjectionType.fisheye enumeration with no elaboration, they stream their immersive Apple TV+ videos in this format, yet they’ve provided no method to produce or playback third-party content using this projection type.
Additionally, the format is undocumented, they haven’t responded to an open question on the Apple Discussion Forums asking for more detail, and they didn’t cover it in their WWDC23 sessions. As someone who has experience in this area – and a relentless curiosity – I’ve spent time digging-in to Apple’s fisheye projection format, and this post shares what I’ve learned.
↫ Mike Swanson
There is just so much cool technology crammed into the Vision Pro, from the crazy displays down to, apparently, the encoding format for spatial video. Too bad Apple seems to have forgotten that a technology is not a product, as even the most ardent Apple supporterts – like John Gruber, or the hosts of ATP – have stated their Vision Pro devices are lying unused, collecting dust, just months after launch.
The Apple fisheye seems to be a really smart solution to include more pixels of information in the same encoded video frame. Normal fisheye, for a frame of 4096×4096 has a total of 13Mpixels that are actually usable, while Apple fisheye encodes the same frame in 16,7Mpixels, giving you a better resolution. What it does really smart is increase the resolution especially for the areas of interest, such as the horizon line, and the nadir-zenith line that centrally intersects the horizon. So all those 3,7M extra pixels go particularly to the horizon and central vertical line, while keeping a similar resolution for the corners of the projected view.
When compared to equirectangular, it does a better job with the center of the image.