On Sonic 2 Compression
Over twenty years ago, now, I played a small role in the Sonic the Hedgehog 2 hacking community by reverse engineering the level data compression algorithm used in the game. I figured it was about time I wrote my side of that story, so, here it is!
Obviously I do a Google search for my own name from time to time because, admit it, everyone does it. Yeah, you too. Don’t give me that look. We both know it’s true.
So it was that a while back I was perusing those search results, and I came across something that honestly left me incredibly chagrined: a page about something called “Kosinski” compression. Clicking the link, I realized the page was about a project I worked on way back in the early aughts to reverse engineer the algorithm used to compress level data in the Sonic the Hedgehog 2 ROM1.
And somewhere along the way, apparently that format got named after me.
Now I want to be extremely clear about something: I did not invent this algorithm or the in-ROM data storage layout, and I certainly didn’t name it after myself. All I did was uncover what was already there and write it down in my own weird way. But, once you put something out in the world, it can take on a life of it’s own, and so now this algorithm has my name on it. Oops!2
So naturally I shared this little factoid with some friends, and we all had a laugh. And that was that.
But then, in 2022, a gentleman by the name of Damian Grove reached out. Way back in the day, Damian created a site called the Sonic Hacking Community (SHaC), where he collected a whole raft of information about the Sonic 2 ROM layout in a site he called the Sonic 2 Hacking Guide. In reaching out, Damian was hoping I might answer some questions about the reverse engineering work I did, and so I did what I often do: forgot to reply until months later. Meanwhile, Damian himself didn’t see that reply when it was eventually sent, and so we never connected.
Fast forward, finally, to 2024 and Damian finally spots my email and a) responds with a couple of questions, while also b) including a link to a video from the 2023 Retro World Expo of a panel discussion of the history of Sonic 2 hacking. A video in which I come up as a bit of a topic of conversation.
Now, just to get this out of the way: while I was certainly a bit bemused by the conversation in that video, it was nothing but curious and respectful and I had absolutely no issues being talked about. But I gotta admit, it was more than a little weird to see people speculating about me on stage!
In that conversation I noticed a few misapprehensions about my own background and history and the approach I took to reverse engineer the algorithm, so I figured I’d set the record straight and provide my own vague recollections about how it all went down!
A bit of background
First, I gotta point out that these events all took place a little over twenty years ago, and anyone who knows me knows my autobiographical memory leaves something to be desired. So I’m not going to claim everything here is entirely from crystal clear memory. But, I will do my best, and make up the rest!
At the time I started poking around at the Sonic 2 ROM, I’d already had a keen interest in console games, emulation, and ROM hacking in general. While I don’t remember which one I saw first, I certainly ran across my fair share of game hacking communities. In these communities, people would take an existing game, explore the data and code in that game, and figure out how to modify it, changing graphics, level layouts, or game logic. In a lot of cases this work was done via the hard work of manual reverse engineering, while occasionally these communities were directly supported by the game developers, as was the case with, for example, Doom, which pioneered user-generated game content.
And so I couldn’t help but wonder if it was possible to modify Sonic 2.
A little poking around on the internet, and I found Damian Grove (aka saxman) and his Sonic 2 Hacking guide, which contained all kinds of information about the game, both how data and code was structured in-ROM and also how it was organized in memory during gameplay.
Unfortunately, one crucial bit of information was missing: while the location of the level layout data was known, the structure of it was not. As Damian noted in a comment from 2022:
while I did figure out the format in savestates/RAM, and I figured out where the floor layout was in the ROM, I failed miserably at understanding the compression scheme.
Being a young twenty-something with time on his hands, I decided this was a problem I wanted to take a crack at solving.
Attacking the problem
At the time I was on a sixteen month internship between my third and fourth year of my bachelors degree in Computing Science, and so unlike school, I had a remarkable amount of free time on my hands.
So I began by poking at the ROM and eyeballing the data to see if a structure would reveal itself. But it didn’t take too long for me to conclude that approach simply wasn’t going to work: As Damian had already discovered, modifying bytes in the level data produced seemingly random results. It was clear the data was encoded, but it was not clear how.
In the end I decided the only way I’d figure out the algorithm was to reverse engineer the actual code in the game.
Now, I was at a bit of a disadvantage, here, in that I had, and still do not have, any formal education in compression algorithms.3 However, what I did have was experience with 68k assembler, thanks to one of my university courses, and a lot of youthful persistence.
But I immediately faced a challenge: how would I find the code that decoded the level layout data?
Open source for the win
The solution actually begins with Damian’s work: while I didn’t know the location of the code that implemented the decompression algorithm, I knew the parts of video memory that algorithm must be writing to, thanks to Damian’s work reverse engineering the VRAM layout.
So what I needed to do was stop execution of the game, and ideally drop into a debugger or something, the moment that video memory was written to, as the code that was executing at that time would necessarily be part of the decoding algorithm.
That’s when I turned to DGen, which was an early, and critically, open source Sega Genesis emulator that happened to included a full blown debugger.
While the debugger didn’t have the ability to set watchpoints (i.e. a breakpoint that triggers when a specific memory location is written to) on VRAM writes, because the code was open source, I could just implement that feature myself. Once I’d built that feature, I was then able to quickly find the code for the decompression algorithm.
After that it was a matter of good ol’ elbow grease as I single stepped through 68k assembly and manually decompiled the decoder implementation into pseudocode.
And from there, it was pretty straight forward to write a (really bad) implementation of a compressor, and voila, it became possible to build a true level editor for the game!
The aftermath
As a proof of concept, I then spent a couple of months building a proof-of-concept level editor called Chaos that could load the in-game art and level data, render it, and then allow the user to edit it, all through a simple little GUI.
Damian also reminded me that, as part of that early work, I also restored the Hidden Palace Zone–a level present in the early Sonic 2 beta cartridges but removed from the final game–back in Sonic 2. I’d completely forgotten about that detour…
Of course, as with most of my projects, I got the thing to some basic level of viability and then kinda dropped off the face of the planet as far as the Sonic hacking community goes. In hindsight, I suspect this was in part because right around that time Nortel started truly collapsing around us, with half of my team eventually getting laid off. Then a couple of months after that I was back in Edmonton and panicking my way through the last year of my degree.
But if I’m being totally honest, this is often what I do: jump in and attack an interesting problem deeply and intently for a short period of time, solve the really interesting and hard bits, and then promptly drop the idea and move on.
Incidentally, around that same time I also built a working (if slow) Sega Master System emulator for the Gameboy Advance, and began poking around with a fancy new console, the Nintendo DS, which would eventually lead me to building savsender and NetHackDS. I’ve always had rather weird hobby projects…
The upshot
So that’s it! That’s how I did it. As always, no achievement is done in isolation, and I could never have reverse engineered that algorithm without Damian’s work pointing the way, the various authors of DGen for supplying the crucial debugger that made it possible to reverse engineer the code in the ROM, and the immense patience of my then-girlfriend Lenore, who I discovered recently had no idea this is what I was doing on those many late nights with the glow of a monitor on my face.
And of course there would be no algorithm to reverse engineer if not for the work of some unnamed coders working for Sega who helped create one of the finest games of all time!
Getting a bit philosophical
In the end, for me, this all really reinforces how much we might underestimate the ripples that we create in the world. Whether it’s something as simple as being kind to your barista in the morning, or offering a kind word to someone having a tough day, it’s so incredibly easy for us to lose sight of the impact we have on one another.
Moreover, those ripples can carry on much farther than we might expect.
After all, from my perspective, this whole project was a random detour down a very deep rabbit hole that I eventually crawled out of a couple of months later and kind of forgot about. Meanwhile, for others, it had a real impact that still gets discussed 20 years later, including the odd panel discussion where my name is mentioned!
So a smile and a wave to the Sonic hacking community. While I didn’t stay long, I enjoyed the little bit of it that I experienced, and I’m glad I could play a small part in your story.
-
If you’re curious, I have my own writeup here. ↩
-
I’d love to know the name of the person who actually wrote the implementation used in Sonic 2 (and, it turns out, a few other places), so we can give credit where credit is due. ↩
-
Which should be obvious to anyone who reads my little write-up on the format, as it’s basically just a bunch of made-up lingo describing what, it turns out, was a variant of LZSS, a fact I did not learn until much much later when someone who actually understands this stuff noticed. ↩