Puyo Puyo Tsu/Software Architecture
Puyo Puyo Tsu has an interesting software architecture, which develops many programming concepts which are comparable to modern paradigms, such as callbacks functions that are somewhat similar to object methods in object-oriented programming, memory management through a custom allocator, or CPU time yielding not much different from modern multitasking.
Here are the main concepts needed to understand how the game works between frame updates.
This behaviour directly impacts how to consider frame timings, ticks and input handling.
Overview
The main code flow of the game consists of a main loop that iterates once per video frame.
This loop can be interrupted at any moment by an external signal, an interrupt, which can only be triggered by the console's VDP (video) chip.
When interrupted, the CPU jumps to an interrupt handler routine, which is required to be a short one. The longer it is, the more likely a new interrupt will occur during its own execution, ending up in a fatal machine deadlock.
Puyo Puyo Tsu truly makes use of only one interrupt: the vertical blanking interrupt, VINT, which is triggered every time a frame has been fully drawn.
As there is not enough time to perform all the calculations between frames (the CPU would have bursts of activity between frames, and would idle while they are being drawn), rendering is asynchronous: the CPU computes data to be displayed by the VDP on the next frame, while it is drawing the previous one.
All the trick is to prepare the next frame while the previous one is drawing, then trigger a VDP state update during vertical blanking, while reading necessary information to update the frame, to perform the calculation while the next frame is being drawn.
Consequently, during VINT, Puyo Puyo Tsu performs the VDP updates it buffered during the previous frame, updates the sound processor with the next sample chunk to play, then reads the gamepads. That gamepad readout is all that is needed to be able to compute the next frame.
After VINT execution, the main loop resumes its execution, and the game routines are called in sequence to make a check if the falling pair is blocked at the bottom, place it on the board, resolve chaining, check all clear, compute the score, etc.
These routines are not explicitely called by the main loop. Instead, the game allocates in-memory "objects" representing various items of the game: board, falling puyos, player status, menu... To each object is attached a callback rountine: one of the object fields references a routine address that shall be run on itself, the same way you can call object methods in C++.
The main loop calls a routine that will simply loop through all the objects in memory, and call the callback routine for each of them.
Thus, during a battle, the game has an object representing the falling pair (actually, two objects). The callback routine attached to the falling pair checks the active gamepad inputs, performs lateral movement, rotation and soft drop according to them, then checks if the pair should be locked on the board.
As there are two such objects, one for each player, the routine is called twice per frame, once per player. The same occurs for freefalling puyos: their callback routine is one that updates their on-screen coordinates according to gravity acceleration.
Player status has callback routines that handle the main chaining logic between pair placements.
Puyo Puyo Tsu implements a clever yielding mechanism to prevent CPU hogging (monopolizing CPU time, i.e. 100% usage, a.k.a. crashed process, infinite loop...), ensuring all the game logic fits within the available time slot between frame updates and to resume the callback execution at predefined checkpoints. That means the callbacks don't follow through the end of their code every time, but they are abruptly stopped and resumed from one frame to the next. This way, the game only execute the part handling rotation while the pair is not locked, and skips right after that (i.e. to a new checkpoint) when the pair is locked on the board, as inputs are no longer needed/possible.
See it as a plane approaching an airport for landing: the plane circles at a parking altitude and gradually comes down as air traffic control gives it the green light, and it won't come up again. Yielding makes the callback routine repeat its current step until it received the green light to go on to the next step: from pair falling to pair splitting, to freefall, to board placement, and chain resolution.
This "checkpoint/yield" mechanism is what enables two instances of the same callback routine to be run for the two players, each being at a different step than the other (say P1 is busy soft dropping his pair while P2 has split his pair and a puyo is currently free falling).
From a programmer's point of view, their solution is quite elegant as the game simply "happens" by itself: the main game routine has no clue about what is going on, it blindly runs callback on objects. It simply happens that those objects are falling puyos, but they could be menu entries as well.
Hence, transitions from menus to battle are done by routines that cleanup objects and create new ones with the corresponding callbacks and initial values. Once the cleanup is done, the game happens to be in the state corresponding to a battle, menu, or whatever animation.
The next sections detail how these mechanisms are implemented in the game code.
Vertical interrupt (VINT, vertical blanking)
The vertical blanking interrupt routine handler is pretty short and straightforward :
After saving the CPU registers, it tries to talk with the VDP (to send the next frame update), then reads the gamepad status to make the new inputs available to the game through the readout routine, performs a bunch of stuff that doesn't need to be explained here, and also updates the sound processor with the chunks it should play through the next frame.
Note the game only tries to update the state of the VDP, because VINT may occur faster than it has predicted. Frame skipping then occurs, which results in animations taking longer or shorter to complete.
However, the game logic ensures a deterministic number of frame passes when completing a specific action (accounting for rotation, lateral movement, etc.). Thus, it may not look like it, but every action takes the same amount of time to complete, while visual feedback (animation) is asynchronous and may skip frames to catch up.
Main loop
Here's the main loop of the game code:
It may look complicated, but only performs a few basic actions, in between video palette updates. The first thing done during a main loop iteration is to wait for the frame to be drawn (sub_376), so that no other line is executed twice per frame. This prevents the game from handling your inputs twice in a row, or prevents it to skip frames from a game logic standpoint.
It also handles pausing / resuming the game, handles default menu transitions, introduction animations, then runs all the callbacks from the objects in memory (call to run_callbacks). This call thus occurs only once per frame.
After a few more uninteresting tasks, the iteration is complete.
Memory heap and allocation
A memory pool starting at 0xFFD000 can contain up to 128 objects (128 available slots) of 64 bytes each. Of this pool, 60 slots are managed by the memory allocator, others are initialized on the go by various game routines.
The following screenshot shows what the pool looks like at the beginning of a battle:
The first slot is unused, while the next 9 contain various objects (structures), each with a callback address declared; that's the address of the routine that run_callbacks will execute. It is also apparent that the same structures share the same callback address: indeed, at that point in the game, the players are in the same "state", so their falling puyos are handled by the same callback routine, at the same checkpoint through their execution. If a player locks a pair before the other, the callback address will advance to the next checkpoint.
The player status object has a different callback, which handles the game logic between pair placements. It constantly checks whether the falling pair is locked before advancing to the next checkpoint, then returning to that "waiting" point.
Callbacks
As expected, the run_callbacks is a basic loop from addresses FFD000 onwards. The routine checks whether the object is flagged for callbacks (compares the value in the first 2 bytes), then runs the corresponding callback at 0089D0 (jmp (a1)):
There are two optional final callbacks at the end of the routine, that are set through global variables. Only one of them is used during battle, and handles end game transition to the menu, victory sound effect playback, and some other stuff.
Execution yield
Now on to the neat programming tricks, the yield functions are split in two. Actually, the yielding mechanism allows the game to set a predefined number of times the callback should restart from a checkpoint before being allowed to advance to the next one. This can be used to force the game to spend a precise amount of frames at a specific check, or in a given state (say, input shall be locked for x frames).
The premature_yield_reset() routine sets a per-object counter to the value given in argument (d0):
premature_yield_or_end() checks that counter and decrements it. If the counter is null, the callback advances through this checkpoint and will skip to right after the call to the yield routine at the next iteration of the game's main loop. If the counter wasn't zero yet, the routine terminates the callback execution after decrementing it, while keeping track of that checkpoint. The next execution of the callback will thus resume right after the last call to premature_yield_or_end().
The neat programming trick resides in how this yielding/checkpoint is done:
- the callback is called by run_callbacks() by a jmp instruction, which pushes it's return address (the address of the instruction after the jmp) onto the program stack. It means that when the callback executes the rts instruction, the first 4-byte value on the stack is popped and stored in the program counter register (PC), effectively making the CPU jump to the instruction in run_callback() right after the jump.
- the yield routine is also called through a jmp instruction, from the callback. If it were to execute rts, the PC would be restored to a value pointing inside the callback routine. That's what happens when the counter value is not zero, at 008AFA.
- when the counter hits zero, the yield routine pops the first 4 byte value of the stack into the memory location 2(a0). That value is the return address of the yield routine: the address of the very next instruction after its call. 2(a0) is the memory location of the callback address.
The trick done here is the yield routine has just replaced the object's callback address to "advance" it forward. But the next rts instruction cannot restore the correct return address to return to the callback anymore! Indeed, the last value pushed onto the stack was the return address from run_callbacks() when it called our callback itself. This makes the callback prematurely end and return directly to the run_callbacks() routine, while skipping to this "checkpoint" the next time run_callbacks() handles this object.