Chinese SoC update: I likely found the thing that caused all the little inconsistencies and weirdnesses that I encountered, and it was my own assumptions being mainly a software dev. Lemme explain.
When I started reverse engineering, I went by the SPI trace. That gave me jump and call instructions, as they caused the program counter to change and as such the SPI reads to continue on different addresses. The first I encountered looked like this:
04 xx xx xx
and I could see that xx was a relative displacement to the current PC; you'd have to add xx to the current PC to get where you want. So I'm like 'Cool, 07 is the opcode field, and 4 means call'.
Then later, I found a call that called all the way back to the beginning of the flash, but strangely there was no trace of the subroutine called; execution simply continued under the call. It looked like this:
07 xx xx xx
and I was 'Hmm, no execution... perhaps 07 is a conditional call?' Little did I know at that time that the beginning of flash is copied to RAM, and another explanation would have been that routines in RAM don't leave a SPI call trace.
So I continued. I found two types of unconditional jump instructions; I shrugged, thinking I'd probably find the difference later. I found the ALU and memory access instructions, and they looked e.g. like this:
9c yz xx xx = add xx to register (y/2) and store to register z.
And I went 'Hmm, weird that y is shifted by 2, but that leaves it only 3 bits before it bumps into the opcode field, so there must be only 2^3=8 registers.' That also leaves a gap of two bits between the register fields, but those may be flags or something?
The facepalm moment came when I wanted to write my own code, and it didn't work. It was a simple hook: replace a call with a jump to an unused memory address, then do the call from there. Later you move the call down and insert your own code before it. It didn't work. Maybe I used the wrong call? What's the difference between them anyway?
And then it occurred to me. Calls with opcode 0x07 only ever called a memory address *before* the call. Calls with opcode 0x04 only ever called a memory address *after* a call. This thing is made by a digital designer. Why would they use 8 bits for an opcode field, surely not to make the opcodes easy to read in hex? The op field is actually *6* bits, and the 2 lsbs of what I though were the op field were part of the (26-bit) displacement.
That also means that in the calls that use registers, the registers could be 5 bits, meaning 32 registers rather than the 8 I thought were there. That would clear up a lot of things. That also clears up the >8 pushes that some functions do on entry. That clears up the fact that I have multiple ld32 instructions that don't seem to do anything differently. That means that my conditional jump situation goes down from a forest of options to 'jump if last compare returned true' and 'jump if last compare returned false'.
Sorry, I'm off, I have a Ghidra cpu spec file to entirely rewrite.