Thinking about it twice, while separating the collision plane from the representational one requires a a bit of restructuring in the underlying template, there is actually one more method which should be more easy to hack on top of the pre-existing one. It is not just as efficient. I'll get to that.
NESmaker uses a tile grid where you check for coarse collisions, just like zelda or SMB or whatever. Now, you've got plenty of tiles. You can use some of those to call a narrow-phase collision routine.
Some games, especially games that use curved slopes etc, has two phases of colission detection. If a "block" is wholly comprised of solid or nonsolid, you can stay in broad phase collision, which is efficient. When you step on a tile that is supposed to have pixel for pixel detection, you step into the narrow phase, which calculates a high-precision collision.
The crux is that you need to be mindful of how many actors that need to have that fine detection degree simultaneously. You can most likely get away with one or two, and probably three, maybe 4, depending on the weight of the rest of your/nesmakers' code.
Other drawbacks are there too: You want to limit the amount of geometrical shape types (i'd guess 2 per angle) and maybe a round shape for pillars, or else you wind up with a *lot* of collision tile definitions. Each narrow phase collision map requires 16 or 32 bytes for a 16x16 pixel area if you bitpack them, and depending on whether you can get away with a halved collision resolution on the x-axis (which seems entirely plausible for an isometric game). 16 or 32 bytes doesn't seem like a lot, but consider that multiplied by each shape, and that's not the end of it. Your map data size may need to expand if you need too many special collision tiles, which is worse.
If geometry is kept simple (ie straight isometric lines only), it might not be too heavy for the game to compute in good time without using any stored colission data at all except where the line of collision is supposed to be, so that would at least free up all the individual collision micro maps.
So just like with everything else on the nes, it is a tradeoff between what features you want to premiere at the expense of others. But at least this option is more straightforward to jigsaw into the existing rpg/adventure module.
The previously mentioned method (while doable) requires you to separate physics and representation logic, which also changes the object size/structure... not bad at all, but it's a bit more surgical in nature and so it'd depend on how ones’ willingness to carefully modify this + how conveniently written NM is with this type of modicifation in mind.