Maybe I was lucky in my tests because I used Ubuntu Studio, which is tuned for low latency audio performance and has lots of useful tools pre-installed and pre-configured.
Other than using Pipewire, a JACK-only solution could be possible:
Run jackd on your input soundcard
Run two instances of alsa-jack bridge to bring the two output cards into your JACK session.
Use two stereotool-jack binaries, each having an own distinctive file name
Use aj-snapshot after you have connected everything and store the connections
Then create systemd-services for each part (jack, the alsa bridges, stereootools and aj-snapshot).
This wil make everything fire up on boot and, my recommendation, set all the processes other than jackd to restart automatically on failure.
So for instance, if on ST instance crashes, systemd takes care of it and restarts after a couple of seconds. aj-snapshot will automagically reconnect all the ports.
The reason why I would aim for pipewire is that it simplifies the bridging of the additional audio devices. You don't have to care about two extra processes shoveling the audio back and forth between your jack session and the audio devices. The jack docs explain in detail why this bridging is needed (in short: multiple sound cards WILL run out of sync, which in a digital audio domain will cause audible clicks because sampling rates drift. The bridges do a resampling, much like a wire between your audio cards. But that comes at a cost, ie. CPU power and latency).
Finally, I think you are on the right track with your setup idea.
You might want to tweak the setup by only using a single audio device with enough i/o to stay within the clock domain of that device. Much less to worry and less to fail. Like the RME HDSP AIO:
https://www.rme-audio.de/de_hdspe-aio.html with the expansion to give you more analogue outputs.
I can say from experience that multiple instances of ST in a Jack environment have always worked fine for me as long as there was plenty CPU power available.