I've been Googeling a bit and I now at least know the cause of the artifacts at lower latencies.
Well I did already know it, but now I know what it's called:
time aliasing. Basically, if I have a short piece of audio and filter it, the audio can 'spread out' a bit. During processing I treat a single block of audio as if it repeats infinitely. So if audio spreads out, it can move into the next or previous repetition, hence 'time aliasing'.
The solution is simple: Add some silence around each block of audio, and then also merge the audio in that silence block back into the end result.
Unfortunately, that would at least double the CPU load, and I'm still thinking what effect it would have on latency (probably none, as I need everything outside the audio window to be '0' after Final Limiter/Loudness anyway).
I also found one solution to reduce the effect as much as possible - but it turns out that that was exactly what I was already doing........
