Quote:
Quote:
Why are you using the scalar instructions instead of the packed / parallel instructions?
I use the parallel instructions in most of the performance sensitive code, but not everything is suitable to be parallellized.
In this specific case, the compressor, each sample depends on the samples before it. So I cannot process the next sample until I've processed the current - that's why I cannot process samples in parallel (cannot is a big word, it might be possible to find some tricks or to parallellize parts of the code. I just haven't seen them yet.)
That makes sense to me, at least given my limited understanding of this type of low level stuff. I was just wondering if and how that relates to your question over on the Intel forum about portions of the code that won't vectorize.
fft_abs_sse2[2*cc] = max(fft_abs_sse2[2*cc], strength * m);
for (int c=0; c<f1; c++)
{
temp[2*c] *= one_DIV_bass_static_clip_level_dynamic;
temp[2*c+1] *= one_DIV_bass_static_clip_level_dynamic;
}