Stereo Tool

Feedback, questions, settings and more...
      Home         Download         Donate         Register         Forums         Help   
It is currently Tue Nov 21, 2017 3:34 am

All times are UTC+01:00




Post new topic  Reply to topic  [ 13 posts ]  Go to page 1 2 Next
Author Message
PostPosted: Thu Oct 12, 2017 8:59 pm 

Joined: Sat Nov 12, 2011 7:46 pm
Posts: 165
It's much more like a mind game now. At the moment, artificial intelligence and machine learning are widely spoken of. Almost every aspect of everyday life has something to do with it. It is just a hype, but it is already slightly in decline.

So I wanted to ask if it could be relevant for sound processors like Stereo Tool. The ideal idea would be to teach the program how to create the ideal sound using good and bad examples. It's the same everywhere in machine learning that you get the information you need and how you would like to feed it. That should work with this. But the "brain" would then become more and more complex, and of course more and more obscure. It can always come to strange phenomena that one has not considered before.

If the machine knows how it reacts in different situations, and of course (mostly) faster than a conventional algorithm, how does the machine know what it has to do internally. Do they then have to be taught what a compressor is, or does the machine itself learn from the examples?

Another interesting question apart from sound processors, but in everyday use of audio files. There are already examples of sound separation. This means teaching a machine that it recognizes different sources and tries to separate them. Here too, you teach the system what you want. However, this usually requires a lot of sample data. The basics for this are quite difficult to get. Let's take a concrete example: let's say we want to separate music from applause. Applause and music are represented at the same volume. But the applause is of course irregular. As a human being, this separation works, for example, because we can separate the rhythmic structures from the non-rhythmic applause. This works amazingly well. But how do you teach it to the machine? With conventional algorithms, there is the problem that rhythmic elements are also removed because they have a frequency response similar to the applause. you would have to recognize the rhythm in addition, so to speak. building a model is not really easy in this case.

so much about artificial intelligence and machine learning. I find it unbelievably exciting, and I hope that there will be some applications for it in the audio sector in the future. I think many things are possible, but there is a lack of a good user interface and an innovative system for it. somehow there are mainly Python scripts that you feed with something.

what's your opinion on that? will it be possible to benefit from this in this area one day? is the idea of separating applause and music just a wishful thinking, or is something like that possible at some point?


Top
   
PostPosted: Fri Oct 13, 2017 10:01 am 
User avatar

Joined: Tue Sep 15, 2015 12:22 pm
Posts: 82
I think there are two separate worlds here: "conventional" sound processing (like wat Stereo Tool does), and sound analysis/restoration (more what you would do once to a damaged recording). Of course Stereo Tool can do the latter as well (e.g. with declipper), but for these types of reconstruction it doesn't need to know what it is "hearing". It processes sound waves, not sounds.

In the first world a lot is depending on taste. A sound that I really like can be terrible to you and vice versa. So "how to create the ideal sound using good and bad examples" is already very difficult, because what is a good example and what is a bad example? Of course you could say that a song with terrible distortion is a bad example, but hey, some people like that as well ;)

Does the machine need to get taught what a compressor is? Well, it may learn what a compressor is on its own, but it will take a very, very long time. I don't think you can obtain a magic sound processing box by just giving it examples of what you like and what you don't like. It needs to have some background knowledge in order to do anything useful at all (at least to do something useful within your lifetime).

Sound separation could be interesting. I think Hans once said "if you can imagine how it should have sounded, you can program it" (freely translated). So is it possible? Probably. Will it sound perfect? Probably not. I'm also not sure if the type of AI you are talking about would help here. Just giving it examples of what you want may work, but as you already mention, it will need a lot of feedback before it does something useful. However, building a system that does exactly this (removing applause) with an algorithm that just does some smart measurements, isn't that AI as well? Where do you draw the line? Does AI have to be fully autonomous, or can it be human-steered?

_________________
Trust me. I'm an engineer.


Top
   
PostPosted: Fri Oct 13, 2017 10:17 am 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 9014
This may be a good place to share this great anecdote that I heard while studying Computer Science.

There are basically 2 ways of doing stuff like this: AI and NN. AI stands for Artificial Intelligence, and it means that you have to program all the rules in yourself. It basically means programming something that behaves "as if" it's intelligent, without actually being intelligent. NN is Neural Networks, which means imitating the human brain and letting it learn. For example, if every time you enter a kitchen you see a microwave, then in your brain the concepts of kitchen and microwave get a stronger and stronger link between them, so when someone says microwave you'll automatically imagine a kitchen (because the block of cells for 'microwave' is connected to those for 'kitchen' and they wake each other up when either gets active). This also explains how children learn to speak, when they are very little they will correctly say "I ate", once they learn that all past tense words are basically formed by adding -ed they may for some period say "I eated" until they learn that "ate" is an exception to the rule.

The US military wanted to develop a system - based on neural networks - that could analyse photos of a forest and detect tanks in it. So, they drove some tanks into a forest, made photos, removed them, made more photos, and devided them into a learning set and a control set, and let the software learn based on the learning set. After a while, it was detecting photos with tanks with 100% accuracy. So, they fed it the control set - and guess what? It was detecting them with 100% accuracy as well! Yeeeee, done.

So, as a final test, they went out and made some new photos and offered those. And the results were completely wrong. Eventually they found out that in their first set of photos, when the tanks were in the forest, the sun was shining, and after they took them out it was clouded. So, instead of detecting tanks, it was detecting whether the sun was shining.

I hope that this shows how difficult it will be to do stuff like this!


Top
   
PostPosted: Fri Oct 13, 2017 10:20 am 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 9014
Having said that, I *do* have ideas for things that would require neural networks :) But they will take quite a bit of effort (and knowledge that I don't have), so maybe I'll let some graduate student do some of those things at some point.


Top
   
PostPosted: Fri Oct 13, 2017 3:37 pm 

Joined: Sat Nov 12, 2011 7:46 pm
Posts: 165
Thank you very much for the detailed answers. This clearly shows once again that things that sound relatively simple in theory are sometimes very difficult to implement.

To come back to the example with the applause. This was of course only an extreme example, which has been buzzing around in my head for quite some time. Only my idea was the following. It is possible, as Hans has already described correctly, to apply such neural networks to images. That works out pretty well in part. For example, some scientists have succeeded in removing rain from pictures.

Here's the article about it:
https://theoutline.com/post/979/scienti ... e-learning

That seems to work quite impressively. So I thought it should work somehow with audio material too. I only did a little bit of research and then came across the ISSE program. This was apparently a research project that included Adobe:

http://isse.sourceforge.net/


The exciting thing about this project is that in a relatively simple user interface, a complex algorithm is used to separate two sources from each other. This works out very impressively.

During my experimental phase I came up with this demanding example. You could try to separate the applause from the music. The approaches in the ISSE are not bad either, and you get rid of such noises relatively well. But what it mainly fails is of course the small data set. If you had 1000 test data with applause and the music, that would be much better. But one would probably need not only the pure applause but also a combination of music and applause, as well as pure music. That's going to be a lot.

Since I found this program I find it very exciting. Unfortunately, there are very few user-friendly programs to learn neural networks correctly. I sometimes find that far too technical and mathematical, and I don't quite understand the connections. maybe there is already some kind of technology that makes this possible, but I can't use it because I don't understand the constructively behind it.

just asked again purely obligatory. if you want to teach a neural network something, how do you choose the database? I imagine it to be relatively complex, because it seems to depend on details, as Hans has described correctly, based on the tank example. for me as a layman it would be interesting to try something like this. but the hurdles seem to be relatively high if you don't know your way around Matlab or Python.


Top
   
PostPosted: Fri Oct 13, 2017 4:59 pm 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 9014
Interesting images (de-raining, lol!). Here you can find them in high quality: https://arxiv.org/pdf/1701.05957v1.pdf

Now, here's the thing. If I wanted to remove rain (and had time to write an algorithm for it), what I would probably do is search for gray stripes that are all in the same direction, and then try to filter them out, using a filter that removes stripes in that one direction. (In fact, certain propriatory image processing algorithms that have MASSIVE, beyond belief, effects on image quality work like that).

Now obviously, if you do that, details that have stripes in that same direction will be removed as well. And, as expected, if you look at the images and zoom in far enough, you can see exactly that. Compare the Input and CCR image on the top of page 8. If you look at the text on the shirt of the man, or at the bag that he's carrying, or at the logo in the top left of the screen, you can see exactly what I just described: They are all faded out, and the text on the shirt has become almost unreadable, while it is clearly readable in the original. Also check the shirt of the man in the distance on the left - HUGE difference! The splattering raindrops on the bottom are gone as well (although you could argue that that's rain so it should be gone ;) ).

I'm not saying that it's a bad effect - in fact it's quite impressive! I'm just not convinced that you couldn't get a similar result with a normal algorithm without training. And, implementing some algorithm that has a similar effect is relatively simple - not at all comparable with a complete audio processor. In the images, you can see that the learning algorithm didn't really notice that the amount of rain is usually constant over the entire image, so it's filtering out way too many details in some spots - more than would be needed to get rid of the rain. So, at least in that regard, a human-made algorithm could outperform this learned algorithm. Which might just mean that the data set wasn't big enough or something else like that, it also really depends on how the algorithm learned etc.

It's still very well possible that training is a better idea if it's faster! I don't know if it is, I didn't read the paper, just scanned through the images.


Top
   
PostPosted: Fri Oct 13, 2017 5:00 pm 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 9014
Actually, learning probably makes the most sense if you have no idea how to do something. ;)


Top
   
PostPosted: Sun Oct 15, 2017 12:02 pm 

Joined: Sat Nov 12, 2011 7:46 pm
Posts: 165
Hans, one more question about these topics directly related to StereoTool. Wouldn't it be possible that you could benefit from it for the AutoEQ feature? It should be possible to distinguish speech from music and then react accordingly. Thus, the optimization could be done in a much more specific way and with regard to the raw material. Or would that be interesting even with the Declipper and Dehummer or Natural Dynamics? If you can see the material that arrives, you should be able to restore it much better. What do you think of that?


Top
   
PostPosted: Tue Oct 17, 2017 9:54 am 
User avatar

Joined: Tue Sep 15, 2015 12:22 pm
Posts: 82
Also in that case it is very much the question if AI is the way to go. Detecting speech versus music is probably not that hard and much easier to implement directly compared to having a system learn what speech is and what not.

_________________
Trust me. I'm an engineer.


Top
   
PostPosted: Wed Oct 18, 2017 1:52 pm 
Site Admin
User avatar

Joined: Mon Mar 17, 2008 1:40 am
Posts: 9014
@near05: For Natural Dynamics it would help (because you really don't want ND to be active on that). For all the other things, not.

Having said that, I would rather improve the ND algorithm to not respond to speech (which should be possible, and it already doesn't do much with it) than make it depend on some detection that may or may not fail. There are other sounds that you also don't want ND to act on, and if there's music with vocals you also don't want ND to do anything to the vocals.

I actually like what Auto EQ does with voice right now.... are you hearing weird things?


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 13 posts ]  Go to page 1 2 Next

All times are UTC+01:00


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Limited