Questions about approach for background music generation for songs [closed]

Questions about approach for background music generation for songs [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a project proposal for music lovers who have no knowledge in audio processing. I think the project is interesting, but I don't have clear picture on how to implement it.
The project proposal: Some people like singing, but they cannot find appropriate musical accompaniment (background music). People who can play guitar, they may sing with playing guitar (the rhythm provided by guitar is background music). The project is to achieve the similar result like playing guitar for people singing.
I think to implement this project, the following components are required:
Musical knowledge (how guitar plays as background music (maybe simple pattern will work))
signal/audio processing
Key detection
Beat detection
Chord matching
Is there any other component I missed to achieve my purpose? Any libraries can help me? The project is supposed to be completed in 1.5 month. Is it possible? (I just expect it to work like guitar beginners playing background music). For development languages, I will not use c/c++. Currently my favorite is python, but possibly use other programming language as long as it can help simplify the implementation process.
I have no musical background and just studies very basic audio processing. Any suggestions or comments are apprietiated.
Edited Information:
I tried to search auto accompaniment, and there are some software. I didn't find any open source project for it, I want to know the details on how it process audio information. If you know any open source project about it , please share your knowledge, thank you.

You might start by considering what a guitarist would have to do to successfully accompany a singer singing in a situation where that they have no prior knowledge of the key, chord progression, or rhythm of the song (not to mention its structure, style, etc.)
Doing this in real-time in a situation where the accompanist (human or computer) has not heard the song before will be difficult, as it will take some time to analyse what's being sung in order to make appropriate musical choices about the accompaniment. A guitarist or other musician having this ability in the real world would be considered highly skilled.
It sounds like a very challenging project for 1.5 months if you have no musical background. 'maybe simple pattern will work' - maybe, but there are a huge number of simple patterns possible!
Less ambitious projects might be:
record a whole song and analyse it, then render a backing (still a
lot of work!)
to create a single harmony line or part, in the same
way that vocal harmoniser effects do
generating a backing based on a
chord progression input by the user
Edit in reply to your first comment:
If you wanted to generate a full accompaniment, you will need to (as you say) deal with both the key and chord progression, and the timing (including time signature and detecting which beat of the bar is 'beat 1')
Getting this level of timing information this may be difficult, as beat detection from voice only is not going to be possible using the standard techniques used to get beat from a song (looking for amplitude peaks in certain frequency ranges).
You might still get good results by not caculating timing at all, and simply playing your chords in time with the start of the sung notes (or a subset of them).
All you would then need to do is
detect the notes. This post is about detecting pitch in python: Python frequency detection. Amplitude detection is more straightforward.
come up with an algorithm for working out the root note of the piece (and - more ambitiously - places where it changes). In some cases it may be hard to discern from the melody alone. You could start by assuming that the first note or most common note is the root.
come up with an algorithm for generating a chord progression (do a web search for 'harmonising a melody'). Obviously there is no objectively right or wrong way to do this and you will likely only be able to do this convincingly for a limited range of styles. You might want to start by assuming a limited subset of chords, e.g. I, IV, V. These should work on most simple 'nursery rhyme' style tunes.
Of course if you limit yourself to simple tunes that start on beat one, you might have an easier time working out the time signature. In general I think your route to success will be to try to deal with the easy cases first and then build on that.

Related

How to match image based on border of image?

This is more of a 'what is this called' kind of question than a technical one. I have recently started playing with PyAutoGUI and I am using it to do some automation. In order to improve the speed of the overall function I am trying to narrow down the 'region' in which its looking. How would I identify a region by looking for a specific "border" ignoring the internal contents. I don't really need any code, unless your just that bored, just trying to learn what techniques are available to accomplish this task or maybe some helpful keywords that I can use in my search. I am having a very difficult time finding any resources that relate to my objective.
For example, how would I match the entire dimensions of the following picture regardless of what is inside the frame.

Recognize start of piano music in an MP3 file which starts with a spoken introduction, and remove spoken part, using Python

I have a number of .mp3 files which all start with a short voice introduction followed by piano music. I would like to remove the voice part and just be left with the piano part, preferably using a Python script. The voice part is of variable length, ie I cannot use ffmpeg to remove a fixed number of seconds from the start of each file.
Is there a way of detecting the start of the piano part and then know how many seconds to remove using ffmpeg or even using Python itself?.
Thank you

This is a non-trivial problem if you want a good outcome.
Quick and dirty solutions would involve inferred parameters like:
"there's usually 15 seconds of no or low-db audio between the speaker and the piano"
"there's usually not 15 seconds of no or low-db audio in the middle of the piano piece"
and then use those parameters to try to get something "good enough" using audio analysis libraries.
I suspect you'll be disappointed with that approach given that I can think of many piano pieces with long pauses and this reads like a classic ML problem.
The best solution here is to use ML with a classification model and a large data set. Here's a walk-through that might help you get started. However, this isn't going to be a few minutes of coding. This is a typical ML task that will involve collecting and tagging lots of data (or having access to pre-tagged data), building a ML pipeline, training a neural net, and so forth.
Here's another link that may be helpful. He's using a pretrained model to reduce the amount of data required to get started, but you're still going to put in quite a bit of work to get this going.

Using a Decision Tree to build a Recommendations Application

First of all, my apologies if I am not following some of the best practices of this site, as you will see, my home is mostly MSE (math stack exchange).
I am currently working on a project where I build a vacation recommendation system. The initial idea was somewhat akin to 20 questions: We ask the user certain questions, such as "Do you like museums?", "Do you like architecture", "Do you like nightlife" etc., and then based on these answers decide for the user their best vacation destination. We answer these questions based on keywords scraped from websites, and the decision tree we would implement would allow us to effectively determine the next question to ask a user. However, we are having some difficulties with the implementation. Some examples of our difficulties are as follows:
There are issues with granularity of questions. For example, to say that a city is good for "nature-lovers" is great, but this does not mean much. Nature could involve say, hot, sunny and wet vacations for some, whereas for others, nature could involve a brisk hike in cool woods. Fortunately, the API we are currently using provides us with a list of attractions in a city, down to a fairly granular level (for example, it distinguishes between different watersport activities such as jet skiing, or white water rafting). My question is: do we need to create some sort of hiearchy like:
nature-> (Ocean,Mountain,Plains) (Mountain->Hiking,Skiing,...)
or would it be best to simply include the bottom level results (the activities themselves) and just ask questions regarding those? I only ask because I am unfamiliar with exactly how the classification is done and the final output produced. Is there a better sort of structure that should be used?
Thank you very much for your help.

I think using a decision tree is a great idea for this problem. It might be an idea to group your granular activities, and for the "nature lovers" category list a number of different climate types: Dry and sunny, coastal, forests, etc and have subcategories within them.
For the activities, you could make a category called watersports, sightseeing, etc. It sounds like your dataset is more granular than you want your decision tree to be, but you can just keep dividing that granularity down into more categories on the tree until you reach a level you're happy with. It might be an idea to include images too, of each place and activity. Maybe even without descriptive text.

Bins and sub bins are a good idea, as is the nature, ocean_nature thing.
I was thinking more about your problem last night, TripAdvisor would be a good idea. What I would do is, take the top 10 items in trip advisor and categorize them by type.
Or, maybe your tree narrows it down to 10 cities. You would rank those cities according to popularity or distance from the user.
I’m not sure how to decide which city would be best for watersports, etc. You could even have cities pay to be top of the list.

Matching a Pattern in a Region in Sikuli is very slow

I am automating a computer game using Sikuli as a hobby project and to hopefully get good enough to make scripts to help me at my job. In a certain small region, (20x20 pixels) one of 15 characters will appear. Right now I have these 15 images defined as variables, and then using an if, elif loop I am doing Region.exists(). If one of my images is present in the region, I assign a variable the appropriate value.
I am doing this for two areas on the screen and then based on the combination of characters the script clicks appropriately.
The problem right now is that to run the 15 if statements is taking approximately 10 seconds. I was hoping to do this recognition in closer to 1 second.
These are just text characters but the OCR feature was not reading them reliably and I wanted close to 100% accuracy.
Is this an appropriate way to do OCR? Is there a better way you guys can recommend? I haven't done much coding in the last 3 years so I am wondering if OCR has improved and if Sikuli is still even a relevant program. Seeing as this is just a hobby project I am hoping to stick to free solutions.

Sikuli operates by scanning a Screen or a part of a screen and attempting to match a set pattern. Naturally, the smaller the pattern is, the more time it will consume to match it. There few ways to improve the detection time:
Region and Pattern manipulation (bound region size)
Functions settings (reduce minimum wait time)
Configuration (amend scan rate)
I have described the issue in some more detail here.
OCR is still quite unreliable. There are ways to improve that but if you only have a limited set of characters, I reckon you will be better off using them as patterns. It will be quicker and more reliable.
As of Sikuli itself, the tool is under active development and is still relevant if it helps you to solve your problem.

Count the number of people in the video

I am working on image processing and computer vision project. The project is to count the number of people entering the conference. This need to done in OpenCV or Python.
I have already tried the Haar Cascade that is available in OpenCV for Upper body: Detect upper body portion using OpenCV
However, it does not address the requirement. The link of the videos is as follows:
https://drive.google.com/open?id=0B3LatSCwKo2benZyVXhKLXV6R0U
If you view the sample1 file, at 0:16 secs a person is entering the room, that would always be the way. The camera is on top of the door.

Identifying People from this Aerial Video Stream
I think there is a simple way of approaching this problem. Background subtraction methods for detecting moving objects are just what you need because the video you provided seems to only have one moving object at any point: the person walking through the door. Thus, if you follow this tutorial in Python, you should be able to implement a satisfying solution for your problem.
Counting People Entering / Exiting
Now, the first question that pops to my mind is what might I do to count if multiple people are walking through the door at separate time intervals (one person walks in 10 seconds into the video and a second person walks in 20 seconds into the video)? Here's the simplest solution to this consideration that I can think of. Once you've detected the blob(s) via background subtraction, you only have to track the blob until it goes off the frame. Once it leaves the frame, the next blob you detect must be a new person entering the room and thus you can continue counting. If you aren't familiar with how to track objects once they have been detected, give this tutorial a read. In this manner, you'd avoid counting the same blob (i.e., the same person) entering too many times.
The Difficulties in Processing Complex Dynamic Environments
If you think that there is a high level of traffic through that doorway, then the problem becomes much more difficult. This is because in that case there may not be much stationary background to subtract at any given moment, and further there may be a lot of overlap between detected blobs. There is a lot of active research in the area of autonomous pedestrian tracking and identification - so, in short, it's a difficult question that doesn't have a straightforward easy-to-implement solution. However, if you're interested in reading about some of the potential approaches you could take to solving these more challenging problems in pedestrian detection from an aerial view, I'd recommend reading the answers to this question.
I hope this helps, good luck coding!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.