Analyze just Pretty_Midi Instruments - python

Trying to figure out a good way of solving this problem but wanted to ask for the best way of doing this.
In my project, I am looking at multiple instrument note pairs for a neural network. The only problem is that there are multiple instruments with the same name and just because they have the same name doesn't mean that they are the same instrument 100% of the time. (It should be but I want to be sure.)
I personally would like to analyze the instrument itself (like metadata on just the instrument in question) and not the notes associated with it. Is that possible?
I should also mention that I am using pretty-midi to collect the musical instruments.

In MIDI files, bank and program numbers uniquely identity instruments.
In General MIDI, drums are on channel 10 (and, in theory, should not use a Program Change message).
In GM2/GS/XG, the defaults for drums are the same, but can be changed with bank select messages.

Related

How to classify unseen text data?

I am training an text classifier for addresses such that if given sentence is an address or not.
Sentence examples :-
(1) Mirdiff City Centre, DUBAI United Arab Emirates
(2) Ultron Inc. <numb> Toledo Beach Rd #1189 La Salle, MI <numb>
(3) Avenger - HEAD OFFICE P.O. Box <numb> India
As addresses can be of n types it's very difficult to make such classifier. Is there any pre-trained model or database for the same or any other non ML way.
As mentioned earlier, verifying that an address is valid - is probably better formalized as an information retrieval problem rather than a machine learning problem. (e.g. using a service).
However, from the examples you gave, it seems like you have several entity types that reoccur, such as organizations and locations.
I'd recommend enriching the data with a NER, such a spacy, and use the entity types for either a feature or a rule.
Note that named-entity recognizers rely more on context than the typical bag-of-words classifier, and are usually more robust to unseen data.
When I did this the last time the problem was very hard, esp. since I had international adresses and the variation across countries is enormous. Add to that the variation added by people and the problem becomes quite hard even for humans.
I finally build a heuristic (contains it some like PO BOX, a likely country name (grep wikipedia), maybe city names) and then threw every remaining maybe address into the google maps API. GM is quite good a recognizing adresses, but even that will have false positives, so manual checking will most likely be needed.
I did not use ML because my adress db was "large" but not large enough for training, esp. we lacked labeled training data.
As you are asking for recommendation for literature (btw this question is probably to broad for this place), I can recommend you two links:
https://www.reddit.com/r/datasets/comments/4jz7og/how_to_get_a_large_at_least_100k_postal_address/
https://www.red-gate.com/products/sql-development/sql-data-generator/
https://openaddresses.io/
You need to build a labeled data as #Christian Sauer has already mentioned, where you have examples with adresses. And probably you need to make false data with wrong adresses as well! So for example you have to make sentences with only telephone numbers or whatever. But in anyway this will be a quite disbalanced dataset, as you will have a lot of correct adresses and only a few which are not adresses. In total you would need around 1000 examples to have a starting point for it.
Other option is to identify the basic adresses manually and do a similarity analysis to identify the sentences which are clostet to it.
As mentioned by Uri Goren, the problem is of Named entity recognition, while there are a lot of trained models in the market. Still, the best one cant get is the Stanford NER.
https://nlp.stanford.edu/software/CRF-NER.shtml
It is a conditional random field NER. It is available in java.
If you are looking for a python implementation of the same. Have a look at:
How to install and invoke Stanford NERTagger?
Here you can gather info from a multiple sequence of tags like
, , or any other sequence like that. If it doesn't give you the correct stuff, it will still somehow get you closer to any address in the whole document. That's a head start.
Thanks.

How can I detect presence (and calculate extent) of overlapping speakers in audio file?

I have a collection of WAV audio files that contain broadcast recordings. Mostly audio parts of video recordings of news broadcasts etc. (I don't have the original videos). I need to estimate what %age of those files have overlapping speakers, i.e. when 2 or more people are talking more or less at the same time. And for those files where overlap does occur, what %age of those are overlapping speech. I don't care if it's 2, 3 or 23 people talking at the same time, as long as it's more than 1. Gender, age etc don't matter either. On the other hand, those recording are in many different languages, of varying quality, and may also contain background noise (street sounds, music etc). So this problem seems to be simpler than speaker diarization, but has complicating factors.
So is there a library (preferably Python) or a command line tool that can do this out of the box. One that does not require any supervised training (that is, I don't have any labeled data to train it with). Unsupervised training might be OK, but I prefer to avoid it too.
Thank you
UPDATE: Downstream processing of these files might define the task a bit better: Ultimately, we'll process them with ASR in order to index resulting transcripts for keyword search. When we search for a keyword "blah" in a multi-speaker recording, we won't care which speaker said it as long as any one of them did. Intuitively, getting "blah" correctly from a recording where there are multiple speakers but everyone carefully waits for their turn to speak would be easier than when everyone is speaking at the same time. I am trying to measure how much overlap is in those recordings. Among other things, this will allow me to quantitatively compare 2 sets of such recordings and conclude that one is harder than the other.

Hub and Spoke indication using Python

Situation
Our company generates waste from various locations in US. The waste is taken to different locations based on suppliers' treatment methods and facilities placed nationally.
Consider a waste stream A which is being generated from location X. Now the overall costs to take care of Stream A includes Transportation cost from our site as well treatment method. This data is tabulated.
What I want to achieve
I would like my python program to import excel table containing this data and plot the distance between our facility and treatment facility and also show in a hub-spoke type picture just like airlines do as well show data regarding treatment method as a color or something just like on google maps.
Can someone give me leads on where should I start or which python API or module that might best suite my scenario?
This is a rather broad question and perhaps not the best for SO.
Now to answer it: you can read excel's csv files with the csv module. Plotting is best done with matplotlib.pyplot.

Datasets like "The LJ Speech Dataset"

I am trying to find databases like the LJ Speech Dataset made by Keith Ito. I need to use these datasets in TacoTron 2 (Link), so I think datasets need to be structured in a certain way. the LJ database is linked directly into the tacotron 2 github page, so I think it's safe to assume it's made to work with it. So I think Databases should have the same structure as the LJ. I downloaded the Dataset and I found out that it's structured like this:
main folder:
-wavs
-001.wav
-002.wav
-etc
-metadata.csv: This file is a csv file which contains all the things said in every .wav, in a form like this **001.wav | hello etc.**
So, my question is: Are There other datasets like this one for further training?
But I think there might be problems, for example, the voice from one dataset would be different from the one in one another, would this cause too much problems?
And also different slangs or things like that can cause problems?
There a few resources:
The main ones I would look at are Festvox (aka CMU artic) http://www.festvox.org/dbs/index.html and LibriVoc https://librivox.org/
these guys seem to be maintaining a list
https://github.com/candlewill/Speech-Corpus-Collection
And I am part of a project that is collecting more (shameless self plug): https://github.com/Idlak/Living-Audio-Dataset
Mozilla includes a database of several datasets you can download and use, if you don't need your own custom language or voice: https://voice.mozilla.org/data
Alternatively, you could create your own dataset following the structure you outlined in your OP. The metadata.csv file needs to contain at least two columns -- the first is the path/name of the WAV file (without the .wav extension), and the second column is the text that has been spoken.
Unless you are training Tacotron with speaker embedding/a multi-speaker model, you'd want all the recordings to be from the same speaker. Ideally, the audio quality should be very consistent with a minimum amount of background noise. Some background noise can be removed using RNNoise. There's a script in the Mozilla Discourse group that you can use as a reference. All the recordings files need to be short, 22050 Hz, 16-bit audio clips.
As for slag or local colloquialisms -- not sure; I suspect that as long as the word sounds match what's written (i.e. the phonemes match up), I would expect the system to be able to handle it. Tacotron is able to handle/train on multiple languages.
If you don't have the resources to produce your own recordings, you could use audio from a permissively licensed audiobook in the target language. There's a tutorial on this very topic here: https://medium.com/#klintcho/creating-an-open-speech-recognition-dataset-for-almost-any-language-c532fb2bc0cf
The tutorial has you:
Download the audio from the audiobook.
Remove any parts that aren't useful (e.g. the introduction, foreward, etc) with Audacity.
Use Aeneas to fine-tune and then export a forced alignment between the audio and the text of the e-book, so that the audio can be exported sentence by sentence.
Create the metadata.csv file containing the map from audio to segments. (The format that the post describes seems to include extra columns that aren't really needed for training and are mainly for use by Mozilla's online database).
You can then use this dataset with systems that support LJSpeech, like Mozilla TTS.

How to do Log Mining?

In order to figure out (or guess) something from one of our proprietary desktop tools developed by wxPython, I injected a logging decorator on several regardful class methods. Each log record looks like the following:
Right now, there are more than 3M log records in database and I started to think "What I can get from those stuff?". I can get some information like:
hit rate of (klass, method) by a period of time (ex, a week).
power users by record counts.
approximate crash rate by lost closing log compared to opening log.
I guess the related technique might be log mining. Does anyone have any idea for further information I can retrieve from this really simple log? I'm really interested to get something more from it.
SpliFF is right, you'll have to decide which questions are important to you and then figure out if you're collecting the right data to answer them. Making sense of this sort of operational data can be very valuable.
You probably want to start by seeing if you can answer some basic questions, and then move on to the tougher stuff once you have your log collection and analysis workflow established. Some longer-term questions you might consider:
What are the most common, severe bugs being encountered "in the wild", ranked by frequency and impact. Data: Capture stacktraces / callpoints and method arguments if possible.
Can you simplify some of the common actions your users perform? If X is the most common, can the number of steps be reduced or can individual steps be simplified? Data: Sessions, clickstreams for the common workflows. Features ranked by frequency of use, number and complexity of steps.
Some features may be confusing, have conflicting options, which lead to user mistakes. Sessions where the user backs up several times to repeat a step, or starts over from the beginning, may be telling.
You may also want to notify users that data is being collected for quality purposes, and even solicit some feedback from within the app's interface.
Patterns!
Patterns preceding failures. Say a failure was logged, now consider exploring these questions:
What was the sequence of klass-method combos that preceded it?
What about other combos?
Is it always the same sequence that precedes the same failures?
Does a sequence of minor failures precede a major failure?
etc
One way to compare patterns can be as such:
Classify each message
Represent each class/type with a unique ID, so you now have a sequence of IDs
Slice the sequence into time periods to compare
Compare the slices (arrays of IDs) with a diff algorithm
Retain samples of periods to establish the common patterns, then compare new samples for the same periods to establish a degree of anomaly

Categories

Resources