Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
A digital image consists of multiple pixels, each of them has some values which indicate the intensity of the corresponding colors. If I want to work with images I can simply read or change pixels. For scientific purposes there is for example the PPM-Format, which encodes each pixel one by one in readable ASCII format.
Is there a similar way to read or modify audio files? How is audio edited? What are the building blocks, the smallest parts, the “pixels” of audio recordings? Is there an ASCII sound file format?
This is probably completely off topic, but here you are...
An audio file consists of samples representing air movement at a certain point of time. In case of CD quality, that is 44100 samples per second, 16 bits each.
I don't think visualising that as ASCII would be very useful. You would need at least 3 characters per sample, which is 132300 characters per second of sound, or 39690000 (that is 40 millions) characters for a 5-minute song.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I downloaded this dataset with numbers and other mathematical symbols, which contains circa 380 000 images, split into 80 different folders, each named after the symbol it represents. For this project, a machine learning one, i need to get train and test sets which equally represent each symbol. For example 1/3 of the symbol folder in former dataset goes to test directory and 2/3 goes into train dir. I tried many times, but i always ended up with a ineffective code, iterating through every item, which lasted for ages and didn't even finish.
The dataset:
https://www.kaggle.com/xainano/handwrittenmathsymbols/
Dataset that you are using has extractor.py script that automaticaly does this for you
Scripts info
extract.py
Extracts trace groups from inkml files.
Converts extracted trace groups into images. Images are square shaped bitmaps > with only black (value 0) and white (value 1) pixels. Black color denotes patterns (ROI).
Labels those images (according to inkml files).
Flattens images to one-dimensional vectors.
Converts labels to one-hot format.
Dumps training and testing sets separately into outputs folder.
Visit its github here: https://github.com/ThomasLech/CROHME_extractor
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am trying to extract images of words from a picture that mostly has sentences in different type fonts. For example, consider this scenario:
Now I would want to extract individual images of words Clinton, Street and so on like this:
I tried applying the binary dilation but the distance between the white and black area was almost negligible to crop out the words. However, there was a little success when I first cropped out the blank area in the original image and then re-do the binary dilation on the cropped image with a lower F1 value.
What should be the best and high-accuracy approach to separate out images of the words from this picture?
Ps: I am following this blog post to help me get the task done.
Thank you
Fennec
With dilatation, I get this :
Is this not satisfactory for you because of the fact that lines may be too close by and merged together with dilatation (like it sort of happens for the last two lines) ?
Other stuff to try, from the top of my head :
-clustering.
-low level method where you count number of pixels in each line to find out where the lines are, then count the pixels in each column to figure out where the words are in each line.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Currently as per what I know the max resolution that can be created using pillow's image function is 3000*3000 but I need to create image which has resolution of 10000*10000 or more programmatically ???
If you people didn't get what I meant ,just comment to me rahter than closing this question (please)I will give a more detailed question!!!
The only ways to create a pixel-perfect SVG from a bitmap is to <rect/> elements for each pixel (or block of same-colored pixels), or to use an <image> element to reference your bitmap. In neither case will you end up reducing the file size.
A vector format like SVG is not well-suited to representing hand-tweaked pixels. You likely want to use a bitmap format that supports lossless compression, such as PNG. If file size is of critical importance, you may wish to use a tool like OptiPNG to ensure that your PNG files are as small as possible.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm developing a little tool which is able to classify musical genres. To do this, I would like to use a K-nn algorithm (or another one, but this one seems to be good enough) and I'm using python-yaafe for the feature extraction.
My problem is that, when I extract a feature from my song (example: mfcc), as my songs are 44100Hz-sampled, I retrieve a lot (number of sample windows) of 12-values-array, and I really don't know how to deal with that. Is there an approach to get just one representative value per feature and per song?
One approach would be to take the least RMS energy value of the signal as a parameter for classification.
You should use a music segment, rather than using the whole music file for classification.Theoretically, the part of the music of 30 sec, starting after the first 30 secs of the music, is best representative for genre classification.
So instead of taking the whole array, what you can do is to consider the part which corresponds to this time window, 30sec-59sec. Calculate the RMS energy of the signal separately for every music file, averaged over the whole time. You may also take other features into account, eg. , MFCC.
In order to use MFCC, you may go for the averaged value of all signal windows for a particular music file. Make a feature vector out of it.
You may use the difference between the features as the distance between the data points for classification.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm wondering if I can extract a sequence of musical notes from a recorded sound using Python.
It is the first time I'm considering using Python for this.
Help would be truly awesome :)
What you would want to do is take your audio samples, convert them into the frequency domain with a Fast Fourier Transform (FFT), find the most powerful frequency in the sample, and convert that frequency into a note.
See FFT for Spectrograms in Python for pointers to libraries to help with the first two items. See http://80.68.92.234/sigproc.html for some sample code to get you started.