Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last year.
Improve this question
I downloaded this dataset with numbers and other mathematical symbols, which contains circa 380 000 images, split into 80 different folders, each named after the symbol it represents. For this project, a machine learning one, i need to get train and test sets which equally represent each symbol. For example 1/3 of the symbol folder in former dataset goes to test directory and 2/3 goes into train dir. I tried many times, but i always ended up with a ineffective code, iterating through every item, which lasted for ages and didn't even finish.
The dataset:
https://www.kaggle.com/xainano/handwrittenmathsymbols/
Dataset that you are using has extractor.py script that automaticaly does this for you
Scripts info
extract.py
Extracts trace groups from inkml files.
Converts extracted trace groups into images. Images are square shaped bitmaps > with only black (value 0) and white (value 1) pixels. Black color denotes patterns (ROI).
Labels those images (according to inkml files).
Flattens images to one-dimensional vectors.
Converts labels to one-hot format.
Dumps training and testing sets separately into outputs folder.
Visit its github here: https://github.com/ThomasLech/CROHME_extractor
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I'm trying to make a function to load a large image data-set of 14,000 images into a variable but I'm running into memory (RAM) issues.
What I'm trying to make is something like a cifar100.load_data function but it's not working out for me.
The function I defined looks like this:
def load_data():
trn_x_names=os.listdir('data/train_x')
trn_y_names=os.listdir('data/train_y')
trn_x_list=[]
trn_y_list=[]
for image in trn_x_names[0:]:
img=cv2.imread('data/train_x/%s'%image)
img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
trn_x_list.append(img)
for image in trn_y_names[0:]:
img=cv2.imread('data/train_y/%s'%image)
img=cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
trn_y_list.append(img)
x_train= np.array(trn_x_list)
y_train= np.array(trn_y_list)
return x_train,y_train
I first load all the images one by one, adding them to corresponding lists and at the end changing those lists to a numpy array and assigning them to some variables and returning them. But on the way, I ran into RAM issues as it consumed 100 % of my RAM.
You need to read in your images in batches as opposed to loading the entire data set into memory. If you are using tensorflow use the ImageDataGenerator.flowfromdirectory. Documentation is here. If your data is not organized into sub directories then you will need to create a python generator that reads in the data in batches. You can see how to build such a generator here.. Set the batch size to a value say 30 that will not fill up your memory.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
A digital image consists of multiple pixels, each of them has some values which indicate the intensity of the corresponding colors. If I want to work with images I can simply read or change pixels. For scientific purposes there is for example the PPM-Format, which encodes each pixel one by one in readable ASCII format.
Is there a similar way to read or modify audio files? How is audio edited? What are the building blocks, the smallest parts, the “pixels” of audio recordings? Is there an ASCII sound file format?
This is probably completely off topic, but here you are...
An audio file consists of samples representing air movement at a certain point of time. In case of CD quality, that is 44100 samples per second, 16 bits each.
I don't think visualising that as ASCII would be very useful. You would need at least 3 characters per sample, which is 132300 characters per second of sound, or 39690000 (that is 40 millions) characters for a 5-minute song.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm currently trying to write a bot to play tetris on tetrisfriends.com to practice machine-learning, but I've become stuck. I'm trying to find a way to read the players score from the game but Tesseract doesn't recognize the font/numbers and I don't think I can retrain Tesseract to recognize the numbers either because it isn't a full font being used, just numbers.
The image that I'm trying to read the numbers from is this:
https://imgur.com/a/OVwV5
When I use Tesseract I can get it to recognize other words on the page, just not the numbers which is the part I need.
Does anyone have a way to do this, either by retraining Tesseract, another method, or any other way?
I'm not very familiar with Tesseract in particular, but it might not your best bet here. If the end goal was just to make a bot, you could probably pull the text directly from the app rather than worrying about OCR, but if you want to learn more about machine learning and you haven't done them already the MNIST and CIFAR-10 datasets are fantastic places to start.
Anyway! The image you're trying to test has very low contrast, and the font is heavily stylised. Looking at the website itself it looks like the characters are coloured yellow:
If you preprocessed your image so that yellow pixels are black and all others are white you would have a much cleaner source to work with e.g.:
If you want to push forward with Tesseract for this and the preprocessing isn't enough then you will probably have to retrain it for this font. You will need to prepare a corpus, process it similarly to how you expect your source data to look, and then use something like qt-box-editor to correct the data. This guide should be able to walk you through the basic steps of retraining.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am trying to extract images of words from a picture that mostly has sentences in different type fonts. For example, consider this scenario:
Now I would want to extract individual images of words Clinton, Street and so on like this:
I tried applying the binary dilation but the distance between the white and black area was almost negligible to crop out the words. However, there was a little success when I first cropped out the blank area in the original image and then re-do the binary dilation on the cropped image with a lower F1 value.
What should be the best and high-accuracy approach to separate out images of the words from this picture?
Ps: I am following this blog post to help me get the task done.
Thank you
Fennec
With dilatation, I get this :
Is this not satisfactory for you because of the fact that lines may be too close by and merged together with dilatation (like it sort of happens for the last two lines) ?
Other stuff to try, from the top of my head :
-clustering.
-low level method where you count number of pixels in each line to find out where the lines are, then count the pixels in each column to figure out where the words are in each line.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm developing a little tool which is able to classify musical genres. To do this, I would like to use a K-nn algorithm (or another one, but this one seems to be good enough) and I'm using python-yaafe for the feature extraction.
My problem is that, when I extract a feature from my song (example: mfcc), as my songs are 44100Hz-sampled, I retrieve a lot (number of sample windows) of 12-values-array, and I really don't know how to deal with that. Is there an approach to get just one representative value per feature and per song?
One approach would be to take the least RMS energy value of the signal as a parameter for classification.
You should use a music segment, rather than using the whole music file for classification.Theoretically, the part of the music of 30 sec, starting after the first 30 secs of the music, is best representative for genre classification.
So instead of taking the whole array, what you can do is to consider the part which corresponds to this time window, 30sec-59sec. Calculate the RMS energy of the signal separately for every music file, averaged over the whole time. You may also take other features into account, eg. , MFCC.
In order to use MFCC, you may go for the averaged value of all signal windows for a particular music file. Make a feature vector out of it.
You may use the difference between the features as the distance between the data points for classification.