I am trying to build a set of image comparison module as part of an open source e-commerce solution. I have been reading about various techniques in Computer Vision using OpenCv & python.
Objective:
I need to pull out similar images from 1000's of images available in the site. Images are primarily of clothing like shirts, pants, tops etc...
For example, when some is looking for dotted dress, they should see products with similar pattern and maybe with same color.
I so far saw multiple ways to pull similar images. But due to lack of experience can't figure out which is the right method. Some of the possible solutions I stumbled upon:
Histogram comparison.
Feature matching (Wouldn't it match the patterns?)
HAAR Classifier (I assume training a lot of dotted dresses may yeild result)
Bag of words method.
Texture Matching using Local Binary Patterns
I also so LIRE based on Lucene for similar purpose. But couldn't understand which method can be used for this purpose. For example in their documentation they have mentioned LIRE supports the following:
cl_ha .. ColorLayout
ph_ha .. PHOG
oh_ha .. OpponentHistogram
eh_ha .. EdgeHistogram
jc_ha .. JCD
ce_ha .. CEDD
sc_ha .. ScalableColor
Any input/direction in the best approach will be very much appreciated.
Related
I am trying to build a web scraper that can predict the content of a given URL into multiple categories, but I am currently confused about which method is best suited for my use case. Here's the overall use case:
I want to predict a researcher's interest from their biography and categorize them into one or multiple categories based on SDG 17 goals. I have three data points to work with:
The biography of each researcher (can be scrapped and tokenized)
A list of keywords that are often associated with each of the SDG categories/goals (here's the example of said keywords)
Hundreds of categorizations that are done manually by students in the form of binary data (here's the example of said data)
So far, we have students that read each researcher's biography and decide which SDG category/goal each researcher belongs to. One research can belong to one or more SDG categories. We usually categorize it based on how often SDG keywords listed in our database are present in each researcher's bio.
I have looked up online machine learning models for NLP but couldn't decide on which method would work best with my use case. Any suggestions and references would be super appreciated because I'm a bit lost here.
The problem that you have here is a multi-label classification and you can solve it by applying supervised learning since you have a labelled dataset.
A labelled dataset should look something like this,
article 1 - sdg1, sdg2, sdg4
article 2 - sdg4
.
.
.
The implementation is explained in detail here - keras - multi-label-classification
This one has plenty of things abstracted and the implementation is kept simple - fasttext multi-label-classification
Profound insights of these libraries are here,
keras and fasttext
I am trying to find databases like the LJ Speech Dataset made by Keith Ito. I need to use these datasets in TacoTron 2 (Link), so I think datasets need to be structured in a certain way. the LJ database is linked directly into the tacotron 2 github page, so I think it's safe to assume it's made to work with it. So I think Databases should have the same structure as the LJ. I downloaded the Dataset and I found out that it's structured like this:
main folder:
-wavs
-001.wav
-002.wav
-etc
-metadata.csv: This file is a csv file which contains all the things said in every .wav, in a form like this **001.wav | hello etc.**
So, my question is: Are There other datasets like this one for further training?
But I think there might be problems, for example, the voice from one dataset would be different from the one in one another, would this cause too much problems?
And also different slangs or things like that can cause problems?
There a few resources:
The main ones I would look at are Festvox (aka CMU artic) http://www.festvox.org/dbs/index.html and LibriVoc https://librivox.org/
these guys seem to be maintaining a list
https://github.com/candlewill/Speech-Corpus-Collection
And I am part of a project that is collecting more (shameless self plug): https://github.com/Idlak/Living-Audio-Dataset
Mozilla includes a database of several datasets you can download and use, if you don't need your own custom language or voice: https://voice.mozilla.org/data
Alternatively, you could create your own dataset following the structure you outlined in your OP. The metadata.csv file needs to contain at least two columns -- the first is the path/name of the WAV file (without the .wav extension), and the second column is the text that has been spoken.
Unless you are training Tacotron with speaker embedding/a multi-speaker model, you'd want all the recordings to be from the same speaker. Ideally, the audio quality should be very consistent with a minimum amount of background noise. Some background noise can be removed using RNNoise. There's a script in the Mozilla Discourse group that you can use as a reference. All the recordings files need to be short, 22050 Hz, 16-bit audio clips.
As for slag or local colloquialisms -- not sure; I suspect that as long as the word sounds match what's written (i.e. the phonemes match up), I would expect the system to be able to handle it. Tacotron is able to handle/train on multiple languages.
If you don't have the resources to produce your own recordings, you could use audio from a permissively licensed audiobook in the target language. There's a tutorial on this very topic here: https://medium.com/#klintcho/creating-an-open-speech-recognition-dataset-for-almost-any-language-c532fb2bc0cf
The tutorial has you:
Download the audio from the audiobook.
Remove any parts that aren't useful (e.g. the introduction, foreward, etc) with Audacity.
Use Aeneas to fine-tune and then export a forced alignment between the audio and the text of the e-book, so that the audio can be exported sentence by sentence.
Create the metadata.csv file containing the map from audio to segments. (The format that the post describes seems to include extra columns that aren't really needed for training and are mainly for use by Mozilla's online database).
You can then use this dataset with systems that support LJSpeech, like Mozilla TTS.
I am trying to implement U-net and I use https://github.com/jakeret/tf_unet/tree/master/scripts this link as reference. I don't understand which dataset they used. please give me some idea or link which dataset i use.
On their github README.md they show three different datasets, that they applied their implementation to. Their implementation is dataset agnostic, therefore it shouldn't matter too much what data they use if you're trying to solve your own problem with your own data. But if you're looking for a toy data set to play around, check out their demos. There you'll see two readily available examples and how they can be used:
demo_radio_data.ipynb which uses an astronomic radio data example set from here: http://people.phys.ethz.ch/~ast/cosmo/bgs_example_data/
demo_toy_problem.ipynb which uses their built-in data generator of a noisy image with circles that are to be detected.
The latter is probably the easiest one if it comes to just having something to play with. To see how data is generated, check out the class:
image_gen.py -> GrayScaleDataProvider
(with an IDE like PyCharm you can just jump into the according classes in the demo source code)
How can I map the categories extracted by my program from a text analysis (using NLP/NLTK or Textblob) to a standard (or almost standard) taxonomy
Preferably open source products
I would also prefer to download the selected taxonomies (by theme) and work offline over them in Python (than use an online service/api)
I've just found this on the subject...
http://www.iab.com/guidelines/iab-quality-assurance-guidelines-qag-taxonomy/
There are several companies which provide REST API for classifications. I tried those three below which where applicable for me.
1) AYLIEN - NLP service
2) https://www.uclassify.com/ - providing multiple NLP classifiers
3) https://iab.taxonome.org - I found this one very simple and easy to use, they also have a kind free trial and some video classifications demo
I want to object extraction from Images. for example i want to count of human in a picture or find similar picture in great data base(like google example) or finding field of picture (Nature of Office or Home) and etc.
did you know any python library or module for do this work.
If you can link me
tutrial or instruction to this work
similar example project
Perhaps using simplecv?
Here is a video of a presenter at pycon who runs through a quick tutorial of how to use simplecv. About half-way through, at 9:50, she demonstrates how to detect faces in an image, which you might be able to use for your project.
Try this out: https://github.com/CMU-Perceptual-Computing-Lab/openpose
I used it to detect multiple persons and extract the skeleton joints. It's also a little sensitive, so post-processing needs to be done to remove outliers caused due to reflections on the floor, glass walls, etc.