UndefinedMetricWarning. Nowhere i'm able to get the answer - python

In my code i'm calculating f1-score for my multi-labelled dataset by writing the following command:
f1_score_results.append(f1_score(y_train[col], y_pred_train[col_idx], average='macro'))
This code is giving me the output but error is also coming which is given below.
I also wrote the following code
print(len(np.unique(X_train)))
print(len(np.unique(X_test)))
print(len(np.unique(y_train)))
print(len(np.unique(y_test)))
The output that i m getting is :
Kindly help me how I can resolve this issue.

The error happens, because you have a lack of testing data. Therefore the warning, saying hey i dont have enough test data. So i cant give you the f-score, so i am just gonna put a 0.0 value since i dont have anything to campare too.
What you should do is try to increase, your y_train and y_test dataset size. Maybe imposing less conditions on then

Related

Tokenizing a Gensim dataset

Im trying to tokenize a gensim dataset, which I've never worked with before and Im not sure if its a small bug or im not doing it properly.
I loaded the dataset using
model = api.load('word2vec-google-news-300')
and from my understanding, to tokenize using nltk all I need to do it call
tokens = word_tokenize(model)
However, the error im getting is "TypeError: expected string or bytes-like object". What am I doing wrong?
word2vec-google-news-300 isn't a dataset that's appropriate to 'tokenize'; it's the pretrained GoogleNews word2vec model released by Google circa 2013 with 3 million word-vectors. It's got lots of word-tokens, each with a 300-dimensional vector, but no multiword texts needing tokenization.
You can run type(model) on the object that api.load() returns to see its Python type, which will offer more clues as to what's appropriate to do with it.
Also, something like nltk's word_tokenize() appears to take a single string; you'd typically not pass it any full large dataset, in one call, in any case. (You'd be more likely to iterate over many individual texts as strings, tokenizing each in turn.)
Rewind a bit & think more about what kind of dataset you're looking for.
Try to get it in a simple format you can inspect it yourself, as files, before doing extra steps. (Gensim's api.load() is really bad/underdocumented for that, returning who-knows-what depending on what you've requested.)
Try building on well-explained examples that already work, making minimal individual changes that you understand individually, checking continued proper operation after each step.
(Also, for future SO questions that may be any more complicated than this: it's usually best to include the full error message you've received, including all lines of 'traceback' context showing involved files and lines-of-code, in order to better point at relevant lines-of-code in your code, or the libraries you're using, that are most-directly involved.)

How to run CRISPR Off-Target-Predictor (CROP)?

I would like to check the off-target of my gRNA in the genome sequences from the species that I would like to examine. I found this method CROP
However, I really do not know how to run this code, because I have never used Python before.
Could anyone want to teach me to step by step?
I really appreciate your help.

Problem in lr_find() in Pytorch fastai course

While following the Jupyter notebooks for the course
I hit upon an error when these lines are run.
I know that the cnn_learner line has got no errors whatsoever, The problem lies in the lr_find() part
It seems that learn.lr_find() does not want to return two values! Although its documentation says that it returns a tuple. That is my problem.
These are the lines of code:
learn = cnn_learner(dls, resnet34, metrics=error_rate)
lr_min,lr_steep = learn.lr_find()
The error says:
not enough values to unpack (expected 2, got 1)
for the second line.
Also, I get this graph with one 'marker' which I suppose is either one of the values of lr_min or lr_steep
This is the graph
When I run learn.lr_find() only, i.e. do not capture the output in lr_min, lr_steep; it runs well but then I do not get the min and steep learning rates (which is really important for me)
I read through what lr_find does and it is clear that it returns a tuple. Its docstring says
Launch a mock training to find a good learning rate and return suggestions based on suggest_funcs as a named tuple
I had duplicated the original notebook, and when I hit this error, I ran the original notebook, with the same results. I update the notebooks as well, but no change!
Wherever I have searched for this online, any sort of error hasn't popped up. The only relevant thing I found is that lr_find() returns different results of the learning rates after every run, which is perfectly fine.
I was having the same problem and I found that the lr_find() output's has updated. You can substitute the second line to lrs = learn.lr_find(suggest_funcs=(minimum, steep, valley, slide)), and then you just substitute where you using lr_min and lr_steep to lrs.minimum and lrs.steep respectively, this should work fine and solve your problem.
If you wanna read more about it, you can see this post that is in the fastai's forum.

Assistance with Keras for a noise detection script

I'm currently trying to learn more about Deep learning/CNN's/Keras through what I thought would be a quite simple project of just training a CNN to detect a single specific sound. It's been a lot more of a headache than I expected.
I'm currently reading through this ignoring the second section about gpu usage, the first part definitely seems like exactly what I'm needing. But when I go to run the script, (my script is pretty much totally lifted from the section in the link above that says "Putting the pieces together, you may end up with something like this:"), it gives me this error:
AttributeError: 'DataFrame' object has no attribute 'file_path'
I can't find anything in the pandas documentation about a DataFrame.file_path function. So I'm confused as to what that part of the code is attempting to do.
My CSV file contains two columns, one with the paths and then a second column denoting the file paths as either positive or negative.
Sidenote: I'm also aware that this entire guide just may not be the thing I'm looking for. I'm having a very hard time finding any material that is useful for the specific project I'm trying to do and if anyone has any links that would be better I'd be very appreciative.
The statement df.file_path denotes that you want access the file_path column in your dataframe table. It seams that you dataframe object does not contain this column. With df.head() you can check if you dataframe object contains the needed fields.

powerlaw Python package errors

I'm currently trying to fit a set of (positive) data with the powerlaw.Fit() function from the powerlaw package. However, every single time I do this I obtain the following message:
<powerlaw.Fit at 0x25eac6d3e80>
which I've been trying to figure out what it means for ages, but obviously without success. Another issue that I've been facing is that whenever I plot my CCDF using
powerlaw.plot_ccdf()
and my PDF using
powerlaw.plot_pdf()
with my data, I only obtain a plot for the CCDF but nothing for the PDF. Why are all of these things happening? My data is within a NumPy array and looks as follows:
array([ 9.90857053e-06, 3.45336391e-05, 4.06757403e-05, ...,
6.91411789e-02, 6.92511375e-02, 7.45046008e-02])
I doubt there is any kind of issue with my data, since, as I said, I get the plot for the CCDF more than fine. Any kind of help would be highly appreciated. Thanks in advance. (Edit: the data is composed of 1908 non-integer values)
It probably helps to read the documentation. http://pythonhosted.org/powerlaw/
powerlaw.Fit is a class, so when you call powerlaw.Fit(...), you will get an object with associated methods. Save the object in a variable, then pull the results you want from it. For example:
results = powerlaw.Fit(data)
print(results.find_xmin())
The 'message' you are getting is just a placeholder for the Fit object that is created.

Categories

Resources