Top2Vec Model Failing To Train (Following Simple PyPi Tutorial) - python

I am trying to follow this tutorial on PyPi (See Example -> Train Model): https://pypi.org/project/top2vec/
Very short amount of code, following it line by line:
from top2vec import Top2Vec
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
model = Top2Vec(documents=newsgroups.data, speed="learn", workers=8)
I've tried running multiple times on different datasets, yet I keep running into the following error when training/building the model:
UFuncTypeError: ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types <class 'numpy.dtype[float32]'> -> None
Has anyone encountered this error before and if so how have you fixed it? Otherwise, if anyone can run this same code please let me know if you run into the same error.
Thanks

Solved this by moving from a Jupyter notebook in favor for a typical .py file, as well as cloning the library, installing the requirements to a fresh virtualenv and running the setup.py file.

Related

Problem with hdbscan used with bertopic: OSError: [Errno 22] Invalid argument

I am writing because I have a problem (silly and obvious introduction, I know).
I am trying to use the BERTopic package using the Python interpreter in RStudio and the reticulate extension:
Python 3.6.13 (C:/Users/Francesco/AppData/Local/r-miniconda/envs/r-reticulate/python.exe)
Reticulate 1.18.9008 REPL -- A Python interpreter in R.
I managed to install it with
pip3 install bertopic
At first, trying to install bertopic resulted in an error realating to its hdbscan dependence, specifically to the wheel used; I overcame it by installing hdbscan by conda (with pip the problem appeared unsolvable) and after doing it seemed that both were installed and fine (pip would confirm so).
Afterwards, I tried to follow the package tutorial in Medium/Towards Data Science (here the Colab version I’m following) to get accostumed with the package and to check that everything was working as supposed to.
I am basically copying and pasting the code of Colab on the Python chunks in the RMarkdown file I am using, but when I try to apply the same code of the tutorial to the same dataset used:
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
from bertopic import BERTopic
topic_model = BERTopic(language="english", calculate_probabilities=True, verbose=True)
topics, probs = topic_model.fit_transform(docs)
I get the following error:
Batches: 100%|##########| 589/589 [28:21<00:00, 2.89s/it]
2021-04-29 16:24:25,973 - BERTopic - Transformed documents to Embeddings
2021-04-29 16:24:35,752 - BERTopic - Reduced dimensionality with UMAP
OSError: [Errno 22] Invalid argument
In theory, following the output on colab, I should get:
....................... - BERTopic - Clustered UMAP embeddings with HDBSCAN
Since I had problem with hdbscan I do believe it is somehow related to it, and I read several GitHub and Stackoverflow pages pointing out problems with such a package, but I do not know how to solve this, but I really need to since I need to use package for my thesis.
Can someone help me, please?
PS: it's the first time I am asking stuff on stackoverflow: I hoped I wrote down everything necessary, but if some info is missing, please tell me.

Importing t5-base from T5Tokenizer fails

I have been trying to load pretrained t5-base from the T5Tokenizer transformer in python. However it is not working after repeated attempts.
The Output shows "None"
!pip install sentencepiece==0.1.91
tokenizer = T5Tokenizer.from_pretrained("t5-base")
print(tokenizer)
The output of the above code is: None
A GitHub page says that version 0.1.91 of the sentencepiece library is required for t5-base. However, it is still not working as you can see in the above image.
What can be done in this case?

I am doing docker Jupyter deep learning course and ran in to a problem with importing keras libraries and packages

I tried running this command but i get erros that i dont have tenserflow 2.2 or higher. But I checked and I have the correct version of tenserflow. I also did pip3 install keras command
I know for a fact that all of the code is correct because it worked for my teacher the other day and nothing has changed. I just need to run his commands but i keep running into problems
I am doing this course following everything he does in a recorded video so there must be no issue there but for some reason it just doesn't work
just install tensorflow as requested in the last line of the error message: pip install tensorflow. It is needed as backend for Keras.
Also, since keras is part of tensorflow now, I recommend to write imports as from tensorflow.keras.[submodule name] import instead of from keras.[submodule name] import

Issue when trying to read MNIST data set

I am about to learn about Neural Networks and I am about to reproduce a tutorial which trains a Neural Network with the target to identify handwritten letters. The training of the Neural Network should be done with the MNIST data set. Unfortunately, exactly where my issue comes as I am not able to read in the MNIST data set.
The environment I am using is a Jupyter Notebook and Python 3.
These are the lines of code I have (line 2 causes the issue):
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot = True)
Line 2 causes this error message:
ModuleNotFoundError: No module named 'tensorflow.contrib'
Ok, what the error tells me, is clear. Reason is, that in my tensorflow installation folder a directory /tensorflow/contrib/... does not exist.
The issues is caused by line 2, as the module input_data.py contains this line of code:
from tensorflow.contrib.learn.python.learn.datasets.mnist import read_data_sets
So, the core of my issue is, that I do not know, where to get the module read_data_sets from. I was searching at GitHub, but the path
/tensorflow/contrib/learn/python/learn/datasets/mnist/
does not exist there.
In detail: Subfolder 'mnist' is not to be found in GitHub. Therefore, I also do not find the file read_data_sets.py.
So, where do I find the missing module 'read_data_sets'?
Would be great, if someone could help me as this issue stops my attempt to deal with Neural Networks already at the very beginning.
Thanks a lot and kind regards,
Matthias
It seems that you are using a new version of tensorflow >= 1.13.0 so you may follow this link if you want to load MNIST dataset

Error "ValueError: bad marshal data (unknown type code)" with Python 2.7.13 and Keras 2.0.8

I get the ValueError: bad marshal data (unknown type code) above when trying to load a previously saved Keras model (I think it's a Python error though that has nothing to do with Keras, but not quite sure.)
from keras.models import load_model
from keras import __version__ as keras_version
model = load_model("model.h5")
I searched on Google but didn't find a working solution. I tried deleting pya-files with: sudo find /usr -name '*.pyc' -delete but that didn't help either.
Do you have an idea how I can fix this error? Thank you!
I know the post is a bit older, but I just ran into the same problem.
As #Daniel Möller said, it was because I had installed different versions of Python, Tensorflow and Keras. Try to train the model again, in the same environment that you use to load the model afterwards. Or at least make sure that the Python version and the modules used are installed in the same version.

Categories

Resources