I have been trying to load pretrained t5-base from the T5Tokenizer transformer in python. However it is not working after repeated attempts.
The Output shows "None"
!pip install sentencepiece==0.1.91
tokenizer = T5Tokenizer.from_pretrained("t5-base")
print(tokenizer)
The output of the above code is: None
A GitHub page says that version 0.1.91 of the sentencepiece library is required for t5-base. However, it is still not working as you can see in the above image.
What can be done in this case?
Related
I'm trying to work on an ASR model using transfer learning on wav2vec 2 model.
Anyway when I ever I wan't to show or modifiy an audio file I get this problem
def prepare_dataset(batch):
audio = batch["audio"]
# batched output is "un-batched"
batch["input_values"] = processor(audio["array"], sampling_rate=audio["sampling_rate"]).input_values[0]
batch["input_length"] = len(batch["input_values"])
with processor.as_target_processor():
batch["labels"] = processor(batch["sentence"]).input_ids
return batch
common_voice_train = common_voice_train.map(prepare_dataset, remove_columns=common_voice_train.column_names)
common_voice_test = common_voice_test.map(prepare_dataset, remove_columns=common_voice_test.column_names)
The erorrs:
RuntimeError: Backend "sox_io" is not one of available backends: ['soundfile'].
ImportError: To support decoding 'mp3' audio files, please install 'sox'.
This is my pytorch and torchaudio versions:
import torch
import torchaudio
print(torch.__version__)
print(torchaudio.__version__)
1.13.1+cu117
0.13.1+cu117
I really need help fixing this problem, this is part of my junior project! )':
I've trying to installing pytorch and installing deffrent versions but nothing worked the code is working. fine in colab but it's impossible for me to train it there so I have to use visual code...
First, note that the second error message is not from torchaudio and it's not accurate. TorchAudio does not depend on an external sox package.
TorchAudio provides limited IO features on Windows, as libsox does not
compile on Windows with VS2019. This situation is being worked on, but as of v0.13, Windows users need a workaround.
A simple way is to use other libraries like soundfile and convert the decoded NumPy NdArray object into PyTorch Tensor.
Another way is to install FFmpeg, and use torchaudio.io.StreamReader. You can write your own load function, following the tutorial like this.
https://pytorch.org/audio/0.13.1/tutorials/streamreader_basic_tutorial.html#sphx-glr-tutorials-streamreader-basic-tutorial-py
I am trying to follow this tutorial on PyPi (See Example -> Train Model): https://pypi.org/project/top2vec/
Very short amount of code, following it line by line:
from top2vec import Top2Vec
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))
model = Top2Vec(documents=newsgroups.data, speed="learn", workers=8)
I've tried running multiple times on different datasets, yet I keep running into the following error when training/building the model:
UFuncTypeError: ufunc 'correct_alternative_cosine' did not contain a loop with signature matching types <class 'numpy.dtype[float32]'> -> None
Has anyone encountered this error before and if so how have you fixed it? Otherwise, if anyone can run this same code please let me know if you run into the same error.
Thanks
Solved this by moving from a Jupyter notebook in favor for a typical .py file, as well as cloning the library, installing the requirements to a fresh virtualenv and running the setup.py file.
I am trying to play around with a custom object detection model that builds of a pretrained model. All I want is for my model is to detect a specific logo in a picture. The problem is, the guide that I am following is having problems with the libraries.
import tensorflow as tf
from imageai.Detection.Custom import DetectionModelTrainer
trainer = DetectionModelTrainer()
trainer.setModelTypeAsYOLOv3()
trainer.setDataDirectory(data_directory="/content/drive/MyDrive/Logo_Model2/")
trainer.setTrainConfig(object_names_array=["logo"], batch_size=4, num_experiments=122, train_from_pretrained_model="/content/drive/MyDrive/pretrained-yolov3.h5")
trainer.trainModel()
My error is coming when I import imageai.Detection.Custom import DetectionModelTrainer. I am doing this on google colab and I checked the versions and they seem to be all up to date.
ModuleNotFoundError: No module named 'tensorflow.keras'
Any ideas? I have looked around stack for similar problems yet I haven't been able to resolve my issue. It doesn't seem to be a tensorflow problem since I am able to build my models just fine until I add imageai.
Seems to be an issue with the latest tensorflow==2.8.0. git issue
For now, you can revert back to the older version of tensorflow
pip install tensorflow==2.7
And upgrade imageAI :
pip install imageai --upgrade
I am writing because I have a problem (silly and obvious introduction, I know).
I am trying to use the BERTopic package using the Python interpreter in RStudio and the reticulate extension:
Python 3.6.13 (C:/Users/Francesco/AppData/Local/r-miniconda/envs/r-reticulate/python.exe)
Reticulate 1.18.9008 REPL -- A Python interpreter in R.
I managed to install it with
pip3 install bertopic
At first, trying to install bertopic resulted in an error realating to its hdbscan dependence, specifically to the wheel used; I overcame it by installing hdbscan by conda (with pip the problem appeared unsolvable) and after doing it seemed that both were installed and fine (pip would confirm so).
Afterwards, I tried to follow the package tutorial in Medium/Towards Data Science (here the Colab version I’m following) to get accostumed with the package and to check that everything was working as supposed to.
I am basically copying and pasting the code of Colab on the Python chunks in the RMarkdown file I am using, but when I try to apply the same code of the tutorial to the same dataset used:
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
from bertopic import BERTopic
topic_model = BERTopic(language="english", calculate_probabilities=True, verbose=True)
topics, probs = topic_model.fit_transform(docs)
I get the following error:
Batches: 100%|##########| 589/589 [28:21<00:00, 2.89s/it]
2021-04-29 16:24:25,973 - BERTopic - Transformed documents to Embeddings
2021-04-29 16:24:35,752 - BERTopic - Reduced dimensionality with UMAP
OSError: [Errno 22] Invalid argument
In theory, following the output on colab, I should get:
....................... - BERTopic - Clustered UMAP embeddings with HDBSCAN
Since I had problem with hdbscan I do believe it is somehow related to it, and I read several GitHub and Stackoverflow pages pointing out problems with such a package, but I do not know how to solve this, but I really need to since I need to use package for my thesis.
Can someone help me, please?
PS: it's the first time I am asking stuff on stackoverflow: I hoped I wrote down everything necessary, but if some info is missing, please tell me.
I tried running this command but i get erros that i dont have tenserflow 2.2 or higher. But I checked and I have the correct version of tenserflow. I also did pip3 install keras command
I know for a fact that all of the code is correct because it worked for my teacher the other day and nothing has changed. I just need to run his commands but i keep running into problems
I am doing this course following everything he does in a recorded video so there must be no issue there but for some reason it just doesn't work
just install tensorflow as requested in the last line of the error message: pip install tensorflow. It is needed as backend for Keras.
Also, since keras is part of tensorflow now, I recommend to write imports as from tensorflow.keras.[submodule name] import instead of from keras.[submodule name] import