Issues in Gensim WordRank Embeddings

Issues in Gensim WordRank Embeddings - python

I am using Gensim wrapper to obtain wordRank embeddings (I am following their tutorial to do this) as follows.
from gensim.models.wrappers import Wordrank
model = Wordrank.train(wr_path = "models", corpus_file="proc_brown_corp.txt",
out_name= "wr_model")
model.save("wordrank")
model.save_word2vec_format("wordrank_in_word2vec.vec")
However, I am getting the following error FileNotFoundError: [WinError 2] The system cannot find the file specified. I am just wondering what I have made wrong as everything looks correct to me. Please help me.
Moreover, I want to know if the way I am saving the model is correct. I saw that Gensim offers the method save_word2vec_format. What is the advantage of using it without directly using the original wordRank model?

FileNotFoundError: [WinError 2] The system cannot find the file specified.
So, I am gonna assume here that you got the traceback on
model = Wordrank.train(wr_path = "models", corpus_file="proc_brown_corp.txt",
out_name= "wr_model")
See, the wr_path is supposed to point to where you have your wordrank installed, to be more specific, the path to the folder where your wordrank binary is saved.
So mine was path_to_wordrank_binary ='/home/ubuntu/wordrank' where wordrank is the folder that contains the wordrank.cpp
Then ensure that your corpus file is on the current directory. Since that's what you have given.
This is the tutorial you should be looking into.

Related

How to update data_dir and data_path in TF DatasetInfo object?

I'm trying to run a script that builds and loads a TF dataset. The dataset is cityscapes and it is already downloaded and stored in fs/datasets/cityscapes/. I can't move the data. In the directory, there are the following files: ['tfrecord', 'gtFine', 'tfrecord_instances_old', 'README', 'leftImg8bit', 'cityscapesScripts', 'tfrecord_instances', 'license.txt']. An error arises when I try to run dataset = self._dataset_builder.as_dataset(split=self._split, decoders=self._decoders). This error is
AssertionError: Dataset cityscapes: could not find data in /fs/datasets/cityscapes. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.
I believe the issue relates to Constructing tf.data.Dataset cityscapes for split train, from /fs/datasets/cityscapes/cityscapes/semantic_segmentation/1.0.0 which is printed before the error. This added path comes from the Cityscapes TFDS DatasetInfo object. If I try to edit the data_dir or data_path in that object with self._dataset_builder.info.data_dir='/fs/datasets/cityscapes', I receive the error message: AttributeError: can't set attribute. So if anyone has a fix, I'd appreciate it.

How to read and write of TFOD2 pipeline.config file by python?

As you have already seen in Tensorflow objects detection they provide pipeline.config file with respect to a particular model. But there we need to manually open these config files & change the parameter by hard coding. My query is like how can I read this pipeline.config file by python & change the parameter in runtime. Please help me with that.

There's an example in the tutorial notebook.
from object_detection.utils import config_util, save_pipeline_config
pipeline_config = 'configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
configs['model'].ssd.num_classes = 10 # change number of classes
Then, you can save:
save_pipeline_config(configs, 'path/to/save/dir/')
See the source code.

The answer of #Nicolas Gervais seems to be a bit outdated.
This seems to be the fully working version right now:
from object_detection.utils import config_util
pipeline_config = 'configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
configs['model'].ssd.num_classes = 10 # change number of classes
After you can save your pipeline.config in the following way:
# Convert dictionary to pipeline_pb2.TrainEvalPipelineConfig to be able to save it
pipeline_proto = config_util.create_pipeline_proto_from_configs(configs)
config_util.save_pipeline_config(pipeline_proto, 'path/to/save/dir/')

RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

I am trying to train PeleeNet pytorch and got the following error
train.py line 80
pelee_voc train configuration

Reading the link provided in #Dwijay 's answer, I found an answer that does not require you to do any source code change.
Indeed, it is very dangerous I would say to change PyTorch source code.
But the idea of modifying the Generator is the good one.
Indeed by default the random number generator generates numbers on CPU, but we want them on GPU.
Therefore, one should actually modify the data loader instantiation to fit the use of the default cuda device.
This is highlighted in this GitHub comment:
data_loader = data.DataLoader(
...,
generator=torch.Generator(device='cuda'),
)
This fix worked for me in PyTorch 1.11 (and worked for this other user in PyTorch 1.10).

I had same issue but on ubuntu20.04
I have tried turning shuffle off as mentioned and that worked but its not correct way as it will make your training worse.
Keep the shuffle ON and follow below step, these would vary according to pytorch version:
In file "site-packages/torch/utils/data/sampler.py" located in anaconda or wherever.
[Modify line 116]: generator = torch.Generator()
change to generator = torch.Generator(device='cuda')
[Modify line 126]: yield from torch.randperm(n, generator=generator).tolist()
change to yield from torch.randperm(n, generator=generator, device='cuda').tolist()
Line number could be different for different version but point to note is adding device='cuda' to functions.
Hope this helps!!!

Turning the shuffle parameter off in the dataloader solved it.
Got the answer form here.

Just wrote a quick code to Automate #Dwijay Bane 's answer
import os
import inspect
import torch
# Find the location of the torch package
package_path = os.path.dirname(inspect.getfile(torch))
full_path=os.path.join(package_path,'utils/data/sampler.py')
# Read in the file
with open(full_path, 'r') as file :
filedata = file.read()
# Replace the target string
filedata = filedata.replace('generator = torch.Generator()', 'generator = torch.Generator(device=\'cuda\')')
filedata = filedata.replace('yield from torch.randperm(n, generator=generator).tolist()', 'yield from torch.randperm(n, generator=generator, device=\'cuda\').tolist()')
# Write the file out again
with open(full_path, 'w') as file:
file.write(filedata)

OpenCV 3.4.1 error readNetFromTensorflow Can't open .pb in cv::dnn::ReadProtoFromBinaryFile

I have a problem with opening protobuf file using opencv C++.
I use this code:
cv::String weights = "frozen_inference_graph_face.pb";
cv::String pbtxt = "prototxt.pbtxt";
auto graph = cv::dnn::readNetFromTensorflow(weights, pbtxt);
I have this error:
OpenCV(3.4.1) Error: Unspecified error (FAILED: fs.is_open(). Can't open "frozen_inference_graph_face.pb") in cv::dnn::ReadProtoFromBinaryFile, file C:.hunter_Base\acbf4b9\93b3222\8eb84a0\Build\OpenCV\Source\modules\dnn\src\caffe\caffe_io.cpp, line 1126
It works well when I open it with Python code like this and detect image correctly:
cvNet =
cv.dnn.readNetFromTensorflow('frozen_inference_graph.pb','prototxt.pbtxt')
I have trained ssd_mobilenet_v1_pets. Cannot understand why I cannot open it with my C++ code and the error is refers to cafe, when I use tensorflow. Maybe the configuration of builded OpenCV is wrong? I set WITH_PROTOBUF=ON and BUILD_opencv_dnn=ON.

obviously,it's the problem of path.you should check the relative path,like this:
model = cv2.dnn.readNetFromCaffe("CarTypeRecognizition/model/vehicle_model.prototxt",
"CarTypeRecognizition/model/vehicle_model.caffemodel")

Cannot figure out how to install and use io_funcs in Python 2.7

I'm currently working through someone else's code and I cannot figure out what "io_funcs" are in Python. The import line is currently throwing an error that the packages cannot be found. I'm currently using Python 2.7
The original import line is:
from io_funcs.binary_io import BinaryIOCollection
And it is used under the context of loading in from a binary file, for example:
io_funcs = BinaryIOCollection()
io_funcs.array_to_binary_file(data, file_name)
file_id = os.path.splitext(os.path.basename(file_name))[0]
features, frame_number = io_funcs.load_binary_file_frame(file_name, 63)
io_funcs.array_to_binary_file(gen_features, new_file_name)
I've searched through StackOverflow and done a lot of Googling and I can't seems to find a reference to io_funcs anywhere.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Issues in Gensim WordRank Embeddings - python

Related

How to update data_dir and data_path in TF DatasetInfo object?

How to read and write of TFOD2 pipeline.config file by python?

RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

OpenCV 3.4.1 error readNetFromTensorflow Can't open .pb in cv::dnn::ReadProtoFromBinaryFile

Cannot figure out how to install and use io_funcs in Python 2.7

Categories

Resources