How to read and write of TFOD2 pipeline.config file by python?

How to read and write of TFOD2 pipeline.config file by python? - python

As you have already seen in Tensorflow objects detection they provide pipeline.config file with respect to a particular model. But there we need to manually open these config files & change the parameter by hard coding. My query is like how can I read this pipeline.config file by python & change the parameter in runtime. Please help me with that.

There's an example in the tutorial notebook.
from object_detection.utils import config_util, save_pipeline_config
pipeline_config = 'configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
configs['model'].ssd.num_classes = 10 # change number of classes
Then, you can save:
save_pipeline_config(configs, 'path/to/save/dir/')
See the source code.

The answer of #Nicolas Gervais seems to be a bit outdated.
This seems to be the fully working version right now:
from object_detection.utils import config_util
pipeline_config = 'configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'
configs = config_util.get_configs_from_pipeline_file(pipeline_config)
configs['model'].ssd.num_classes = 10 # change number of classes
After you can save your pipeline.config in the following way:
# Convert dictionary to pipeline_pb2.TrainEvalPipelineConfig to be able to save it
pipeline_proto = config_util.create_pipeline_proto_from_configs(configs)
config_util.save_pipeline_config(pipeline_proto, 'path/to/save/dir/')

Related

Saving image of every python execution in a folder with a serial number as file name

I am a beginner in Python and would like to execute a code which saves an image into particular directory.
The thing is, I would like to save image with a serial number so that I can have many images(each execution gives one image) in the directory.
plt.savefig('Images/Imageplot.png') ## Image saved to particular folder

About the serial number, you can use the uuid library to generate a random and unique id for an image. See more here: https://www.geeksforgeeks.org/generating-random-ids-using-uuid-python/
To save images, it is a bit more complicated. It requires you to import the os library.
Here is an example:
import os
UPLOADED_FILE_DIR_PATH = os.path.join(os.path.dirname(__file__), "static", "uploaded-images")
my_file_path = os.path.join(UPLOADED_FILE_DIR_PATH, my_filename)
image_file.save(my_file_path)
This is a block of code I used previously for my website, so it may not apply for you, depending on your situation. I personnaly like that method, but if you are unsatisfied, take a look at this for more options: https://towardsdatascience.com/loading-and-saving-images-in-python-ba5a1f5058fb

RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

I am trying to train PeleeNet pytorch and got the following error
train.py line 80
pelee_voc train configuration

Reading the link provided in #Dwijay 's answer, I found an answer that does not require you to do any source code change.
Indeed, it is very dangerous I would say to change PyTorch source code.
But the idea of modifying the Generator is the good one.
Indeed by default the random number generator generates numbers on CPU, but we want them on GPU.
Therefore, one should actually modify the data loader instantiation to fit the use of the default cuda device.
This is highlighted in this GitHub comment:
data_loader = data.DataLoader(
...,
generator=torch.Generator(device='cuda'),
)
This fix worked for me in PyTorch 1.11 (and worked for this other user in PyTorch 1.10).

I had same issue but on ubuntu20.04
I have tried turning shuffle off as mentioned and that worked but its not correct way as it will make your training worse.
Keep the shuffle ON and follow below step, these would vary according to pytorch version:
In file "site-packages/torch/utils/data/sampler.py" located in anaconda or wherever.
[Modify line 116]: generator = torch.Generator()
change to generator = torch.Generator(device='cuda')
[Modify line 126]: yield from torch.randperm(n, generator=generator).tolist()
change to yield from torch.randperm(n, generator=generator, device='cuda').tolist()
Line number could be different for different version but point to note is adding device='cuda' to functions.
Hope this helps!!!

Turning the shuffle parameter off in the dataloader solved it.
Got the answer form here.

Just wrote a quick code to Automate #Dwijay Bane 's answer
import os
import inspect
import torch
# Find the location of the torch package
package_path = os.path.dirname(inspect.getfile(torch))
full_path=os.path.join(package_path,'utils/data/sampler.py')
# Read in the file
with open(full_path, 'r') as file :
filedata = file.read()
# Replace the target string
filedata = filedata.replace('generator = torch.Generator()', 'generator = torch.Generator(device=\'cuda\')')
filedata = filedata.replace('yield from torch.randperm(n, generator=generator).tolist()', 'yield from torch.randperm(n, generator=generator, device=\'cuda\').tolist()')
# Write the file out again
with open(full_path, 'w') as file:
file.write(filedata)

SaveData in Paraview python is not saving the file

I used one stl file to split the stl using Paraview. I traced the method using python trace in paraview.
Now, I used the code in python to run it. It runs perfectly, but it does not save the splitted mesh as needed. The code is used as per the trace obtained from paraview. Below is the snipped of the code where SaveData is used to save the file. How to save stl file?
import sys #sys- append path
import numpy as np
ParaViewBuildPath = "/home/ParaView-5.7.0/"
sys.path.append(ParaViewBuildPath + "lib/")
sys.path.append(ParaViewBuildPath + "lib/python3.7/site-packages")
sys.path.append(ParaViewBuildPath + "lib/python3.7/site-packages/vtkmodules")
from paraview.simple import *
import vtk
# find source
mesh176_rightstl = FindSource('mesh176_right.stl')
generateSurfaceNormals1 = GenerateSurfaceNormals(Input=mesh176_rightstl)
# Properties modified on generateSurfaceNormals1
generateSurfaceNormals1.FeatureAngle = 15.0
# create a new 'Connectivity'
connectivity1 = Connectivity(Input=generateSurfaceNormals1)
# create a new 'Threshold'
threshold1 = Threshold(Input=connectivity1)
#threshold1.Scalars = ['POINTS', 'RegionId']
# Properties modified on threshold1
threshold1.ThresholdRange = [10.0, 982.0]
# create a new 'Extract Surface'
extractSurface1 = ExtractSurface(Input=threshold1)
# save data
SaveData('surf176.stl', proxy=extractSurface1, FileType='Ascii')
I have addressed the error I am facing from the line "generateSurfaceNormals1".
[paraview ]vtkDemandDrivenPipeline:713 ERR| vtkPVCompositeDataPipeline (0x556f782fe7c0): Input port 0 of algorithm vtkPPolyDataNormals(0x556f7a16b2a0) has 0 connections but is not optional.
How to overcome this error?
Any leads will be appreciated.
Regards,
Sunag R A.

The error message means that mesh176_rightstl is None, so the FindSource does not find anything. Is the source name correct ? Is the data correctly loaded ?
As an error raised, the script stops and SaveData is not called. But its syntax is correct.
Minimal code to test stl writer:
s = Sphere()
SaveData('sphere.stl', proxy = s, FileType='Ascii')
It correctly produces the stl file with ParaView 5.9
edit
You should uncomment the line
#threshold1.Scalars = ['POINTS', 'RegionId']
Because pipeline is not executed until you ask for (e.g. with the SaveData), no default array can be found when you created the Threshold.

Where are the `tfds.load` datasets are saved?

I downloaded the cats vs dogs dataset using the tfds.load('cats_vs_dogs') and I want to find where it has been saved on my computer, after reading a bit I came across someone who claims the dataset can be found at ~/tensorflow_datasets/cats_vs_dogs/ but I can't find a folder that is called cats_vs_dogs at that path, how can I get the path to the files?

As per default
as I assume TFDS_DATA_DIR has not been set, datasets will be stored under ~/tensorflow_datasets
However, as this depends on your system and setup: If you want to check the dataset and see it, I would suggest to just manually set data_dir when using tfds.load. Then you know for sure, where it is stored.

You can use this:
import tensorflow_datasets as tfds
tfds.core.get_tfds_path('cats_vs_dogs')
'C:/Users/user/anaconda3/envs/env/lib/site-packages/tensorflow_datasets/cats_vs_dogs'

You can also set a folder to download as :
data_dir = 'D:\\Sandbox\\Github\\DATA_TFDS'
tfds.load(name='mnist',
split=['train', 'test'],
shuffle_files=True,
data_dir=data_dir,
with_info=True,
download=True)

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them into text, and tried importing it on tensorflow's live model of the embedding projector. One problem. It didn't work. It told me that the tensors were improperly formatted. So, being a beginner, I thought I would ask some people with more experience about possible solutions.
Equivalent to my code:
import gensim
corpus = [["words","in","sentence","one"],["words","in","sentence","two"]]
model = gensim.models.Word2Vec(iter = 5,size = 64)
model.build_vocab(corpus)
# save memory
vectors = model.wv
del model
vectors.save_word2vec_format("vect.txt",binary = False)
That creates the model, saves the vectors, and then prints the results out nice and pretty in a tab delimited file with values for all of the dimensions. I understand how to do what I'm doing, I just can't figure out what's wrong with the way I put it in tensorflow, as the documentation regarding that is pretty scarce as far as I can tell.
One idea that has been presented to me is implementing the appropriate tensorflow code, but I don’t know how to code that, just import files in the live demo.
Edit: I have a new problem now. The object I have my vectors in is non-iterable because gensim apparently decided to make its own data structures that are non-compatible with what I'm trying to do.
Ok. Done with that too! Thanks for your help!

What you are describing is possible. What you have to keep in mind is that Tensorboard reads from saved tensorflow binaries which represent your variables on disk.
More information on saving and restoring tensorflow graph and variables here
The main task is therefore to get the embeddings as saved tf variables.
Assumptions:
in the following code embeddings is a python dict {word:np.array (np.shape==[embedding_size])}
python version is 3.5+
used libraries are numpy as np, tensorflow as tf
the directory to store the tf variables is model_dir/
Step 1: Stack the embeddings to get a single np.array
embeddings_vectors = np.stack(list(embeddings.values(), axis=0))
# shape [n_words, embedding_size]
Step 2: Save the tf.Variable on disk
# Create some variables.
emb = tf.Variable(embeddings_vectors, name='word_embeddings')
# Add an op to initialize the variable.
init_op = tf.global_variables_initializer()
# Add ops to save and restore all the variables.
saver = tf.train.Saver()
# Later, launch the model, initialize the variables and save the
# variables to disk.
with tf.Session() as sess:
sess.run(init_op)
# Save the variables to disk.
save_path = saver.save(sess, "model_dir/model.ckpt")
print("Model saved in path: %s" % save_path)
model_dir should contain files checkpoint, model.ckpt-1.data-00000-of-00001, model.ckpt-1.index, model.ckpt-1.meta
Step 3: Generate a metadata.tsv
To have a beautiful labeled cloud of embeddings, you can provide tensorboard with metadata as Tab-Separated Values (tsv) (cf. here).
words = '\n'.join(list(embeddings.keys()))
with open(os.path.join('model_dir', 'metadata.tsv'), 'w') as f:
f.write(words)
# .tsv file written in model_dir/metadata.tsv
Step 4: Visualize
Run $ tensorboard --logdir model_dir -> Projector.
To load metadata, the magic happens here:
As a reminder, some word2vec embedding projections are also available on http://projector.tensorflow.org/

Gensim actually has the official way to do this.
Documentation about it

The above answers didn't work for me. What I found out pretty useful was this script (will be added to gensim in the future) Source
To transform the data to metadata:
model = gensim.models.Word2Vec.load_word2vec_format(model_path, binary=True)
with open( tensorsfp, 'w+') as tensors:
with open( metadatafp, 'w+') as metadata:
for word in model.index2word:
encoded=word.encode('utf-8')
metadata.write(encoded + '\n')
vector_row = '\t'.join(map(str, model[word]))
tensors.write(vector_row + '\n')
Or follow this gist

the gemsim provide convert method word2vec to tf projector file
python -m gensim.scripts.word2vec2tensor -i ~w2v_model_file -o output_folder
add in projector wesite, upload the metadata

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read and write of TFOD2 pipeline.config file by python? - python

Related

Saving image of every python execution in a folder with a serial number as file name

RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

SaveData in Paraview python is not saving the file

Where are the `tfds.load` datasets are saved?

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

Categories

Resources