Reading data from .csv or .txt in python

Reading data from .csv or .txt in python - python

I'm a beginner for python and TensorFlow. Following the instruction of "Reading data" in TensorFlow website, I want to load some data in to my project in python. That is my code, very simple
import tensorflow as tf
files = tf.train.match_filenames_once("*.txt")
print(files)
And the result is
Tensor("matching_filenames/read:0", dtype=string)
I have put the data which I want to read to the working space of this project. Why it still told me that matching file name is 0?
In addition, the data I want to read is a one dimensional data list, each double per line. And the file size is about 100W+ numbers.
The IDE I'm using is pycharm
Thank you!

Your variable Files is a Tensor (a node in the TensorFlow graph). You need to run it in a TensorFlow session, in order to get access to its value.
files = tf.train.match_filenames_once("*.txt")
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
print(sess.run(files))
I would advise you to read the official documentation to know more about TensorFlow, the Tensors, and the TensorFlow graph.

Related

Load MNIST dataset from scratch and split it in training-validation-test set

There are many guides about loading and splitting MNIST dataset, like this one. They are using libraries such as Keras or Tensorflow.
I would like to load MNIST dataset and splitting in trainig-validation-test set from scratch that is only using built-in python features (and numpy library, if needed).
This is the link to the dataset: MNIST dataset.
Can you help me?

You may look at the source code of Tensorflow or Keras to see how they download it without other libraries.
Here is the relevant piece of code in PyTorch.
It uses this helper code. As far as I can see that code only uses standard libraries. You may reuse their code (BSD-3 Clause License) or read theirs to see what you have to do and then write your own.
Once the data is on your disk and you can load it, there are several options to create a custom train/validate/test split: Python splitting data into random sets

How to load downloaded image dataset (the300w_lp) in TensorFlow?

I am trying to load 300W_lp dataset in tensorflow.
I downloaded and extracted the dataset manually at C:/datasets/the300w
Now when I try to load dataset into tensorflow using
the300w = tfds.load('the300w_lp',data_dir='C:\datasets\the300w', download=False)
it gives me error
Dataset the300w_lp: could not find data in C:\datasets\the300w. Please make sure to call dataset_builder.download_and_prepare(), or pass download=True to tfds.load() before trying to access the tf.data.Dataset object.
Please help. How to load dataset in tensorflow?

Try to use plain old
dataset = tfds.load('the300w_lp')
It works fine for me. Maybe You somehow incorrectly unzipped the dataset file? If you have spare time, try the above code and see if it works.

Just a simple way to tackle this issue. Simply run the above command in google colab, grab a portion of the dataset object, download it and use it for your own purpose :)

Embedding visualization with TensorFlow eager execution

I am using TensorFlow's eager execution and I would like to visualize embeddings in TensorBoard. I use the following code to setup the visualization:
self._writer = tf.contrib.summary.create_file_writer('path')
embedding_config = projector.ProjectorConfig()
embedding = embedding_config.embeddings.add()
embedding.tensor_name = self._word_embeddings.name
embedding.metadata_path = 'metadata.tsv'
projector.visualize_embeddings(self._writer, embedding_config)
where self._word_embeddings is my variable for the embeddings. However, when executing this script TensorFlow throws the following error message:
logdir = summary_writer.get_logdir()
AttributeError: 'SummaryWriter' object has no attribute 'get_logdir'
Has anybody experienced something similar and has an idea how to get the embedding visualization to run in eager mode?
I am using TensorFlow 1.10.0.
Any kind of help is greatly appreciated!

If you only care about visualization, and since you are working in eager mode, things can be much simpler.
As I can see, you already have your metadata.TSV file set. The only thing left, is to write your embedding matrix to a TSV file. Like, just a for loop over the matrix rows, with the values TAB separated.
Last step, you can load the tensorboard projector online, without installing it via: http://projector.tensorflow.org/ and upload your data. You have to upload the embedding file, and the metadata file separately, in two simple steps.

How to Port a .ckpt to a .pb for use in Tensorflow for Mobile Poets

I am trying to convert a pretrained InceptionV3 model (.ckpt) from the Open Images Dataset to a .pb file for use in the Tensorflow for Mobile Poets example. I have searched the site as well as the GitHub Repository and have not found any conclusive answers.
(OpenImages Inception Model: https://github.com/openimages/dataset)
Thank you for your responses.

Below I've included some draft documentation I'm working on that might be helpful. One other thing to look out for is that if you're using Slim, you'll need to run export_inference_graph.py to get a .pb GraphDef file initially.
In most situations, training a model with TensorFlow will give you a folder containing a GraphDef file (usually ending with the .pb or .pbtxt extension) and a set of checkpoint files. What you need for mobile or embedded deployment is a single GraphDef file that’s been ‘frozen’, or had its variables converted into inline constants so everything’s in one file.
To handle the conversion, you’ll need the freeze_graph.py script, that’s held in tensorflow/pythons/tools/freeze_graph.py. You’ll run it like this:
bazel build tensorflow/tools:freeze_graph
bazel-bin/tensorflow/tools/freeze_graph \
--input_graph=/tmp/model/my_graph.pb \ --input_checkpoint=/tmp/model/model.ckpt-1000 \ --output_graph=/tmp/frozen_graph.pb \
--input_node_names=input_node \
--output_node_names=output_node \
The input_graph argument should point to the GraphDef file that holds your model architecture. It’s possible that your GraphDef has been stored in a text format on disk, in which case it’s likely to end in ‘.pbtxt’ instead of ‘.pb’, and you should add an extra --input_binary=false flag to the command.
The input_checkpoint should be the most recent saved checkpoint. As mentioned in the checkpoint section, you need to give the common prefix to the set of checkpoints here, rather than a full filename.
output_graph defines where the resulting frozen GraphDef will be saved. Because it’s likely to contain a lot of weight values that take up a large amount of space in text format, it’s always saved as a binary protobuf.
output_node_names is a list of the names of the nodes that you want to extract the results of your graph from. This is needed because the freezing process needs to understand which parts of the graph are actually needed, and which are artifacts of the training process, like summarization ops. Only ops that contribute to calculating the given output nodes will be kept. If you know how your graph is going to be used, these should just be the names of the nodes you pass into Session::Run() as your fetch targets. If you don’t have this information handy, you can get some suggestions on likely outputs by running the summarize_graph tool.
Because the output format for TensorFlow has changed over time, there are a variety of other less commonly used flags available too, like input_saver, but hopefully you shouldn’t need these on graphs trained with modern versions of the framework.

Restoring graph in tensorflow fails because there is no variable to save

I know that there are countless questions on stack and github, etc. on how to restore a trained model in Tensorflow. I have read most of them (1,2,3).
I have almost exactly the same problem as 3 however I would like if possible to solve it in a different fashion as my training and my test need to be in separate scripts called from the shell and I do not want to add the exact same lines I used to define the graph in the test script so I cannot use tensorflow FLAGS and the other answers based on reruning the graph by hand.
I also do not want to sess.run every variables and manually map them by hands as it was explained as my graph is quite big (Using import_graph_def with the arguments input_map).
So I run some graph and train it in a specific script. Like for instance (but without the training part)
#Script 1
import tensorflow as tf
import cPickle as pickle
x=tf.Variable(42)
saver=tf.train.Saver()
sess=tf.Session()
#Saving the graph
graph_def=sess.graph_def
with open('graph.pkl','wb') as output:
pickle.dump(graph_def,output,HIGHEST_PROTOCOL)
#Training the model
sess.run(tf.initialize_all_variables())
#Saving the variables
saver.save(sess,"pretrained_model.ckpt")
I now have both graph and variables saved so I should be able to run my test model from another script even if I have extra training nodes in my graph.
#Script 2
import tensorflow as tf
import cPickle as pickle
sess=tf.Session()
with open('graph.pkl','rb') as input:
graph_def=pickle.load(input)
tf.import_graph_def(graph_def,name='persisted')
Then obviously I want to restore the variables using a saver but I encounter the same problem as 3 as there are no variables found to save to even create a saver. So I cannot write:
saver=tf.train.Saver()
saver.restore(sess,"pretrained_model.ckpt")
Is there a way to bypass those limitations ? I thought by importing graph it would recover the uninitialized variables in every node but it seems not. Do I really need to rerun it a second time like most of the answers given ?

The list of variables is saved in a Collection which is not saved in the GraphDef. Saver by default uses the list in ops.GraphKeys.VARIABLES collection (accessible through tf.all_variables()), and if you restored from GraphDef rather than using Python API to build your model, that collection is empty. You could specify the list of variables manually in tf.train.Saver(var_list=['MyVariable1:0', 'MyVariable2:0',...]).
Alternatively instead of GraphDef you could use MetaGraphDef which saves collections, there's a recently added MetaGraphDef HowTo

To my knowledge and my tests you can't simply pass names to tf.train.Saver object. It must be either list of variables o dictionary.
I would also like to read model from graph_def and then load variables using saver - however attempting it results only in error message: "Variable to save is not a variable"

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.