pickle.dump() fails saving a complex object, but no exceptions thrown

pickle.dump() fails saving a complex object, but no exceptions thrown - python

I am trying to save a very complex object using pickle. The object's structure is omitted as it is quite complicated, yet it doesn't require as much memory (~5MB).
nX = 20
nY = 50
model = ComplexGridModel(nX, nY) # nX, nY number or elements in X, Y directions
# I don't think it's important but "model" is being use like this:
analyzer = OtherComplexObject(*args, **kwargs)
analyzer.model = model
start = time()
[...] # loop operations on model and analyzer
end = time()
# now the critical part
print("Finished in {:.2f} min".format(end/60 - start/60) ) # print elapsed time
with open('filename.pickle', 'wb') as file:
pickle.dump(model, file, protocol=pickle.HIGHEST_PROTOCOL) # here crushes or sth like that, don't know
The above code creates the file filename.pickle, but it's empty with size 0 KB. When I comment out the pickle part everything works flawlessly, but I still want to save model.
There are 3 mindblown facts that I can't explain:
Everything works as expected, until I change nX > 24. So, everything works fine if e.g. nX=20.
No Errors are thrown, neither in my own code, nor during pickle.dump().
When the bug happens, even the line previous to pickle.dump() i.e. print(...) doesn't execute and so do the next ones. Seems variables are deleted, as well. My Spyder editor displays them in the variable explorer but when I try to access them from the console it throws NameError: name 'variable_name' not defined.
I know it's a drug and I would surely suspect my code being faulty; but with no Errors I can't help it. If my code was at fault wouldn't it at least throw an Error prior to pickle.dump()?.

Related

Execute python code and evaluate/test results

Admittedly I am not sure how to ask this, as I know how to handle this in R (code execution in a new environment), but equivalent searches for the python solution are not yielding what I was hoping.
In short, I will receive a spreadsheet (or csv) where the contents of the column will contain, hopefully, valid python code. This could be the equivalent of a script, but just contained in the csv/workbook. For a use case, think teaching programming and the output is an LMS.
What I am hoping to do is loop over the file, and for each cell, run the code, and with the results in memory, test to see if certain things exist.
For example: https://docs.google.com/spreadsheets/d/1D-zC10rUTuozfTR5yHfauIGbSNe-PmfrZCkC7UTPH1c/edit?usp=sharing
When evaluating the first response in the spreadsheet above, I would want to test that x, y, and z are all properly defined and have the expected values.
Because there would be multiple rows in the file, one per student, how can I run each row separately, evaluate the results, and ensure that I isolate the results to only that cell. Simply, when moving on, I do not retain any of the past evaluations.

(I am unaware of tools to do code checking, so I am dealing with it in a very manual way.)
It is possible to use Python's exec() function to execute strings such as the content in the cells.
Ex:
variables = {}
exec("""import os
# a comment
x = 2
y = 6
z = x * y""", variables)
assert variables["z"] == 12
Dealing with the csv file:
import csv
csv_file = open("path_to_csv_file", "rt")
csv_reader = csv.reader(csv_file)
iterator = iter(csv_reader)
next(iterator) # To skip the titles of the columns
for row in iterator:
user = row[0]
answer = row[1]
### Any other code involving the csv file must be put here to work properly,
### that is, before closing csv_file.
csv_file.close() # Remember to close the file.
It won't be able to detect whether some module was imported (Because when importing from an exec() function, the module will remain in cache for the next exec's). One way to test this would be to 'unimport' the module and test the exec for Exceptions.
Ex:
# This piece of code would be before closing the file,
# INSIDE THE FOR LOOP AND WITH IT IDENTED (Because you want
# it to run for each student.).
try:
del os # 'unimporting' os (This doesn't 'unimport' as much as deletes a
# reference to the module, what could be problematic if a 'from
# module import object' statement was used.)
except NameError: # So that trying to delete a module that wasn't imported
# does not lead to Exceptions being raised.
pass
namespace = dict()
try:
exec(answer, namespace)
except:
# Answer code could not be run without raising exceptions, i.e., the code
# is poorly written.
# Code you want to run when the answer is wrong.
else:
# The code hasn't raised Exceptions, time to test the variables.
x, y, z = namespace['x'], namespace['y'], namespace['z']
if (x == 2) and (y == 6) and (z == x * y):
# Code you want to run when the answer is right.
else:
# Code you want to run when the answer is wrong.
I sense that this is not the best way to do this, but it is certainly an attempt.
I hope this helped.
EDIT: Removed some bad code and added part of Tadhg McDonald-Jensen's comment.

Can you delete or replace a python logging message?

I'm looking for a way to replace or delete the last message wrote by python's logging module. The goal is to log a change in variables once it occurs. If the variable changes again, the old log message should be deleted and the new one printed instead.
Hi,
I am using pythons's logging module for a deep learning project I'm currently working on. As some GPUs just don't have enough memory to support the default batch size during training and there is no apparent connection between batch size and actual memory usage that could be used for calculations beforehand, I'm catching the runtime error once it occurs and decrease the batch size by one.
This process can be repeated quite a bit and I'm always logging which batch size did not work and which will be the next one tried. Instead of having 10-30 of these messages (or more) I'd like to simply delete the last one and replace it with the newer one instead.
I've already checked out the python logging documentation, stumbled upon the LogRecord object, but upon trying to deal with it, it seems this object does not actually keep a record of all logs, but rather saves some more info on one specific log instead.
If there is simply no way to do this, I will look into some kind of bundling solution as described here: Python logging: bundle reoccurring messages
The code below shows the log message I'm looking to replace.
Any help is greatly apreciated.
training_not_successful = True
while training_not_successful:
try:
model.run_training(global_settings['epochs'],
train_loader,
test_loader,
global_settings['checkpoint_output_path'],
model_name,
global_settings['best_net_criterion'])
training_not_successful = False
except MemoryError:
logging.warning("Ran out of CUDA memory using batch size " + str(batch_size) +
". Trying again with batch size " + str(batch_size-1))
batch_size -= 1
train_loader, test_loader = get_train_test_loaders(
train_dataset_list,
test_dataset_list,
value_counts,
batch_size
)

I believe (correct me if I'm wrong) that the logging module does not allow to supress newlines, meaning that it's simply not possible to do something like that.
It is possible to do it with a print though:
import shutil
def display(variable, rewritable=False):
columns, lines = shutil.get_terminal_size(fallback=(80, 20))
text = str(variable)
filled = text + ((columns - len(text)) * ' ')
print(filled, end='\r' if rewritable else '\n')
if __name__ == "__main__":
from random import random
from time import sleep
for i in range(10):
display(f"x = {random()}", True)
sleep(1)
display(f"x = 0.0") # test if old value is overwritten completely
display("Done!")
Tested this on linux, but it should work everywhere. (the shutil.get_terminal_size function)
It's not mandatory, but very nice when entire line is overwritten as opposed to only the part that's changed.
The key is character \r - it returns the cursor to the end of the line, that's it. Now you can start writing from the front again, overwriting the line if it has anything else, which is exactly what you want.
Display function is simple, but I'll explain it anyway:
First line gets terminal size, what we need is the width of the line, so we can pad the text with spaces and fill entire line with spaces, to completely overwrite previous line no matter what it had.
Then we convert our variable to a string.
After that, it's just simple math, our string takes n characters, so the rest should be spaces, so we add width - n spaces to final string, and then we print it - entire line is overwritten.
rewritable flag allows to control when the variable should be rewritten next time you call display.
While this is not what you want, as it does not use logging module, since there's no way (that I know of) to make logging module to print \r instead of \n, I think this is a good enough substitute, that could be used if it turns out that you can indeed do this with logging module.

Printing inside jupyter notebook custom loss function with Keras/TF

In Keras, if you make a custom loss function in a Jupyter notebook, you can not print anything. For instance if you have:
def loss_func(true_label, NN_output):
true_cat = true_label[:,0]
pred_cat = NN_output[:,0]
indicator = NN_output[:,1]
print("Hi!")
custom_term = K.mean(K.abs(indicator))
return binary_crossentropy(true_cat, pred_cat) + custom_term
Nothing will print when the function is evaluated.
As a workaround, in case I am doing some debugging, I have found that I can write to a file in a cost function, which can be useful if I want to print something standard like an int or a string.
However, trying to write out a tensor like indicator to a file gives the unbelievably helpful output:
Tensor("loss_103/model_105_loss/Print:0", shape=(512,), dtype=float32)
I know TF provides a tf.Print() method to print the value of a tensor, but I don't understand how that plays with Jupyter. Other answers have said that tf.Print() writes to std. err, which means trying
sys.stderr = open('test.txt', 'w')
should theoretically allow me to get my output from a file, but unfortunately this doesn't work (at least in Jupyter).
Is there any general method to get a representation of my tensor as a string? How do people generally get around this barrier to seeing what your code does? If I come up with something more fancy than finding a mean, I want to see exactly what's going on in the steps of my calculation to verify it works as intended.
Thanks!

You can do something like the below code:
def loss_func(true_label, NN_output):
true_cat = true_label[:,0]
true_cat = tf.Print(true_cat, [true_cat], message="true_cat: ") # added line
pred_cat = NN_output[:,0]
pred_cat = tf.Print(pred_cat, [pred_cat], message="pred_cat: ") # added line
indicator = NN_output[:,1]
custom_term = K.mean(K.abs(indicator))
return binary_crossentropy(true_cat, pred_cat) + custom_term
Basically I have added two lines to print the values of true_cat, pred_cat.
To print something, you have to include the print statement in the tf graph by the above statements.
However, the trick is it's going to print on jupyter notebook console where you're running the notebook not on the ipython notebook itself.
References:
How to print the value of a Tensor object in TensorFlow?
Printing the loss during TensorFlow training
https://www.tensorflow.org/api_docs/python/tf/Print

Tensorflow session returns as 'closed'

I have successfully ported the CIFAR-10 ConvNet tutorial code for my own images and am able to train on my data and generate Tensorboard outputs etc.
My next step was to implement an evaluation of new data against the model I built. I am trying now to use cifar10_eval.py as a starting point however am running into some difficulty.
I should point out that the original tutorial code runs entirely without a problem, including cifar10_eval.py. However, when moving this particular code to my application, I get the following error message (last line).
RuntimeError: Attempted to use a closed Session.
I found this error is thrown by TF's session.py
# Check session.
if self._closed:
raise RuntimeError('Attempted to use a closed Session.')
I have checked the directories in which all files should reside and be created, and all seems exactly as it should (they mirror perfectly those created by running the original tutorial code). They include a train, eval and data folders, containing checkpoints/events files, events file, and data binaries respectively.
I wonder if you could help pointing out how I can debug this, as I'm sure there may be something in the data flow that got disrupted when transitioning the code. Unfortunately, despite digging deep and comparing to the original, I can't find the source, as they are essentially similar with trivial changes in file names and destination directories only.
EDIT_01:
Debugging step by step, it seems the line that actually throws the error is #106 in the original cifar10_eval.py:
def eval_once(args etc)
...
with tf.Session() as sess:
...
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op)) # <========== line 106
summary_op is created in def evaluate of this same script and passed as an arg to def eval_once.
summary_op = tf.merge_all_summaries()
...
while True:
eval_once(saver, summary_writer, top_k_op, summary_op)

From documentation on Session, a session can be closed with .close command or when using it through a context-manager in with block. I did find tensorflow/models/image/cifar10 | xargs grep "sess" and I don't see any sess.close, so it must be the later.
IE, you'll get this error if you do something like this
with tf.Session() as sess:
sess.run(..)
sess.run(...) # Attempted to use a closed Session.

It was a simple (but humbling) error in indentation.
summary = tf.Summary()
summary.ParseFromString(sess.run(summary_op))
summary.value.add(tag='Precision # 1', simple_value=precision)
summary_writer.add_summary(summary, global_step)
was outside of the try: block, and of course, no session could be found.
Sigh.

TensorFlow: Node not found

I am training a neural network and have been running this code without any problems but sometimes (twice) I get an error Not Found: FetchOutputs node not found at the line y_1 = sess.run(get_labels(step)) (See below).
get_labels(step) is a function to return the correct labels of my training images which is in a text file.
def get_labels(step):
with open('labels.txt','r') as fin:
reader = csv.reader(fin)
c = [[int(s) for s in row] for i,row in enumerate(reader) if i==step]
label_numbers = np.array(c)
# Convert to one-hot vectors
numpy_label = np.zeros((BATCH_SIZE,5))
for i in range(BATCH_SIZE):
numpy_label[i,label_numbers[0][i]-1] = 1
# Convert to tensor
y_label = tf.convert_to_tensor(numpy_label,dtype=tf.float32)
return y_label
This is my main function:
def main():
# Placeholder for correct labels
y_label = tf.placeholder(tf.float32,shape=[BATCH_SIZE,5])
< Other functions etc. >
sess.run(tf.initialize_all_variables())
tf.train.start_queue_runners(sess=sess)
for step in range(1000):
# Get labels for current batch
y_1 = sess.run(get_labels(step))
# Train
sess.run([train_step],feed_dict={y_label:y_1})
< Other stuff like writing summaries, saving variables etc. >
sess.close()
From reading some of the issues on GitHub, I know this is to do with the fact that I call y_1 = sess.run(get_labels(step)) after tf.train.start_queue_runners(sess=sess) but I don't understand:
why it works most of the time, but occasionally doesn't?
Is y_1 = sess.run(get_labels(step)) adding or modifying nodes in the graph? I thought I was just running a node get_labels(step) that was already defined in the graph. I tried finalizing the graph before starting the queue runners but that gave me the error that finalized graphs cannot be modified.
What would be the proper way to write the code? Usually I just restart my program and it is fine - but clearly I am not doing it the proper way.
Thank you!
EDIT:
I think it might be important to mention that this happens when I am trying to run a TensorFlow script in a separate screen on a server i.e. I have one screen running a TensorFlow script and now I create a new screen to run a different TensorFlow script. I just started using screens so I might be missing something fundamental about how they work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.