Dill deletes object when using "load"

Dill deletes object when using "load" - python

I'm having an error that is driving me nuts. I generate some numerical simulation data sim_data.dill and save it to a directory on my computer using
with open(os.path.join(original_directory, 'sim_data.dill'), 'w' as f:
dill.dump(outputs, f)
This data is about 1 Gb and takes a while to generate. Now, I copied that file from original_directory to new_directory when I try to load it from a different program using
simfile = '/new_directory/sim_data.dill'
with open(simfile, 'r') as f:
outputs = dill.load(f)
One of two things happens:
the program says the file is missing with UnpicklingError: [Errno 2] No such file or directory: .../original_directory/sim_data.dill. This means dill puts in the original_directory in the metadata of the file and refuses to open it when the file is moved; truly appalling behavior.
when I copy the file back to new_directory, trying to open it gives an EOFError and dill changes the file to zero bytes, essentially deleting it. This is even worse.
I can read the file just fine by using a standard with open(simfile, 'r') as f; print f.readlines(), but obviously this does not help when trying to recover the internal class structure of the files.

Apparently this is normal behavior for dill; please see:
https://github.com/uqfoundation/dill/issues/296
Paraphrasing: the file location is part of the file handle to be pickled, and so unpickling it without that information is impossible. This means, apparently, that if you save a .dill file in one location, move the file manually (for example to a more convenient directory), and then try to open it again, it won't work.
In terms of the deletion issue, the author of the post above recommends to use fmode=FMODE_PRESERVEDATA or one of the other file modes listed at
https://github.com/matsjoyce/dill/blob/087c00899ef55f31d36e7aee51a958b17daf8c91/dill/dill.py#L136-L145

Related

Large Zip Files with Zipfile Module Python

I have never used the zip file module before. I have a directory that contains thousands of zip files i need to process. These files can be up to 6GB big. I have looked through some documentation but a lot of them are not clear on what the best methods are for reading large zip files without needing to extract.
I stumbled up this: Read a large zipped text file line by line in python
So in my solution I tried to emulate it and use it like I would reading a normal text file with the with open function
with open(odfslogp_obj, 'rb', buffering=102400) as odfslog
So I wrote the following based off the answer from that link:
for odfslogp_obj in odfslogs_plist:
with zipfile.ZipFile(odfslogp_obj, mode='r') as z:
with z.open(buffering=102400) as f:
for line in f:
print(line)
But this gives me an "unexpected keyword" error for z.open()
Question is, is there documentation that explains what keywords, the z.open() function would take? I only found one for the .ZipFile() function.
I wanna make sure my code isn't using up too much memory while processing these files line by line.
odfslogp_obj is a Path object btw
When I take off the buffering and just have z.open(), I get an error saying: TypeError: open() missing 1 required positional argument: 'name'

Once you've opened the zipfile, you still need to open the individual files it contains. That the second z.open you had problems with. Its not the builtin python open and it doesn't have a "buffering" parameter. See ZipFile.open
Once the zipfile is opened you can enumate its files and open them in turn. ZipFile.open opens in binary mode, which may be a different problem, depending on what you want to do with the file.
for odfslogp_obj in odfslogs_plist:
with zipfile.ZipFile(odfslogp_obj, mode='r') as z:
for name in z.namelist():
with z.open(name) as f:
for line in f:
print(line)

Approaches to loading JSON from file to dict using Python?

Using Python I'm loading JSON from a text file and converting it into a dictionary. I thought of two approaches and wanted to know which would be better.
Originally I open the text file, load the JSON, and then close text file.
import json
// Open file. Load as JSON.
data_file = open(file="fighter_data.txt", mode="r")
fighter_match_data = json.load(data_file)
data_file.close()
Could I instead do the following instead?
import json
// Open file. Load as JSON.
fighter_match_data = json.load(open(file="fighter_data.txt", mode="r"))
Would I still need to close the file? If so, how? If not, does Python close the file automatically?

Personally wouldn't do either. Best practice for opening files generally is to use with.
with open(file="fighter_data.txt", mode="r") as data_file:
fighter_match_data = json.load(data_file)
That way it automatically closes when you're out of the with statement. It's shorter than the first, and if it throws an error (say, there's an error parsing the json), it'll still close it.
Regarding your actual question, on needing to close the file in your one liner.
From what I understand about file handling and garbage collection, if you're using CPython, since the file isn't referenced anymore it "should" be closed straight away by the garbage collector. However, relying on garbage collection to do your work for you is never the nicest way of writing code. (See the answers to open read and close a file in 1 line of code for information as to why).

Your code as under is valid:
fighter_match_data = json.load(open(file="fighter_data.txt", mode="r"))
Consider this part:
open(file="fighter_data.txt", mode="r") . #1
v/s
data_file = open(file="fighter_data.txt", mode="r") . #2
In case of #2, in case you do not explicitly close the file, the file will automatically be closed when the variable ceases to exist[In better words, no reference exists to that variable] (when you move out of the function).
In case of #1, since you never create a variable, the lifespan of that implicit variable created for opening that file ceases to exist on that line itself. And python automatically closes the file after opening it.

Parsing a .lis file in Python

I am trying to parse in a .lis file into python to perform further analysis on the data but every time I get the following error,
<_io.TextIOWrapper name='Data.lis' mode='r' encoding='cp1252'>
I am parsing in the file with the standard command,
open(fileName)
Is there a certain package I need to install or is my parsing method incorrect?

What you got as an output doesn't appear to be an error, it is just telling you that python opened the file, and you have a file type object now.
Further, the operation you performed only got you part of the way. When reading a file, you need to:
Open the file
Store it as a variable (usually)
Read the variable a line at a time
Parse the result of your reading
Close the file
I usually start by trying to open the file in a program like Notepad++. That way I can get an idea of what I am trying to parse.
Let's walk through an example:
filename = 'myfile.lis'
with open(filename) as f:
for line in f:
print(line)
The code above opens the .lis file, and then prints the file to the console one line at a time. The with statement ensure that the file gets closed after we're done.
However, you could just as well replace the print() command with a parse() command of your own choosing:
def parse(input_line):
if 'text' in input_line:
print('I found \'text\' in line \'{}\''.format(input_line))
Hopefully that will get you started. If you are able to provide more detail about what the contents of your .lis file is, or what you are looking to extract from that file, I'm sure many around here can provide better guidance.

Cpickle invalid load key error with a weird key at the end

I just tried to update a program i wrote and i needed to add another pickle file. So i created the blank .pkl and then use this command to open it(just as i did with all my others):
with open('tryagain.pkl', 'r') as input:
self.open_multi_clock = pickle.load(input)
only this time around i keep getting this really weird error for no obvious reason,
cPickle.UnpicklingError: invalid load key, 'Γ'.
The pickle file does contain the necessary information to be loaded, it is an exact match to other blank .pkl's that i have and they load fine. I don't know what that last key is in the error but i suspect that could give me some incite if i know what it means.

So have have figured out the solution to this problem, and i thought I'd take the time to list some examples of what to do and what not to do when using pickle files. Firstly, the solution to this was to simply just make a plain old .txt file and dump the pickle data to it.
If you are under the impression that you have to actually make a new file and save it with a .pkl ending you would be wrong. I was creating my .pkl's with notepad++ and saving them as .pkl's. Now from my experience this does work sometimes and sometimes it doesn't, if your semi-new to programming this may cause a fair amount of confusion as it did for me. All that being said, i recommend just using plain old .txt files. It's the information stored inside the file not necessarily the extension that is important here.
#Notice file hasn't been pickled.
#What not to do. No need to name the file .pkl yourself.
with open('tryagain.pkl', 'r') as input:
self.open_multi_clock = pickle.load(input)
The proper way:
#Pickle your new file
with open(filename, 'wb') as output:
pickle.dump(obj, output, -1)
#Now open with the original .txt ext. DONT RENAME.
with open('tryagain.txt', 'r') as input:
self.open_multi_clock = pickle.load(input)

Gonna guess the pickled data is throwing off portability by the outputted characters. I'd suggest base64 encoding the pickled data before writing it to file. What what I ran:
import base64
import pickle
value_p = pickle.dumps("abdfg")
value_p_b64 = base64.b64encode(value_p)
f = file("output.pkl", "w+")
f.write(value_p_b64)
f.close()
for line in open("output.pkl", 'r'):
readable += pickle.loads(base64.b64decode(line))
>>> readable
'abdfg'

Creating a new file in Python

I am a beginner, writing a python script in which I need it to create a file that I can write information to. However, I am having problems getting it to create a new, not previously existing file.
for example, I have:
file = open(coordinates.kml, 'w')
which it proceeds to tell me:
nameerror: name 'coordinates' is not defined.
Of course it isn't defined, I'm trying to make that file.
Everything I read on creating a new file says to take this route, but it simply will not allow me. What am I doing wrong?
I even tried to flat out define it...
file = coordinates.kml
file_open = open(file, 'w')
... and essentially got the same result.

You need to pass coordinates.kml as a string, so place them in quotes (single or double is fine).
file = open("coordinates.kml", "w")

In addition to the above answer,
If you want to create a file in the same path, then no problem or else you need to specify the path as well in the quotes.
But surely opening a file with read permission will throw an error as you are trying to access an nonexistent file.

To be future proof and independent of the platforms you can read and write files in binaries. For example if this is Python on Windows, there could be some alternations done to the end of line. Hence reading and writing in Binary mode should help, using switches "rb" and "wb"
file = open("coordinates.kml", "wb")
And also remember to close the file session, else can throw errors while re running the script.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dill deletes object when using "load" - python

Related

Large Zip Files with Zipfile Module Python

Approaches to loading JSON from file to dict using Python?

Parsing a .lis file in Python

Cpickle invalid load key error with a weird key at the end

Creating a new file in Python

Categories

Resources