EC2 python pickle fails - python

I'm doing a calculation on EC2 using python and when I try to dump a pickle file containing a dictionary, the program just exists, no error notice or anything. The file is large, around 1 gig, but everything works fine on my laptop, just not on EC2. I'm using an m3.large instance with an attached EBS volume with plenty of space. In the following code snippet, it prints out "dumping now" and then.....nothing, no "dumping complete". No error is caught by 'except'. When I try to load the pickle file I get an EOF Error.
Thanks for any advice!
try:
fp = open(pickleFile,"wb")
print 'dumping now'
pickle.dump(dataDict, fp)
print 'dumping complete'
fp.close()
except:
fp = open('/Users/thisUser/Data/report.txt','w')
fp.write('error writing pickle file')
fp.close()

Especially if you're debugging file IO, try not to have your debugging depend on said file IO. If there are any permissions or disk issues, you'll never know since the error message won't be able to write either.
Try running watch "ls -lh /your/file", then running your script, to see if the file writing stops suddenly, or is just continuing at a slow pace.
Try Ctrl-C if your script is hung and your watch looks static, and check out where in the traceback your function is spending its time.
Try pickling elements/keys of the dictionary individually--the pickling may be crashing on an individual complex element.
Try cPickle
Compare your versions at home and on ec2: import cPickle; print cPickle.__version__
Try different pickling versions.
I'm not sure which of these might help, but one of them might.

Related

Pickle not writing to file and program terminates after pickle.dump()

I am trying to write a program that initializes a dictionary with web scraping, and then serializes the dictionary using pickle so that the program doesn't need to scrape after the first time it is run. My issue is that after calling pickle.dump(someDict, dictFile), no data is written to the file and the program actually terminates. The code I am using is below:
if not Path(getcwd() + "\\dic.pickle").is_file():
someDict = funcToScrapeDict()
with open("dic.pickle", "wb") as dic_file:
pickle.dump(someDict, dic_file)
else:
with open("dic.pickle", "rb") as dic_file:
someDict = pickle.load(dic_file)
<<lots more code here processing someDict>>
So if the pickle file already exists it will just jump to unpickling.
I know that my scraping function works inside the loop because I test printed it immediately before calling pickle.dump(someDict, dic_file), so termination happens immediately after that call with no bytes written to file (though the file is created) and no error messages.
I am on Windows 10 using python 3.7.1
I also increased recursion limit because of a previous runtime error and tried using absolute paths with no luck.
[EDIT] Also worth noting that I tried this exact implementation outside of the scope of my problem with just a manually created dictionary of equal size (280) and it worked fine.

How to detect a .lock file in a geodatabase

I am very new to python, but I have written a simple python script tool for automating the process of updating mosaic datasets at my job. The tool runs great, but sometimes I get the dreaded 9999999 error, or "the geodatase already exists" when I try to overwrite the data.
The file structure is c:\users\my.name\projects\ImageryMosaic\Alachua_2014\Alachua_2014_mosaic.gdb. After some research, I determined that the lock was being placed on the FGDB whenever I opened the newly created mosaic dataset inside of the FGDB to check for errors after running the tool. I would like to be able to overwrite the data instead of having to delete it, so I am using the arcpy.env.overwriteOutput statement in my script. This works fine unless I open the dataset after running the tool. Since other people will be using this tool, I don't want them scratching thier heads for hours like me, so it would be nice if the script tool could look for the presence of a .Lock file in the geodatabase. That way I could at least provide a statement in the script as to why the tool failed in lieu of the unhelpful 9999999 error. I know about arcpy.TestSchemaLock, but I don't think that will work in this case since I am not trying to place a lock and I want to overwrite the FGDB, not edit it.
Late but this function below will check for lock files in given (gdb) path.
def lockFileExist(path = None):
if path == None:
import traceback
raise Exception("Invalid Path!")
else:
import glob
full_file_paths = glob.glob(path+"\\*.lock")
for f in full_file_paths:
if f.endswith(".lock"):
return True
return False
if lockFileExist(r"D:\sample.gdb"):
print "Lock file found in gdb. Aborting..."
else:
print "No lock files found!. Ready for processing..."

Detecting broken stream in python when file is deleted

My problem is that logging stops for a python program when the log is rotated.
I have tracked it down to the stream itself. I don't see any way to tell if the stream is broken from python. After the file is deleted it still accepts writes without any issue.
import os
FILE = 'testing.txt'
fs = open(FILE, 'a')
fs.write('word')
os.remove(FILE)
fs.write('Nothing....') # Nothing breaks
print(fs.errors) # No errors
So, how can I find out if the file stream is still valid?
And checking to see if the file exists will not help since the file will always exist regardless of whether or not the stream is still valid.
Upon much more inspection, I found the solution. It is an OS specific problem. When the file is deleted in Linux (or Macintosh) it just unlinks it. (I was not aware of this)
So if you run lsof on the machine, it still shows the file as open.
[user#machine]$ lsof | grep --color -i "testing.txt"
python26 26495 user 8w REG 8,33 23474 671920 /home/user/temp/testing.txt (deleted)
The solution is to stat the stream in python.
stat = os.fstat(fs.fileno())
Which will give you the number of links it has.
if stat.st_nlink < 1:
#has been deleted
And there you go. Now you know if you should reload it or not. Hopefully this helps someone else.
Try Exception handling:
import os
FILE = 'testing.txt'
try:
fs = open(FILE, 'a')
fs.write('word')
os.remove(FILE)
fs.write('Nothing....') # Nothing breaks
except Exception, e:
print "Error:", e
print(fs.errors) # No errors
There are python bindings for ionotify if you need more intelligence than just an try: except: clause. But I think its only pertinent to Linux (im not sure of your platform)
Another solution I found is to add the "copytruncate" flag into the logrotate config.
See "man logrotate" for more info.

Dropping a file onto a script to run as argument causes exception in Vista

edit:OK, I could swear that the way I'd tested it showed that the getcwd was also causing the exception, but now it appears it's just the file creation. When I move the try-except blocks to their it actually does catch it like you'd think it would. So chalk that up to user error.
Original Question:
I have a script I'm working on that I want to be able to drop a file on it to have it run with that file as an argument. I checked in this question, and I already have the mentioned registry keys (apparently the Python 2.6 installer takes care of it.) However, it's throwing an exception that I can't catch. Running it from the console works correctly, but when I drop a file on it, it throws an exception then closes the console window. I tried to have it redirect standard error to a file, but it threw the exception before the redirection occurred in the script. With a little testing, and some quick eyesight I saw that it was throwing an IOError when I tried to create the file to write the error to.
import sys
import os
#os.chdir("C:/Python26/Projects/arguments")
try:
print sys.argv
raw_input()
os.getcwd()
except Exception,e:
print sys.argv + '\n'
print e
f = open("./myfile.txt", "w")
If I run this from the console with any or no arguments, it behaves as one would expect. If I run it by dropping a file on it, for instance test.txt, it runs, prints the arguments correctly, then when os.getcwd() is called, it throws the exception, and does not perform any of the stuff from the except: block, making it difficult for me to find any way to actually get the exception text to stay on screen. If I uncomment the os.chdir(), the script doesn't fail. If I move that line to within the except block, it's never executed.
I'm guessing running by dropping the file on it, which according to the other linked question, uses the WSH, is somehow messing up its permissions or the cwd, but I don't know how to work around it.
Seeing as this is probably not Python related, but a Windows problem (I for one could not reproduce the error given your code), I'd suggest attaching a debugger to the Python interpreter when it is started. Since you start the interpreter implicitly by a drag&drop action, you need to configure Windows to auto-attach a debugger each time Python starts. If I remember correctly, this article has the needed information to do that (you can substitute another debugger if you are not using Visual Studio).
Apart from that, I would take a snapshot with ProcMon while dragging a file onto your script, to get an idea of what is going on.
As pointed out in my edit above, the errors were caused by the working directory changing to C:\windows\system32, where the script isn't allowed to create files. I don't know how to get it to not change the working directory when started that way, but was able to work around it like this.
if len(sys.argv) == 1:
files = [filename for filename in os.listdir(os.getcwd())
if filename.endswith(".txt")]
else:
files = [filename for filename in sys.argv[1:]]
Fixing the working directory can be managed this way I guess.
exepath = sys.argv[0]
os.chdir(exepath[:exepath.rfind('\\')])

Why doesn't Python release file handles after calling file.close()?

I am on windows with Python 2.5. I have an open file for writing. I write some data. Call file close. When I try to delete the file from the folder using Windows Explorer, it errors, saying that a process still holds a handle to the file.
If I shutdown python, and try again, it succeeds.
It does close them.
Are you sure f.close() is getting called?
I just tested the same scenario and windows deletes the file for me.
Are you handling any exceptions around the file object? If so, make sure the error handling looks something like this:
f = open("hello.txt")
try:
for line in f:
print line
finally:
f.close()
In considering why you should do this, consider the following lines of code:
f = open('hello.txt')
try:
perform_an_operation_that_causes_f_to_raise_an_exception()
f.close()
except IOError:
pass
As you can see, f.close will never be called in the above code. The problem is that the above code will also cause f to not get garbage collected. The reason is that f will still be referenced in sys.traceback, in which case the only solution is to manually call close on f in a finally block or set sys.traceback to None (and I strongly recommend the former).
Explained in the tutorial:
with open('/tmp/workfile', 'r') as f:
read_data = f.read()
It works when you writing or pickling/unpickling, too
It's not really necessary that try finally block: Java way of doing things, not Python
I was looking for this, because the same thing happened to me. The question didn't help me, but I think I figured out what happened.
In the original version of the script I wrote, I neglected to add in a 'finally' clause to the file in case of an exception.
I was testing the script from the interactive prompt and got an exception while the file was open. What I didn't realize was that the file object wasn't immediately garbage-collected. After that, when I ran the script (still from the same interactive session), even though the new file objects were being closed, the first one still hadn't been, and so the file handle was still in use, from the perspective of the operating system.
Once I closed the interactive prompt, the problem went away, at which I remembered that exception occurring while the file was open and realized what had been going on. (Moral: Don't try to program on insufficient sleep. : ) )
Naturally, I have no idea if this is what happened in the case of the original poster, and even if the original poster is still around, they may not remember the specific circumstances, but the symptoms are similar, so I thought I'd add this as something to check for, for anyone caught in the same situation and looking for an answer.
I did it using intermediate file:
import os
f = open("report.tmp","w")
f.write("{}".format("Hello"))
f.close()
os.system("move report.tmp report.html") #this line is for Windows users

Categories

Resources