Problems with pickle python - python

I recently made a program using an external document with pickle. But when it tries to load the file with pickle, I got this error (the file is already existing but it also fails when the file in't existing):
python3.6 check-graph_amazon.py
a
b
g
URL to follow www.amazon.com
Product to follow Pool_table
h
i
[' www.amazon.com', ' Pool_table', []]
p
Traceback (most recent call last):
File "check-graph_amazon.py", line 17, in <module>
tab_simple = pickle.load(doc_simple)
io.UnsupportedOperation: read
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "check-graph_amazon.py", line 42, in <module>
pickle.dump(tab_simple, 'simple_data.dat')
TypeError: file must have a 'write' attribute
Here is the code :
import pickle5 as pickle
#import os
try:
print("a")
with open('simple_data.dat', 'rb') as doc_simple:
print("b")
tab_simple = pickle.load(doc_simple)
print("c")
print(tab_simple)
print("d")
URL = tab_simple[0]
produit_nom = tab_simple[1]
tous_jours = tab_simple[2]
print("f")
except :
print("g")
URL = str(input("URL to follow"))
produit_nom = str(input("Product to follow"))
with open('simple_data.dat', 'wb+') as doc_simple:
print("h")
#os.system('chmod +x simple_data.dat')
tab_simple = []
tab_simple.append(URL)
tab_simple.append(produit_nom)
tab_simple.append([])
print(tab_simple)
print("c'est le 2")
print("p")
pickle.dump(tab_simple, 'simple_data.dat')
print("q")
The prints are here to show which lines are executed. The os.system is here to allow writing on the file but the error is persisting.
I don't understand why it's said that the document doesn't have a write attribute because I opened it in writing mode. And I neither understand the first error where it can't load the file.
If it can help you the goal of this script is to initialise the program, with a try. It tries to open the document in reading mode in the try part and then set variables. If the document doesn't exist (because the program is lauched for the first time) it goes in the except part and create the document, before writing informations on it.
I hope you will have any clue, including changing the architecture of the code if you have a better way to make an initialisation for the 1st time the program is launched.
Thanks you in advance and sorry if the code isn't well formated, I'm a beginner with this website.

Quote from the docs for pickle.dump:
pickle.dumps(obj, protocol=None, *, fix_imports=True)
Write a pickled representation of obj to the open file object file. ...
...
The file argument must have a write() method that accepts a single bytes argument. It can thus be an on-disk file opened for binary writing, an io.BytesIO instance, or any other custom object that meets this interface.
So, you should pass to this function a file object, not a file name, like this:
with open("simple_data.dat", "wb"): as File:
pickle.dump(tab_simple, File)
Yeah, in your case the file has already been opened, so you should write to doc_simple.

Related

Python Read Excel file with Error - "We Found a problem with some content..."

Here is my problem. We have an Excel based report that business users enter comments into two separate fields, as well as selecting a code form a drop down. We then have a manual process that collects those files and pushes the comments and codes to a Snowflake table to be able to use in various reports.
I am trying to improve the process with a Python script that will collect the files, copy them to a staging_folder location, then read in the data from the sheet, append it all together, do some cleanup and push to Snowflake. The plan is that this would be completely automated - but this is where we run into issues.
Initial step works perfectly. I have a loop that grabs the files based on the previous business day date, copies them to a staging folder. There are typically 32 files each day.
Next step reads those files to append to a dataframe. Here is the function that is loading the Excel files in my Python script.
def load_files():
file_list = glob.glob(file_path + r'\*')
df = pd.DataFrame()
print("Importing data to Pandas DF...")
for file in file_list:
try:
wb = load_workbook(file)
ws = wb["Daily Outs"]
data = ws.values
cols = next(data)[1:]
data = list(data)
idx = [r[0] for r in data]
data = (islice(r, 1, None) for r in data)
data_1 = pd.DataFrame(data, index=idx, columns=cols)
df = df.append(data_1, sort=False)
print(file + " Imported to Df...")
except Exception as e:
print("Error: " + e + " When attempting to open file: " + file)
# error_notify(e)
print(df.head(10))
return df
The problem is when we have files that have some sort of corruption. The files when opened manually will show an error like the one below.
I thought with my try, except code above this would catch an error like this and alert me with the error_notify(e) function. However, we get a result where the Python script crashes with an error like this: zipfile.BadZipFile: File is not a zip file
During handling of the above exception, another exception occurred.
There is more to the error, but I only copied & pasted this part in some communication with some folks int he office. Impossible to replicate the error on our own - I have no idea how the files get corrupted in this way - except that there are multiple people accessing the files throughout the day.
The way to make the file readable is completely manual - we must open the file, get that error, hit yes, and save the file over the existing one. Then re-launch the script. But since the try, except isn't catching it and alerting us to the failure, we have to run the script manually to see if it works or not.
Two questions - am I doing something incorrect in my try, except command? I am admittedly weak in error catching so my first thought is there is more I can do there to make that work. Secondly, is there a Python way to get past that error in the Excel workbook files?
Here is the error text:
Traceback (most recent call last):
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service
Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 48, in load_files
wb = load_workbook(file)
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 314, in load_workbook
data_only, keep_links)
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 124, in init
self.archive = _validate_archive(fn)
File "C:\ProgramData\Anaconda3\lib\site-packages\openpyxl\reader\excel.py", line 96, in _validate_archive
archive = ZipFile(filename, 'r')
File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 1222, in init
self._RealGetContents()
File "C:\ProgramData\Anaconda3\lib\zipfile.py", line 1289, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 123, in <module>
main()
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 86, in main
df_output = df_clean()
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 68, in df_clean
df = load_files()
File "G:/Replenishment/Reporting/00 - I&A Replenishment/02 - Service Level/Daily Outs Comment Capture/Python/daily_outs_missed_files.py", line 61, in load_files
print("Error: " + e + " When attempting to open file: " + file)
TypeError: can only concatenate str (not "BadZipFile") to str
Your try/except code looks correct. All user defined exceptions in python should be classes based on Exception. See BaseException and
and Exception in python documentation :
"Exception (..) All user-defined exceptions should also be derived from this class" see also the exception class hierarchy tree at the end of the python doc sesction.
If your python script "crashes" it means one of the library procedures throws an exception which is not based on the Exception class, something that "should not" be. You could look at the Traceback and try catching the offending exception type separately, or find what part of the source code and which library is the cause, fix it and submit a PR. Here are two examples of a good and bad way of deriving own exceptions
class MyBadError(BaseException):
"""
my bad exception, do not make yours that way
"""
pass
instead of recommended
class MyGoodError(Exception):
"""
exception based on the Exception
"""
pass
Where and what exactly fails is a bit of mystery still but the problems with your exception from the Traceback is not new, see zipfile.BadZipfile issue in pandas discussion. Note that xlrd used by pandas to read Excel workbooks data is currently a "no-maintainer-ware" declaration about xlrd from the authors and in case of any issues the recommendation is to use openpyxl instead or fix any issues yourself (pandas maintainers are doing pontius pilate on that, but happily use xlrd as a dependency). I suggest you catch the BadZipfile as a special known corruption error separately from all other exceptions, see python error handling tutorial for example code (you probably already have seen it, this is for other readers). If that does not work I can trace it in the source code of your libraries / python modules to the exact offending section and find the culprit, if you reach out directly.

Shelve keeps forgetting the variables it holds

I'm teaching myself how to Python3. I wanted to train my acquired skills and write a command-line backup program. I'm trying to save the default backup and save locations with the Shelve module but it seems that it keeps forgetting the variables I save whenever I close or restart the program.
Here is the main function that works with the shelves:
def WorkShelf(key, mode='get', variable=None):
"""Either stores a variable to the shelf or gets one from it.
Possible modes are 'store' and 'get'"""
config = shelve.open('Config')
if mode.strip() == 'get':
print(config[key])
return config[key]
elif mode.strip() == 'store':
config[key] = variable
print(key,'holds',variable)
else:
print("mode has not been reconginzed. Possible modes:\n\t- 'get'\n\t-'store'")
config.close()
So, whenever I call this function to store variables and call the function just after that, it works perfectly. I tried to access the shelf manually and everything is there.
This is the code used to store the variables:
WorkShelf('BackUpPath','store', bupath)
WorkShelf('Path2BU', 'store', path)
The problem comes when I try to get my variables from the shelf after restarting the script. This code:
config = shelve.open('Config')
path = config['Path2BU']
bupath = config['BackUpPath']
Gives me this error:
Traceback (most recent call last):
File "C:\Python35-32\lib\shelve.py", line 111, in __getitem__
value = self.cache[key]
KeyError: 'Path2BU'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
config['Path2BU']
File "C:\Python35-32\lib\shelve.py", line 113, in __getitem__
f = BytesIO(self.dict[key.encode(self.keyencoding)])
File "C:\Python35-32\lib\dbm\dumb.py", line 141, in __getitem__
pos, siz = self._index[key] # may raise KeyError
KeyError: b'Path2BU
Basically, this is an error I could reproduce by calling ShelveObject['ThisKeyDoesNotExist'].
I am really lost right now. When I try to manually create a shelf, close it and access it again it seems to work (even though I got an error doing that before) now. I've read every post concerning this, I thought about shelf corruption (but it's not likely it happens every time), and I've read my script A to Z around 20 times now.
Thanks for any help (and I hope I asked my question the right way this time)!
EDIT
Okay so this is making me crazy. Isolating WorkShelf() does work perfectly, like so:
import shelve
def WorkShelf(key, mode='get', variable=None):
"""Either stores a variable to the shelf or gets one from it.
Possible modes are 'store' and 'get'"""
config = shelve.open('Config')
if mode.strip() == 'get':
print(config[key])
return config[key]
elif mode.strip() == 'store':
config[key] = variable
print(key,'holds',variable)
else:
print("mode has not been reconginzed. Possible modes:\n\t- 'get'\n\t-'store'")
config.close()
if False:
print('Enter path n1: ')
path1 = input("> ")
WorkShelf('Path1', 'store', path1)
print ('Enter path n2: ')
path2 = input("> ")
WorkShelf('Path2', 'store', path2)
else:
path1, path2 = WorkShelf('Path1'), WorkShelf('Path2')
print (path1, path2)
No problem, perfect.
But when I use the same function in my script, I get this output. It basically tells me it does write the variables to the shelve files ('This key holds this variable' message). I can even call them with the same code I use when restarting. But when calling them after a program reset it's all like 'What are you talking about m8? I never saved these'.
Welcome to this backup manager.
We will walk you around creating your backup and backup preferences
What directory would you like to backup?
Enter path here: W:\Users\Damien\Documents\Code\Code Pyth
Would you like to se all folders and files at that path? (enter YES|NO)
> n
Okay, let's proceed.
Where would you like to create your backup?
Enter path here: N:\
Something already exists there:
19 folders and 254 documents
Would you like to change your location?
> n
Would you like to save this destination (N:\) as your default backup location ? That way you don't have to type it again.
> y
BackUpPath holds N:\
If you're going to be backing the same data up we can save the files location W:\Users\Damien\Documents\Code\Code Pyth so you don't have to type all the paths again.
Would you like me to remember the backup file's location?
> y
Path2BU holds W:\Users\Damien\Documents\Code\Code Pyth
>>>
======== RESTART: W:\Users\Damien\Documents\Code\Code Pyth\Backup.py ========
Welcome to this backup manager.
Traceback (most recent call last):
File "C:\Python35-32\lib\shelve.py", line 111, in __getitem__
value = self.cache[key]
KeyError: 'Path2BU'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "W:\Users\Damien\Documents\Code\Code Pyth\Backup.py", line 198, in <module>
path, bupath = WorkShelf('Path2BU'), WorkShelf('BackUpPath')
File "W:\Users\Damien\Documents\Code\Code Pyth\Backup.py", line 165, in WorkShelf
print(config[key])
File "C:\Python35-32\lib\shelve.py", line 113, in __getitem__
f = BytesIO(self.dict[key.encode(self.keyencoding)])
File "C:\Python35-32\lib\dbm\dumb.py", line 141, in __getitem__
pos, siz = self._index[key] # may raise KeyError
KeyError: b'Path2BU'
Please help, I'm going crazy and might develop a class to store and get variables from text files.
Okay so I just discovered what went wrong.
I had a os.chdir() in my setup code. Whenever Setup was done and I wanted to open my config files it would look in the current directory but the shelves were in the directory os.chdir() pointed to in the setup.
For some reason I ended up with empty shelves in the directory the actual ones were supposed to be. That cost me a day of debugging.
That's all for today folks!

Docopt - Errors, exit undefined - CLI interface for Python programme

I'm sure that the answer for this is out there, but I've read the site info, I've watched the video they made and I've tried to find a really basic tutorial but I can't. I've been messing about with this for most of the day and It's not really making sense to me.
Here's my error:
vco#geoHP:~$ python3 a_blah.py "don't scare the cats" magic
Traceback (most recent call last):
File "a_blah.py", line 20, in <module>
arguments = docopt.docopt(__doc__)
File "/usr/lib/python3/dist-packages/docopt.py", line 579, in docopt
raise DocoptExit()
docopt.DocoptExit: Usage:
a_blah.py <start>... <end>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "a_blah.py", line 33, in <module>
except DocoptExit:
NameError: name 'DocoptExit' is not defined
line 20 - I don't see why that line is creating an error, it worked before and I've seen that exact line in others programmes?
I don't know why the line 570 of docopt is creating an error - I've seen others use DocoptExit(), isn't this something that's just part of Docopt? Do I have to write my own exit function for this? (I've not seen anyone else do that)
here's the code
import docopt
if __name__ == '__main__':
try:
arguments = docopt.docopt(__doc__)
print(arguments['<start>'])
print("that was that")
print(arguments['<end>'])
except docopt.DocoptExit:
print("this hasn't worked")
What I'm trying to make this for is a script that I've written that moves files from one place to another based on their extension.
So the arguments at the command line will be file type, start directory, destination directory, and an option to delete them from the start directory after they've been moved.
I'm trying (and failing) to get docopt working on it's own prior to including it in the other script though.
The exception you want is in docopt's namespace. You never import it into your global namespace, so you can't refer to it simply with it's name. You need to import it separately or refer to it through the module. You also shouldn't use parenthesis after the exception.
import docopt
try:
# stuff
except docopt.DocoptExit:
# other stuff
or
import docopt
from docopt import DocoptExit
try:
# stuff
except DocoptExit:
# other stuff

Python: EOFError using cPickle while running a class instance

This is the code snippet causing the problem:
if str(sys.argv[2]) + '.pickle' in os.listdir(os.curdir): #os.path.isfile(str(sys.argv[2]) + '.pickle'):
path = sys.argv[2] + '.pickle'
#print path
instance = cPickle.load(open(str(path)))
This is the traceback:
Traceback (most recent call last):
File "parent_cls.py", line 92, in <module>
instance = cPickle.load(open(str(path)))
EOFError
If this keeps happening because of file.close() is not performed or some other ridiculous mistake, please let me know if there is a way to access the pickle file using subprocess. Thanks.
UPDATE: Another thing I notice. The filename.pickle to check if its there or not using the if condition actually is creating a filename.pickle although it wasn't there first.
I dont want to create it but to check its existence. is this some other problem?
Open it in binary mode :
open(str(path), 'rb')

Error when using astWCS trying to create WCS object

I'm running python2.5 and trying to use the astLib library to analyse WCS information in astronomical images. I try and get the object instanciated with the following skeleton code:
from astLib import astWCS
w = astWCS.WCS('file.fits') # error here
where file.fits is a string pointing to a valid fits file.
I have tried using the alternate method of passing a pyfits header object and this fails also:
import pyfits
from astLib import astWCS
f = pyfits.open('file.fits')
header = f[0].header
f.close()
w = astWCS.WCS(header, mode='pyfits') # error here also
The error is this:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/astro/phrfbf/build/lib/python2.6/site-packages/astLib/astWCS.py", line 79, in __init__
self.updateFromHeader()
File "/home/astro/phrfbf/build/lib/python2.6/site-packages/astLib/astWCS.py", line 119, in updateFromHeader
self.WCSStructure=wcs.wcsinit(cardstring)
File "/home/astro/phrfbf/build/lib/python2.6/site-packages/PyWCSTools/wcs.py", line 70, in wcsinit
return _wcs.wcsinit(*args)
TypeError: in method 'wcsinit', argument 1 of type 'char *'
When I run in ipython, I get the full error here on the pastebin
I know the astWCS module is a wrapped version of WCStools but i'd prefer to use the Python module as the rest of my code is in Python
Can anyone help with this problem?
Just found out the updated version of this library has fixed the problem, thanks for everyone's help
Oh sorry, I should have seen. Looking at the pastebin in more detail, the only error I can think of is that, for some reason the header has unicode in it. It can't be converted to char *, and you get the error. I tried searching for something in the header, but everything looks okay. Can you do this and post the output in another pastebin?
import pyfits
f = pyfits.open('file.fits')
header = f[0].header
f.close()
for x, i in enumerate(header.iteritems()):
if len(str(i[1])) >= 70:
print x, str(i[1])
cardlist = header.ascardlist()
cardstring = ""
for card in cardlist:
cardstring = cardstring + str(card)
print repr(cardstring)
Or, if you can check the header of your fits file for "funny" characters, getting rid of them should solve the issue.

Categories

Resources