Python Pickling Dictionary EOFError - python

I have several script running on a server which pickle and unpickle various dictionaries. They all use the same basic code for pickling as shown below:
SellerDict=open('/home/hostadl/SellerDictkm','rb')
SellerDictionarykm=pickle.load(SellerDict)
SellerDict.close()
SellerDict=open('/home/hostadl/SellerDictkm','wb')
pickle.dump(SellerDictionarykm,SellerDict)
SellerDict.close()
All the scripts run fine except for one of them. The one that has issues goes to various websites and scrapes data and stores it in dictionary. This code runs all day long pickling and unpickling dictionaries and stops at midnight. A cronjob then starts it again
the next morning. This script can run for weeks without having a problem but about once a month the script dies due to a EOFError when it tries to open a dictionary. The size of the dictionaries are usually about 80 MB. I even tried adding SellerDict.flush() before SellerDict.close() when pickling the data to make sure evening was being flushed.
Any idea's what could be causing this? Python is pretty solid so I don't think it is due to the size of the file. Where the code runs fine for a long time before dying it leads me to believe that maybe something is being saved in the dictionary that is causing this issue but I have no idea.
Also, if you know of a better way to be saving dictionaries other than pickle I am open to options. Like I said earlier, the dictionaries are constantly being opened and closed. Just for clarification, only one program will use the same dictionary so the issue is not being caused by several programs trying to access the same dictionary.
UPDATE:
Here is the traceback that I have from a log file.
Traceback (most recent call last):
File "/home/hostadl/CompileRecentPosts.py", line 782, in <module>
main()
File "/home/hostadl/CompileRecentPosts.py", line 585, in main
SellerDictionarykm=pickle.load(SellerDict)
EOFError

So this actually turned out to be a memory issue. When the computer would run out of RAM and try to unpickle or load the data, the process would fail claiming this EOFError. I increase the RAM on the computer and this never was an issue again.
Thanks for all the comments and help.

Here's what happens when you don't use locking:
import pickle
# define initial dict
orig_dict={'foo':'one'}
# write dict to file
writedict_file=open('./mydict','wb')
pickle.dump(orig_dict,writedict_file)
writedict_file.close()
# read the dict from file
readdict_file=open('./mydict','rb')
mydict=pickle.load(readdict_file)
readdict_file.close()
# now we have new data to save
new_dict={'foo':'one','bar':'two'}
writedict_file=open('./mydict','wb')
#pickle.dump(orig_dict,writedict_file)
#writedict_file.close()
# but...whoops! before we could save the data
# some other reader tried opening the file
# now they are having a problem
readdict_file=open('./mydict','rb')
mydict=pickle.load(readdict_file) # errors out here
readdict_file.close()
Here's the output:
python pickletest.py
Traceback (most recent call last):
File "pickletest.py", line 26, in <module>
mydict=pickle.load(readdict_file) # errors out here
File "/usr/lib/python2.6/pickle.py", line 1370, in load
return Unpickler(file).load()
File "/usr/lib/python2.6/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.6/pickle.py", line 880, in load_eof
raise EOFError
EOFError
Eventually, some read process is going to try to read the pickled file while a write process already has it open to write. You need to make sure that you have some way to tell whether another process already has a file open for writing before you try to read from it.
For a very simple solution, have a look at this thread that discusses using Filelock.

Related

Decoding a Payload using GitHub Decoder Script

Abstract:
I am analysing a pcap file, with live malware (for educational purposes), and using Wireshark - I managed to extract few objects from the HTTP stream and some executables.
During my Analysis, I found instances hinting Fiestka Exploit Kit used.
Having Googled a ton, I came across a GitHub Rep: https://github.com/0x3a/tools/blob/master/fiesta-payload-decrypter.py
What am I trying to achieve?
I am trying to run the python fiesta-payload-decrypter.py against the malicious executable (extracted from the pcap).
What have I done so far?
I've copied the code onto a plain text and saved it as malwaredecoder.py. - This script is saved in the same Folder (/Download/Investigation/) as the malware.exe that I want to run it against.
What's the Problem?
Traceback (most recent call last):
File "malwaredecoder.py", line 51, in <module>
sys.exit(DecryptFiestaPyload(sys.argv[1], sys.argv[2]))
File "malwaredecoder.py", line 27, in DecryptFiestaPyload
fdata = open(inputfile, "rb").read()
IOError: [Errno 2] No such file or directory: '-'
I am running this python script in Kali Linux, and any help would be much appreciated. Thank you.
The script expects two args... What are you passing it?
Looks like it expects the args to be files and it sees a -, (dash), as the input file.
https://github.com/0x3a/tools/blob/master/fiesta-payload-decrypter.py#L44 Here it looks like the first arg is the input file and second is the output file.
Try running it like this:
python malewaredecoder.py /Download/Investigation/fileImInvestigating.pcap /Download/Investigation/out.pcap
All that said, good luck, that script looks pretty old and was last modified in 2015.

Script terminates with PermissionError [Error 13] when reading a large file

Yeah I know this or similar questions have been posted to this forum but so far none of the answers is sufficient to solve my problem:
Here I have the following code:
with open(filename,'r',buffering=2000000) as f:
f.readline() # takes header away
for i, l in enumerate(f): # count the number of lines
print('Counting {}'.format(i),end='\r')
pass
What happens is the file is a 23Gbytes csv file. I get the following error:
File "programs\readbigfile.py", line 33, in <module>
for i, l in enumerate(f): # count the number of lines
PermissionError: [Errno 13] Permission denied
The error always happens at the line number 1374200. I checked the file with a text editor and there is nothing unusual at that line. This happened to me with the same file but a smaller version (a few less Gigabytes). Then suddenly it worked.
The file is not being used by any other process at all.
Any ideas of why this error occurs in the middle of the file?
PD. I am running this program on a computer with an Intel i5-6500 CPU/16Gb memory and a NVIDIA GeForce GTX 750 Ti card.
System is Windows 10. Python 3.7.6 x64/Anaconda
The file is on a local disk, no networking involved.
Whatever it is, I think your code is ok.
My ideas:
do you need this buffering? Did you try to remove it completely?
I see you're running on Windows. I don't know if that's important, but many weird issues happen on Windows
if you're trying to access it on a disk in the network (samba etc), maybe it's not fully synced?
are you sure nothing else tries to access this file in the meantime? excel?
did you try reading this file with csv.reader? I don't think it'd help though, just wondering
you can try/except and when the error is raised check os.stat or os.access if you have permissions
maybe printing is at fault, it sounds like a huge file. Did you try without printing? You might want to add if i % 1000 == 0: print(...)
I found out the error is due to a file writing error, either because a bad disk block or a disk system failure. The point is, the file had a CRC error somewhere in the middle of it, which I corrected just by creating the file again. It is a random error so if you find yourself in the same situation, one of the checks should be the soundness of the file itself.

TypeError while using Openpyxl to read file

I was trying out Openpyxl, and wrote the following code:
from openpyxl import load_workbook, __version__
workbook = load_workbook(filename="Contacts.xlsx")
sheet = workbook.active
That's it, no other code.
I got the following error on running:
(TL;DR, due to line 2, I got
"TypeError: __init__() got an unexpected keyword argument 'extLst'")
"C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\Scripts\python.exe" "C:/Users/taijee/Documents/PythonAll/Python Scripts/newfile.py"
Traceback (most recent call last):
File "C:/Users/taijee/Documents/PythonAll/Python Scripts/newfile.py", line 2, in <module>
workbook = load_workbook(filename="Contacts.xlsx")
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\reader\excel.py", line 315, in load_workbook
reader.read()
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\reader\excel.py", line 279, in read
apply_stylesheet(self.archive, self.wb)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\styles\stylesheet.py", line 192, in apply_stylesheet
stylesheet = Stylesheet.from_tree(node)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\styles\stylesheet.py", line 102, in from_tree
return super(Stylesheet, cls).from_tree(node)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\descriptors\serialisable.py", line 83, in from_tree
obj = desc.from_tree(el)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\descriptors\sequence.py", line 85, in from_tree
return [self.expected_type.from_tree(el) for el in node]
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\descriptors\sequence.py", line 85, in <listcomp>
return [self.expected_type.from_tree(el) for el in node]
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\styles\fills.py", line 64, in from_tree
return PatternFill._from_tree(child)
File "C:\Users\taijee\Documents\PythonAll\Python Scripts\venv\lib\site-packages\openpyxl\styles\fills.py", line 102, in _from_tree
return cls(**attrib)
TypeError: __init__() got an unexpected keyword argument 'extLst'
Process finished with exit code 1
I've searched for similar problems and found one, but it was from 2014 so the solution given was to download version 2.0 of Openpyxl instead of 2.2.
The openpyxl version is 3.0.4, python version 3.8.3, and Contacts.xlsx definitely exists in the same folder.
Edit:
Making a new file had worked, but I recently made another program using openpyxl. It was working fine but it suddenly broke, giving an error very similar to the one above (I didn't record it at the time, but the last line TypeError: __init__() got an unexpected keyword argument 'extLst' definitely matched).
I had made no changes to the program between the successful runs and the unsuccessful ones, so I concluded the file was at fault. I had opened and closed the file once, though I don't remember whether the very next run was the one when it broke. After the error appeared, when I opened the file I couldn't save any changes - it gave some sort of error.
Note: I don't have Office, so I'm using Planmaker of the SoftMaker Office Suite to open the .xlsx files when I need to.
I deleted the file and created a new one with the same name so as to not change the code, but the problems persisted with the new file, including the inability to save changes.
Only when I created a new file with a different name did the code work.
It may not even be an openpyxl problem, but rather a Planmaker problem, though I don't understand why the error would be the one I keep getting.
Neverthless, If someone could explain why this happens or how to fix it, I'll be really grateful.
Meanwhile, I'll see if a file I never use openpyxl on will still give me the same problem, and if it does, whether openpyxl gives the same error with that file when used in a program.
I too was using Planmaker of the SoftMaker Office Suite, got the same error when I try to edit one of my .xlsx with openpyxl. The only solution that I found is by not using the Planmaker and go back to MS Excel.
As much as I don't like MS Excel like the next guy, but it works flawlessly with openpyxl.
Probably not the solution you seek, but it's a solution for me at least.
I got the same error "
TypeError: init() got an unexpected keyword argument 'extLst'
" because I changed and saved a tabular workbook file in xlsx format with Planmaker software.
Thereafter I saved this file with libreoffice calc. Then loading of this xlsx file with openpyxl was working.
This might be a workaround for some users.
Just a note, in case others stumble over this: openpyxl in the latest version (3.0.10) seems to now raise an error on the "unexpected keyword". Before you could using the warnings-module to ignore it. No more.
Since I am working on Linux, going to MS Excel is not an option, so I rewrote my latest Python project to use pylightxl instead of openpyxl. It more hassle, as it has a lot less functionality, but it does not choke on Softwaker's extensions.
– Hendrik

Xbee Micropython EEXIST error when call file open() function

I'm using Xbee3 and I want to append data to file.
I try this script for test but I received EEXIST error if TEST.txt file exists. If this file doesn't exist the file is created for the first run but I get same error when I run this script again.
f = open("TEST.txt", 'a')
for a in range(3):
f.write("#EMPTY LINE#\n")
f.close()
Traceback (most recent call last):
File "main", line 1, in
OSError: [Errno 7017] EEXIST
I formatted xbee by the way.
It's sounds like you're using an 802.15.4, DigiMesh or Zigbee module. The file system in those modules is extremely limited, and doesn't allow for modifying existing files. There should be documentation on the product that lists those limitations (no rename, no modify/append, only one open file at a time, etc.)
XBee/XBee3 Cellular modules have a fuller file system implementation that allows for renaming files and modifying file contents.

Understanding Parsing Error when reading model file into PySD

I am receiving the following error message when I try to read a Vensim model file (.mdl) using Python's PySD package.
My code is:
import pysd
import os
os.chdir('path/to/model_file')
model = pysd.read_vensim('my_model.mdl')
The Error I receive is:
Traceback (most recent call last):
Python Shell, prompt 13, line 1
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pysd/pysd.py", line 53, in read_vensim
py_model_file = translate_vensim(mdl_file)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pysd/vensim2py.py", line 673, in translate_vensim
entry.update(get_equation_components(entry['eqn']))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pysd/vensim2py.py", line 251, in get_equation_components
tree = parser.parse(equation_str)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/parsimonious/grammar.py", line 123, in parse
return self.default_rule.parse(text, pos=pos)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/parsimonious/expressions.py", line 110, in parse
node = self.match(text, pos=pos)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/parsimonious/expressions.py", line 127, in match
raise error
parsimonious.exceptions.ParseError: Rule 'subscriptlist' didn't match at '' (line 1, column 21).
I have searched for this particular error and I cannot find much information on the failed matching rule for 'subscriptlist'.
I appreciate any insight. Thank you.
Good news is that there is nothing wrong with your code. =) (Although you can also just include the path to the file in the .read_vensim call, if you don't want to make the dir change).
That being the case, there are a few possibilities that would cause this issue. One is if the model file is created with a sufficiently old version of Vensim, the syntax may be different from what the current parser is designed for. One way to get around this is to update Vensim and reload the model file there - Vensim will update to the current syntax.
If you are already using a recent version of Vensim (the parser was developed using syntax of Vensim 6.3E) then the parsing error may be due to a feature that isn't yet included. There are still some outstanding issues with subscripts, which you can read about here and here).
If you aren't using subscripts, you may have found a bug in the parser. If so, best course is to create a report in the github issue tracker for the project. The stack trace you posted says that the error is happening in the first line of the file, and that the error has to do with how the right hand side of the equation is being parsed. You might include the first few lines in your bug report to help me recreate the issue. I'll add a case to our growing test suite and then we can make sure it isn't a problem going forwards.

Categories

Resources