This question already has answers here:
How to create an encrypted ZIP file?
(8 answers)
Closed 11 months ago.
I have an encrypted ZIP file and for some reason, any password I feed it doesn't seem to matter as it can add files to the archive regardless. I checked for any ignored exceptions or anything, but nothing seems to be fairly obvious.
I posted the minimalist code below:
import zipfile
z = zipfile.ZipFile('test.zip', 'a') #Set zipfile object
zipPass = str(input("Please enter the zip password: "))
zipPass = bytes(zipPass, encoding='utf-8')
z.setpassword(zipPass) #Set password
z.write("test.txt")
I am not sure what I am missing here, but I was looking around for anything in zipfile that can handle encrypted zipfiles and add files into them using the password, as the only thing I have is the ``z.setpassword()` function that seems to not work here.
TL;DR: z.write() doesn't throw an exception and neither does z.setpassword() or anything zipfile related when fed the incorrect password, and willingly adds files no matter what. I was expecting to get BadPasswordForFile.
Is there any way to do this?
What I found in the documentation for zipfile is that the library supports decryption only with a password. It cannot encrypt. So you won't be able to add files with a password.
It supports decryption of encrypted files in ZIP archives, but it currently cannot create an encrypted file.
https://docs.python.org/3/library/zipfile.html
EDIT: Further, looking into python bugs Issue 34546: Add encryption support to zipfile it appears that in order to not perpetuate a weak password scheme that is used in zip, they opted to not include it.
Something that you could do is utilize subprocess to add files with a password.
Further, if you wanted to "validate" the entered password first, you could do something like this but you'd have to know the contents of the file because decrypt will happily decrypt any file with any password, the plaintext result will just be not correct.
Issues you'll have to solved:
Comparing file contents to validate password
Handling when a file exists already in the zip file
handling when the zipfile already exists AND when it doesn't.
import subprocess
import zipfile
def zip_file(zipFilename, filename):
zipPass = str(input("Please enter the zip password: "))
zipPass = bytes(zipPass, encoding='utf-8')
#If there is a file that we know the plain-text (or original binary)
#TODO: handle fipFilename not existing.
validPass=False
with zipfile.ZipFile(zipFilename, 'r') as zFile:
zFile.setpassword(zipPass)
with zFile.open('known.txt') as knownFile:
#TODO: compare contents of known.txt with actual
validPass=True
#Next to add file with password cannot use zipfile because password not supported
# Note this is a linux only solution, os dependency will need to be checked
#if compare was ok, then valid password?
if not validPass:
print('Invalid Password')
else:
#TODO: handle zipfile not-exist and existing may have to pass
# different flags.
#TODO: handle filename existing in zipFilename
#WARNING the linux manual page for 'zip' states -P is UNSECURE.
res = subprocess.run(['zip', '-e', '-P', zipPass, zipFilename, filename])
#TODO: Check res for success or failure.
EDIT:
I looked into fixing the whole "exposed password" issue with -P. Unfortunately, it is non trivial. You cannot simply write zipPass into the stdin of the subprocess.run with input=. I think something like pexpect might be a solution for this, but I haven't spent the time to make that work. See here for example of how to use pexpect to accomplish this: Use subprocess to send a password_
After all of the lovely replies, I did find a workaround for this just in case someone needs the answer!
I did first retry the z.testzip() and it does actually catch the bad passwords, but after seeing that it wasn't reliable (apparently hash collisions that allow for bad passwords to somehow match a small hash), I decided to use the password, extract the first file it sees in the archive, and then extract it. If it works, remove the extracted file, and if it doesn't, no harm done.
Code works as below:
try:
z = zipfile.ZipFile(fileName, 'a') #Set zipfile object
zipPass = bytes(zipPass, encoding='utf-8') #Str to Bytes
z.setpassword(zipPass) #Set password
filesInArray = z.namelist() #Get all files
testfile = filesInArray[0] #First archive in list
z.extract(testfile, pwd=zipPass) #Extract first file
os.remove(testfile) #remove file if successfully extracted
except Exception as e:
print("Exception occurred: ",repr(e))
return None #Return to mainGUI - this exits the function without further processing
Thank you guys for the comments and answers!
Related
I have around 2000 JSON files which I'm trying to run through a Python program. A problem occurs when a JSON file is not in the correct format. (Error: ValueError: No JSON object could be decoded) In turn, I can't read it into my program.
I am currently doing something like the below:
for files in folder:
with open(files) as f:
data = json.load(f); # It causes an error at this part
I know there's offline methods to validating and formatting JSON files but is there a programmatic way to check and format these files? If not, is there a free/cheap alternative to fixing all of these files offline i.e. I just run the program on the folder containing all the JSON files and it formats them as required?
SOLVED using #reece's comment:
invalid_json_files = []
read_json_files = []
def parse():
for files in os.listdir(os.getcwd()):
with open(files) as json_file:
try:
simplejson.load(json_file)
read_json_files.append(files)
except ValueError, e:
print ("JSON object issue: %s") % e
invalid_json_files.append(files)
print invalid_json_files, len(read_json_files)
Turns out that I was saving a file which is not in JSON format in my working directory which was the same place I was reading data from. Thanks for the helpful suggestions.
The built-in JSON module can be used as a validator:
import json
def parse(text):
try:
return json.loads(text)
except ValueError as e:
print('invalid json: %s' % e)
return None # or: raise
You can make it work with files by using:
with open(filename) as f:
return json.load(f)
instead of json.loads and you can include the filename as well in the error message.
On Python 3.3.5, for {test: "foo"}, I get:
invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
and on 2.7.6:
invalid json: Expecting property name: line 1 column 2 (char 1)
This is because the correct json is {"test": "foo"}.
When handling the invalid files, it is best to not process them any further. You can build a skipped.txt file listing the files with the error, so they can be checked and fixed by hand.
If possible, you should check the site/program that generated the invalid json files, fix that and then re-generate the json file. Otherwise, you are going to keep having new files that are invalid JSON.
Failing that, you will need to write a custom json parser that fixes common errors. With that, you should be putting the original under source control (or archived), so you can see and check the differences that the automated tool fixes (as a sanity check). Ambiguous cases should be fixed by hand.
Yes, there are ways to validate that a JSON file is valid. One way is to use a JSON parsing library that will throw exceptions if the input you provide is not well-formatted.
try:
load_json_file(filename)
except InvalidDataException: # or something
# oops guess it's not valid
Of course, if you want to fix it, you naturally cannot use a JSON loader since, well, it's not valid JSON in the first place. Unless the library you're using will automatically fix things for you, in which case you probably wouldn't even have this question.
One way is to load the file manually and tokenize it and attempt to detect errors and try to fix them as you go, but I'm sure there are cases where the error is just not possible to fix automatically and would be better off throwing an error and asking the user to fix their files.
I have not written a JSON fixer myself so I can't provide any details on how you might go about actually fixing errors.
However I am not sure whether it would be a good idea to fix all errors, since then you'd have assume your fixes are what the user actually wants. If it's a missing comma or they have an extra trailing comma, then that might be OK, but there may be cases where it is ambiguous what the user wants.
Here is a full python3 example for the next novice python programmer that stumbles upon this answer. I was exporting 16000 records as json files. I had to restart the process several times so I needed to verify that all of the json files were indeed valid before I started importing into a new system.
I am no python programmer so when I tried the answers above as written, nothing happened. Seems like a few lines of code were missing. The example below handles files in the current folder or a specific folder.
verify.py
import json
import os
import sys
from os.path import isfile,join
# check if a folder name was specified
if len(sys.argv) > 1:
folder = sys.argv[1]
else:
folder = os.getcwd()
# array to hold invalid and valid files
invalid_json_files = []
read_json_files = []
def parse():
# loop through the folder
for files in os.listdir(folder):
# check if the combined path and filename is a file
if isfile(join(folder,files)):
# open the file
with open(join(folder,files)) as json_file:
# try reading the json file using the json interpreter
try:
json.load(json_file)
read_json_files.append(files)
except ValueError as e:
# if the file is not valid, print the error
# and add the file to the list of invalid files
print("JSON object issue: %s" % e)
invalid_json_files.append(files)
print(invalid_json_files)
print(len(read_json_files))
parse()
Example:
python3 verify.py
or
python3 verify.py somefolder
tested with python 3.7.3
It was not clear to me how to provide path to the file folder, so I'd like to provide answer with this option.
path = r'C:\Users\altz7\Desktop\your_folder_name' # use your path
all_files = glob.glob(path + "/*.json")
data_list = []
invalid_json_files = []
for filename in all_files:
try:
df = pd.read_json(filename)
data_list.append(df)
except ValueError:
invalid_json_files.append(filename)
print("Files in correct format: {}".format(len(data_list)))
print("Not readable files: {}".format(len(invalid_json_files)))
#df = pd.concat(data_list, axis=0, ignore_index=True) #will create pandas dataframe
from readable files, if you like
I am working on an app that will be something like a text editor, using kivy.
I included a FileChooser to choose a file to edit, I included a try except to catch the problems of files readability like videos and executable files.The problem is that this doesn't work and it raises an error of decode so the try except didn't work as expected.
I am thinking now of something like precising the extentions of files to open.But I would like to request from you a list of extensions that python can read.
Please Help
I think no need to share my code because there is nothing specific in it and I think my request is clear:
A list of extensions that python can read.
Ok so here is my check function that is called when a user click a button after choosing a file(This is in kivy by the way):
def check(self):
app= App.get_running_app()
with open(os.path.join(sys.path[0], "data.txt"), "r") as f:
dataa= f.read()
data= dataa.split('\n')
if self.namee.text in dataa:
for i in data:
ii = i.split(':')
if ii[1] == self.namee.text and ii[3]== self.password.text:
self.btn.text="Identified successfully!!"
time.sleep(0.5)
break
else:
self.btn.text="Not identified successfully!!"
elif self.password.text.strip() =="":
try:
with codecs.open(self.namee.text, 'r', encoding="utf-8", errors='ignore') as file:
try:
test = file.read()
print(test)
self.btn.text=="Identified successfully!!"
except:
self.btn.text ="Oops! Couldn't access the content of this file, try again later or verify your typing."
self.btn.text=="Identified successfully!!"
except:
self.btn.text ="Oops! Couldn't open the file, try again later or verify your typing."
else:
self.btn.text = "Cannot find any secret file with this name!"
if self.btn.text=="Identified successfully!!":
app.root.current = "control"
self.btn.text = "open"
self.namee.text = ""
self.password.text = ""
File extensions don't actually really work that way, there is no simple list that Python can read, and nor does the extension actually mean anything - for instance, you could give a video file the extension .py if you wanted.
You initial idea of catching errors is the right way to do things, and you'll want to have that functionality even if you also limit the available file extensions, since a file with a valid extension may contain invalid data. There's no reason it shouldn't work so probably you had some bug you can resolve.
Also, your request isn't clear, since "how to limit the type of files to open in kivy" != "what extensions can python read". It sounds like you already know the answer to the former question.
To my understanding python can read any file. Extensions don't matter. What matters is the way you handle the file.
using python-gnupg v0.3.5 on windows 7 w/Python 2.7 and GPG4Win v2.2.0
test_gnupg.py results in 2 failures:
Test that searching for keys works ... FAIL
Doctest: gnupg.GPG.recv_keys ... FAIL
2 keyrings exist in each of these locations(secring & pubring in each):
under the GPGHome directory (C:\Program Files (x86)\GNU\GnuPG)
under the user profile(C:\Users\\AppData\Roaming\gnupg)
If I create GPG instance and set the keyring file path to the user profile pubring.pgp I get a result from GPG.list_keys(). If I let it use the gpghome directory pubring.pgp I get no results from list_keys() because that keyring is empty.
So given I specify the user profile keyring and I have a key to use this is what happens:
>>>data = '1234 abcd 56678'
>>>fingerprint = u'<fingerprint>'
>>>enc = gpg.encrypt(data,fingerprint)
>>>enc.data
''
encrypt_file() gives the same results, nothing happens, no errors. I'm not particularly savvy in any of this but it seems like if I have data and public key this should be dead simple. I'm having a horrendous time trying to determine what is wrong given I see no log files anywhere and I have no errors when attempting this.
How can I determine what is going wrong here?
I've read pretty much everything I can find here on StackOverflow, http://pythonhosted.org/python-gnupg/#getting-started and the google group for python-gnupg.
Also why do I have 2 separate sets of keyrings in the first place?
edit:
clarified there are 2 separate sets of pubring and secring
edit 2:
answer below was instrumental in leading to the actual problem.
the gnupg.GPG() constructor is setting gpg command line options that include 'no-tty', calling gnupg.GPG(options='') resolves the issue and successfully encrypts both data and files.
Okay, I finally got around to looking at this and got basic encryption to work from the command line. Here's an example that will work to encrypt data entered from the command line:
import gnupg
gpg_home = "/path/to/gnupg/home"
gpg = gnupg.GPG(gnupghome=gpg_home)
data = raw_input("Enter data to encrypt: ")
rkey = raw_input("Enter recipient's key ID: ")
encrypted_ascii_data = gpg.encrypt(data, rkey)
print(encrypted_ascii_data)
Change the gpg_home to whichever of those two GnuPG paths you want to use. The first one looks like the default installation location and the second one appears to be specific to your user account. The script will prompt for some text to encrypt and a key ID to encrypt to, then print the ASCII armoured encrypted data to stdout.
EDIT: I'm not certain, but I suspect the reason your code failed was either due to using the whole fingerprint for the recipient key ID, which is unnecessary (I used the 0xLONG format, an example of which is on my profile), or you called the wrong GPG home directory.
EDIT 2: This works to encrypt files and writes the output to a file in the same directory, it will work as is on *nix systems. You will need to change the gpg_home as with the above example:
import gnupg
gpg_home = "~/.gnupg"
gpg = gnupg.GPG(gnupghome=gpg_home)
data = raw_input("Enter full path of file to encrypt: ")
rkeys = raw_input("Enter key IDs separated by spaces: ")
savefile = data+".asc"
afile = open(data, "rb")
encrypted_ascii_data = gpg.encrypt_file(afile, rkeys.split(), always_trust=True, output=savefile)
afile.close()
My work here is done! :)
BTW, both these examples use Python 2.7, for Python 3 you'll need to modify the raw_input() lines to use input() instead.
This might seem like a strange question, but I have this idea that I want to make a python script that requires a pass login. The user should be able to type in the desired pass in the beginning of the program then the code will write that into the actual source code (so no extra files are generated).
I know that this is possible by doing something like this
with open('test.py','a') as f:
f.write('\nprint "hello world"')
Running this script 3 times will generate the following code
with open('test.py','a') as f:
f.write('\nprint "hello world"')
print "hello world"
print "hello world"
print "hello world"
But I would like to make my python script work on every windows machine that doesn't have python installed. So i would have to use PyInstaller - but then how would I be able to write to the source code?
(Optional solution to my question would be an answer how to securely save then password without creating too many obscure files that frightens the end-user)
AFAIK there is no way to modify your code after it is an executable, but you can simply store the password as hash in one file (Method A) or better use a special module for it (Method B). You should never store passwords anywhere in plain text (even not in your executable)
Method A (only use this if you can't use other libraries)
The code could look like this:
# To create the password file (e.g. change password)
import hashlib
with open('password', 'wb') as f:
p = 'new password'
f.write(hashlib.sha512(p.encode('utf-8')).digest()) # hash and save password
# To check the password
import hashlib
with open('password', 'rb') as f:
p_in = # your method to read get the password from the user
p = hashlib.sha512(p_in.encode('utf-8')).digest() # create hash
if p == f.read(): # verify hash (password)
# right password
else:
# wrong password
The content of the file is the binary form of the hash.
One important thing to note is, that you should use a secure hash function (look into the article linked above) or better use Method B.
Method B (you should use this)
Here is a way more secure and even simpler version (as pointed out by user9876) with the usage of the library passlib which is for such things.
This is an example copied from the passlib documentation:
# import the context under an app-specific name (so it can easily be replaced later)
from passlib.apps import custom_app_context as pwd_context
# encrypting a password...
hash = pwd_context.encrypt("somepass")
# verifying a password...
ok = pwd_context.verify("somepass", hash)
As you can see the hashing and verification is very simple and you can configure various parameters if you want.
There are a many ways to store the hash, which all have pros and cons so you have to carefully think about them.
A simple File.
You could use the same file to store other settings of you program
If someone installs your program into C:\Program Files\ your program would probably not have the rights to store a file there (but you can use some standard directory like %APPDATA%)
You could hide the file (but if someone copies the program there is a high chance, that it will be lost)
The Windows registry. You can use the standard python winreg module.
Hidden from the user
No extra files
Only on windows
Not portable (if you copy the program to another computer the password will be lost)
Append it to the executable. This is an possibility, but it wouldn't work in your case, because you can't modify a running executable. That means you would need another program to change your main program and that would be another file. So it is the same number of files as if you use the first option, but more work.
Another think to note is, that you could have a master password or fallback password if someone (accidentally) deletes your saved password. But you should think about this, because someone who knows the master password can delete the old password and get into your program.
As you already noticed, storing data in code has more problems than it solves. The way to store "hidden" configuration would be to use _winreg (or winreg in py3) under Windows, and ConfigParser for a ~/.config/myapp.ini file under Linux and other POSIX systems. But then, most people use an .INI file in %APPDATA% under Windows too, that's hidden enough.
If you write a wrapper class that abstracts away the differences, your application code can use this uniformly as a small key/value store. More or less ready-to-use solutions are in this recipe and in kilnconfig.
Then when it comes to passwords, use py-bcrypt to securely persist them.
NEVER NEVER NEVER store passwords!!! It is just insecure!
Use the following approach instead:
make a file "passwords.pwd" (windows will not recognize the file type - good for dummy useres)
Don't store the pssword but the hashing function of the password (you can use e.g. passlib or do your own approach):
import hashlib
password = "12345" #take user input here
hashed_password = hashlib.sha512(password).hexdigest()
print hashed_password
Whenever you have to verify a password, just do the above calculation and compare the result to the strored hash value.
I'm trying to automate downloading of some text files from a z/os PDS, using Python and ftplib.
Since the host files are EBCDIC, I can't simply use FTP.retrbinary().
FTP.retrlines(), when used with open(file,w).writelines as its callback, doesn't, of course, provide EOLs.
So, for starters, I've come up with this piece of code which "looks OK to me", but as I'm a relative Python noob, can anyone suggest a better approach? Obviously, to keep this question simple, this isn't the final, bells-and-whistles thing.
Many thanks.
#!python.exe
from ftplib import FTP
class xfile (file):
def writelineswitheol(self, sequence):
for s in sequence:
self.write(s+"\r\n")
sess = FTP("zos.server.to.be", "myid", "mypassword")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
sess.cwd("'FOO.BAR.PDS'")
a = sess.nlst("RTB*")
for i in a:
sess.retrlines("RETR "+i, xfile(i, 'w').writelineswitheol)
sess.quit()
Update: Python 3.0, platform is MingW under Windows XP.
z/os PDSs have a fixed record structure, rather than relying on line endings as record separators. However, the z/os FTP server, when transmitting in text mode, provides the record endings, which retrlines() strips off.
Closing update:
Here's my revised solution, which will be the basis for ongoing development (removing built-in passwords, for example):
import ftplib
import os
from sys import exc_info
sess = ftplib.FTP("undisclosed.server.com", "userid", "password")
sess.sendcmd("site sbd=(IBM-1047,ISO8859-1)")
for dir in ["ASM", "ASML", "ASMM", "C", "CPP", "DLLA", "DLLC", "DLMC", "GEN", "HDR", "MAC"]:
sess.cwd("'ZLTALM.PREP.%s'" % dir)
try:
filelist = sess.nlst()
except ftplib.error_perm as x:
if (x.args[0][:3] != '550'):
raise
else:
try:
os.mkdir(dir)
except:
continue
for hostfile in filelist:
lines = []
sess.retrlines("RETR "+hostfile, lines.append)
pcfile = open("%s/%s"% (dir,hostfile), 'w')
for line in lines:
pcfile.write(line+"\n")
pcfile.close()
print ("Done: " + dir)
sess.quit()
My thanks to both John and Vinay
Just came across this question as I was trying to figure out how to recursively download datasets from z/OS. I've been using a simple python script for years now to download ebcdic files from the mainframe. It effectively just does this:
def writeline(line):
file.write(line + "\n")
file = open(filename, "w")
ftp.retrlines("retr " + filename, writeline)
You should be able to download the file as a binary (using retrbinary) and use the codecs module to convert from EBCDIC to whatever output encoding you want. You should know the specific EBCDIC code page being used on the z/OS system (e.g. cp500). If the files are small, you could even do something like (for a conversion to UTF-8):
file = open(ebcdic_filename, "rb")
data = file.read()
converted = data.decode("cp500").encode("utf8")
file = open(utf8_filename, "wb")
file.write(converted)
file.close()
Update: If you need to use retrlines to get the lines and your lines are coming back in the correct encoding, your approach will not work, because the callback is called once for each line. So in the callback, sequence will be the line, and your for loop will write individual characters in the line to the output, each on its own line. So you probably want to do self.write(sequence + "\r\n") rather than the for loop. It still doesn' feel especially right to subclass file just to add this utility method, though - it probably needs to be in a different class in your bells-and-whistles version.
Your writelineswitheol method appends '\r\n' instead of '\n' and then writes the result to a file opened in text mode. The effect, no matter what platform you are running on, will be an unwanted '\r'. Just append '\n' and you will get the appropriate line ending.
Proper error handling should not be relegated to a "bells and whistles" version. You should set up your callback so that your file open() is in a try/except and retains a reference to the output file handle, your write call is in a try/except, and you have a callback_obj.close() method which you use when retrlines() returns to explicitly file_handle.close() (in a try/except) -- that way you get explict error handling e.g. messages "can't (open|write to|close) file X because Y" AND you save having to think about when your files are going to be implicitly closed and whether you risk running out of file handles.
Python 3.x ftplib.FTP.retrlines() should give you str objects which are in effect Unicode strings, and you will need to encode them before you write them -- unless the default encoding is latin1 which would be rather unusual for a Windows box. You should have test files with (1) all possible 256 bytes (2) all bytes that are valid in the expected EBCDIC codepage.
[a few "sanitation" remarks]
You should consider upgrading your Python from 3.0 (a "proof of concept" release) to 3.1.
To facilitate better understanding of your code, use "i" as an identifier only as a sequence index and only if you irredeemably acquired the habit from FORTRAN 3 or more decades ago :-)
Two of the problems discovered so far (appending line terminator to each character, wrong line terminator) would have shown up the first time you tested it.
Use retrlines of ftplib to download file from z/os, each line has no '\n'.
It's different from windows ftp command 'get xxx'.
We can rewrite the function 'retrlines' to 'retrlines_zos' in ftplib.py.
Just copy the whole code of retrlines, and chane the 'callback' line to:
...
callback(line + "\n")
...
I tested and it worked.
you want a lambda function and a callback. Like so:
def writeLineCallback(line, file):
file.write(line + "\n")
ftpcommand = "RETR {}{}{}".format("'",zOsFile,"'")
filename = "newfilename"
with open( filename, 'w' ) as file :
callback_lambda = lambda x: writeLineCallback(x,file)
ftp.retrlines(ftpcommand, callback_lambda)
This will download file 'zOsFile' and write it to 'newfilename'