Formatting Errors in Python - python

I've never used Python and have copied some script (with permission) from someone online, so I'm not sure why the code is dropping. I'm hoping someone can understand it and put it right for me!
from os import walk
from os.path import join
#First some options here.
!RootDir = "C:\\Users\\***\\Documents\\GoGames"
!OutputFile = "C:\\Users\\***\\Documents\\GoGames\\protable.csv"
Properties = !!['pb', 'pw', 'br', 'wr', 'dt', 'ev', 're']
print """
SGF Database Maker
==================
Use this program to create a CSV file with sgf info.
"""
def getInfo(filename):
"""Read out file info here and return a dictionary with all the
properties needed."""
result = !![]
file = open(filename, 'r')
data = file.read(1024) read at most 1kb since we assume all relevant info is in the beginning
file.close()
for prop in Properties:
try:
i = data.lower().index(prop)
except !ValueError:
result.append((prop, ''))
continue
try:
value = data![data.index('![', i)+1 : data.index(']', i)]
except !ValueError:
value = ''
result.append((prop, value))
return dict(result)
!ProgressCounter = 0
file = open(!OutputFile, "w")
file.write('^Filename^;^PB^;^BR^;^PW^;^WR^;^RE^;^EV^;^DT^\n')
for root, dirs, files in walk(!RootDir):
for name in files:
if name![-3:].lower() != "sgf":
continue
info = getInfo(join(root, name))
file.write('^'+join(root, name)+'^;^'+info!['pb']+'^;^'+info!['br']+'^;^'+info!['pw']+'^;^'+info!['wr']+'^;^'+info!['re']+'^;^'+info!['ev']+'^;^'+info!['dt']+'^\n')
!ProgressCounter += 1
if (!ProgressCounter) % 100 == 0:
print str(!ProgressCounter) + " games processed."
file.close()
print "A total of " + str(!ProgressCounter) + " have been processed."
Using Netbeans IDE I get the following error:
!RootDir = "C:\\Users\\***\\Documents\\GoGames"
^
SyntaxError: mismatched input '' expecting EOF
I have previously been able to step through the code as far as file.close(), where I go an error "does not match outer indentation level".
Anyone able to put the syntax of this code right for me?

Remove the exclamation marks in front of variable names, list declarations (!![]) and in except clauses (except !ValueError), this is not valid Python syntax.

Related

How do I perform error handling with two files?

So , I am having two files , so to checks its validity I am performing try and except two times . But I don't thinks this is a good method, can you suggest a better way?
Here is my code:
def form_density_dictionary(self,word_file,fp_exclude):
self.freq_dictionary={}
try:
with open(fp_exclude,'r')as fp2:
words_excluded=fp2.read().split() #words to be excluded stored in a list
print("**Read file successfully :" + fp_exclude + "**")
words_excluded=[words.lower() for words in words_excluded] # converted to lowercase
except IOError:
print("**Could not read file:", fp_exclude, " :Please check file name**")
sys.exit()
try:
with open(word_file,'r') as file:
print("**Read file successfully :" + word_file + "**")
words_list=file.read()
if not words_list:
print("**No data in file:",word_file +":**")
sys.exit()
words_list=words_list.split()
words_list=[words.lower() for words in words_list] # lowercasing entire list
unique_words=list((set(words_list)-set(words_excluded)))
self.freq_dictionary= {word:("%6.2f"%(float((words_list.count(word))/len(words_list))*100)) for word in unique_words}
#print((len(self.freq_dictionary)))
except IOError:
print("**Could not read file:", word_file, " :Please check file name**")
sys.exit()
Any other suggestion is also welcomed to make it more pythonic.
The first thing that jumps out is the lack of consistency and readability: in some lines you indent with 4 spaces, on others you only use two; in some places you put a space after a comma, in others you don't, in most places you don't have spaces around the assignment operator (=)...
Be consistent and make your code readable. The most commonly used formatting is to use four spaces for indenting and to always have a space after a comma but even more important than that is to be consistent, meaning that whatever you choose, stick with it throughout your code. It makes it much easier to read for everyone, including yourself.
Here are a few other things I think you could improve:
Have a single exception handling block instead of two.
You can also open both files in a single line.
Even better, combine both previous suggestions and have a separate method to read data from the files, thus eliminating code repetition and making the main method easier to read.
For string formatting it's preferred to use .format() instead of %. Check this out: https://pyformat.info/
Overall try to avoid repetition in your code. If there's something you're doing more than once, extract it to a separate function or method and use that instead.
Here's your code quickly modified to how I'd probably write it, and taking these things into account:
import sys
class AtifImam:
def __init__(self):
self.freq_dictionary = {}
def form_density_dictionary(self, word_file, exclude_file):
words_excluded = self.read_words_list(exclude_file)
words_excluded = self.lowercase(words_excluded)
words_list = self.read_words_list(word_file)
if len(words_list) == 0:
print("** No data in file: {} **".format(word_file))
sys.exit()
words_list = self.lowercase(words_list)
unique_words = list((set(words_list) - set(words_excluded)))
self.freq_dictionary = {
word: ("{:6.2f}".format(
float((words_list.count(word)) / len(words_list)) * 100))
for word in unique_words
}
#staticmethod
def read_words_list(file_name):
try:
with open(file_name, 'r') as file:
data = file.read()
print("** Read file successfully: {} **".format(file_name))
return data.split()
except IOError as e:
print("** Could not read file: {0.filename} **".format(e))
sys.exit()
#staticmethod
def lowercase(word_list):
return [word.lower() for word in word_list]
Exceptions thrown that involve a file system path have a filename attribute that can be used instead of explicit attributes word_file and fp_exclude as you do.
This means you can wrap these IO operations in the same try-except and use the exception_instance.filename which will indicate in which file the operation couldn't be performed.
For example:
try:
with open('unknown_file1.py') as f1, open('known_file.py') as f2:
f1.read()
f2.read()
except IOError as e:
print("No such file: {0.filename}".format(e))
Eventually prints out:
No such file: unknown_file1.py
While the opposite:
try:
with open('known_file.py') as f1, open('unknown_file2.py') as f2:
f1.read()
f2.read()
except IOError as e:
print("No such file: {0.filename}".format(e))
Prints out:
No such file: unknown_file2.py
To be more 'pythonic' you could use something what is callec Counter, from collections library.
from collections import Counter
def form_density_dictionary(self, word_file, fp_exclude):
success_msg = '*Read file succesfully : {filename}'
fail_msg = '**Could not read file: {filename}: Please check filename'
empty_file_msg = '*No data in file :{filename}:**'
exclude_read = self._file_open(fp_exclude, success_msg, fail_msg, '')
exclude = Counter([word.lower() for word in exclude_read.split()])
word_file_read = self._file_open(word_file, success_msg, fail_msg, empty_file_msg)
words = Counter([word.lower() for word in word_file_read.split()])
unique_words = words - excluded
self.freq_dictionary = {word: '{.2f}'.format(count / len(unique_words))
for word, count in unique_words.items()}
Also it would be better if you would just create the open_file method, like:
def _open_file(self, filename, success_msg, fails_msg, empty_file_msg):
try:
with open(filename, 'r') as file:
if success_msg:
print(success_msg.format(filename= filename))
data = file.read()
if empty_file_msg:
print(empty_file_msg.format(filename= filename))
return data
except IOError:
if fail_msg:
print(fail_msg.format(filename= filename))
sys.exit()

Using Subprocess module to capture file names?

I'm trying to read in a list of account numbers, then have my program do a search in the appropriate directory for each account number. I want to capture the information from this search, to then split out the file name, date, and time as the output from my program. Currently I'm receiving this error: TypeError: bufsize must be an integer
Here is my code:
def app_files(level):
piv_list_file = raw_input(r"Please enter the full path of the file containing the Pivot ID's you would like to check: ")
piv_id_list = []
proc_out_list = []
input_dir = ''
try:
with open(piv_list_file, 'rbU') as infile:
for line in infile:
line = line.rstrip('\r\n')
piv_id_list.append(line)
except IOError as e:
print 'Unable to open the account number file: %s' % e.strerror
if level == 'p':
input_dir = '[redacted]'
else:
input_dir = '[redacted]'
subprocess.call('cd', input_dir)
for i, e in enumerate(piv_id_list):
proc_out = subprocess.check_output('ls', '-lh', '*CSV*APP*{0}.zip'.format(e))
proc_out_list.append(proc_out)
print(proc_out)
Your subprocess.check_output() function call is wrong. You should provide the complete command as a list (as the first argument). Example -
subprocess.check_output(['ls', '-lh', '*CSV*APP*{0}.zip'.format(e)])
Similar issue with subprocess.call in your code .

Use of 1 and 0 because python3 doesn't accept true and false? [duplicate]

This question already has answers here:
How do I use a Boolean in Python?
(7 answers)
Closed 8 years ago.
I wrote some code for a python file scanner, but the only problem is my custom error codes won't run! I have to use 1 and 0 for the true and false because python3 doesn't accept true and false! Here's the code.
#File Scanner 1.2
scanTXTlog = input("Please type what you are looking for")
file = input("Please enter the .txt name")
#Gives new error messages
if UnicodeDecodeError == 1:
print("Unicode error.....error resolved")
if FileNotFoundError == 1:
print(file, 'could not be found. Please make sure you spelled it correctly and please do not add the .txt extension.')
#Begins searching
txtFile = open(file + ".txt", "r")
lineList = []
i = 0
for line in txtFile:
lineList.append(line.rstrip("\n"))
if scanTXTlog in lineList[i]:
print(lineList[i -2 ])
i += 1
****UPDATE BECAUSE I'M NEW TO STACK AND HAVE NO IDEA HOW TO REPLY****
I tried some new stuff and it works...but only on certain files. It can scan lots of them, but sometimes it just hits one and gives the IO error after the program is done. Here's the new code
def fileRead(encoding= 'utf-8'):
scanTXTlog = input("Please type what you are looking for")
file = input("Please enter the file name. Capitalization does not matter. Please add extension such as .py or.txt etc, etc")
#Begins searching
txtFile = open(file, "r", encoding = 'utf-8')
lineList = []
i = 0
for line in txtFile:
lineList.append(line.rstrip("\n"))
if scanTXTlog in lineList[i]:
print(lineList[i])
i += 1
class UnicodeDecodeErrorHandler(encoding = 'utf-8'):
def UnicodeDecodeErrorDefault(issubclass):
print('Done')
..... What?
try:
PossiblyRaiseAnException()
except SomeException:
HandleException()
except SomeOtherException as e:
HandleOtherException(e)
At first, Python 3 does support True and False!
To deal with Errors you have to build everything that could raise an error into try-except-clauses:
try:
txtFile = open(file + ".txt", "r")
except OSError:
print(file, 'could not be found. Please make sure you spelled it correctly and please do not add the .txt extension.')
Visit the docs for more information about Errors and Exceptions.

Reading a file and adding a specific match to a dictionary in python

Id like to read a file for a specific match in the following style "word = word", specifically Im looking to find files with usernames and passwords in them. These files would be scripts created by admins using bad practices with clear credentials being used in logonscripts etc.
The code I have created so far does the job but its very messy and prints an entire line if the match is found (I cant help but think there is a more elegant way to do this). This creates ugly output, id like to print only the match in the line. I cant seem to find a way to do that. If I can create the correct regex for a match of something like the below match, is it possible to only print the match found in the line rather than the entire line?
(I am going to try describe the type of match im looking for)
Key
* = wildcard
- = space
^ = anycharacter until a space
Match
*(U|u)ser^-=-^
dirt = "/dir/path/"
def get_files():
for root, dirs, files in os.walk(dirt):
for filename in files:
if filename.endswith(('.bat', '.vbs', '.ps', '.txt')):
readfile = open(os.path.join(root, filename), "r")
for line in readfile:
if re.match("(.*)(U|u)ser(.*)", line) and re.match("(.*)(=)(.*)", line) or re.match("(.*)(P|p)ass(.*)", line) and re.match("(.*)(=)(.*)", line):
print line
TEST SCRIPT
strComputer = "atl-ws-01"
strNamespace = “root\cimv2”
strUser = "Administrator"
strPassword = "4rTGh2#1"
user = AnotherUser #Test
pass = AnotherPass #test
Set objWbemLocator = CreateObject("WbemScripting.SWbemLocator")
Set objWMIService = objwbemLocator.ConnectServer _
(strComputer, strNamespace, strUser, strPassword)
objWMIService.Security_.authenticationLevel = WbemAuthenticationLevelPktPrivacy
Set colItems = objWMIService.ExecQuery _
("Select * From Win32_OperatingSystem")
For Each objItem in ColItems
Wscript.Echo strComputer & ": " & objItem.Caption
Next
Latest Code after taking on bored the responses
This is the latest code I am using. It seems to be doing the job as expected, apart from the output isnt managed as well as Id like. Id like to add the items into a dictionary. Key being the file name. And two vaules, the username and password. Although this will be added as a separate question.
Thanks all for the help
dirt = "~/Desktop/tmp"
def get_files():
regs = ["(.*)((U|u)ser(.*))(\s=\s\W\w+\W)", "(.*)((U|u)ser(.*))(\s=\s\w+)", "(.*)((P|p)ass(.*))\s=\s(\W(.*)\W)", "(.*)((P|p)ass(.*))(\s=\s\W\w+\W)"]
combined = "(" + ")|(".join(regs) + ")"
results = dict()
for root, dirs, files in os.walk(dirt):
for filename in files:
if filename.endswith(('.bat', '.vbs', '.ps', '.txt')):
readfile = open(os.path.join(root, filename), "r")
for line in readfile:
m = re.match(combined, line)
if m:
print os.path.join(root, filename)
print m.group(0)
Latest Code output
~/Desktop/tmp/Domain.local/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}/USER/Scripts/Logon/logonscript1.vbs
strUser = "guytom"
~/Desktop/tmp/DLsec.local/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}/USER/Scripts /Logon/logonscript1.vbs
strPassword = "P#ssw0rd1"
~/Desktop/tmp/DLsec.local/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}/USER/Scripts/Logon/logonscript2.bat
strUsername = "guytom2"
~/Desktop/tmp/DLsec.local/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}/USER/Scripts/Logon/logonscript2.bat
strPass = "SECRETPASSWORD"
https://docs.python.org/2/library/re.html
group([group1, ...])
Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned). If a groupN argument is zero, the corresponding return value is the entire matching string;
match.group(0)
Since you can have many object=value you need to use regular expressions. Here is some sample code for you.
line1 = " someuser = bob "
line2 = " bob'spasswd= secretpassword"
#re.I will do case insensitive search
userMatchObj=re.search('.*user.*=\\s*([\\S]*).*', line1, re.I)
pwdMatchObj=re.search(r'.*pass.*=\s*(.*)', line2, re.I)
if userMatchObj: print "user="+userMatchObj.group(1)
if pwdMatchObj: print "password="+pwdMatchObj.group(1)
output:
user=bob
password=secretpassword
References: https://docs.python.org/2/library/re.html , http://www.tutorialspoint.com/python/python_reg_expressions.htm
Thanks all for the help. Below is my working code (needs further work on the output but the matching is working well)
dirt = "~/Desktop/tmp"
def get_files():
regs = ["(.*)((U|u)ser(.*))(\s=\s\W\w+\W)", "(.*)((U|u)ser(.*))(\s=\s\w+)", "(.*)((P|p)ass(.*))\s=\s(\W(.*)\W)", "(.*)((P|p)ass(.*))(\s=\s\W\w+\W)"]
combined = "(" + ")|(".join(regs) + ")"
results = dict()
for root, dirs, files in os.walk(dirt):
for filename in files:
if filename.endswith(('.bat', '.vbs', '.ps', '.txt')):
readfile = open(os.path.join(root, filename), "r")
for line in readfile:
m = re.match(combined, line)
if m:
print os.path.join(root, filename)
print m.group(0)
Latest Code output
~/Desktop/tmp/Domain.local/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}/USER/Scripts/Logon/logonscript1.vbs
strUser = "guytom"
~/Desktop/tmp/DLsec.local/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}/USER/Scripts/Logon/logonscript1.vbs
strPassword = "P#ssw0rd1"
~/Desktop/tmp/DLsec.local/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}/USER/Scripts/Logon/logonscript2.bat
strUsername = "guytom2"
~/Desktop/tmp/DLsec.local/Policies/{31B2F340-016D-11D2-945F-00C04FB984F9}/USER/Scripts/Logon/logonscript2.bat
strPass = "SECRETPASSWORD"

MD5 Hash returning different results in Python

For a class assignment, I'm supposed to grab the contents of a file, compute the MD5 hash and store it in a separate file. Then I'm supposed to be able to check the integrity by comparing the MD5 hash. I'm relatively new to Python and JSON, so I thought I'd try to tackle those things with this assignment as opposed to going with something I already know.
Anyway, my program reads from a file, creates a hash, and stores that hash into a JSON file just fine. The problem comes in with my integrity checking. When I return the results of the computed hash of the file, it's different from what is recorded in the JSON file even though no changes have been made to the file. Below is an example of what is happening and I pasted my code as well. Thanks in advance for the help.
For example: These are the contents of my JSON file
Content: b'I made a file to test the md5\n'
digest: 1e8f4e6598be2ea2516102de54e7e48e
This is what is returned when I try to check the integrity of the exact same file (no changes made to it):
Content: b'I made a file to test the md5\n'
digest: ef8b7bf2986f59f8a51aae6b496e8954
import hashlib
import json
import os
import fnmatch
from codecs import open
#opens the file, reads/encodes it, and returns the contents (c)
def read_the_file(f_location):
with open(f_location, 'r', encoding="utf-8") as f:
c = f.read()
f.close()
return c
def scan_hash_json(directory_content):
for f in directory_content:
location = argument + "/" + f
content = read_the_file(location)
comp_hash = create_hash(content)
json_obj = {"Directory": argument, "Contents": {"filename": str(f),
"original string": str(content), "md5": str(comp_hash)}}
location = location.replace(argument, "")
location = location.replace(".txt", "")
write_to_json(location, json_obj)
#scans the file, creates the hash, and writes it to a json file
def read_the_json(f):
f_location = "recorded" + "/" + f
read_json = open(f_location, "r")
json_obj = json.load(read_json)
read_json.close()
return json_obj
#check integrity of the file
def check_integrity(d_content):
#d_content = directory content
for f in d_content:
json_obj = read_the_json(f)
text = f.replace(".json", ".txt")
result = find(text, os.getcwd())
content = read_the_file(result)
comp_hash = create_hash(content)
print("content: " + str(content))
print(result)
print(json_obj)
print()
print("Json Obj: " + json_obj['Contents']['md5'])
print("Hash: " + comp_hash)
#find the file being searched for
def find(pattern, path):
result = ""
for root, dirs, files in os.walk(path):
for name in files:
if fnmatch.fnmatch(name, pattern):
result = os.path.join(root, name)
return result
#create a hash for the file contents being passed in
def create_hash(content):
h = hashlib.md5()
key_before = "reallyBad".encode('utf-8')
key_after = "hashKeyAlgorithm".encode('utf-8')
content = content.encode('utf-8')
h.update(key_before)
h.update(content)
h.update(key_after)
return h.hexdigest()
#write the MD5 hash to the json file
def write_to_json(arg, json_obj):
arg = arg.replace(".txt", ".json")
storage_location = "recorded/" + str(arg)
write_file = open(storage_location, "w")
json.dump(json_obj, write_file, indent=4, sort_keys=True)
write_file.close()
#variable to hold status of user (whether they are done or not)
working = 1
#while the user is not done, continue running the program
while working == 1:
print("Please input a command. For help type 'help'. To exit type 'exit'")
#grab input from user, divide it into words, and grab the command/option/argument
request = input()
request = request.split()
if len(request) == 1:
command = request[0]
elif len(request) == 2:
command = request[0]
option = request[1]
elif len(request) == 3:
command = request[0]
option = request[1]
argument = request[2]
else:
print("I'm sorry that is not a valid request.\n")
continue
#if user inputs command 'icheck'...
if command == 'icheck':
if option == '-l':
if argument == "":
print("For option -l, please input a directory name.")
continue
try:
dirContents = os.listdir(argument)
scan_hash_json(dirContents)
except OSError:
print("Directory not found. Make sure the directory name is correct or try a different directory.")
elif option == '-f':
if argument == "":
print("For option -f, please input a file name.")
continue
try:
contents = read_the_file(argument)
computedHash = create_hash(contents)
jsonObj = {"Directory": "Default", "Contents": {
"filename": str(argument), "original string": str(contents), "md5": str(computedHash)}}
write_to_json(argument, jsonObj)
except OSError:
print("File not found. Make sure the file name is correct or try a different file.")
elif option == '-t':
try:
dirContents = os.listdir("recorded")
check_integrity(dirContents)
except OSError:
print("File not found. Make sure the file name is correct or try a different file.")
elif option == '-u':
print("gonna update stuff")
elif option == '-r':
print("gonna remove stuff")
#if user inputs command 'help'...
elif command == 'help':
#display help screen
print("Integrity Checker has a few options you can use. Each option "
"must begin with the command 'icheck'. The options are as follows:")
print("\t-l <directory>: Reads the list of files in the directory and computes the md5 for each one")
print("\t-f <file>: Reads a specific file and computes its md5")
print("\t-t: Tests integrity of the files with recorded md5s")
print("\t-u <file>: Update a file that you have modified after its integrity has been checked")
print("\t-r <file>: Removes a file from the recorded md5s\n")
#if user inputs command 'exit'
elif command == 'exit':
#set working to zero and exit program loop
working = 0
#if anything other than 'icheck', 'help', and 'exit' are input...
else:
#display error message and start over
print("I'm sorry that is not a valid command.\n")
Where are you defining h, the md5 object being used in this method?
#create a hash for the file contents being passed in
def create_hash(content):
key_before = "reallyBad".encode('utf-8')
key_after = "hashKeyAlgorithm".encode('utf-8')
print("Content: " + str(content))
h.update(key_before)
h.update(content)
h.update(key_after)
print("digest: " + str(h.hexdigest()))
return h.hexdigest()
My suspicion is that you're calling create_hash twice, but using the same md5 object in both calls. That means the second time you call it, you're really hashing "reallyBad*file contents*hashkeyAlgorithmreallyBad*file contents*hashKeyAlgorithm". You should create a new md5 object inside of create_hash to avoid this.
Edit: Here is how your program runs for me after making this change:
Please input a command. For help type 'help'. To exit type 'exit'
icheck -f ok.txt Content: this is a test
digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For
help type 'help'. To exit type 'exit' icheck -t Content: this is a
test
digest: 1f0d0fd698dfce7ce140df0b41ec3729 Please input a command. For
help type 'help'. To exit type 'exit'
Edit #2:
Your scan_hash_json function also has a bug at the end of it. You're removing the .txt suffix from the file, and calling write_to_json:
def scan_hash_json(directory_content):
...
location = location.replace(".txt", "")
write_to_json(location, json_obj)
However, write_to_json is expecting the file to end in .txt:
def write_to_json(arg, json_obj):
arg = arg.replace(".txt", ".json")
If you fix that, I think it should do everything as expected...
I see 2 possible problems you are facing:
for hash computation is computing from a binary representation of a string
unless you work only with ASCII encoding, the same international character e.g. č has different representations in the UTF-8 or Unicode encoding.
To consider:
If you need UTF-8 or Unicode, normalize first your content before you save it or calculate a hash
For testing purposes compare content binary representation.
use UTF-8 only for IO operations, codecs.open does all conversion
for you
from codecs import open
with open('yourfile', 'r', encoding="utf-8") as f:
decoded_content = f.read()

Categories

Resources