Bug in python code preventing successful recursion? - python

I have been working on a script to ingest a file (accounts.txt) which contains email addresses, for which each will then be verified against an API to see if they appear in a data dump. The script appears to work, however there is a bug present whereby once it finds a positive hit, it will disregard any other match...
For example;
If my "accounts.txt" file contains the following entries:
a#a.com
b#b.com
Even though both of those should return results, as soon as the script is run, the match on a#a.com will be found however b#b.com will not return anything.
I cannot seem to figure out why this is happening, ideally I want all of the hits outputted.
FYI, the script is querying 'haveibeenpwned' which is a site that locates email addresses found in credential dumps.
Any help finding my bug would be greatly appreciated. Below is my current script.
#!/usr/bin/env python
import argparse
import json
import requests
import time
breaches_by_date = {}
breaches_by_account = {}
breaches_by_name = {}
class Breach(object):
def __init__(self, e, n, d):
self.email = e
self.name = n
self.date = d
def __repr__(self):
return "%s: %s breached on %s" % (self.email, self.name, self.date)
def accCheck(acc):
global breaches_by_date, breaches_by_account, breaches_by_name
r = requests.get('https://haveibeenpwned.com/api/v2/breachedaccount/%s?truncateResponse=false' % acc)
try:
data = json.loads(r.text)
except ValueError:
print("No breach information for %s" % acc)
return
for i in data:
name, date = (i['Name'], i['BreachDate'])
breach = Breach(acc, name, date)
try: breaches_by_account[acc].append(breach)
except: breaches_by_account[acc] = [breach]
try: breaches_by_name[name].append(breach)
except: breaches_by_name[name] = [breach]
try: breaches_by_date[date].append(breach)
except: breaches_by_date[date] = [breach]
def readFromFile(fname="accounts.txt"):
accounts=[]
with open(fname, "r+") as f:
accounts = [l.strip() for l in f.readlines()]
return accounts
if __name__ == '__main__':
accounts = readFromFile()
for email_addr in accounts:
accCheck(email_addr)
print
print("Breaches by date")
for date, breaches in breaches_by_date.items():
for breach in breaches:
print(breach)
print
print("Breaches by account")
for acc, breaches in breaches_by_account.items():
print(acc)
for breach in breaches:
print("%s breached on %s" % (breach.name, breach.date))
print
print("Breaches by name")
for name, breaches in breaches_by_name.items():
print("%s breached for the following accounts:" % name)
for breach in breaches:
print("%s on %s" % (breach.email, breach.date))
print

I am not 100% sure to know where your problem comes from, but I would opt for a code like:
emails_to_check = open("/path/to/yourfile").read().split("\n")
for email in emails_to_check:
if is_email_blacklisted(email):
do_something()

Related

Call a range of dates from an API using Python

Currently writing a program using an API from MarketStack.com. This is for school, so I am still learning.
I am writing a stock prediction program using Python on PyCharm and I have written the connection between the program and the API without issues. So, I can certainly get the High, Name, Symbols, etc. What I am trying to do now is call a range of dates. The API says I can call up to 30 years of historical data, so I want to call all 30 years for a date that is entered by the user. Then the program will average the high on that date in order to give a trend prediction.
So, the problem I am having is calling more than one date. As I said, I want to call all 30 dates, and then I will do the math, etc.
Can someone help me call a range of dates? I tried installing Pandas and that wasn't being accepted by PyCharm for some reason.. Any help is greatly appreciated.
import tkinter as tk
import requests
# callouts for window size
HEIGHT = 650
WIDTH = 600
# function for response
def format_response(selected_stock):
try:
name1 = selected_stock['data']['name']
symbol1 = selected_stock['data']['symbol']
high1 = selected_stock['data']['eod'][1]['high']
final_str = 'Name: %s \nSymbol: %s \nEnd of Day ($ USD): %s' % (name1, symbol1, high1)
except:
final_str = 'There was a problem retrieving that information'
return final_str
# function linking to API
def stock_data(entry):
params = {'access_key': 'xxx'}
response = requests.get('http://api.marketstack.com/v1/tickers/' + entry + '/' + 'eod', params=params)
selected_stock = response.json()
label2['text'] = format_response(selected_stock)
# function for response
def format_response2(stock_hist_data):
try:
name = stock_hist_data['data']['name']
symbol = stock_hist_data['data']['symbol']
high = stock_hist_data['data']['eod'][1]['high']
name3 = stock_hist_data['data']['name']
symbol3 = stock_hist_data['data']['symbol']
high3 = stock_hist_data['data']['eod'][1]['high']
final_str2 = 'Name: %s \nSymbol: %s \nEnd of Day ($ USD): %s' % (name, symbol, high)
final_str3 = '\nName: %s \nSymbol: %s \nEnd of Day ($ USD): %s' % (name3, symbol3, high3)
except:
final_str2 = 'There was a problem retrieving that information'
final_str3 = 'There was a problem retrieving that information'
return final_str2 + final_str3
# function for response in lower window
def stock_hist_data(entry2):
params2 = {'access_key': 'xxx'}
response2 = requests.get('http://api.marketstack.com/v1/tickers/' + entry2 + '/' + 'eod', params=params2)
hist_data = response2.json()
label4['text'] = format_response2(hist_data)

Not sure how to enter the output from one function as the input for another

I have a script here with some basic funtions:
Function 1 - wget, opens a webpage and saves it to a local variable then closes.
Function 2 - scrapes this webpage for md5 hash values.
Function 3 - takes the hash values and cracks them using a dictionary of commonly used passwords.
My problem is getting my output from Function 2 and inserting it into Function 3. This is partly due to the output from Function 2 being a list and Function 3 is looking for just hash values.
You guys will most likely be able to understand more from reading my code, below is my code so far.
import sys, hashlib, re, urllib
def wget(url): # could import webpage_get and use wget() from there instead
'''Read the contents of a webpage from a specified URL'''
print '[+]---------------------------------------------------------------------------- ' #CHANGE THIS
# open URL
webpage = urllib.urlopen(url) # opens url like a file
# get page contents
page_contents = webpage.read() # reads content of webpage
return page_contents
page_contents = webpage.close() # close webpage
def findmd5(text):
'''Find all md5 hash values'''
md5value = re.findall(r'([a-fA-F\d]{32})', text)
count = len(md5value)
print "[+] Total number of md5 hash values found: %s" % count
for x in md5value:
print x
def dict_attack(passwd_hash):
dic = ['123','1234','12345','123456','1234567','12345678','password','qwerty','abc','abcd','abc123','111111','monkey','arsenal','letmein','trustno1','dragon','baseball','superman','iloveyou','starwars','montypython','cheese','123123','football','password','batman']
passwd_found = False
for value in dic:
hashvalue = hashlib.md5(value).hexdigest()
if hashvalue == passwd_hash:
passwd_found = True
recovered_password = value
if passwd_found == True:
print '[+] Password recovered: %s'% (recovered_password)
else:
print '[-] Password not recovered'
def main():
# temp testing url argument
sys.argv.append('URL HERE!')
# Check args
if len(sys.argv) != 2:
print '[-] Usage: email_analysis URL/filename'
return
#call functions
try:
print '[+] md5 values found: '
print findmd5(wget(sys.argv[1]))
print '[+] Cracking hash values: '
except IOError:
print 'Error'
if __name__ == '__main__':
main()
Any help is greatly appreciated!
wget: Set return statement as last statement.
findmd5: Changed from printing it's results, to returning them to a variable in main.
main: added in for loop to iterate over found hashes and apply dict_attack to each value.
I did however not build in any break or stop condition, so even if found, the program will continue running. It will however still print the found result.
import sys, hashlib, re, urllib
def wget(url): # could import webpage_get and use wget() from there instead
'''Read the contents of a webpage from a specified URL'''
print ('[+]---------------------------------------------------------------------------- ') #CHANGE THIS
# open URL
webpage = urllib.urlopen(url) # opens url like a file
# get page contents
page_contents = webpage.read() # reads content of webpage
page_contents = webpage.close() # close webpage
return page_contents
def findmd5(text):
'''Find all md5 hash values'''
md5value = re.findall(r'([a-fA-F\d]{32})', text)
count = len(md5value)
print ("[+] Total number of md5 hash values found: %s" % count)
return md5value
def dict_attack(passwd_hash):
dic = ['123','1234','12345','123456','1234567','12345678','password','qwerty','abc','abcd','abc123','111111','monkey','arsenal','letmein','trustno1','dragon','baseball','superman','iloveyou','starwars','montypython','cheese','123123','football','password','batman']
passwd_found = False
for value in dic:
hashvalue = hashlib.md5(value).hexdigest()
if hashvalue == passwd_hash:
passwd_found = True
recovered_password = value
if passwd_found == True:
print ('[+] Password recovered: %s'% (recovered_password))
else:
print ('[-] Password not recovered')
def main():
# temp testing url argument
sys.argv.append('URL HERE!')
# Check args
if len(sys.argv) != 2:
print ('[-] Usage: email_analysis URL/filename')
return
#call functions
try:
md5Values = findmd5(wget(sys.argv[1]))
for md5value in md5values:
dict_attack(md5value)
print ('[+] Cracking hash values: ')
except IOError:
print ('Error')
if __name__ == '__main__':
main()
Billy, return a a list of the hashes found, instead of printing them (it looks like you are thinking like it was bash, but you don't need to "print" the output of a function, as you did in bash, in python you can literally return an array with the elements found).
Your regexp for the hash uses \d, but that includes - as well, it might bring something that is not a MD5 hash.

How to add tag and region name while printing the result

This is basically a effort to learn mapping for dictionary, basically i have a function which prints the change in a port , the code is as follows :
def comp_ports(self,filename,mapping):
try:
#print "HEYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
f = open(filename)
self.prev_report = pickle.load(f) # NmapReport
for s in self.prev_report.hosts:
self.old_port_dict[s.address] = set()
for x in s.get_open_ports():
self.old_port_dict[s.address].add(x)
for s in self.report.hosts:
self.new_port_dict[s.address] = set()
for x in s.get_open_ports():
self.new_port_dict[s.address].add(x)
print "The following Host/ports were available in old scan : !!"
print `self.old_port_dict`
print "--------------------------------------------------------"
print "The following Host/ports have been added in new scan: !!"
print `self.new_port_dict`
##
for h in self.old_port_dict.keys():
self.results_ports_dict[h] = self.new_port_dict[h]- self.old_port_dict[h]
print "Result Change: for",h ,"->",self.results_ports_dict[h]
################### The following code is intensive ###################
print "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
diff_key=[key for key in self.old_port_dict if self.old_port_dict[key]!=self.new_port_dict[key]]
for key in diff_key:
print "For %s, Port changed from %s to %s" %(key,self.old_port_dict[key],self.new_port_dict[key])
The way i call this is via main function,
if __name__ == "__main__":
if len(sys.argv) < 2:
print "Usage:\n\tportwatch.py <configfile> [clean]"
sys.exit(-1)
else:
# Read
config = ConfigParser.ConfigParser()
config.read(sys.argv[1])
if len(sys.argv) > 2:
if sys.argv[2] == "clean":
for f in ['nmap-report-old.pkl','nmap-report.pkl']:
try:
os.remove( config.get('system','scan_directory') + "/" + f )
except Exception as e:
print e
# Configure Scanner
s = Scanner(config)
# Execute Scan and Generate latest report
net_range = gather_public_ip() #config.get('sources','networks') # gather_public_ip()
### r = s.run(','.join([[i[0] for i in v] for v in net_range][0]))
r = s.run(net_range)
data = list(itertools.chain(*net_range))
mapping = {i[0]:[i[1],i[2]] for i in data}
s.save()
report = Report(r)
report.dump_raw(mapping) ## change made for dump to dump_raw
print "Hosts in scan report",report.total_hosts()
# Read in last scan
report.compare(config.get('system','scan_directory') + '/nmap-report-old.pkl' )
print "New Hosts"
report.new_hosts()
# slack.api_token = config.get('notification','slack_key')
notify_slack_new_host(report.new_hosts()) #Notifty Slack for any new added host
# for h in report.result_port_dict.keys():
# notify_slack(report.new_hosts(h))
print "Lost Hosts"
report.lost_hosts()
report.comp_ports(config.get('system','scan_directory') + '/nmap-report-old.pkl',mapping)
The whole code is at http://pastebin.com/iDYBBrEq , can someone please help me at comp_ports where i want to also add the tag and region name as similer to dump_raw.
Please help
Since the IP is your key in the dictionaries old_port_dict, new_port_dict and mapping and in mapping each IP maps to a list with tag at index 0 and region at index 1, the way to access those will be.
for key in diff_key:
print "For %s with tag %s and region %s, Port changed from %s to %s" %(key,mapping[key][0],mapping[key][1],self.old_port_dict[key],self.new_port_dict[key])

Importing certain details from a 'txt' file

I have a .txt file where names and addresses appear in the following format:
Sam, 35 Marly Road
...
...
I want to be able to search Sam and for 35 Marly Road to come up.
Here is the code I have so far:
name = input("Please insert your required Client's name: ")
if name in open('clientAddress.txt').read():
print ("Client Found")`
This checks if the ID inputted is available in the file, but it doesn't print the address. How do I change it, so it finds the name and prints the address?
As a quick solution - see below
username = input()
with open('clientRecords.txt', 'r') as clientsfile:
for line in clientsfile.readline():
if line.startswith("%s, " % username):
print("Cient %s found: %s" % (username, line[len(username) + 2:]))
break
Simple solution
username = input()
with open('clientRecords.txt', 'r') as clientsfile:
for line in clientsfile:
if line.startswith("%s, " % username):
print("Client %s found: %s" % (username, line.split(",")[1]))
break
For loop iterates through file lines and when we found line starts with desired client name we print address and break loop.

<type 'exceptions.IOError'> [Errno 9] Bad file descriptor

The code below is a part of a program which is aimed to capture data from Bloomberg terminal and dump it into SQLite database. It worked pretty well on my 32-bit windows XP. But it keeps giving me
"get_history.histfetch error: [Errno 9] Bad file descriptor" on 64-bit windows 7, although there shouldn't be a problem using 32-bit python under 64-bit OS. Sometimes this problem can be solved by simply exit the program and open it again, but sometimes it just won't work. Right now I'm really confused about what leads to this problem. I looked at the source code and found the problem is generated while calling "histfetch" and I have NO idea which part of the code is failing. Can anyone help me out here...? I really really appreciate it. Thanks in advance.
def run(self):
try: pythoncom.CoInitializeEx(pythoncom.COINIT_APARTMENTTHREADED)
except: pass
while 1:
if self.trigger:
try: self.histfetch()
except Exception,e:
logging.error('get_history.histfetch error: %s %s' % (str(type(e)),str(e)))
if self.errornotify != None:
self.errornotify('get_history error','%s %s' % ( str(type(e)), str(e) ) )
self.trigger = 0
if self.telomere: break
time.sleep(0.5)
def histfetch(self):
blpcon = win32com.client.gencache.EnsureDispatch('blpapicom.Session')
blpcon.Start()
dbcon = sqlite3.connect(self.dbfile)
c = dbcon.cursor()
fieldcodes = {}
symcodes = {}
trysleep(c,'select fid,field from fields')
for fid,field in c.fetchall():
# these are different types so this will be ok
fieldcodes[fid] = field
fieldcodes[field] = fid
trysleep(c,'select sid,symbol from symbols')
for sid,symbol in c.fetchall():
symcodes[sid] = symbol
symcodes[symbol] = sid
for instr in self.instructions:
if instr[1] != 'minute': continue
sym,rollspec = instr[0],instr[2]
print 'MINUTE',sym
limits = []
sid = getsid(sym,symcodes,dbcon,c)
trysleep(c,'select min(epoch),max(epoch) from minute where sid=?',(sid,))
try: mine,maxe = c.fetchone()
except: mine,maxe = None,None
print sym,'minute data limits',mine,maxe
rr = getreqrange(mine,maxe)
if rr == None: continue
start,end = rr
dstart = start.strftime('%Y%m%d')
dend = end.strftime('%Y%m%d')
try: # if rollspec is 'noroll', then this will fail and goto except-block
ndaysbefore = int(rollspec)
print 'hist fetch for %s, %i days' % (sym,ndaysbefore)
rolldb.update_roll_db(blpcon,(sym,))
names = rolldb.get_contract_range(sym,ndaysbefore)
except: names = {sym:None}
# sort alphabetically here so oldest always gets done first
# (at least within the decade)
sorted_contracts = names.keys()
sorted_contracts.sort()
for contract in sorted_contracts:
print 'partial fetch',contract,names[contract]
if names[contract] == None:
_start,_end = start,end
else:
da,db = names[contract]
dc,dd = start,end
try: _start,_end = get_overlap(da,db,dc,dd)
except: continue # because get_overlap returning None cannot assign to tuple
# localstart and end are for printing and logging
localstart = _start.strftime('%Y/%m/%d %H:%M')
localend = _end.strftime('%Y/%m/%d %H:%M')
_start = datetime.utcfromtimestamp(time.mktime(_start.timetuple())).strftime(self.blpfmt)
_end = datetime.utcfromtimestamp(time.mktime(_end.timetuple())).strftime(self.blpfmt)
logging.debug('requesting intraday bars for %s (%s): %s to %s' % (sym,contract,localstart,localend))
print 'start,end:',localstart,localend
result = get_minute(blpcon,contract,_start,_end)
if len(result) == 0:
logging.error('warning: 0-length minute data fetch for %s,%s,%s' % (contract,_start,_end))
continue
event_count = len(result.values()[0])
print event_count,'events returned'
lap = time.clock()
# todo: split up writes: no more than 5000 before commit (so other threads get a chance)
# 100,000 rows is 13 seconds on my machine. 5000 should be 0.5 seconds.
try:
for i in range(event_count):
epoch = calendar.timegm(datetime.strptime(str(result['time'][i]),'%m/%d/%y %H:%M:%S').timetuple())
# this uses sid (from sym), NOT contract
row = (sid,epoch,result['open'][i],result['high'][i],result['low'][i],result['close'][i],result['volume'][i],result['numEvents'][i])
trysleep(c,'insert or ignore into minute (sid,epoch,open,high,low,close,volume,nevents) values (?,?,?,?,?,?,?,?)',row)
dbcon.commit()
except Exception,e:
print 'ERROR',e,'iterating result object'
logging.error(datetime.now().strftime() + ' error in get_history.histfetch writing DB')
# todo: tray notify the error and log it
lap = time.clock() - lap
print 'database write of %i rows in %.2f seconds' % (event_count,lap)
logging.debug(' -- minute bars %i rows (%.2f s)' % (event_count,lap))
for instr in self.instructions:
oldestdaily = datetime.now().replace(hour=0,minute=0,second=0,microsecond=0) - timedelta(self.dailyback)
sym = instr[0]
if instr[1] != 'daily': continue
print 'DAILY',sym
fields = instr[2]
rollspec = instr[3]
sid = getsid(sym,symcodes,dbcon,c)
unionrange = None,None
for f in fields:
try: fid = fieldcodes[f]
except:
trysleep(c,'insert into fields (field) values (?)',(f,))
trysleep(c,'select fid from fields where field=?',(f,))
fid, = c.fetchone()
dbcon.commit()
fieldcodes[fid] = f
fieldcodes[f] = fid
trysleep(c,'select min(epoch),max(epoch) from daily where sid=? and fid=?',(sid,fid))
mine,maxe = c.fetchone()
if mine == None or maxe == None:
unionrange = None
break
if unionrange == (None,None):
unionrange = mine,maxe
else:
unionrange = max(mine,unionrange[0]),min(maxe,unionrange[1])
print sym,'daily unionrange',unionrange
yesterday = datetime.now().replace(hour=0,minute=0,second=0,microsecond=0) - timedelta(days=1)
if unionrange == None:
reqrange = oldestdaily,yesterday
else:
mine = datetime.fromordinal(unionrange[0])
maxe = datetime.fromordinal(unionrange[1])
print 'comparing',mine,maxe,oldestdaily,yesterday
if oldestdaily < datetime.fromordinal(unionrange[0]): a = oldestdaily
else: a = maxe
reqrange = a,yesterday
if reqrange[0] >= reqrange[1]:
print 'skipping daily',sym,'because we\'re up to date'
continue
print 'daily request range',sym,reqrange,reqrange[0] > reqrange[1]
try:
ndaysbefore = int(rollspec) # exception if it's 'noroll'
print 'hist fetch for %s, %i days' % (sym,ndaysbefore)
rolldb.update_roll_db(blpcon,(sym,))
names = rolldb.get_contract_range(sym,ndaysbefore,daily=True)
except: names = {sym:None}
# sort alphabetically here so oldest always gets done first
# (at least within the year)
sorted_contracts = names.keys()
sorted_contracts.sort()
start,end = reqrange
for contract in sorted_contracts:
print 'partial fetch',contract,names[contract]
if names[contract] == None:
_start,_end = start,end
else:
da,db = names[contract]
dc,dd = start,end
try: _start,_end = get_overlap(da,db,dc,dd)
except: continue # because get_overlap returning None cannot assign to tuple
_start = _start.strftime('%Y%m%d')
_end = _end.strftime('%Y%m%d')
logging.info('daily bars for %s (%s), %s - %s' % (sym,contract,_start,_end))
result = get_daily(blpcon,(contract,),fields,_start,_end)
try: result = result[contract]
except:
print 'result doesn\'t contain requested symbol'
logging.error("ERROR: symbol '%s' not in daily request result" % contract)
# todo: log and alert error
continue
if not 'date' in result:
print 'result has no date field'
logging.error('ERROR: daily result has no date field')
# todo: log and alert error
continue
keys = result.keys()
keys.remove('date')
logging.info(' -- %i days returned' % len(result['date']))
for i in range(len(result['date'])):
ordinal = datetime.fromtimestamp(int(result['date'][i])).toordinal()
for k in keys:
trysleep(c,'insert or ignore into daily (sid,fid,epoch,value) values (?,?,?,?)',(sid,fieldcodes[k],ordinal,result[k][i]))
dbcon.commit()
Print the full traceback instead of just the exception message. The traceback will show you where the exception was raised and hence what the problem is:
import traceback
...
try: self.histfetch()
except Exception,e:
logging.error('get_history.histfetch error: %s %s' % (str(type(e)),str(e)))
logging.error(traceback.format_exc())
if self.errornotify != None:
self.errornotify('get_history error','%s %s' % ( str(type(e)), str(e) ) )
Update:
With the above (or similar, the idea being to look at the full traceback), you say:
it said it's with the "print" functions. The program works well after I disable all the "print" functions.
The print function calls you have in your post uses syntax valid in python 2.x only. If that is what you are using, perhaps the application that runs your script has undefined print and you're supposed to use a log function, otherwise I can't see anything wrong with the calls (unless you mean only one of the prints was the issue, then I would need to see the exact error to identify -- post this if you want to figure this out). If you are using Python 3.x, then you must use print(a, b, c, ...), see 3.x docs.

Categories

Resources