Related
I've made a piece of code to try to challenge myself a bit with python, as I'm relatively new, but this one has me stumped.
I've been running some code, and the part that failed is supposed to check the time it takes to complete 1 million iterations, and then log these times every 10 million iterations, when the program saves its progress. The program had been running for about an hour, with me checking in every 10-20 minutes, when I came back and saw this error message:
Traceback (most recent call last):
File "C:\Users\BruhK\PycharmProjects\Conjecture.py", line 101, in <module>
run(stl[0], stl[1], stl[2], stl[3])
File "C:\Users\BruhK\PycharmProjects\Conjecture.py", line 78, in run
ntimestr = ntimestr + timesplit.pop(0)
IndexError: pop from empty list
I've gone over the code again and again trying to figure out what would cause this, but found nothing. I considered adding the following code:
try:
if timehold != 0:
timediff = time.time()*1000 - timehold
timestr = str(timediff)
timesplit = Functions.split(timestr)
ntimestr = ""
for x in range(7):
ntimestr = ntimestr + timesplit.pop(0)
timelist.append(ntimestr)
print("Completed in {} ms.".format(ntimestr))
timehold = time.time()*1000
except IndexError as err:
# write err to a log file
pass
I don't want to do this, however. I'd rather find a proper solution. Any help with this would be appreciated.
(Unrelated) at the moment I'm saving the progress of the program to a file on my computer, then running a separate piece of code to export it to a spreadsheet to make a graph of the timings, but I plan to combine the two once I get this resolved.
import time
...
def run(snum, maxval, maxitn, maxtrialnum):
itn = snum
timehold = 0
timelist = []
if itn == 0:
itn = 1
val = False
while not val:
trialnum = 0
value = itn
val2 = False
if itn % 10000000 == 0 and itn > 9999999:
f = open("conjecturestore.txt", "w+")
f.write(str(itn))
f.close()
f = open("conjecturedata.txt", "w+")
f.write("maxval:{}\nmaxitn:{}\nmaxtrialnum:{}".format(maxval, maxitn, maxtrialnum))
f.close()
f = open("conjecturetimings.txt", "a+")
for x in range(len(timelist)):
f.write("{}\n".format(timelist.pop(0)))
f.close()
while not val2:
if itn % 2 == 0:
val2 = True
if itn % 1000000 == 0:
print("MI+{} MV+{} MTR+{}".format(maxitn, int(maxval), maxtrialnum))
print("Inconclusive sample on I+{} V+{} TR+{}.".format(itn, int(value), trialnum))
if timehold != 0:
timediff = time.time()*1000 - timehold
timestr = str(timediff)
timesplit = Functions.split(timestr)
ntimestr = ""
for x in range(7):
ntimestr = ntimestr + timesplit.pop(0)
timelist.append(ntimestr)
print("Completed in {} ms.".format(ntimestr))
timehold = time.time()*1000
for x in range(7):
ntimestr = ntimestr + timesplit.pop(0)
to give that error, your timesplit list has to be shorter then 7 elements. You can test that and log it out to avoid the error. You can try: except: handle the error. Without the data you operate on, nobody here can help you. And we do not want that data - bc this is not a minimal reproducible example. The error is quite self explanatory and has lots of potentials dupes around - time to get hands on and debug your code. -- Patrick Artner"
Very simple mistake on my part. I've done a bit of testing and this ended up being the problem, just flew over my head.
I recently had to write a challenge for a company that was to merge 3 CSV files into one based on the first attribute of each (the attributes were repeating in all files).
I wrote the code and sent it to them, but they said it took 2 minutes to run. That was funny because it ran for 10 seconds on my machine. My machine had the same processor, 16GB of RAM, and had an SSD as well. Very similar environments.
I tried optimising it and resubmitted it. This time they said they ran it on an Ubuntu machine and got 11 seconds, while the code ran for 100 seconds on the Windows 10 still.
Another peculiar thing was that when I tried profiling it with the Profile module, it went on forever, had to terminate after 450 seconds. I moved to cProfiler and it recorded it for 7 seconds.
EDIT: The exact formulation of the problem is
Write a console program to merge the files provided in a timely and
efficient manner. File paths should be supplied as arguments so that
the program can be evaluated on different data sets. The merged file
should be saved as CSV; use the id column as the unique key for
merging; the program should do any necessary data cleaning and error
checking.
Feel free to use any language you’re comfortable with – only
restriction is no external libraries as this defeats the purpose of
the test. If the language provides CSV parsing libraries (like
Python), please avoid using them as well as this is a part of the
test.
Without further ado here's the code:
#!/usr/bin/python3
import sys
from multiprocessing import Pool
HEADERS = ['id']
def csv_tuple_quotes_valid(a_tuple):
"""
checks if a quotes in each attribute of a entry (i.e. a tuple) agree with the csv format
returns True or False
"""
for attribute in a_tuple:
in_quotes = False
attr_len = len(attribute)
skip_next = False
for i in range(0, attr_len):
if not skip_next and attribute[i] == '\"':
if i < attr_len - 1 and attribute[i + 1] == '\"':
skip_next = True
continue
elif i == 0 or i == attr_len - 1:
in_quotes = not in_quotes
else:
return False
else:
skip_next = False
if in_quotes:
return False
return True
def check_and_parse_potential_tuple(to_parse):
"""
receives a string and returns an array of the attributes of the csv line
if the string was not a valid csv line, then returns False
"""
a_tuple = []
attribute_start_index = 0
to_parse_len = len(to_parse)
in_quotes = False
i = 0
#iterate through the string (line from the csv)
while i < to_parse_len:
current_char = to_parse[i]
#this works the following way: if we meet a quote ("), it must be in one
#of five cases: "" | ", | ," | "\0 | (start_of_string)"
#in case we are inside a quoted attribute (i.e. "123"), then commas are ignored
#the following code also extracts the tuples' attributes
if current_char == '\"':
if i == 0 or (to_parse[i - 1] == ',' and not in_quotes): # (start_of_string)" and ," case
#not including the quote in the next attr
attribute_start_index = i + 1
#starting a quoted attr
in_quotes = True
elif i + 1 < to_parse_len:
if to_parse[i + 1] == '\"': # "" case
i += 1 #skip the next " because it is part of a ""
elif to_parse[i + 1] == ',' and in_quotes: # ", case
a_tuple.append(to_parse[attribute_start_index:i].strip())
#not including the quote and comma in the next attr
attribute_start_index = i + 2
in_quotes = False #the quoted attr has ended
#skip the next comma - we know what it is for
i += 1
else:
#since we cannot have a random " in the middle of an attr
return False
elif i == to_parse_len - 1: # "\0 case
a_tuple.append(to_parse[attribute_start_index:i].strip())
#reached end of line, so no more attr's to extract
attribute_start_index = to_parse_len
in_quotes = False
else:
return False
elif current_char == ',':
if not in_quotes:
a_tuple.append(to_parse[attribute_start_index:i].strip())
attribute_start_index = i + 1
i += 1
#in case the last attr was left empty or unquoted
if attribute_start_index < to_parse_len or (not in_quotes and to_parse[-1] == ','):
a_tuple.append(to_parse[attribute_start_index:])
#line ended while parsing; i.e. a quote was openned but not closed
if in_quotes:
return False
return a_tuple
def parse_tuple(to_parse, no_of_headers):
"""
parses a string and returns an array with no_of_headers number of headers
raises an error if the string was not a valid CSV line
"""
#get rid of the newline at the end of every line
to_parse = to_parse.strip()
# return to_parse.split(',') #if we assume the data is in a valid format
#the following checking of the format of the data increases the execution
#time by a factor of 2; if the data is know to be valid, uncomment 3 lines above here
#if there are more commas than fields, then we must take into consideration
#how the quotes parse and then extract the attributes
if to_parse.count(',') + 1 > no_of_headers:
result = check_and_parse_potential_tuple(to_parse)
if result:
a_tuple = result
else:
raise TypeError('Error while parsing CSV line %s. The quotes do not parse' % to_parse)
else:
a_tuple = to_parse.split(',')
if not csv_tuple_quotes_valid(a_tuple):
raise TypeError('Error while parsing CSV line %s. The quotes do not parse' % to_parse)
#if the format is correct but more data fields were provided
#the following works faster than an if statement that checks the length of a_tuple
try:
a_tuple[no_of_headers - 1]
except IndexError:
raise TypeError('Error while parsing CSV line %s. Unknown reason' % to_parse)
#this replaces the use my own hashtables to store the duplicated values for the attributes
for i in range(1, no_of_headers):
a_tuple[i] = sys.intern(a_tuple[i])
return a_tuple
def read_file(path, file_number):
"""
reads the csv file and returns (dict, int)
the dict is the mapping of id's to attributes
the integer is the number of attributes (headers) for the csv file
"""
global HEADERS
try:
file = open(path, 'r');
except FileNotFoundError as e:
print("error in %s:\n%s\nexiting...")
exit(1)
main_table = {}
headers = file.readline().strip().split(',')
no_of_headers = len(headers)
HEADERS.extend(headers[1:]) #keep the headers from the file
lines = file.readlines()
file.close()
args = []
for line in lines:
args.append((line, no_of_headers))
#pool is a pool of worker processes parsing the lines in parallel
with Pool() as workers:
try:
all_tuples = workers.starmap(parse_tuple, args, 1000)
except TypeError as e:
print('Error in file %s:\n%s\nexiting thread...' % (path, e.args))
exit(1)
for a_tuple in all_tuples:
#add quotes to key if needed
key = a_tuple[0] if a_tuple[0][0] == '\"' else ('\"%s\"' % a_tuple[0])
main_table[key] = a_tuple[1:]
return (main_table, no_of_headers)
def merge_files():
"""
produces a file called merged.csv
"""
global HEADERS
no_of_files = len(sys.argv) - 1
processed_files = [None] * no_of_files
for i in range(0, no_of_files):
processed_files[i] = read_file(sys.argv[i + 1], i)
out_file = open('merged.csv', 'w+')
merged_str = ','.join(HEADERS)
all_keys = {}
#this is to ensure that we include all keys in the final file.
#even those that are missing from some files and present in others
for processed_file in processed_files:
all_keys.update(processed_file[0])
for key in all_keys:
merged_str += '\n%s' % key
for i in range(0, no_of_files):
(main_table, no_of_headers) = processed_files[i]
try:
for attr in main_table[key]:
merged_str += ',%s' % attr
except KeyError:
print('NOTE: no values found for id %s in file \"%s\"' % (key, sys.argv[i + 1]))
merged_str += ',' * (no_of_headers - 1)
out_file.write(merged_str)
out_file.close()
if __name__ == '__main__':
# merge_files()
import cProfile
cProfile.run('merge_files()')
# import time
# start = time.time()
# print(time.time() - start);
Here is the profiler report I got on my Windows.
EDIT: The rest of the csv data provided is here. Pastebin was taking too long to process the files, so...
It might not be the best code and I know that, but my question is what slows down Windows so much that doesn't slow down an Ubuntu? The merge_files() function takes the longest, with 94 seconds just for itself, not including the calls to other functions. And there doesn't seem to be anything too obvious to me for why it is so slow.
Thanks
EDIT: Note: We both used the same dataset to run the code with.
It turns out that Windows and Linux handle very long strings differently. When I moved the out_file.write(merged_str) inside the outer for loop (for key in all_keys:) and stopped appending to merged_str, it ran for 11 seconds as expected. I don't have enough knowledge on either of the OS's memory management systems to be able to give a prediction on why it is so different.
But I would say that the way that the second one (the Windows one) is the more fail-safe method because it is unreasonable to keep a 30 MB string in memory. It just turns out that Linux sees that and doesn't always try to keep the string in cache, or to rebuild it every time.
Funny enough, initially I did run it a few times on my Linux machine with these same writing strategies, and the one with the large string seemed to go faster, so I stuck with it. I guess you never know.
Here's the modified code
for key in all_keys:
merged_str = '%s' % key
for i in range(0, no_of_files):
(main_table, no_of_headers) = processed_files[i]
try:
for attr in main_table[key]:
merged_str += ',%s' % attr
except KeyError:
print('NOTE: no values found for id %s in file \"%s\"' % (key, sys.argv[i + 1]))
merged_str += ',' * (no_of_headers - 1)
out_file.write(merged_str + '\n')
out_file.close()
When I run your solution on Ubuntu 16.04 with the three given files, it seems to take ~8 seconds to complete. The only modification I made was to uncomment the timing code at the bottom and use it.
$ python3 dimitar_merge.py file1.csv file2.csv file3.csv
NOTE: no values found for id "aaa5d09b-684b-47d6-8829-3dbefd608b5e" in file "file2.csv"
NOTE: no values found for id "38f79a49-4357-4d5a-90a5-18052ef03882" in file "file2.csv"
NOTE: no values found for id "766590d9-4f5b-4745-885b-83894553394b" in file "file2.csv"
8.039648056030273
$ python3 dimitar_merge.py file1.csv file2.csv file3.csv
NOTE: no values found for id "38f79a49-4357-4d5a-90a5-18052ef03882" in file "file2.csv"
NOTE: no values found for id "766590d9-4f5b-4745-885b-83894553394b" in file "file2.csv"
NOTE: no values found for id "aaa5d09b-684b-47d6-8829-3dbefd608b5e" in file "file2.csv"
7.78482985496521
I rewrote my first attempt without using csv from the standard library and am now getting times of ~4.3 seconds.
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.332579612731934
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.305467367172241
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.27345871925354
This is my solution code (lettuce_merge.py):
from collections import defaultdict
def split_row(csv_row):
return [col.strip('"') for col in csv_row.rstrip().split(',')]
def merge_csv_files(files):
file_headers = []
merged_headers = []
for i, file in enumerate(files):
current_header = split_row(next(file))
unique_key, *current_header = current_header
if i == 0:
merged_headers.append(unique_key)
merged_headers.extend(current_header)
file_headers.append(current_header)
result = defaultdict(lambda: [''] * (len(merged_headers) - 1))
for file_header, file in zip(file_headers, files):
for line in file:
key, *values = split_row(line)
for col_name, col_value in zip(file_header, values):
result[key][merged_headers.index(col_name) - 1] = col_value
file.close()
quotes = '"{}"'.format
with open('lettuce_merged.csv', 'w') as f:
f.write(','.join(quotes(a) for a in merged_headers) + '\n')
for key, values in result.items():
f.write(','.join(quotes(b) for b in [key] + values) + '\n')
if __name__ == '__main__':
from argparse import ArgumentParser, FileType
from time import time
parser = ArgumentParser()
parser.add_argument('files', nargs='*', type=FileType('r'))
args = parser.parse_args()
start_time = time()
merge_csv_files(args.files)
print(time() - start_time)
I'm sure this code could be optimized even further but sometimes just seeing another way to solve a problem can help spark new ideas.
When I measure the time manually, it is less than the time that I got through this script:
import time import os
def getTimes():
try:
times = []
if(exists("1472205483589.png",60)):
click("1472192774056.png")
wait("1472040968178.png",10)
click("1472036591623.png")
click("1472036834091.png")
click("1472036868986.png")
if(exists("1472192829443.png",5)):
click("1472192829443.png")
u = time.time()
click("1472539655695.png")
wait("1472042542247.png",120)
v = time.time()
print("Open File to when views list appear  (sec) : " , int(v-u))
times.append(int(v-u))
u = time.time()
click("1472042542247.png")
wait("1472108424071.png",120)
mouseMove("1472108424071.png")
wait("1472108486171.png",120)
v = time.time()
print("Opening view (sec) : ",int(v-u))
times.append(int(v-u))
u = time.time()
click("1472109163884.png")
wait("1472042181291.png",120)
v = time.time()
print("Clicking element (sec) : ", float(v-u))
times.append(int(v-u))
return times
except FindFailed as ex:
print("Failed. Navigator might have stopped working")
if(exists("1472204045678.png",10)):
click("1472204045678.png")
return -1
file = open(r"C:\BSW\SikulixScripts\NavigatorAutoTesting\log.txt",'w') ret = getTimes() if (ret == -1):
file.write("-1")
exit() str = " ".join(str(x) for x in ret) file.write(str) file.close()
By using time.time(), you are actually returning a number of seconds--the difference between "the epoch" and now. (The epoch is the same as gmtime(0)). Instead, try using datetime.now() which will give you a datetime object. You can add and subtract datetime objects freely, resulting in a timedelta object as per the Python docs
u = datetime.now()
click("1472539655695.png")
wait("1472042542247.png",120)
v = datetime.now()
tdelta = v-u
seconds = tdelta.total_seconds() #if you want the number of seconds as a floating point number... (available in Python 2.7 and up)
times.append(seconds)
This should yield more accuracy for you.
I'm trying to get code similar to the following example working correctly:
from multiprocessing import Process, Queue, Manager, Pool
import time
from datetime import datetime
def results_producer(the_work, num_procs):
results = Manager().Queue()
ppool = Pool(num_procs)
multiplier = 3
#step = len(the_work)/(num_procs*multiplier)
step = 100
for i in xrange(0,len(the_work), step):
batch = the_work[i:i+step]
ppool.apply_async(do_work1, args=(i,batch,results))#,callback=results.put_nowait)
return (ppool, results)
def results_consumer(results, total_work, num_procs, pool=None):
current = 0
batch_size=10
total = total_work
est_remaining = 0
while current < total_work:
size = results.qsize()
est_remaining = total_work - (current + size)
if current % 1000 == 0:
print 'Attempting to retrieve item from queue that is empty? %s, with size: %d and remaining work: %d' % (results.empty(), size, est_remaining)
item = results.get()
results.task_done()
current += 1
if current % batch_size == 0 or total_work - current < batch_size:
if pool is not None and est_remaining == 0 and size/num_procs > batch_size:
pool.apply_async(do_work2, args=(current, item, True))
else:
do_work2(current,item, False)
if current % 1000 == 0:
print 'Queue size: %d and remaining work: %d' % (size, est_remaining)
def do_work1(i, w, results):
time.sleep(.05)
if i % 1000 == 0:
print 'did work %d: from %d to %d' % (i,w[0], w[-1])
for j in w:
#create an increasing amount of work on the queue
results.put_nowait(range(j*2))
def do_work2(index, item, in_parallel):
time.sleep(1)
if index % 50 == 0:
print 'processed result %d with length %d in parallel %s' % (index, len(item), in_parallel)
if __name__ == "__main__":
num_workers = 2
start = datetime.now()
print 'Start: %s' % start
amount_work = 4000
the_work = [i for i in xrange(amount_work)]
ppool, results = results_producer(the_work, num_workers)
results_consumer(results, len(the_work), num_workers, ppool)
if ppool is not None:
ppool.close()
ppool.join()
print 'Took: %s time' % (datetime.now() - start)
And it deadlocks on the results.put_nowait call from do_work1 even though the queue is empty! Sometimes the code is able to put all the work on the queue but the results.get call from results_consumer blocks since it is apparently empty even though the work has not been consumed yet.
Additionally, I checked the programming guidelines: https://docs.python.org/2/library/multiprocessing.html and believe the above code conforms to it. Lastly the problem in this post: Python multiprocessing.Queue deadlocks on put and get seems very similar and claims to be solved on Windows (I'm running this on Windows 8.1) however the above code doesn't block due to the parent process attempting to join the child process since the logic is similar to the suggested answer. Any suggestions about the cause of the deadlock and how to fix it? Also in general, what is the best way to enable multiple producers to provide results for a consumer to process in python?
The code below is a part of a program which is aimed to capture data from Bloomberg terminal and dump it into SQLite database. It worked pretty well on my 32-bit windows XP. But it keeps giving me
"get_history.histfetch error: [Errno 9] Bad file descriptor" on 64-bit windows 7, although there shouldn't be a problem using 32-bit python under 64-bit OS. Sometimes this problem can be solved by simply exit the program and open it again, but sometimes it just won't work. Right now I'm really confused about what leads to this problem. I looked at the source code and found the problem is generated while calling "histfetch" and I have NO idea which part of the code is failing. Can anyone help me out here...? I really really appreciate it. Thanks in advance.
def run(self):
try: pythoncom.CoInitializeEx(pythoncom.COINIT_APARTMENTTHREADED)
except: pass
while 1:
if self.trigger:
try: self.histfetch()
except Exception,e:
logging.error('get_history.histfetch error: %s %s' % (str(type(e)),str(e)))
if self.errornotify != None:
self.errornotify('get_history error','%s %s' % ( str(type(e)), str(e) ) )
self.trigger = 0
if self.telomere: break
time.sleep(0.5)
def histfetch(self):
blpcon = win32com.client.gencache.EnsureDispatch('blpapicom.Session')
blpcon.Start()
dbcon = sqlite3.connect(self.dbfile)
c = dbcon.cursor()
fieldcodes = {}
symcodes = {}
trysleep(c,'select fid,field from fields')
for fid,field in c.fetchall():
# these are different types so this will be ok
fieldcodes[fid] = field
fieldcodes[field] = fid
trysleep(c,'select sid,symbol from symbols')
for sid,symbol in c.fetchall():
symcodes[sid] = symbol
symcodes[symbol] = sid
for instr in self.instructions:
if instr[1] != 'minute': continue
sym,rollspec = instr[0],instr[2]
print 'MINUTE',sym
limits = []
sid = getsid(sym,symcodes,dbcon,c)
trysleep(c,'select min(epoch),max(epoch) from minute where sid=?',(sid,))
try: mine,maxe = c.fetchone()
except: mine,maxe = None,None
print sym,'minute data limits',mine,maxe
rr = getreqrange(mine,maxe)
if rr == None: continue
start,end = rr
dstart = start.strftime('%Y%m%d')
dend = end.strftime('%Y%m%d')
try: # if rollspec is 'noroll', then this will fail and goto except-block
ndaysbefore = int(rollspec)
print 'hist fetch for %s, %i days' % (sym,ndaysbefore)
rolldb.update_roll_db(blpcon,(sym,))
names = rolldb.get_contract_range(sym,ndaysbefore)
except: names = {sym:None}
# sort alphabetically here so oldest always gets done first
# (at least within the decade)
sorted_contracts = names.keys()
sorted_contracts.sort()
for contract in sorted_contracts:
print 'partial fetch',contract,names[contract]
if names[contract] == None:
_start,_end = start,end
else:
da,db = names[contract]
dc,dd = start,end
try: _start,_end = get_overlap(da,db,dc,dd)
except: continue # because get_overlap returning None cannot assign to tuple
# localstart and end are for printing and logging
localstart = _start.strftime('%Y/%m/%d %H:%M')
localend = _end.strftime('%Y/%m/%d %H:%M')
_start = datetime.utcfromtimestamp(time.mktime(_start.timetuple())).strftime(self.blpfmt)
_end = datetime.utcfromtimestamp(time.mktime(_end.timetuple())).strftime(self.blpfmt)
logging.debug('requesting intraday bars for %s (%s): %s to %s' % (sym,contract,localstart,localend))
print 'start,end:',localstart,localend
result = get_minute(blpcon,contract,_start,_end)
if len(result) == 0:
logging.error('warning: 0-length minute data fetch for %s,%s,%s' % (contract,_start,_end))
continue
event_count = len(result.values()[0])
print event_count,'events returned'
lap = time.clock()
# todo: split up writes: no more than 5000 before commit (so other threads get a chance)
# 100,000 rows is 13 seconds on my machine. 5000 should be 0.5 seconds.
try:
for i in range(event_count):
epoch = calendar.timegm(datetime.strptime(str(result['time'][i]),'%m/%d/%y %H:%M:%S').timetuple())
# this uses sid (from sym), NOT contract
row = (sid,epoch,result['open'][i],result['high'][i],result['low'][i],result['close'][i],result['volume'][i],result['numEvents'][i])
trysleep(c,'insert or ignore into minute (sid,epoch,open,high,low,close,volume,nevents) values (?,?,?,?,?,?,?,?)',row)
dbcon.commit()
except Exception,e:
print 'ERROR',e,'iterating result object'
logging.error(datetime.now().strftime() + ' error in get_history.histfetch writing DB')
# todo: tray notify the error and log it
lap = time.clock() - lap
print 'database write of %i rows in %.2f seconds' % (event_count,lap)
logging.debug(' -- minute bars %i rows (%.2f s)' % (event_count,lap))
for instr in self.instructions:
oldestdaily = datetime.now().replace(hour=0,minute=0,second=0,microsecond=0) - timedelta(self.dailyback)
sym = instr[0]
if instr[1] != 'daily': continue
print 'DAILY',sym
fields = instr[2]
rollspec = instr[3]
sid = getsid(sym,symcodes,dbcon,c)
unionrange = None,None
for f in fields:
try: fid = fieldcodes[f]
except:
trysleep(c,'insert into fields (field) values (?)',(f,))
trysleep(c,'select fid from fields where field=?',(f,))
fid, = c.fetchone()
dbcon.commit()
fieldcodes[fid] = f
fieldcodes[f] = fid
trysleep(c,'select min(epoch),max(epoch) from daily where sid=? and fid=?',(sid,fid))
mine,maxe = c.fetchone()
if mine == None or maxe == None:
unionrange = None
break
if unionrange == (None,None):
unionrange = mine,maxe
else:
unionrange = max(mine,unionrange[0]),min(maxe,unionrange[1])
print sym,'daily unionrange',unionrange
yesterday = datetime.now().replace(hour=0,minute=0,second=0,microsecond=0) - timedelta(days=1)
if unionrange == None:
reqrange = oldestdaily,yesterday
else:
mine = datetime.fromordinal(unionrange[0])
maxe = datetime.fromordinal(unionrange[1])
print 'comparing',mine,maxe,oldestdaily,yesterday
if oldestdaily < datetime.fromordinal(unionrange[0]): a = oldestdaily
else: a = maxe
reqrange = a,yesterday
if reqrange[0] >= reqrange[1]:
print 'skipping daily',sym,'because we\'re up to date'
continue
print 'daily request range',sym,reqrange,reqrange[0] > reqrange[1]
try:
ndaysbefore = int(rollspec) # exception if it's 'noroll'
print 'hist fetch for %s, %i days' % (sym,ndaysbefore)
rolldb.update_roll_db(blpcon,(sym,))
names = rolldb.get_contract_range(sym,ndaysbefore,daily=True)
except: names = {sym:None}
# sort alphabetically here so oldest always gets done first
# (at least within the year)
sorted_contracts = names.keys()
sorted_contracts.sort()
start,end = reqrange
for contract in sorted_contracts:
print 'partial fetch',contract,names[contract]
if names[contract] == None:
_start,_end = start,end
else:
da,db = names[contract]
dc,dd = start,end
try: _start,_end = get_overlap(da,db,dc,dd)
except: continue # because get_overlap returning None cannot assign to tuple
_start = _start.strftime('%Y%m%d')
_end = _end.strftime('%Y%m%d')
logging.info('daily bars for %s (%s), %s - %s' % (sym,contract,_start,_end))
result = get_daily(blpcon,(contract,),fields,_start,_end)
try: result = result[contract]
except:
print 'result doesn\'t contain requested symbol'
logging.error("ERROR: symbol '%s' not in daily request result" % contract)
# todo: log and alert error
continue
if not 'date' in result:
print 'result has no date field'
logging.error('ERROR: daily result has no date field')
# todo: log and alert error
continue
keys = result.keys()
keys.remove('date')
logging.info(' -- %i days returned' % len(result['date']))
for i in range(len(result['date'])):
ordinal = datetime.fromtimestamp(int(result['date'][i])).toordinal()
for k in keys:
trysleep(c,'insert or ignore into daily (sid,fid,epoch,value) values (?,?,?,?)',(sid,fieldcodes[k],ordinal,result[k][i]))
dbcon.commit()
Print the full traceback instead of just the exception message. The traceback will show you where the exception was raised and hence what the problem is:
import traceback
...
try: self.histfetch()
except Exception,e:
logging.error('get_history.histfetch error: %s %s' % (str(type(e)),str(e)))
logging.error(traceback.format_exc())
if self.errornotify != None:
self.errornotify('get_history error','%s %s' % ( str(type(e)), str(e) ) )
Update:
With the above (or similar, the idea being to look at the full traceback), you say:
it said it's with the "print" functions. The program works well after I disable all the "print" functions.
The print function calls you have in your post uses syntax valid in python 2.x only. If that is what you are using, perhaps the application that runs your script has undefined print and you're supposed to use a log function, otherwise I can't see anything wrong with the calls (unless you mean only one of the prints was the issue, then I would need to see the exact error to identify -- post this if you want to figure this out). If you are using Python 3.x, then you must use print(a, b, c, ...), see 3.x docs.