appending array breaks program - python

I am writing a program to analyze some of our invoice data. Basically,I need to take an array containing each individual invoice we sent out over the past year & break it down into twelve arrays which contains the invoices for that month using the dateSeperate() function, so that monthly_transactions[0] returns Januaries transactions, monthly_transactions[1] returns Februaries & so forth.
I've managed to get it working so that dateSeperate returns monthly_transactions[0] as the january transactions. However, once all of the January data is entered, I attempt to append the monthly_transactions array using line 44. However, this just causes the program to break & become unrepsonsive. The code still executes & doesnt return an error, but Python becomes unresponsive & I have to force quite out of it.
I've been writing the the global array monthly_transactions. dateSeperate runs fine as long as I don't include the last else statement. If I do that, monthly_transactions[0] returns an array containing all of the january invoices. the issue arises in my last else statement, which when added, causes Python to freeze.
Can anyone help me shed any light on this?
I have written a program that defines all of the arrays I'm going to be using (yes I know global arrays aren't good. I'm a marketer trying to learn programming so any input you could give me on how to improve this would be much appreciated
import csv
line_items = []
monthly_transactions = []
accounts_seperated = []
Then I import all of my data and place it into the line_items array
def csv_dict_reader(file_obj):
global board_info
reader = csv.DictReader(file_obj, delimiter=',')
for line in reader:
item = []
item.append(line["company id"])
item.append(line["user id"])
item.append(line["Amount"])
item.append(line["Transaction Date"])
item.append(line["FIrst Transaction"])
line_items.append(item)
if __name__ == "__main__":
with open("ChurnTest.csv") as f_obj:
csv_dict_reader(f_obj)
#formats the transacation date data to make it more readable
def dateFormat():
for i in range(len(line_items)):
ddmmyyyy =(line_items[i][3])
yyyymmdd = ddmmyyyy[6:] + "-"+ ddmmyyyy[:2] + "-" + ddmmyyyy[3:5]
line_items[i][3] = yyyymmdd
#Takes the line_items array and splits it into new array monthly_tranactions, where each value holds one month of data
def dateSeperate():
for i in range(len(line_items)):
#if there are no values in the monthly transactions, add the first line item
if len(monthly_transactions) == 0:
test = []
test.append(line_items[i])
monthly_transactions.append(test)
# check to see if the line items year & month match a value already in the monthly_transaction array.
else:
for j in range(len(monthly_transactions)):
line_year = line_items[i][3][:2]
line_month = line_items[i][3][3:5]
array_year = monthly_transactions[j][0][3][:2]
array_month = monthly_transactions[j][0][3][3:5]
#print(line_year, array_year, line_month, array_month)
#If it does, add that line item to that month
if line_year == array_year and line_month == array_month:
monthly_transactions[j].append(line_items[i])
#Otherwise, create a new sub array for that month
else:
monthly_transactions.append(line_items[i])
dateFormat()
dateSeperate()
print(monthly_transactions)
I would really, really appreciate any thoughts or feedback you guys could give me on this code.

Based on the comments on the OP, your csv_dict_reader function seems to do exactly what you want it to do, at least inasmuch as it appends data from its argument csv file to the top-level variable line_items. You said yourself that if you print out line_items, it shows the data that you want.
"But appending doesn't work." I take it you mean that appending the line_items to monthly_transactions isn't being done. The reason for that is that you didn't tell the program to do it! The appending that you're talking about is done as part of your dateSeparate function, however you still need to call the function.
I'm not sure exactly how you want to use your dateFormat and dateSeparate functions, but in order to use them, you need to include them in the main function somehow as calls, i.e. dateFormat() and dateSeparate().
EDIT: You've created the potential for an endless loop in the last else: section, which extends monthly_transactions by 1 if the line/array year/month aren't equal. This is problematic because it's within the loop for j in range(len(monthly_transactions)):. This loop will never get to the end if the length of monthly_transactions is increased by 1 every time through.

Related

how to create an avg, price function to loop over 3 index list with N lines in Python

thanks to stack overflow I was able to get help to finish one of the first steps to my project. First steps were to receive 2 files as txt, merge and sort by date. Currently, I am stuck on trying to loop over a list with 3 fields [Date, Buy/Sell, Quantity]. This part of the program must
Check list for first date and if it is a purchase, add to walletList.
If it is a sale, subtract from the previous WalletList Balance.
Create a function to get daily avg. price.
from typing import List
def file_to_list(file: str) -> list:
list= []
with open(file) as f:
for line in f:
list.append(line.strip().replace(";",","))
return list
def merge_list():
bitcoin = file_to_list("bitcoin.txt")
exchange = file_to_list("exchange.txt")
combined = bitcoin + exchange
sorted_combined = sorted(combined)
return sorted_combined
def getBuyorSell(sorted_combined:List):
walletList = []
#transactionDate = [line.split(",")[0] for line in sorted_combined]
#transactionType = [line.split(",")[1] for line in sorted_combined]
for line in sorted_combined:
if line.split(",")[1] == "buy":
walletList.append(line.split(",")[2])
for line in sorted_combined:
if line.split(",")[1] == "sell":
walletList.remove(line.split(",")[2])
str2int = [eval(i) for i in walletList]
totalWallet = sum(str2int)
print(f"Total of wallet: {totalWallet}")
if __name__ == "__main__":
sorted_combined = merge_list()
show_transactions(sorted_combined)
getBuyorSell(sorted_combined)
Price is fixed # 20,000 for ease.
Will implement a price getter for each day in the future.
bitcoin.txt
2022-08-01;buy;100
2022-08-04;buy;50
2022-08-06;buy;10
exchange.txt
2022-08-02;buy;200
2022-08-03;sell;50
2022-08-05;sell;25
Note: getBuyorSell function is sloppy and not working, I am aware of this. Just showed it for all to be able to see my thought process. My work day is over and the office is closing so I won't be able to work on this till tomorrow figured id post it anyways. if this is not okay, I can come back tomorrow to ask a more detailed and better-formed question with a more complete code. However, if anyone can help in the meantime to help guide me, it would be greatly appreciated. (I know I must use calculations to get the avg. price, just used Remove to show the thought process.)

Python string not clearing correctly in loop

I am wring a script where I need to go through a csv file and find am looking for the first time that specific user logged in, and the last time they logged out. I have loops set up that are working great but when I clear the lists with the time string of their login/logout, I get an Index out of range error. Can anyone spot anything incorrect with this?
#this gets the earliest login time for each agent (but it assumes all dates to be the same!)
with open(inputFile, 'r') as dailyAgentLog:
csv_read = csv.DictReader(dailyAgentLog)
firstLoginTime = []
lastLogoutTime = []
outputLine = []
while x < len(agentName):
for row in csv_read:
if row["Agent"] == agentName[x]:
firstLoginTime.append(datetime.strptime(row["Login Time"], '%I:%M:%S %p'))
lastLogoutTime.append(datetime.strptime(row["Logout Time"], '%I:%M:%S %p'))
firstLoginTime.sort()
lastLogoutTime.sort()
outputLine = [agentName[x], agentLogin[x], agentExtension[x], row["Login Date"], firstLoginTime[0], row["Logout Date"], lastLogoutTime[-1]]
print(f'Agent {agentName[x]} first login was {firstLoginTime[0]} and last logout {lastLogoutTime[-1]}.')
fileLines.append(outputLine)
x += 1
firstLoginTime.clear() #this should be emptying/clearing the list at the end of every iteration
lastLogoutTime.clear()
The problem is that on the 2nd and following iterations, the for row in csv_read: loop doesn't execute, because there's nothing left to read. So you never fill in the firstLoginTime and lastLoginTime lists on subsequent iterations, and indexing them fails.
If the file isn't too large, you can read it into a list before iterating:
csv_read = list(csv.DictReader(dailyAgentLog))
If it's too big to hold in memory, put
dailyAgentLog.seek(0)
at the end of the loop body.
Also, instead of sorting the lists, you can use min() and max():
firstLogin = min(firstLoginTime)
lastLogin = max(lastLoginTime)
And I suggest you use
for x in range(len(agentName)):
rather than while and increment.

Iterating over a csv file given a specific range

So the problem I'm having is that I'm iterating over a pretty large csv file. startDate and endDate are input given to me by the user and I need to only search in that range.
Although, when I run the program up to that point, it takes a long time to just spit back out "set()" at me. I've pointed where I'm having trouble at in the code
looking for suggestions and possibly sample code, thank you all in advance!
def compare(word1, word2, startDate, endDate):
with open('all_words.csv') as allWords:
readWords = csv.reader(allWords, delimiter=',')
year = set()
for row in readWords:
if row[1] in range(int(startDate), int(endDate)): #< Having trouble here
if row[0] == word1:
year.add(row[1])
print(year)
The reason your test isn't finding any years is that the expression:
row[1] in range(int(startDate), int(endDate))
is checking to see if a string value appears in a list of integers. If you test:
"1970" in range(1960, 1980)
you will see that it returns False. You need to write:
int(row[1]) in range(int(startDate), int(endDate))
However, this is still quite inefficient. It is checking if the value int(row[1]) occurs anywhere in the sequence [int(startDate), int(startDate)+1, ..., int(endDate)], and it's doing it by linear search. Much faster will be:
if int(startDate) <= int(row[1]) < int(endDate):
Note that your code above was written to exclude endDate for the list of possible dates (because range excludes its second argument), and I've done the same above.
Edit: Actually, I guess I should point out that it's only Python 2 where an expression like 500000 in range(1, 1000000) is inefficient. In Python 3 (or in Python 2 with xrange in place of range), it's fast.
You can try read_csv function of pandas library. This function allows you to read a desirable amount of data each time. So you can overcome the size problem.
reader = pd.read_csv(file_name, chunksize=chunk_size, iterator=True)
while True:
try:
df = reader.get_chunk(chunk_size)
# select data rows which have desired dates
except:
break
del df

Splitting a list into a file without duplicates

Large data file like this:
133621 652.4 496.7 1993.0 ...
END SAMPLES EVENTS RES 271.0 2215.0 ...
ESACC 935.6 270.6 2215.0 ...
115133 936.7 270.3 2216.0 ...
115137 936.4 270.4 2219.0 ...
115141 936.1 271.0 2220.0 ...
ESACC L 114837 115141 308 938.5 273.3 2200
115145 936.3 271.8 2220.0 ...
END 115146 SAMPLES EVENTS RES 44.11 44.09
SFIX L 133477
133477 650.8 500.0 2013.0 ...
133481 650.2 499.9 2012.0 ...
ESACC 650.0 500.0 2009.0 ...
Want to grab only the ESACC data into trials. When END appears, preceding ESACC data is aggregated into a trial. Right now, I can get the first chunk of ESACC data into a file but because the loop restarts from the beginning of the data, it keeps grabbing only the first chunk so I have 80 trials with the exact same data.
for i in range(num_trials):
with open(fid) as testFile:
for tline in testFile:
if 'END' in tline:
fid_temp_start.close()
fid_temp_end.close() #Close the files
break
elif 'ESACC' in tline:
tline_snap = tline.split()
sac_x_start = tline_snap[4]
sac_y_start = tline_snap[5
sac_x_end = tline_snap[7]
sac_y_end = tline_snap[8]
My question: How to iterate to the next chunk of data without grabbing the previous chunks?
Try rewriting your code something like this:
def data_parse(filepath): #Make it a function
try:
with open(filepath) as testFile:
tline = '' #Initialize tline
while True: #Switch to an infinite while loop (I'll explain why)
while 'ESACC' not in tline: #Skip lines until one containing 'ESACC' is found
tline = next(testFile) #(since it seems like you're doing that anyway)
tline_snap = tline.split()
trial = [tline_snap[4],'','',''] #Initialize list and assign first value
trial[1] = tline_snap[5]
trial[2] = tline_snap[7]
trial[3] = tline_snap[8]
while 'END' not in tline: #Again, seems like you're skipping lines
tline = next(testFile) #so I'll do the same
yield trial #Output list, save function state
except StopIteration:
fid_temp_start.close() #I don't know where these enter the picture
fid_temp_end.close() #but you closed them so I will too
testfile.close()
#Now, initialize a new list and call the function:
trials = list()
for trial in data_parse(fid);
trials.append(trial) #Creates a list of lists
What this creates is a generator function. By using yield instead of return, the function returns a value AND saves its state. The next time you call the function (as you will do repeatedly in the for loop at the end), it picks up where it left off. It starts at the line after the most recently executed yield statement (which in this case restarts the while loop) and, importantly, it remembers the values of any variables (like the value of tline and the point it stopped at in the data file).
When you reach the end of the file (and have thus recorded all of your trials), the next execution of tline = next(testFile) raises a StopIteration error. The try - except structure catches that error and uses it to exit the while loop and close your files. This is why we use an infinite loop; we want to continue looping until that error forces us out.
At the end of the whole thing, your data is stored in trials as a list of lists, where each item equals [sac_x_start, sac_y_start, sac_x_end, sac_y_end], as you defined them in your code, for one trial.
Note: it does seem to me like your code is skipping lines entirely when they don't contain ESACC or END. I've replicated that, but I'm not sure if that's what you want. If you want to get the lines in between, you can rewrite this fairly simply by adding to the 'END' loop as below:
while 'END' not in tline:
tline = next(testFile)
#(put assignment operations to be applied to each line here)
Of course, you'll have to adjust the variable you're using to store this data accordingly.
Edit: Oh dear lord, I just now noticed how old this question is.

Python: How to speed up creating of objects?

I'm creating objects derived from a rather large txt file. My code is working properly but takes a long time to run. This is because the elements I'm looking for in the first place are not ordered and not (necessarily) unique. For example I am looking for a digit-code that might be used twice in the file but could be in the first and the last row. My idea was to check how often a certain code is used...
counter=collections.Counter([l[3] for l in self.body])
...and then loop through the counter. Advance: if a code is only used once you don't have to iterate over the whole file. However You are stuck with a lot of iterations which makes the process really slow.
So my question really is: how can I improve my code? Another idea of course is to oder the data first. But that could take quite long as well.
The crucial part is this method:
def get_pc(self):
counter=collections.Counter([l[3] for l in self.body])
# This returns something like this {'187':'2', '199':'1',...}
pcode = []
#loop through entries of counter
for k,v in counter.iteritems():
i = 0
#find post code in body
for l in self.body:
if i == v:
break
# find fist appearence of key
if l[3] == k:
#first encounter...
if i == 0:
#...so create object
self.pc = CodeCana(k,l[2])
pcode.append(self.pc)
i += 1
# make attributes
self.pc.attr((l[0],l[1]),l[4])
if v <= 1:
break
return pcode
I hope the code explains the problem sufficiently. If not, let me know and I will expand the provided information.
You are looping over body way too many times. Collapse this into one loop, and track the CodeCana items in a dictionary instead:
def get_pc(self):
pcs = dict()
pcode = []
for l in self.body:
pc = pcs.get(l[3])
if pc is None:
pc = pcs[l[3]] = CodeCana(l[3], l[2])
pcode.append(pc)
pc.attr((l[0],l[1]),l[4])
return pcode
Counting all items first then trying to limit looping over body by that many times while still looping over all the different types of items defeats the purpose somewhat...
You may want to consider giving the various indices in l names. You can use tuple unpacking:
for foo, bar, baz, egg, ham in self.body:
pc = pcs.get(egg)
if pc is None:
pc = pcs[egg] = CodeCana(egg, baz)
pcode.append(pc)
pc.attr((foo, bar), ham)
but building body out of a namedtuple-based class would help in code documentation and debugging even more.

Categories

Resources