Python string not clearing correctly in loop - python

I am wring a script where I need to go through a csv file and find am looking for the first time that specific user logged in, and the last time they logged out. I have loops set up that are working great but when I clear the lists with the time string of their login/logout, I get an Index out of range error. Can anyone spot anything incorrect with this?
#this gets the earliest login time for each agent (but it assumes all dates to be the same!)
with open(inputFile, 'r') as dailyAgentLog:
csv_read = csv.DictReader(dailyAgentLog)
firstLoginTime = []
lastLogoutTime = []
outputLine = []
while x < len(agentName):
for row in csv_read:
if row["Agent"] == agentName[x]:
firstLoginTime.append(datetime.strptime(row["Login Time"], '%I:%M:%S %p'))
lastLogoutTime.append(datetime.strptime(row["Logout Time"], '%I:%M:%S %p'))
firstLoginTime.sort()
lastLogoutTime.sort()
outputLine = [agentName[x], agentLogin[x], agentExtension[x], row["Login Date"], firstLoginTime[0], row["Logout Date"], lastLogoutTime[-1]]
print(f'Agent {agentName[x]} first login was {firstLoginTime[0]} and last logout {lastLogoutTime[-1]}.')
fileLines.append(outputLine)
x += 1
firstLoginTime.clear() #this should be emptying/clearing the list at the end of every iteration
lastLogoutTime.clear()

The problem is that on the 2nd and following iterations, the for row in csv_read: loop doesn't execute, because there's nothing left to read. So you never fill in the firstLoginTime and lastLoginTime lists on subsequent iterations, and indexing them fails.
If the file isn't too large, you can read it into a list before iterating:
csv_read = list(csv.DictReader(dailyAgentLog))
If it's too big to hold in memory, put
dailyAgentLog.seek(0)
at the end of the loop body.
Also, instead of sorting the lists, you can use min() and max():
firstLogin = min(firstLoginTime)
lastLogin = max(lastLoginTime)
And I suggest you use
for x in range(len(agentName)):
rather than while and increment.

Related

How to read a text file and convert into a list for use with statistics package in Python

The code I am running so far is as follows
import os
import math
import statistics
def main ():
infile = open('USPopulation.txt', 'r')
values = infile.read()
infile.close()
index = 0
while index < len(values):
values(index) = int(values(index))
index += 1
print(values)
main()
The text file contains 41 rows of numbers each entered on a single line like so:
151868
153982
156393
158956
161884
165069
168088
etc.
My tasks is to create a program which shows average change in population during the time period. The year with the greatest increase in population during the time period. The year with the smallest increase in population (from the previous year) during the time period.
The code will print each of the text files entries on a single line, but upon trying to convert to int for use with the statistics package I am getting the following error:
values(index) = int(values(index))
SyntaxError: can't assign to function call
The values(index) = int(values(index)) line was taken from reading as well as resources on stack overflow.
You can change values = infile.read() to values = list(infile.read())
and it will have it ouput as a list instead of a string.
One of the things that tends to happen whenever reading a file like this is, at the end of every line there is an invisible '\n' that declares a new line within the text file, so an easy way to split it by lines and turn them into integers would be, instead of using values = list(infile.read()) you could use values = values.split('\n') which splits the based off of lines, as long as values was previously declared.
and the while loop that you have can be easily replace with a for loop, where you would use len(values) as the end.
the values(index) = int(values(index)) part is a decent way to do it in a while loop, but whenever in a for loop, you can use values[i] = int(values[i]) to turn them into integers, and then values becomes a list of integers.
How I would personally set it up would be :
import os
import math
import statistics
def main ():
infile = open('USPopulation.txt', 'r')
values = infile.read()
infile.close()
values = values.split('\n') # Splits based off of lines
for i in range(0, len(values)) : # loops the length of values and turns each part of values into integers
values[i] = int(values[i])
changes = []
# Use a for loop to get the changes between each number.
for i in range(0, len(values)-1) : # you put the -1 because there would be an indexing error if you tried to count i+1 while at len(values)
changes.append(values[i+1] - values[i]) # This will get the difference between the current and the next.
print('The max change :', max(changes), 'The minimal change :', min(changes))
#And since there is a 'change' for each element of values, meaning if you print both changes and values, you would get the same number of items.
print('A change of :', max(changes), 'Happened at', values[changes.index(max(changes))]) # changes.index(max(changes)) gets the position of the highest number in changes, and finds what population has the same index (position) as it.
print('A change of :', min(changes), 'Happened at', values[changes.index(min(changes))]) #pretty much the same as above just with minimum
# If you wanted to print the second number, you would do values[changes.index(min(changes)) + 1]
main()
If you need any clarification on anything I did in the code, just ask.
I personally would use numpy for reading a text file.
in your case I would do it like this:
import numpy as np
def main ():
infile = np.loadtxt('USPopulation.txt')
maxpop = np.argmax(infile)
minpop = np.argmin(infile)
print(f'maximum population = {maxpop} and minimum population = {minpop}')
main()

Iterating over a csv file given a specific range

So the problem I'm having is that I'm iterating over a pretty large csv file. startDate and endDate are input given to me by the user and I need to only search in that range.
Although, when I run the program up to that point, it takes a long time to just spit back out "set()" at me. I've pointed where I'm having trouble at in the code
looking for suggestions and possibly sample code, thank you all in advance!
def compare(word1, word2, startDate, endDate):
with open('all_words.csv') as allWords:
readWords = csv.reader(allWords, delimiter=',')
year = set()
for row in readWords:
if row[1] in range(int(startDate), int(endDate)): #< Having trouble here
if row[0] == word1:
year.add(row[1])
print(year)
The reason your test isn't finding any years is that the expression:
row[1] in range(int(startDate), int(endDate))
is checking to see if a string value appears in a list of integers. If you test:
"1970" in range(1960, 1980)
you will see that it returns False. You need to write:
int(row[1]) in range(int(startDate), int(endDate))
However, this is still quite inefficient. It is checking if the value int(row[1]) occurs anywhere in the sequence [int(startDate), int(startDate)+1, ..., int(endDate)], and it's doing it by linear search. Much faster will be:
if int(startDate) <= int(row[1]) < int(endDate):
Note that your code above was written to exclude endDate for the list of possible dates (because range excludes its second argument), and I've done the same above.
Edit: Actually, I guess I should point out that it's only Python 2 where an expression like 500000 in range(1, 1000000) is inefficient. In Python 3 (or in Python 2 with xrange in place of range), it's fast.
You can try read_csv function of pandas library. This function allows you to read a desirable amount of data each time. So you can overcome the size problem.
reader = pd.read_csv(file_name, chunksize=chunk_size, iterator=True)
while True:
try:
df = reader.get_chunk(chunk_size)
# select data rows which have desired dates
except:
break
del df

Splitting a list into a file without duplicates

Large data file like this:
133621 652.4 496.7 1993.0 ...
END SAMPLES EVENTS RES 271.0 2215.0 ...
ESACC 935.6 270.6 2215.0 ...
115133 936.7 270.3 2216.0 ...
115137 936.4 270.4 2219.0 ...
115141 936.1 271.0 2220.0 ...
ESACC L 114837 115141 308 938.5 273.3 2200
115145 936.3 271.8 2220.0 ...
END 115146 SAMPLES EVENTS RES 44.11 44.09
SFIX L 133477
133477 650.8 500.0 2013.0 ...
133481 650.2 499.9 2012.0 ...
ESACC 650.0 500.0 2009.0 ...
Want to grab only the ESACC data into trials. When END appears, preceding ESACC data is aggregated into a trial. Right now, I can get the first chunk of ESACC data into a file but because the loop restarts from the beginning of the data, it keeps grabbing only the first chunk so I have 80 trials with the exact same data.
for i in range(num_trials):
with open(fid) as testFile:
for tline in testFile:
if 'END' in tline:
fid_temp_start.close()
fid_temp_end.close() #Close the files
break
elif 'ESACC' in tline:
tline_snap = tline.split()
sac_x_start = tline_snap[4]
sac_y_start = tline_snap[5
sac_x_end = tline_snap[7]
sac_y_end = tline_snap[8]
My question: How to iterate to the next chunk of data without grabbing the previous chunks?
Try rewriting your code something like this:
def data_parse(filepath): #Make it a function
try:
with open(filepath) as testFile:
tline = '' #Initialize tline
while True: #Switch to an infinite while loop (I'll explain why)
while 'ESACC' not in tline: #Skip lines until one containing 'ESACC' is found
tline = next(testFile) #(since it seems like you're doing that anyway)
tline_snap = tline.split()
trial = [tline_snap[4],'','',''] #Initialize list and assign first value
trial[1] = tline_snap[5]
trial[2] = tline_snap[7]
trial[3] = tline_snap[8]
while 'END' not in tline: #Again, seems like you're skipping lines
tline = next(testFile) #so I'll do the same
yield trial #Output list, save function state
except StopIteration:
fid_temp_start.close() #I don't know where these enter the picture
fid_temp_end.close() #but you closed them so I will too
testfile.close()
#Now, initialize a new list and call the function:
trials = list()
for trial in data_parse(fid);
trials.append(trial) #Creates a list of lists
What this creates is a generator function. By using yield instead of return, the function returns a value AND saves its state. The next time you call the function (as you will do repeatedly in the for loop at the end), it picks up where it left off. It starts at the line after the most recently executed yield statement (which in this case restarts the while loop) and, importantly, it remembers the values of any variables (like the value of tline and the point it stopped at in the data file).
When you reach the end of the file (and have thus recorded all of your trials), the next execution of tline = next(testFile) raises a StopIteration error. The try - except structure catches that error and uses it to exit the while loop and close your files. This is why we use an infinite loop; we want to continue looping until that error forces us out.
At the end of the whole thing, your data is stored in trials as a list of lists, where each item equals [sac_x_start, sac_y_start, sac_x_end, sac_y_end], as you defined them in your code, for one trial.
Note: it does seem to me like your code is skipping lines entirely when they don't contain ESACC or END. I've replicated that, but I'm not sure if that's what you want. If you want to get the lines in between, you can rewrite this fairly simply by adding to the 'END' loop as below:
while 'END' not in tline:
tline = next(testFile)
#(put assignment operations to be applied to each line here)
Of course, you'll have to adjust the variable you're using to store this data accordingly.
Edit: Oh dear lord, I just now noticed how old this question is.

Iterating over list with while and for loop in python - issues

I'm trying to query the Twitter API with a list of names and get their friends list. The API part is fine, but I can't figure out how to go through the first 5 names, pull the results, wait for a while to respect the rate limit, then do it again for the next 5 until the list is over. The bit of the code I'm having trouble is this:
first = 0
last = 5
while last < 15: #while last group of 5 items is lower than number of items in list#
for item in list[first:last]: #parses each n twitter IDs in the list#
results = item
text_file = open("output.txt", "a") #creates empty txt output / change path to desired output#
text_file.write(str(item) + "," + results + "\n") #adds twitter ID, resulting friends list, and a line skip to the txt output#
text_file.close()
first = first + 5 #updates list navigation to move on to next group of 5#
last = last + 5
time.sleep(5) #suspends activities for x seconds to respect rate limit#
Shouldn't this script go through the first 5 items in the list, add them to the output file, then change the first:last argument and loop it until the "last" variable is 15 or higher?
No, because your indentation is wrong. Everything happens inside the for loop, so it'll process one item, then change first and last, then sleep...
Move the last three lines back one indent, so that they line up with the for statement. That way they'll be executed once the first five have been done.
Daniel found the issue, but here are some code improvements suggestions:
first, last = 0, 5
with open("output.txt", "a") as text_file:
while last < 15:
for twitter_ID in twitter_IDs[first:last]:
text_file.write("{0},{0}\n".format(twitter_ID))
first += 5
last += 5
time.sleep(5)
As you can see, I removed the results = item as it seemed redundant, leveraged with open..., also used += for increments.
Can you explain why you where doing item = results?

appending array breaks program

I am writing a program to analyze some of our invoice data. Basically,I need to take an array containing each individual invoice we sent out over the past year & break it down into twelve arrays which contains the invoices for that month using the dateSeperate() function, so that monthly_transactions[0] returns Januaries transactions, monthly_transactions[1] returns Februaries & so forth.
I've managed to get it working so that dateSeperate returns monthly_transactions[0] as the january transactions. However, once all of the January data is entered, I attempt to append the monthly_transactions array using line 44. However, this just causes the program to break & become unrepsonsive. The code still executes & doesnt return an error, but Python becomes unresponsive & I have to force quite out of it.
I've been writing the the global array monthly_transactions. dateSeperate runs fine as long as I don't include the last else statement. If I do that, monthly_transactions[0] returns an array containing all of the january invoices. the issue arises in my last else statement, which when added, causes Python to freeze.
Can anyone help me shed any light on this?
I have written a program that defines all of the arrays I'm going to be using (yes I know global arrays aren't good. I'm a marketer trying to learn programming so any input you could give me on how to improve this would be much appreciated
import csv
line_items = []
monthly_transactions = []
accounts_seperated = []
Then I import all of my data and place it into the line_items array
def csv_dict_reader(file_obj):
global board_info
reader = csv.DictReader(file_obj, delimiter=',')
for line in reader:
item = []
item.append(line["company id"])
item.append(line["user id"])
item.append(line["Amount"])
item.append(line["Transaction Date"])
item.append(line["FIrst Transaction"])
line_items.append(item)
if __name__ == "__main__":
with open("ChurnTest.csv") as f_obj:
csv_dict_reader(f_obj)
#formats the transacation date data to make it more readable
def dateFormat():
for i in range(len(line_items)):
ddmmyyyy =(line_items[i][3])
yyyymmdd = ddmmyyyy[6:] + "-"+ ddmmyyyy[:2] + "-" + ddmmyyyy[3:5]
line_items[i][3] = yyyymmdd
#Takes the line_items array and splits it into new array monthly_tranactions, where each value holds one month of data
def dateSeperate():
for i in range(len(line_items)):
#if there are no values in the monthly transactions, add the first line item
if len(monthly_transactions) == 0:
test = []
test.append(line_items[i])
monthly_transactions.append(test)
# check to see if the line items year & month match a value already in the monthly_transaction array.
else:
for j in range(len(monthly_transactions)):
line_year = line_items[i][3][:2]
line_month = line_items[i][3][3:5]
array_year = monthly_transactions[j][0][3][:2]
array_month = monthly_transactions[j][0][3][3:5]
#print(line_year, array_year, line_month, array_month)
#If it does, add that line item to that month
if line_year == array_year and line_month == array_month:
monthly_transactions[j].append(line_items[i])
#Otherwise, create a new sub array for that month
else:
monthly_transactions.append(line_items[i])
dateFormat()
dateSeperate()
print(monthly_transactions)
I would really, really appreciate any thoughts or feedback you guys could give me on this code.
Based on the comments on the OP, your csv_dict_reader function seems to do exactly what you want it to do, at least inasmuch as it appends data from its argument csv file to the top-level variable line_items. You said yourself that if you print out line_items, it shows the data that you want.
"But appending doesn't work." I take it you mean that appending the line_items to monthly_transactions isn't being done. The reason for that is that you didn't tell the program to do it! The appending that you're talking about is done as part of your dateSeparate function, however you still need to call the function.
I'm not sure exactly how you want to use your dateFormat and dateSeparate functions, but in order to use them, you need to include them in the main function somehow as calls, i.e. dateFormat() and dateSeparate().
EDIT: You've created the potential for an endless loop in the last else: section, which extends monthly_transactions by 1 if the line/array year/month aren't equal. This is problematic because it's within the loop for j in range(len(monthly_transactions)):. This loop will never get to the end if the length of monthly_transactions is increased by 1 every time through.

Categories

Resources