I am currently working on this project and have some issues finding the average time based on their corresponding types. For now, I have my output as shown below after reading my CSV file.
#Following this format (x typeList : timeTakenList)
0 Lift : 5 days, 5:39:00
1 Lift : 5 days, 5:31:00
2 lighting : 3 days, 9:47:00
3 ACMV : 5 days, 5:21:00
4 lighting : 3 days, 9:32:00
.
.
.
How do I calculate the average time taken for each type such that I have the following output?
0 Lift : (5 days, 5:31:00 + 5 days, 5:39:00) / 2
1 lighting : (3 days, 9:47:00 + 3 days, 9:32:00) / 2
2 ACMV : 5 days, 5:21:00
.
.
.
The timeTakenList is calculated from subtracting another column, Acknowledged Date from the column CompletedDate.
timeTakenList = completedDate - acknowledgedDate
There is a ton of other types in the typeList and I'm trying to avoid doing an if statement like if typeList[x] == "Lift", then add the time together, etc.
Example of .csv file:
I'm not too sure how else to make my question clearer, but any help is much appreciated.
check the below implementation.
import re
typeList = ["Lighting", "Lighting", "Air-Con", "Air-Con", "Toilet"]
timeTakenList = ["10hours", "5hours", "2days, 5hrs", "5hours", "4hours"]
def convertTime(total_time):
li = list(map(int,re.sub(r"days|hrs|hours","",total_time).split(",")))
if len(li) == 2:
return li[0] * 24 + li[1]
return li[0]
def convertDays(total_time):
days = int(total_time / 24)
hours = total_time % 24
if (hours).is_integer():
hours = int(hours)
if days == 0:
return str(hours) + "hours"
return str(days) + "days, " + str(hours) + "hrs"
def avg(numbers):
av = float(sum(numbers)) / max(len(numbers), 1)
return av
avgTime = {}
for i, types in enumerate(typeList):
if avgTime.has_key(types):
avgTime[types].append(convertTime(timeTakenList[i]))
else:
avgTime[types] = [convertTime(timeTakenList[i])]
for types in avgTime.keys():
print types + " : " + convertDays(avg(avgTime[types]))
Algo
Convert strip "hours", "hrs", "days" from the timeTakenList list.
convert the elements consisting of days and hours into hours for
easier avg calculation
create hash having typeList elements as keys and converted
timeTakenList elements as list of values
print the values for each key converted back to days and hours.
Related
Running this code produces the error message :
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I have 6 years' worth of competitor's results from a 1/2 marathon in one csv file.
The function year_runners aims to create a new column for each year with a difference in finishing time between each runner.
Is there a more efficient way of producing the same result?
Thanks in advance.
Pos Gun_Time Chip_Time Name Number Category
1 1900-01-01 01:19:15 1900-01-01 01:19:14 Steve Hodges 324 Senior Male
2 1900-01-01 01:19:35 1900-01-01 01:19:35 Theo Bately 92 Supervet Male
#calculating the time difference between each finisher in year and adding this result to into a new column called time_diff
def year_runners(year, x, y):
print('Event held in', year)
# x is the first number (position) for the runner of that year,
# y is the last number (position) for that year e.g. 2016 event spans from df[246:534]
time_diff = 0
#
for index, row in df.iterrows():
time_diff = df2015.loc[(x + 1),'Gun_Time'] - df2015.loc[(x),'Chip_Time']
# using Gun time as the start-time for all.
# using chip time as finishing time for each runner.
# work out time difference between the x-placed runner and the runner behind (x + 1)
df2015.loc[x,'time_diff'] = time_diff #set the time_diff column to the value of time_diff for
#each row of x in the dataframe
print("Runner",(x+1),"time, minus runner" , x,"=",time_diff)
x += 1
if x > y:
break
Hi everyone, this was solved using the shift technique.
youtube.com/watch?v=nZzBj6n_abQ
df2015['shifted_Chip_Time'] = df2015['Chip_Time'].shift(1)
df2015['time_diff'] = df2015['Gun_Time'] - df2015['shifted_Chip_Time']
If I wanted to write a program where it takes your schedule of week and assign it to a list
Schedule will include seven days a week and user enters what he has in his schedule
So my problem is : I'm using for in for loop to show the day and assign the data of schedule of user. In second for I cant assign data to each assignment and only 1 is assign
When I assign the data it only saves it in first list not other 6.
in this code schedule is a list that includes other 7 lists named from a to g like this: schedule=[a,...,g]. what should I do ?
for day in week:
for data in schedule:
while True:
b = inputrange("Your schedule for " + day + " ? "7)
if b == 'done':
break
data . append (b)
break
for data in schedule:
print(data)
print(40 * '='7):
it will like multi...table but what I want to assign each data to each day butis : only and only 0*0 and then 1*1 then 2*2 and so on I only this part of code not all of the data assigns to 1 day.
for i in range(7):
print(i, "*", i, " = ", i * i)
Prints:
0 * 0 = 0
1 * 1 = 1
2 * 2 = 4
3 * 3 = 9
4 * 4 = 16
5 * 5 = 25
6 * 6 = 36
I don't think you really need two for loops and one while loop.
refer to the code below:
schedule = []
week = ['sun','mon','tue','wed','thu','fri','sat']
for day in week:
while True:
b = input("Your schedule for " + day + " ? ")
if b == 'done':
break
schedule.append(b)
for data in schedule:
print(data)
print(40 * '=')
You should consider using a dictionary in the following structure to store the data accurately and present it in a pleasing manner later.
schedule = {
'Mon':['task1', 'task2', ....],
'Tue': ['task1', 'task2', 'task3', ....]
.
.
.
}
[This question has changed alot, so this answer references old material]
If you could post more information that would be great. Not sure what you mean by fors working in parallel. This is more like a nested for loop. Take a look at nested loops that might help.
But you have 3 loops within side each other. You probably only need to have 2 two and then can remove a few breaks. You need to loop through the days of the week and then to loop through the schedule.
Your code could look like this.
week = ["Mon","Tues","Wed","Thur","Fri"]
schedule = []#this is a 2 dimentional array. It shows the days and each event of per day
for day in week:
#loop through each day of the week
data = []
while True:
#ask for the schedule
b = input("Your schedule for " + day + " ? ")
if b == 'done':
break#done
data.append(b)
schedule.append(data)#add to day
#print result
for data in schedule:
print(data)
print(40 * '=')
I have a data set that pulls in 24 months of data from the day the script is ran using python that I group by quarters with the below code.
def quarter_selection(input_date):
curr_mnth = input_date.month
if curr_mnth % 3 == 1:
end_qrtr = input_date.quarter - 2
else:
end_qrtr = input_date.quarter
return end_qrtr
def quarter_helper(date_col):
return 'Q' + str(pd.to_datetime(date_col).quarter) + ' ' + str(pd.to_datetime(date_col).year)
I now would like to modify the below code to pull quarterly data with one conditions. I have attempted to do this by creating a new list but now I'm stuck. What the below code provides is quarterly data by year. Is there any way to replace "2017" with a condition that Exclude the current quarter and pulls the previous quarter.
New_list = ["Q" + str(x) + " 2017" for x in range(1, 5)]
I have a huuge csv file (524 MB, notepad opens it for 4 minutes) that I need to change formatting of. Now it's like this:
1315922016 5.800000000000 1.000000000000
1315922024 5.830000000000 3.000000000000
1315922029 5.900000000000 1.000000000000
1315922034 6.000000000000 20.000000000000
1315924373 5.950000000000 12.452100000000
The lines are divided by a newline symbol, when I paste it into Excel it divides into lines. I would've done it by using Excel functions but the file is too big to be opened.
First value is the number of seconds since 1-01-1970, second is price, third is volumen.
I need it to be like this:
01-01-2009 13:55:59 5.800000000000 1.000000000000 01-01-2009 13:56:00 5.830000000000 3.000000000000
etc.
Records need to be divided by a space. Sometimes there are multiple values of price from the same second like this:
1328031552 6.100000000000 2.000000000000
1328031553 6.110000000000 0.342951630000
1328031553 6.110000000000 0.527604200000
1328031553 6.110000000000 0.876088370000
1328031553 6.110000000000 0.971026920000
1328031553 6.100000000000 0.965781090000
1328031589 6.150000000000 0.918752490000
1328031589 6.150000000000 0.940974100000
When this happens, I need the code to take average price from that second and save just one price for each second.
These are bitcoin transactions which didn't happen every second when BTC started.
When there is no record from some second, there needs to be created a new record with the following second and the values of price and volumen copied from the last known price and volumen.
Then save everything to a new txt file.
I can't seem to do it, I've been trying to write a converter in python for hours, please help.
shlex is a lexical parser. We use it to pick the numbers from the input one at a time. Function records groups these into lists where the first element of the list is an integer and the other elements are floating points.
The loop reads the results of records and averages on times as necessary. It also prints two outputs to a line.
from shlex import shlex
lexer = shlex(instream=open('temp.txt'), posix=False)
lexer.wordchars = r'0123456789.\n'
lexer.whitespace = ' \n'
lexer.whitespace_split = True
import time
def Records():
record = []
while True:
token = lexer.get_token()
if token:
token = token.strip()
if token:
record.append(token)
if len(record)==3:
record[0] = int(record[0])
record[1] = float(record[1])
record[2] = float(record[2])
yield record
record=[]
else:
break
else:
break
def conv_time(t):
return time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(t))
records = Records()
pos = 1
current_date, price, volume = next(records)
price_sum = price
volume_sum = volume
count = 1
for raw_date, price, volume in records:
if raw_date == current_date:
price_sum += price
volume_sum += volume
count += 1
else:
print (conv_time(current_date), price_sum/count, volume_sum/count, end=' ' if pos else '\n')
pos = (pos+1)%2
current_date = raw_date
price_sum = price
volume_sum = volume
count = 1
print (conv_time(current_date), price_sum/count, volume_sum/count, end=' ' if pos else '\n')
Here are the results. You might need to do something about significant digits to the rights of decimal points.
2011-09-13 09:53:36 5.8 1.0 2011-09-13 09:53:44 5.83 3.0
2011-09-13 09:53:49 5.9 1.0 2011-09-13 09:53:54 6.0 20.0
2011-09-13 10:32:53 5.95 12.4521 2012-01-31 12:39:12 6.1 2.0
2012-01-31 12:39:13 6.108 0.736690442 2012-01-31 12:39:49 6.15 0.9298632950000001
1) Reading a single line from a file
data = {}
with open(<path to file>) as fh:
while True:
line = fh.readline()[:-1]
if not line: break
values = line.split(' ')
for n in range(0, len(values), 3):
dt, price, volumen = values[n:n+3]
2) Checking if it's the next second after the last record's
If so, adding the price and volumen values to a variable and increasing a counter for later use in calculating the average
3) If the second is not the next second, copy values of last price and volumen.
if not dt in data:
data[dt] = []
data[dt].append((price, volumen))
4) Divide timestamps like "1328031552" into seconds, minutes, hours, days, months, years.
Somehow take care of gap years.
for dt in data:
# seconds, minutes, hours, days, months, years = datetime (dt)
... for later use in calculating the average
p_sum, v_sum = 0
for p, v in data[dt]:
p_sum += p
v_sum += v
n = len(data[dt])
price = p_sum / n
volumen = v_sum / n
5) Arrange values in the 01-01-2009 13:55:59 1586.12 220000 order
6) Add the record to the end of the new database file.
print(datetime, price, volumen)
I have monthly return and want to calculate annualized return by two groups. Below is the sample data.
Return_M Rise
0.097425 1
0.188547 1
-0.1509 1
0.28011 1
-0.09596 1
0.041459 1
0.106838 1
0.046581 0
-0.16068 0
0.009242 0
0.006104 0
-0.00709 0
0.050352 0
-0.01023 0
-0.00731 0
0.031946 0
0.048552 0
This is what I tried, but the code actually count the length of df1 not by group. I hope a method that could be applied broadly.
df2 = df1.groupby(['Rise'])[['Return_M']].apply(lambda x:np.prod(1+x)**(12/len(x)))
This is the expected output:
Rise Return_M
1 0.249862
0 -0.00443
You only have to groupby on Rise column and aggregate on the Return_M column.
The following snippet assumes you want to divide by 12 (based on your question)
df2 = df1.groupby('Rise').agg({'Return_M': 'sum'}).reset_index()
df2['avg'] = df2['Return_M']/12
df2[['Rise', 'avg']]
But if you need the average based on however many records you have for each group of Rise, you can simply do:
df2 = df1.groupby('Rise').agg('Return_M': 'mean')
EDIT: Editing the answer based on OP's comment:
To get the geometric annualized return as per your formula, the following will work:
df.groupby('Rise').Return_M.apply(lambda x: (1+x).product() ** (12/float(len(x))))
However, the output is different from the expected output you posted in your question:
Rise
0 0.986765
1 1.952498
This however is exactly the correct output as per the formula you described.
I did this calculation manually too, for Rise = 1:
I took the product of each (1 plus Return_M) value
Raised the product to (12 divided by length of the group, which is 7 for this group).
(1 + 0.097425) * (1 + 0.188547) * (1 + -0.1509) * (1 + 0.28011) * (1 + -0.09596)* (1 + 0.041459)* (1 + 0.106838) = 1.4774446702
1.4774446702 ^ (12/7) = 1.9524983367
So just check if your logic is correct. Please mark this answer as accepted if it solves your problem.