I import data from CSV using Python. I want to calculate the mean for every row and column using time-variable only. But the problem is the value is not in seconds.
How can I declare the related variable into time which is second instead of numeric value?
this is my data
--------------------------
|Responses|Time 1 | Time 2 | Time 3|
| abc |20 | 30 | 40 |
| bce |23 | 25 | 30 |
| cde |34 | 40 | 20 |
So, I want to calculate the sum time for each response
df.sum(axis = 1)
abc 90
bce 78
cde 92
df.sum(axis = 0)
Time 1 76
Time 2 95
Time 3 90
But actually I want it in minutes second which is
df.sum(axis = 0)
Time 1 1:16
Time 2 1:35
Time 3 1:30
Or it can be 1 minute 16 seconds or something. Anyone know how to do it?
Your question is not really well defined. You should follow the instructions, as suggested in the comments by jezrael.
As you said "Or it can be 1 minute 16 seconds or something.", I assumed that the output can simply be a string.
If you want the result as:
1:16, use to_minutes_seconds(x)
1 minute 16 seconds, use to_minutes_seconds_text(x)
from datetime import timedelta
def to_minutes_seconds(x):
# x is the current value to process, for example 76
td = timedelta(seconds=x)
# split x into 3 variables: hours, minutes and seconds
hours, remainder = divmod(td.seconds, 3600)
minutes, seconds = divmod(remainder, 60)
# return the required format, minutes:seconds
return "{}:{}".format(minutes, seconds)
def to_minutes_seconds_text(x):
td = timedelta(seconds=x)
hours, remainder = divmod(td.seconds, 3600)
minutes, seconds = divmod(remainder, 60)
if minutes > 1:
m = 'minutes'
else:
m = 'minute'
if seconds > 1:
s = 'seconds'
else:
s = 'second'
return "{} {} {} {}".format(minutes, m, seconds, s)
# Create the input dictionary
df = pd.DataFrame.from_dict({'Responses': [76, 95, 90, 781]})
# Change the total seconds into the required format
df['Time'] = df['Responses'].apply(to_minutes_seconds)
df['Text'] = df['Responses'].apply(to_minutes_seconds_text)
print(df)
Output:
Responses Time Text
0 76 1:16 1 minute 16 seconds
1 95 1:35 1 minute 35 seconds
2 90 1:30 1 minute 30 seconds
3 781 13:1 13 minutes 1 second
Related
From the following DataFrame:
worktime = 1440
person = [11,22,33,44,55]
begin_date = '2019-10-01'
shift= [1,2,3,1,2]
pause = [90,0,85,70,0]
occu = [60,0,40,20,0]
time_u = [50,40,80,20,0]
time_a = [84.5,0.0,10.5,47.7,0.0]
time_p = 0
time_q = [35.9,69.1,0.0,0.0,84.4]
df = pd.DataFrame({'date':pd.date_range(begin_date, periods=len(person)),'person':person,'shift':shift,'worktime':worktime,'pause':pause,'occu':occu, 'time_u':time_u,'time_a':time_a,'time_p ':time_p,'time_q':time_q,})
Output:
date person shift worktime pause occu time_u time_a time_p time_q
0 2019-10-01 11 1 1440 90 60 50 84.5 0 35.9
1 2019-10-02 22 2 1440 0 0 40 0.0 0 69.1
2 2019-10-03 33 3 1440 85 40 80 10.5 0 0.0
3 2019-10-04 44 1 1440 70 20 20 47.7 0 0.0
4 2019-10-05 55 2 1440 0 0 0 0.0 0 84.4
I am looking for a suitable function that takes the already contained value of the columns and uses it in a calculation and then overwrites it with the result of the calculation.
It concerns the columns time_u, time_a, time_p and time_q and should be applied according to the following principle:
time_u = worktime - pause - occu - (existing value of time_u)
time_a = (new value of time_u) - time_a
time_p = (new value of time_a) - time_p
time_q = (new value of time_p)- time_q
Is there a possible function that could be used here?
Using this formula manually, the output would look like this:
date person shift worktime pause occu time_u time_a time_p time_q
0 2019-10-01 11 1 1440 90 60 1240 1155.5 1155.5 1119.6
1 2019-10-02 22 2 1440 0 0 1400 1400 1400 1330.9
2 2019-10-03 33 3 1440 85 40 1235 1224.5 1224.5 1224.5
3 2019-10-04 44 1 1440 70 20 1330 1282.3 1282.3 1282.3
4 2019-10-05 55 2 1440 0 0 1440 1440 1440 1355.6
Unfortunately, this task is way beyond my skill level, so any help in setting up the appropriate function would be greatly appreciated.
Many thanks in advance
You can simply apply the relationships you have supplied sequentially. Or are you looking for something else? By the way, you put an extra space at the end of 'time_p'
df['time_u'] = df['worktime'] - df['pause'] - df['occu'] - df['time_u']
df['time_a'] = df['time_u'] - df['time_a']
df['time_p'] = df['time_a'] - df['time_p']
df['time_q'] = df['time_p'] - df['time_q']
I'm trying to calculate daily returns using the time weighted rate of return formula:
(Ending Value-(Beginning Value + Net Additions)) / (Beginning value + Net Additions)
My DF looks like:
Account # Date Balance Net Additions
1 9/1/2022 100 0
1 9/2/2022 115 10
1 9/3/2022 117 0
2 9/1/2022 50 0
2 9/2/2022 52 0
2 9/3/2022 40 -15
It should look like:
Account # Date Balance Net Additions Daily TWRR
1 9/1/2022 100 0
1 9/2/2022 115 10 0.04545
1 9/3/2022 117 0 0.01739
2 9/1/2022 50 0
2 9/2/2022 52 0 0.04
2 9/3/2022 40 -15 0.08108
After calculating the daily returns for each account, I want to link all the returns throughout the month to get the monthly return:
((1 + return) * (1 + return)) - 1
The final result should look like:
Account # Monthly Return
1 0.063636
2 0.12432
Through research (and trial and error), I was able to get the output I am looking for but as a new python user, I'm sure there is an easier/better way to accomplish this.
DF["Numerator"] = DF.groupby("Account #")[Balance].diff() - DF["Net Additions"]
DF["Denominator"] = ((DF["Numerator"] + DF["Net Additions"] - DF["Balance"]) * -1) + DF["Net Additions"]
DF["Daily Returns"] = (DF["Numerator"] / DF["Denominator"]) + 1
DF = DF.groupby("Account #")["Daily Returns"].prod() - 1
Any help is appreciated!
list = ["Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"]
I want to show the total 30 days with weekdays, can somebody tell me how to do that in for or while loop ? Thanks
the output I want is :
day 0 : Sun
day 1 : Mon
day 2 : Tue
day 3 : Wed
day 4 : Thu
day 5 : Fri
day 6 : Sat
day 7 : Sun
day 8 : Mon
day 9 : Tue
day 10 : Wed
day 11 : Thu
day 12 : Fri
...
...
day 30 :
my code :
a = 0
for i in range(0,30):
print("Day",str(i),list[a])
a += 1
Error:
Traceback (most recent call last):
File "tracker.py", line 25, in <module>
print("Day",str(i),weekdays[day_number])
IndexError: list index out of range
monthdays = 30
day_index = 6
for i in range(monthdays):
day_index = (day_index + 1) % 7
day = weekdays[day_index]
print("day", i, day)
day_index is 6, because you start with sunday, but you can change it.
Also there is no need to call str(i) inside print, it does it for you.
Your problem is that you are trying to print weekdays[7] when weekdays only has seven elements (i.e. weekdays[0] to weekdays[6]).
There are many ways to solve this problem, but in this case, the simplest is the best.
In your loop, use weekdays[i % len(weekdays)] instead of weekdays[i].
The modulo (mod) operator finds the remainder of produced when dividing its arguments. This produces a looping behavior.
n | n % 3 |
--+-------+
0 | 0 | 0 = 0 * 3 + [0]
1 | 1 | 1 = 0 * 3 + [1]
2 | 2 | 2 = 0 * 3 + [2]
3 | 0 | 3 = 1 * 3 + [0]
4 | 1 | 4 = 1 * 3 + [1]
5 | 2 |
6 | 0 |
7 | 1 |
8 | 2 |
9 | 0 |
So when you reach indexes past the length of the some_list, an index of i % len(some_list) will loop back around to 0 and let you keep going.
I am collecting a lot of data with a microcontroller including the controllers frequency. I want to use that frequency to calculate the time-steps between measurements. How do I code that in python?
I have tried using cumsum and other functions, but they do not work in my case. Therefore, I am thinking of using a for-loop, but I cant figure out the logic of the code. Here is how I import my data:
import pandas as pd
from sqlalchemy import create_engine
import datetime as dt
from IPython.display import display
csvfile = 'datafile.csv'
disk_engine = create_engine('sqlite:///sql_data_server.db')
start = dt.datetime.now()
chunksize = 20000
j = 0
index_start = 1
for df in pd.read_csv(csvfile, chunksize=chunksize, iterator=True, encoding='utf-8-sig'):
df.index += index_start
j += 1
print('{} seconds: completed {} rows'.format((dt.datetime.now() - start).seconds, j * chunksize))
df.to_sql('data', disk_engine, if_exists='append')
index_start = df.index[-1] + 1
# SELECTED DATA
df = pd.read_sql_query('SELECT Frequency FROM data ', disk_engine)
Frequency = df.Frequency
Output:
Sample Ax Ay Az Gx ... q1 q2 q3 Temp Frequency
0 0 -0.04 -0.0 -1.01 -0.03 ... -0.16 0.99 0.0 23.74 330.80
1 1 -0.03 -0.0 -1.01 0.08 ... -0.16 0.99 0.0 23.73 328.52
[2 rows x 13 columns]
0 seconds: completed 20000 rows
1 seconds: completed 40000 rows
2 seconds: completed 60000 rows
2 seconds: completed 80000 rows
3 seconds: completed 100000 rows
4 seconds: completed 120000 rows
5 seconds: completed 140000 rows
6 seconds: completed 160000 rows
6 seconds: completed 180000 rows
7 seconds: completed 200000 rows
8 seconds: completed 220000 rows
9 seconds: completed 240000 rows
9 seconds: completed 260000 rows
10 seconds: completed 280000 rows
11 seconds: completed 300000 rows
12 seconds: completed 320000 rows
13 seconds: completed 340000 rows
14 seconds: completed 360000 rows
14 seconds: completed 380000 rows
15 seconds: completed 400000 rows
16 seconds: completed 420000 rows
17 seconds: completed 440000 rows
17 seconds: completed 460000 rows
I have the frequency measurements (df.Frequency) in a column as int64 and I want to use that data to calculate time-steps in between meausurements. I think it is easier to understand if I just show you my thought proces. I have carried out the calculations in Excel in the following manner:
Timestep
Row 1: 1/330,8
Row 2: Row 1 + (1/328,52)
Row 3: Row 2 + (1/329,49)
... And so on.
Hope you can help.
I have a dataframe that contains the duration of a trip as text values as shown below in the column driving_duration_text.
print df
yelp_id driving_duration_text \
0 alexander-rubin-photography-napa 1 hour 43 mins
1 jumas-automotive-napa-2 1 hour 32 mins
2 larson-brothers-painting-napa 1 hour 30 mins
3 preferred-limousine-napa 1 hour 32 mins
4 cardon-y-el-tirano-miami 1 day 16 hours
5 sweet-dogs-miami 1 day 3 hours
As you can see some are written in hours and others in days. How could I convert this format to seconds?
UPDATE:
In [150]: df['seconds'] = (pd.to_timedelta(df['driving_duration_text']
.....: .str.replace(' ', '')
.....: .str.replace('mins', 'min'))
.....: .dt.total_seconds())
In [151]: df
Out[151]:
yelp_id driving_duration_text seconds
0 alexander-rubin-photography-napa 1 hour 43 mins 6180.0
1 jumas-automotive-napa-2 1 hour 32 mins 5520.0
2 larson-brothers-painting-napa 1 hour 30 mins 5400.0
3 preferred-limousine-napa 1 hour 32 mins 5520.0
4 cardon-y-el-tirano-miami 1 day 16 hours 144000.0
5 sweet-dogs-miami 1 day 3 hours 97200.0
OLD answer:
you can do it this way:
from collections import defaultdict
import re
def humantime2seconds(s):
d = {
'w': 7*24*60*60,
'week': 7*24*60*60,
'weeks': 7*24*60*60,
'd': 24*60*60,
'day': 24*60*60,
'days': 24*60*60,
'h': 60*60,
'hr': 60*60,
'hour': 60*60,
'hours': 60*60,
'm': 60,
'min': 60,
'mins': 60,
'minute': 60,
'minutes':60
}
mult_items = defaultdict(lambda: 1).copy()
mult_items.update(d)
parts = re.search(r'^(\d+)([^\d]*)', s.lower().replace(' ', ''))
if parts:
return int(parts.group(1)) * mult_items[parts.group(2)] + humantime2seconds(re.sub(r'^(\d+)([^\d]*)', '', s.lower()))
else:
return 0
df['seconds'] = df.driving_duration_text.map(humantime2seconds)
Output:
In [64]: df
Out[64]:
yelp_id driving_duration_text seconds
0 alexander-rubin-photography-napa 1 hour 43 mins 6180
1 jumas-automotive-napa-2 1 hour 32 mins 5520
2 larson-brothers-painting-napa 1 hour 30 mins 5400
3 preferred-limousine-napa 1 hour 32 mins 5520
4 cardon-y-el-tirano-miami 1 day 16 hours 144000
5 sweet-dogs-miami 1 day 3 hours 97200
Given that the text does seem to follow a standardized format, this is relatively straightforward. We need to break the string apart, compose it into relevant pieces, and then process them.
def parse_duration(duration):
items = duration.split()
words = items[1::2]
counts = items[::2]
seconds = 0
for i, each in enumerate(words):
seconds += get_seconds(each, counts[i])
return seconds
def get_seconds(word, count):
counts = {
'second': 1,
'minute': 60,
'hour': 3600,
'day': 86400
# and so on
}
# Bit complicated here to handle plurals
base = counts.get(word[:-1], counts.get(word, 0))
return base * count