Sum total hours and minutes for each userprofileid with Python - python

I want get the amount of totalhours and totalminutes for each userprofileid I have.
For example:
userprofileid totalhours totalminutes
453 7.0 420
120 7.5 450
453 8.0 480
I can't delete userprofileid because each id have their hours and minutes.
I tried this but I get total amount of hours and minutes, and add they in each row.
for user in clocking_left["userprofileid"]:
clocking_left["user_minutes_total"] = clocking_left["totalminutes"].sum()
clocking_left["user_hours_total"] = clocking_left["hours"].sum()

You can use group by and sum up the values
import pandas as pd
data = {'userprofileid': [453,120,453],
'totalhours': [7.0,7.5,8],
'totalminutes': [420,450,480]
}
df = pd.DataFrame(data, columns = ['userprofileid','totalhours','totalminutes'])
df_new = df.groupby('userprofileid').sum().reset_index()
print(df_new.to_string(index=False))
output
userprofileid totalhours totalminutes
120 7.5 450
453 15.0 900

Related

using python i wan inside function for pandas dataframe

Am having a dataframe,need to implement
every month I will be running this script so automatically it will pick based on extracted date
Input Dataframe
client_id expo_value value cal_value extracted_date
1 126 30 27.06 08/2022
2 135 60 36.18 08/2022
3 144 120 45 08/2022
4 162 30 54.09 08/2022
5 153 90 63.63 08/2022
6 181 120 72.9 08/2022
Input Dataframe
client_id expo_value value cal_value extracted_date Output_Value
1 126 30 27.06 08/2022 126+26.18 = 152.18
2 135 60 36.18 08/2022 261.29+70.02 = 331.31
3 144 120 45 08/2022 557.4+174.19 = 731.59
4 162 30 54.09 08/2022 156.7+ 52.34 = 209.04
5 153 90 63.63 08/2022 444.19+ 182.9 =627.09
6 181 120 72.9 08/2022 700.64+282.19=982.83
I want to implement 31 days/30 days/28 days inside the below function & i tried manually entering the number 31(days) for calculation but it should automatically should pick based on which month has how many days
def month_data(data):
if (data['value'] <=30).any():
return data['expo_value'] *30/ 31(days) + data['cal_value'] * 45/ 31(days)
elif (data['value'] <=60).any():
return data['expo_value'] *60/ 31(days) + data['cal_value'] * 90/31(days)
elif (data['value'] <=90).any():
return data['expo_value'] *100/31(days) + data['cal_value'] * 120/ 31(days)
else (data['value'] <=120).any():
return np.nan
Let me see if I understood you correctly. I tried to reproduce a small subset of your dataframe (you should do this next time you post something). The answer is as follows:
import pandas as pd
from datetime import datetime
import calendar
# I'll make a subset dataframe based on your example
data = [[30, '02/2022'], [60, '08/2022']]
df = pd.DataFrame(data, columns=['value', 'extracted_date'])
# First, turn the extracted_date column into a correct date format
date_correct_format = [datetime.strptime(i, '%m/%Y') for i in df['extracted_date']]
# Second, calculate the number of days per month
num_days = [calendar.monthrange(i.year, i.month)[1] for i in date_correct_format]
num_days

DataSet in Panda: Increase Offset if next value is smaller then previous one

i have following DataSet initially:
value;date
100;2021-01-01
160;2021-02-01
250;2021-02-15
10;2021-03-01
90;2021-04-01
150;2021-04-15
350;2021-06-01
20;2021-07-01
100;2021-08-01
10;2021-08-10
Whenever the value "Value" drops (e.g. from 250 to 10 on 2021-03-01), I want to save the old value as offset.
When the value drops again (e.g. from 350 to 20 on 2021-07-01) I want to add the new offset to the old one (350 + 250).
Afterwards I want to add the offsets with the values, so that I get the following DataSet at the end:
value;date;offset;corrected_value
100;2021-01-01;0;100
160;2021-02-01;0;160
250;2021-02-15;0;250
10;2021-03-01;250;260
90;2021-04-01;250;340
150;2021-04-15;250;400
350;2021-06-01;250;600
20;2021-07-01;600;620
100;2021-08-01;600;700
10;2021-08-10;700;710
My current (terrible) approach whichis not working:
df['date'] = pd.to_datetime(df['date'])
df.index = df['date']
del df['date']
df.drop_duplicates(keep='first')
df['previous'] = df['value'].shift(1)
def pn(current, previous, offset):
if not pd.isna(previous):
if current < previous:
return previous + offset
return offset
df['offset'] = 0
df['offset'] = df.apply(lambda row: pn(row['value'], row['previous'], row['offset']), axis=1)
Your help is so much appreciated, thank you!
Cheers
Find the desired positions in column 'value' with pd.Series.diff and pd.Series.shift. Fill with 0 and compute the cumsum. Add the 'offset' column to 'value'
df['offset'] = df.value[df.value.diff().lt(0).shift(-1, fill_value=False)]
df['offset'] = df.offset.shift(1).fillna(0).cumsum().astype('int')
df['correct_value'] = df.offset + df.value
df
Output
value date offset correct_value
0 100 2021-01-01 0 100
1 160 2021-02-01 0 160
2 250 2021-02-15 0 250
3 10 2021-03-01 250 260
4 90 2021-04-01 250 340
5 150 2021-04-15 250 400
6 350 2021-06-01 250 600
7 20 2021-07-01 600 620
8 100 2021-08-01 600 700
9 10 2021-08-10 700 710

Assign random date and time slot to user from table

I am currently building a vaccination appointment program for college and I am trying to write the code to randomly assign a date that ranges anywhere from 1/1/2022-31/12/2022, alongside a time slot ranging from 8am-5pm. Each hour will have 100 slots. Every time a user is assigned a slot, 1 from the assigned slot will be deducted. I tried doing this with a table i built using pandas, but I didn't get very far. Any help would be greatly appreciated, thank you.
Here's my code for the table using pandas (in case it will be helpful):
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
print(df)
What I would do is start by including the leading zero at the beginning of the hour for each column name. It's easier to extract '08:00' from a pandas Timestamp than '8:00'.
df['08:00'] = 100
df['09:00'] = 100
Then you can set the index to your 'Date/Time' column and use .loc to locate an appointment slot by the date in the row and the hour (rounded down) in the columns, and subtract 1 from the number of appointments at that slot. For example:
df.set_index('Date/Time', inplace=True)
user1_datetime = pd.to_datetime("2022-01-02 08:30")
user1_day = user1_datetime.strftime('%Y-%m-%d')
user1_time = user1_datetime.floor("h").strftime('%H:%M')
df.loc[user1_day, user1_time] -= 1
Result:
>>> df
08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00
Date/Time
2022-01-01 100 100 100 100 100 100 100 100 100 100
2022-01-02 99 100 100 100 100 100 100 100 100 100
2022-01-03 100 100 100 100 100 100 100 100 100 100
2022-01-04 100 100 100 100 100 100 100 100 100 100
2022-01-05 100 100 100 100 100 100 100 100 100 100
... ... ... ... ... ... ... ... ... ... ...
2022-12-27 100 100 100 100 100 100 100 100 100 100
2022-12-28 100 100 100 100 100 100 100 100 100 100
2022-12-29 100 100 100 100 100 100 100 100 100 100
2022-12-30 100 100 100 100 100 100 100 100 100 100
2022-12-31 100 100 100 100 100 100 100 100 100 100
To scale up, you can easily wrap this in a function that takes a list of datetimes for multiple people, and checks that the person isn't making an appointment in an hour slot with 0 remaining appointments.
Thank you Derek, I finally managed to think of a way to do it, and I couldn't have done it without your help. Here's my code:
This builds the table and saves it into a CSV file:
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv')
And this code randomly pick a date and an hour from that date:
import pandas
import random
from random import randrange
#randrange randomly picks an index for date and time for the user
random_date = randrange(365)
random_hour = randrange(10)
list = ["8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00"]
hour = random.choice(list)
df = pandas.read_csv('new.csv')
date=df.iloc[random_date][0]
df.loc[random_date, hour] -= 1
df.to_csv(r'C:\Users\Rich\PycharmProjects\pythonProject\new.csv',index=False)
print(date)
print(hour)
I haven't found a way for the program to check whether the number of slots is > 0 before choosing one though.

Perform a cross column calculation in Python

Context
I am trying to build a portfolio dashboard following this example, only instead of Excel, I am using Python. I am currently not sure how to conduct from 3:47 onwards, cross calculating to arrive at the next period balance.
Problem
Is there a way to conduct this in Python? I tried a for loop but it returned the same number iterated over the number of forward periods. Below is the example:
date_range = pd.date_range(start=today, periods=period_of_investments, freq=contribution_periods)
returns_port = 12
rs = []
balance_total = []
for one in range(len(date_range))):
return_loss = (returns_port/period_of_investments)*capital_insert
rs.append(return_loss)
period_one_balance = capital_insert+return_loss
period_two_return_loss = (returns_port/period_of_investments)*(period_one_balance + capital_insert)
period_two_balance = period_one_balance + capital_insert + period_two_return_loss
balance_total.append(period_two_balance)
I did not watch the video but I will explain how to write a Python code for the following problem, which is similar to the one in the video.
Suppose you want to calculate the return of investment of a fixed monthly deposit for the next 20 years with a fixed interest rate.
The first step is understanding how pd.date_range() works. If you started at the beginning of this month the whole period would be pd.date_rage(start='4-1-2021', periods='240', freq='1m') (240 comes from 20 years, 12 month each). Basically, we are calculating the return at the end of each month.
import pandas as pd
portfolio = pd.DataFrame(columns=['Date', 'Investment', 'Return/Loss', 'Balance'])
interest_rate = 0.121
monthly_deposit = 500
dates = pd.date_range(start="3-31-2021", periods=240, freq='1m')
investment = [monthly_deposit]*len(dates)
return_losses = []
balances = []
current_balance = 500
for date in dates:
current_return_loss = (interest_rate/12)*current_balance
return_losses.append(round(current_return_loss,2))
balances.append(round(current_balance + current_return_loss))
current_balance += (current_return_loss + monthly_deposit)
portfolio['Date'] = pd.to_datetime(dates)
portfolio['Investment'] = investment
portfolio['Return/Loss'] = return_losses
portfolio['Balance'] = balances
balance_at_end = balances[-1]
print(portfolio.head(10))
print(balance_at_end)
You will get the following result, which is identical to the video:
Date Investment Return/Loss Balance
0 2021-03-31 500 5.04 505
1 2021-04-30 500 10.13 1015
2 2021-05-31 500 15.28 1530
3 2021-06-30 500 20.47 2051
4 2021-07-31 500 25.72 2577
5 2021-08-31 500 31.02 3108
6 2021-09-30 500 36.38 3644
7 2021-10-31 500 41.79 4186
8 2021-11-30 500 47.25 4733
9 2021-12-31 500 52.77 5286
506397

Average between points based on time

I'm trying to use Python to get time taken, as well as average speed between an object traveling between points.
The data looks somewhat like this,
location initialtime id speed distance
1 2020-09-18T12:03:14.485952Z car_uno 72 9km
2 2020-09-18T12:10:14.485952Z car_uno 83 8km
3 2020-09-18T11:59:14.484781Z car_duo 70 9km
7 2020-09-18T12:00:14.484653Z car_trio 85 8km
8 2020-09-18T12:12:14.484653Z car_trio 70 7.5km
The function I'm using currently is essentially like this,
Speeds.index = pd.to_datetime(Speeds.index)
..etc
Now if I were doing this usually, I would just take the unique values of the id's,
for x in speeds.id.unique():
Speeds[speeds.id=="x"]...
But this method really isn't working.
What is the best approach for simply seeing if there are multiple id points over time, then taking the average of the speeds by that time given? Otherwise just returning the speed itself if there are not multiple values.
Is there a simpler pandas filter I could use?
Expected output is simply,
area - id - initial time - journey time - average speed.
the point is to get the average time and journey time for a vehicle going past two points
To get the average speed and journey times you can use groupby() and pass in the columns that determine one complete journey, like id or area.
import pandas as pd
from io import StringIO
data = StringIO("""
area initialtime id speed
1 2020-09-18T12:03:14.485952Z car_uno 72
2 2020-09-18T12:10:14.485952Z car_uno 83
3 2020-09-18T11:59:14.484781Z car_duo 70
7 2020-09-18T12:00:14.484653Z car_trio 85
8 2020-09-18T12:12:14.484653Z car_trio 70
""")
df = pd.read_csv(data, delim_whitespace=True)
df["initialtime"] = pd.to_datetime(df["initialtime"])
# change to ["id", "area"] if need more granular aggregation
group_cols = ["id"]
time = df.groupby(group_cols)["initialtime"].agg([max, min]).eval('max-min').reset_index(name="journey_time")
speed = df.groupby(group_cols)["speed"].mean().reset_index(name="average_speed")
pd.merge(time, speed, on=group_cols)
id journey_time average_speed
0 car_duo 00:00:00 70.0
1 car_trio 00:12:00 77.5
2 car_uno 00:07:00 77.5
I tryed to use a very intuitive solution. I'm assuming the data has already been loaded to df.
df['initialtime'] = pd.to_datetime(df['initialtime'])
result = []
for car in df['id'].unique():
_df = df[df['id'] == car].sort_values('initialtime', ascending=True)
# Where the car is leaving "from" and where it's heading "to"
_df['From'] = _df['location']
_df['To'] = _df['location'].shift(-1, fill_value=_df['location'].iloc[0])
# Auxiliary columns
_df['end_time'] = _df['initialtime'].shift(-1, fill_value=_df['initialtime'].iloc[0])
_df['end_speed'] = _df['speed'].shift(-1, fill_value=_df['speed'].iloc[0])
# Desired columns
_df['journey_time'] = _df['end_time'] - _df['initialtime']
_df['avg_speed'] = (_df['speed'] + _df['end_speed']) / 2
_df = _df[_df['journey_time'] >= pd.Timedelta(0)]
_df.drop(['location', 'distance', 'speed', 'end_time', 'end_speed'],
axis=1, inplace=True)
result.append(_df)
final_df = pd.concat(result).reset_index(drop=True)
The final DataFrame is as follows:
initialtime id From To journey_time avg_speed
0 2020-09-18 12:03:14.485952+00:00 car_uno 1 2 0 days 00:07:00 77.5
1 2020-09-18 11:59:14.484781+00:00 car_duo 3 3 0 days 00:00:00 70.0
2 2020-09-18 12:00:14.484653+00:00 car_trio 7 8 0 days 00:12:00 77.5
Here is another approach. My results are different that other posts, so I may have misunderstood the requirements. In brief, I calculated each average speed as total distance divided by total time (for each car).
from io import StringIO
import pandas as pd
# speed in km / hour; distance in km
data = '''location initial-time id speed distance
1 2020-09-18T12:03:14.485952Z car_uno 72 9
2 2020-09-18T12:10:14.485952Z car_uno 83 8
3 2020-09-18T11:59:14.484781Z car_duo 70 9
7 2020-09-18T12:00:14.484653Z car_trio 85 8
8 2020-09-18T12:12:14.484653Z car_trio 70 7.5
'''
Now create data frame and perform calculations
# create data frame
df = pd.read_csv(StringIO(data), delim_whitespace=True)
df['elapsed-time'] = df['distance'] / df['speed'] # in hours
# utility function
def hours_to_hms(elapsed):
''' Convert `elapsed` (in hours) to hh:mm:ss (round to nearest sec)'''
h, m = divmod(elapsed, 1)
m *= 60
_, s = divmod(m, 1)
s *= 60
hms = '{:02d}:{:02d}:{:02d}'.format(int(h), int(m), int(round(s, 0)))
return hms
# perform calculations
start_time = df.groupby('id')['initial-time'].min()
journey_hrs = df.groupby('id')['elapsed-time'].sum().rename('elapsed-hrs')
hms = journey_hrs.apply(lambda x: hours_to_hms(x)).rename('hh:mm:ss')
ave_speed = ((df.groupby('id')['distance'].sum()
/ df.groupby('id')['elapsed-time'].sum())
.rename('ave speed (km/hr)')
.round(2))
# assemble results
result = pd.concat([start_time, journey_hrs, hms, ave_speed], axis=1)
print(result)
initial-time elapsed-hrs hh:mm:ss \
id
car_duo 2020-09-18T11:59:14.484781Z 0.128571 00:07:43
car_trio 2020-09-18T12:00:14.484653Z 0.201261 00:12:05
car_uno 2020-09-18T12:03:14.485952Z 0.221386 00:13:17
ave speed (km/hr)
id
car_duo 70.00
car_trio 77.01
car_uno 76.79
You should provide a better dataset (ie with identical time points) so that we understand better the inputs, and an exemple of expected output so that we understand the computation of the average speed.
Thus I'm just guessing that you may be looking for df.groupby('initialtime')['speed'].mean() if df is a dataframe containing your input data.

Categories

Resources