I am currently building a vaccination appointment program for college and I am trying to write the code to randomly assign a date that ranges anywhere from 1/1/2022-31/12/2022, alongside a time slot ranging from 8am-5pm. Each hour will have 100 slots. Every time a user is assigned a slot, 1 from the assigned slot will be deducted. I tried doing this with a table i built using pandas, but I didn't get very far. Any help would be greatly appreciated, thank you.
Here's my code for the table using pandas (in case it will be helpful):
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
print(df)
What I would do is start by including the leading zero at the beginning of the hour for each column name. It's easier to extract '08:00' from a pandas Timestamp than '8:00'.
df['08:00'] = 100
df['09:00'] = 100
Then you can set the index to your 'Date/Time' column and use .loc to locate an appointment slot by the date in the row and the hour (rounded down) in the columns, and subtract 1 from the number of appointments at that slot. For example:
df.set_index('Date/Time', inplace=True)
user1_datetime = pd.to_datetime("2022-01-02 08:30")
user1_day = user1_datetime.strftime('%Y-%m-%d')
user1_time = user1_datetime.floor("h").strftime('%H:%M')
df.loc[user1_day, user1_time] -= 1
Result:
>>> df
08:00 09:00 10:00 11:00 12:00 13:00 14:00 15:00 16:00 17:00
Date/Time
2022-01-01 100 100 100 100 100 100 100 100 100 100
2022-01-02 99 100 100 100 100 100 100 100 100 100
2022-01-03 100 100 100 100 100 100 100 100 100 100
2022-01-04 100 100 100 100 100 100 100 100 100 100
2022-01-05 100 100 100 100 100 100 100 100 100 100
... ... ... ... ... ... ... ... ... ... ...
2022-12-27 100 100 100 100 100 100 100 100 100 100
2022-12-28 100 100 100 100 100 100 100 100 100 100
2022-12-29 100 100 100 100 100 100 100 100 100 100
2022-12-30 100 100 100 100 100 100 100 100 100 100
2022-12-31 100 100 100 100 100 100 100 100 100 100
To scale up, you can easily wrap this in a function that takes a list of datetimes for multiple people, and checks that the person isn't making an appointment in an hour slot with 0 remaining appointments.
Thank you Derek, I finally managed to think of a way to do it, and I couldn't have done it without your help. Here's my code:
This builds the table and saves it into a CSV file:
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv')
And this code randomly pick a date and an hour from that date:
import pandas
import random
from random import randrange
#randrange randomly picks an index for date and time for the user
random_date = randrange(365)
random_hour = randrange(10)
list = ["8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00"]
hour = random.choice(list)
df = pandas.read_csv('new.csv')
date=df.iloc[random_date][0]
df.loc[random_date, hour] -= 1
df.to_csv(r'C:\Users\Rich\PycharmProjects\pythonProject\new.csv',index=False)
print(date)
print(hour)
I haven't found a way for the program to check whether the number of slots is > 0 before choosing one though.
Related
Am having a dataframe,need to implement
every month I will be running this script so automatically it will pick based on extracted date
Input Dataframe
client_id expo_value value cal_value extracted_date
1 126 30 27.06 08/2022
2 135 60 36.18 08/2022
3 144 120 45 08/2022
4 162 30 54.09 08/2022
5 153 90 63.63 08/2022
6 181 120 72.9 08/2022
Input Dataframe
client_id expo_value value cal_value extracted_date Output_Value
1 126 30 27.06 08/2022 126+26.18 = 152.18
2 135 60 36.18 08/2022 261.29+70.02 = 331.31
3 144 120 45 08/2022 557.4+174.19 = 731.59
4 162 30 54.09 08/2022 156.7+ 52.34 = 209.04
5 153 90 63.63 08/2022 444.19+ 182.9 =627.09
6 181 120 72.9 08/2022 700.64+282.19=982.83
I want to implement 31 days/30 days/28 days inside the below function & i tried manually entering the number 31(days) for calculation but it should automatically should pick based on which month has how many days
def month_data(data):
if (data['value'] <=30).any():
return data['expo_value'] *30/ 31(days) + data['cal_value'] * 45/ 31(days)
elif (data['value'] <=60).any():
return data['expo_value'] *60/ 31(days) + data['cal_value'] * 90/31(days)
elif (data['value'] <=90).any():
return data['expo_value'] *100/31(days) + data['cal_value'] * 120/ 31(days)
else (data['value'] <=120).any():
return np.nan
Let me see if I understood you correctly. I tried to reproduce a small subset of your dataframe (you should do this next time you post something). The answer is as follows:
import pandas as pd
from datetime import datetime
import calendar
# I'll make a subset dataframe based on your example
data = [[30, '02/2022'], [60, '08/2022']]
df = pd.DataFrame(data, columns=['value', 'extracted_date'])
# First, turn the extracted_date column into a correct date format
date_correct_format = [datetime.strptime(i, '%m/%Y') for i in df['extracted_date']]
# Second, calculate the number of days per month
num_days = [calendar.monthrange(i.year, i.month)[1] for i in date_correct_format]
num_days
i have following DataSet initially:
value;date
100;2021-01-01
160;2021-02-01
250;2021-02-15
10;2021-03-01
90;2021-04-01
150;2021-04-15
350;2021-06-01
20;2021-07-01
100;2021-08-01
10;2021-08-10
Whenever the value "Value" drops (e.g. from 250 to 10 on 2021-03-01), I want to save the old value as offset.
When the value drops again (e.g. from 350 to 20 on 2021-07-01) I want to add the new offset to the old one (350 + 250).
Afterwards I want to add the offsets with the values, so that I get the following DataSet at the end:
value;date;offset;corrected_value
100;2021-01-01;0;100
160;2021-02-01;0;160
250;2021-02-15;0;250
10;2021-03-01;250;260
90;2021-04-01;250;340
150;2021-04-15;250;400
350;2021-06-01;250;600
20;2021-07-01;600;620
100;2021-08-01;600;700
10;2021-08-10;700;710
My current (terrible) approach whichis not working:
df['date'] = pd.to_datetime(df['date'])
df.index = df['date']
del df['date']
df.drop_duplicates(keep='first')
df['previous'] = df['value'].shift(1)
def pn(current, previous, offset):
if not pd.isna(previous):
if current < previous:
return previous + offset
return offset
df['offset'] = 0
df['offset'] = df.apply(lambda row: pn(row['value'], row['previous'], row['offset']), axis=1)
Your help is so much appreciated, thank you!
Cheers
Find the desired positions in column 'value' with pd.Series.diff and pd.Series.shift. Fill with 0 and compute the cumsum. Add the 'offset' column to 'value'
df['offset'] = df.value[df.value.diff().lt(0).shift(-1, fill_value=False)]
df['offset'] = df.offset.shift(1).fillna(0).cumsum().astype('int')
df['correct_value'] = df.offset + df.value
df
Output
value date offset correct_value
0 100 2021-01-01 0 100
1 160 2021-02-01 0 160
2 250 2021-02-15 0 250
3 10 2021-03-01 250 260
4 90 2021-04-01 250 340
5 150 2021-04-15 250 400
6 350 2021-06-01 250 600
7 20 2021-07-01 600 620
8 100 2021-08-01 600 700
9 10 2021-08-10 700 710
I want get the amount of totalhours and totalminutes for each userprofileid I have.
For example:
userprofileid totalhours totalminutes
453 7.0 420
120 7.5 450
453 8.0 480
I can't delete userprofileid because each id have their hours and minutes.
I tried this but I get total amount of hours and minutes, and add they in each row.
for user in clocking_left["userprofileid"]:
clocking_left["user_minutes_total"] = clocking_left["totalminutes"].sum()
clocking_left["user_hours_total"] = clocking_left["hours"].sum()
You can use group by and sum up the values
import pandas as pd
data = {'userprofileid': [453,120,453],
'totalhours': [7.0,7.5,8],
'totalminutes': [420,450,480]
}
df = pd.DataFrame(data, columns = ['userprofileid','totalhours','totalminutes'])
df_new = df.groupby('userprofileid').sum().reset_index()
print(df_new.to_string(index=False))
output
userprofileid totalhours totalminutes
120 7.5 450
453 15.0 900
I have a dataframe with 3 columns. Something like this:
Data Initial_Amount Current
31-01-2018
28-02-2018
31-03-2018
30-04-2018 100 100
31-05-2018 100 90
30-06-2018 100 80
I would like to populate the prior rows with the Initial Amount as such:
Data Initial_Amount Current
31-01-2018 100 100
28-02-2018 100 100
31-03-2018 100 100
30-04-2018 100 100
31-05-2018 100 90
30-06-2018 100 80
So find the:
First non_empty row with Initial Amount populated
use that to backfill the initial Amounts to the starting date
If it is the first row and current is empty then copy Initial_Amount, else copy prior balance.
Regards,
Pandas fillna with fill method 'bfill' (uses next valid observation to fill gap) should do what you're looking for:
In [13]: df.fillna(method='bfill')
Out[13]:
Data Initial_Amount Current
0 31-01-2018 100.0 100.0
1 28-02-2018 100.0 100.0
2 31-03-2018 100.0 100.0
3 30-04-2018 100.0 100.0
4 31-05-2018 100.0 90.0
5 30-06-2018 100.0 80.0
I have created data Coverage of time series which is in pandas Data Frame and would like to plot the data coverage in Matplotlib or PyQtGraph.
DATA FRAME
DateTime WD98 WS120 WS125B WD123 WS125A
31-07-2013 100 99.9 99.9 NaN NaN
31-08-2013 100 100 100 NaN NaN
30-09-2013 100 100 100 NaN NaN
31-10-2013 100 100 100 NaN NaN
30-11-2013 100 100 100 100 100
31-12-2013 100 100 100 100 100
31-01-2014 100 100 100 100 100
28-02-2014 100 100 100 100 100
31-03-2014 100 100 100 100 100
30-04-2014 100 100 100 100 100
31-05-2014 67.1 100 100 67.1 7.7
30-06-2014 NaN NaN 100 0 69.2
31-07-2014 NaN NaN 100 0 100
31-08-2014 NaN NaN 100 0 96.2
I would like to plot in below fashion (Broken bar Chart)
The above plot was done using Excel Conditional Formatting. Please help me.
DataCoverage >= 90 (Green)
DataCoverage >= 75 and DataCoverage < 90 (Yellow)
DataCoverage < 75 (red)
you can use seaborn.heatmap:
import seaborn as sns
df = df.set_index(df.pop('DateTime').dt.strftime('%d-%m-%Y'))
g = sns.heatmap(df, cmap=['r','y','g'], annot=True, fmt='.0f')
g.set_yticklabels(g.get_yticklabels(), rotation=0, fontsize=8)
Result:
UPDATE: corrected version:
x = df.set_index(df['DateTime'].dt.strftime('%d-%m-%Y')).drop('DateTime', 1)
z = pd.cut(x.stack(), bins=[-np.inf, 75, 90, np.inf], labels=[1.,2.,3.]).unstack().apply(pd.to_numeric)
g = sns.heatmap(z, cmap=['r','y','g'], fmt='.0f', cbar=False)
g.set_yticklabels(g.get_yticklabels(), rotation = 0, fontsize = 8)
Result: