Sum of Users balance with updated version between two days - python

I have users latest balance every day and I can see in the lates_balance column below
+----+------+------------+----------------+--+
| | user | date | latest_balance | |
| 0 | A | 2019-07-26 | 705.0 | |
| 1 | A | 2019-07-29 | 990.0 | |
| 2 | A | 2019-07-30 | 5.0 | |
| 3 | A | 2019-07-31 | 25.0 | |
| 4 | A | 2019-08-01 | 155.0 | |
| 5 | A | 2019-08-02 | 405.0 | |
| 6 | A | 2019-08-03 | 525.0 | |
| 7 | A | 2019-08-05 | 1000.0 | |
| 8 | A | 2019-08-06 | 825.0 | |
| 9 | B | 2019-08-07 | 230.0 | |
| 10 | A | 2019-08-07 | 965.0 | |
| 11 | B | 2019-08-08 | 224.0 | |
| 12 | A | 2019-08-08 | 80.0 | |
| 13 | A | 2019-08-09 | 380.0 | |
| 14 | B | 2019-08-10 | 4.0 | |
| 15 | B | 2019-08-11 | 114.0 | |
| 16 | A | 2019-08-12 | 725.0 | |
| 17 | B | 2019-08-12 | 234.0 | |
| 18 | A | 2019-08-13 | 815.0 | |
| 19 | B | 2019-08-13 | 243.0 | |
| 20 | B | 2019-08-15 | 13.0 | |
| 21 | A | 2019-08-16 | 75.0 | |
| 22 | B | 2019-08-16 | 53.0 | |
| 23 | A | 2019-08-17 | 890.0 | |
| 24 | B | 2019-08-17 | 36.0 | |
| 25 | A | 2019-08-19 | 100.0 | |
| 26 | A | 2019-08-20 | 115.0 | |
| 27 | A | 2019-08-21 | 150.0 | |
+----+------+------------+----------------+--+
we can see if the user is not active in someday we cannot see the users balance and we cannot make with a total daily sum.
I need to calculate the total balance of each user even they do not have any transaction with their last balance.
my idea was to use python dictionary and dict.update() them.
so if the user has a transaction and new balance add if not add the previous transaction for all day.
my code is:
from datetime import date, timedelta
date_upd =[]
total = {}
date_t ={}
start_date = min(df['date'])
end_date = max(df['date'])
delta = timedelta(days=1)
while start_date <= end_date:
for i,k in enumerate(df['date']):
if(k == start_date):
#print(k)
total.update({df['user'][i]:df['latest_balance'][i]})
else:
total.update({df['user'][i]:df['latest_balance'][i]})
pass
date_upd.append(sum(total.values()))
start_date += delta
#date_t.update(total)
and gives me this result
+----------+
| 705.0, |
| 990.0, |
| 5.0, |
| 25.0, |
| 155.0, |
| 405.0, |
| 525.0, |
| 1000.0, |
| 825.0, |
| 1055.0, |
| 1195.0, |
| 1189.0, |
| 304.0, |
| 604.0, |
| 384.0, |
| 494.0, |
| 839.0, |
| 959.0, |
| 1049.0, |
| 1058.0, |
| 828.0, |
| 88.0, |
| 128.0, |
| 943.0, |
| 926.0, |
| 136.0, |
| 151.0, |
| 186.0 |
+----------+
which is extra a few results because of not looping each day.
should be
705.0,
990.0,
5.0,
25.0,
155.0,
405.0,
525.0,
1000.0,
825.0,
,
1195.0,
,
304.0,
604.0,
384.0,
494.0,
839.0,
959.0,
,
1058.0,
828.0,
,
128.0,
,
926.0,
136.0,
151.0,
186.0

Not sure if I understand the question 100% but something like this?
df.pivot_table(columns='user', index='date', values='latest_balance').ffill().sum(axis=1)

Related

Python check when second columns value changes based on first column value change

I would like to be able to find out what time it takes for the values in one column to change, based on when the value changes in another column. I have loaded an example of the table below.
| 23-02-03 12:01:27.213000 | 60 | 0 |
| 23-02-03 12:01:27.243000 | 60 | 0 |
| 23-02-03 12:01:27.313000 | 60 | 0 |
| 23-02-03 12:01:27.353000 | 50 | 0 |
| 23-02-03 12:01:27.413000 | 50 | 0 |
| 23-02-03 12:01:27.453000 | 50 | 0 |
| 23-02-03 12:01:27.513000 | 50 | 10 |
| 23-02-03 12:01:27.553000 | 50 | 10 |
| 23-02-03 12:01:27.613000 | 50 | 10 |
| 23-02-03 12:01:27.653000 | 50 | 10 |
| 23-02-03 12:01:27.713000 | 50 | 10 |
| 23-02-03 12:01:27.753000 | 50 | 10 |
| 23-02-03 12:01:27.813000 | 50 | 10 |
| 23-02-03 12:01:27.853000 | 49.5 | 10 |
| 23-02-03 12:01:27.913000 | 49.5 | 10 |
| 23-02-03 12:01:27.953000 | 49.5 | 10 |
| 23-02-03 12:01:28.013000 | 49.5 | 10 |
| 23-02-03 12:01:28.053000 | 49.5 | 10 |
| 23-02-03 12:01:28.113000 | 49.5 | 10 |
| 23-02-03 12:01:28.153000 | 49.5 | 10 |
| 23-02-03 12:01:28.213000 | 49.5 | 10 |
| 23-02-03 12:01:28.253000 | 49.5 | 25 |
| 23-02-03 12:01:28.313000 | 49.5 | 25 |
| 23-02-03 12:01:28.353000 | 49.5 | 25 |
| 23-02-03 12:01:28.423000 | 49.5 | 25 |
| 23-02-03 12:01:28.453000 | 48.3 | 25 |
| 23-02-03 12:01:28.533000 | 48.3 | 25 |
| 23-02-03 12:01:28.553000 | 48.3 | 25 |
| 23-02-03 12:01:28.634000 | 48.3 | 25 |
| 23-02-03 12:01:28.653000 | 48.3 | 25 |
| 23-02-03 12:01:28.743000 | 48.3 | 33 |
| 23-02-03 12:01:28.753000 | 48.3 | 33 |
| 23-02-03 12:01:28.843000 | 48.3 | 33 |
| 23-02-03 12:01:28.853000 | 48.3 | 33 |
| 23-02-03 12:01:28.943000 | 48.3 | 33 |
| 23-02-03 12:01:28.953000 | 48.3 | 33 |
| 23-02-03 12:01:29.043000 | 48.3 | 33 |
| 23-02-03 12:01:29.053000 | 48.3 | 33 |
| 23-02-03 12:01:29.143000 | 48.3 | 33 |
| 23-02-03 12:01:29.153000 | 48.3 | 33 |
| 23-02-03 12:01:29.243000 | 48.3 | 33 |
| 23-02-03 12:01:29.253000 | 48.3 | 33 |
| 23-02-03 12:01:29.343000 | 48.3 | 33 |
| 23-02-03 12:01:29.353000 | 49.1 | 33 |
| 23-02-03 12:01:29.443000 | 49.1 | 33 |
| 23-02-03 12:01:29.463000 | 49.1 | 33 |
| 23-02-03 12:01:29.543000 | 49.1 | 59 |
| 23-02-03 12:01:29.563000 | 49.1 | 59 |
So the first column is time stamp. When the value on column 1 changes from 50 to 49.5, the value in the third column changes a while after.
From this example
col A changes from 60 to 50 at 27.353
col b changes from 0 to 10 at 27.513
So it takes .160 secs for the value in col b to change after the value changes in col a.
I would like to be able to use a python script to calculate this time difference, and also the average time difference.
I have just taken out the values to show below
| First Change | | |
|--------------------------|------|----|
| 23-02-03 12:01:27.353000 | 50 | |
| 23-02-03 12:01:27.513000 | | 10 |
| Time diff | | |
| 0.16 | | |
| Second change | | |
| 23-02-03 12:01:27.853000 | 49.5 | |
| 23-02-03 12:01:28.253000 | | 25 |
| Time diff | | |
| 0.4 | | |
| Third change | | |
| 23-02-03 12:01:28.453000 | 48.3 | |
| 23-02-03 12:01:28.743000 | | 33 |
| Time diff | | |
| 0.29 | | |
| Fourth change | | |
| 23-02-03 12:01:29.353000 | 49.1 | |
| 23-02-03 12:01:29.543000 | | 59 |
| 0.19 | | |
| Average Time diff | | |
| 0.26 | | |
thanks
So, I have been able to get the differences by the following code
df['Change 1'] = df['Col1'].diff()
df['Change 2'] = df['Col2'].diff()
This stores when col1 changes and when col2 changes, as seen below. But I am not sure how to get the time diff when between them
| Datetime | Col1 | Col2 | Change 1 | Change 2 |
|----------------------------|------|------|----------|----------|
| 23-02-03 12:01:27.213000 | 60 | 0 | 0 | 0 |
| 23-02-03 12:01:27.243000 | 60 | 0 | 0 | 0 |
| 23-02-03 12:01:27.313000 | 60 | 0 | 0 | 0 |
| 23-02-03 12:01:27.353000 | 50 | 0 | 10 | 0 |
| 23-02-03 12:01:27.413000 | 50 | 0 | 0 | 0 |
| 23-02-03 12:01:27.453000 | 50 | 0 | 0 | 0 |
| 23-02-03 12:01:27.513000 | 50 | 10 | 0 | 10 |
| 23-02-03 12:01:27.553000 | 50 | 10 | 0 | 0 |
| 23-02-03 12:01:27.613000 | 50 | 10 | 0 | 0 |
| 23-02-03 12:01:27.653000 | 50 | 10 | 0 | 0 |
| 23-02-03 12:01:27.713000 | 50 | 10 | 0 | 0 |
| 23-02-03 12:01:27.753000 | 50 | 10 | 0 | 0 |
| 23-02-03 12:01:27.813000 | 50 | 10 | 0 | 0 |
| 23-02-03 12:01:27.853000 | 49.5 | 10 | 0.5 | 0 |
| 23-02-03 12:01:27.913000 | 49.5 | 10 | 0 | 0 |
| 23-02-03 12:01:27.953000 | 49.5 | 10 | 0 | 0 |
| 23-02-03 12:01:28.013000 | 49.5 | 10 | 0 | 0 |
| 23-02-03 12:01:28.053000 | 49.5 | 10 | 0 | 0 |
| 23-02-03 12:01:28.113000 | 49.5 | 10 | 0 | 0 |
| 23-02-03 12:01:28.153000 | 49.5 | 10 | 0 | 0 |
| 23-02-03 12:01:28.213000 | 49.5 | 10 | 0 | 0 |
| 23-02-03 12:01:28.253000 | 49.5 | 25 | 0 | 15 |
| 23-02-03 12:01:28.313000 | 49.5 | 25 | 0 | 0 |
| 23-02-03 12:01:28.353000 | 49.5 | 25 | 0 | 0 |
| 23-02-03 12:01:28.423000 | 49.5 | 25 | 0 | 0 |
| 23-02-03 12:01:28.453000 | 48.3 | 25 | 1.2 | 0 |
| 23-02-03 12:01:28.533000 | 48.3 | 25 | 0 | 0 |
| 23-02-03 12:01:28.553000 | 48.3 | 25 | 0 | 0 |
| 23-02-03 12:01:28.634000 | 48.3 | 25 | 0 | 0 |
| 23-02-03 12:01:28.653000 | 48.3 | 25 | 0 | 0 |
| 23-02-03 12:01:28.743000 | 48.3 | 33 | 0 | 8 |
| 23-02-03 12:01:28.753000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:28.843000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:28.853000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:28.943000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:28.953000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:29.043000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:29.053000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:29.143000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:29.153000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:29.243000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:29.253000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:29.343000 | 48.3 | 33 | 0 | 0 |
| 23-02-03 12:01:29.353000 | 49.1 | 33 | 0.8 | 0 |
| 23-02-03 12:01:29.443000 | 49.1 | 33 | 0 | 0 |
| 23-02-03 12:01:29.463000 | 49.1 | 33 | 0 | 0 |
| 23-02-03 12:01:29.543000 | 49.1 | 59 | 0 | 26 |
| 23-02-03 12:01:29.563000 | 49.1 | 59 | 0 | 0 |
I've had an idea, if I was able to drop the values in-between then this might make it easier to check

Match dtypes of two DataFrames that share columns

I have the following dataframes in pandas:
df:
| ID | country | money | code | money_add | other |
| -------- | -------------- | --------- | -------- | --------- | ----- |
| 832932 | Other | NaN | 00000 | NaN | NaN |
| 217#8# | NaN | NaN | NaN | NaN | NaN |
| 1329T2 | France | 12131 | 00020 | 3452 | 123 |
| 124932 | France | NaN | 00016 | NaN | NaN |
| 194022 | France | NaN | 00000 | NaN | NaN |
df1:
| cod_t | money | money_add | other |
| -------- | ------ | --------- | ----- |
| 00000 | 4532 | 72323 | 321 |
| 00016 | 1213 | 23822 | 843 |
| 00018 | 1313 | 8393 | 183 |
| 00020 | 1813 | 27328 | 128 |
| 00030 | 8932 | 3204 | 829 |
cols = df.columns.intersection(df1.columns)
print (df[cols].dtypes.eq(df1[cols].dtypes))
money False
money_add False
other False
dtype: bool
I want to match the dtypes of the columns of the second dataframe to be equal to those of the first one. Is there any way to do this?
try:
for i in df1.columns.tolist():
df1[f'{i}'] = df1[f'{i}'].astype(df[f'{i}'].dtype)

shift below cells to count for R

I am using the code below to produce following result in Python and I want equivalent for this code on R.
here N is the column of dataframe data . CN column is calculated from values of column N with a specific pattern and it gives me following result in python.
+---+----+
| N | CN |
+---+----+
| 0 | 0 |
| 1 | 1 |
| 1 | 1 |
| 2 | 2 |
| 2 | 2 |
| 0 | 3 |
| 0 | 3 |
| 1 | 4 |
| 1 | 4 |
| 1 | 4 |
| 2 | 5 |
| 2 | 5 |
| 3 | 6 |
| 4 | 7 |
| 0 | 8 |
| 1 | 9 |
| 2 | 10 |
+---+----+
a short overview of my code is
data = pd.read_table(filename,skiprows=15,decimal=',', sep='\t',header=None,names=["Date ","Heure ","temps (s) ","X","Z"," LVDT V(mm) " ,"Force normale (N) ","FT","FN(N) ","TS"," NS(kPa) ","V (mm/min)","Vitesse normale (mm/min)","e (kPa)","k (kPa/mm) " ,"N " ,"Nb cycles normal" ,"Cycles " ,"Etat normal" ,"k imposÈ (kPa/mm)"])
data.columns = [col.strip() for col in data.columns.tolist()]
N = data[data.keys()[15]]
N = np.array(N)
data["CN"] = (data.N.shift().bfill() != data.N).astype(int).cumsum()
an example of data.head() is here
+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+
| Index | Date | Heure | temps (s) | X | Z(mm) | LVDT V(mm) | Force normale (N) | FT | FN(N) | FT (kPa) | NS(kPa) | V (mm/min) | Vitesse normale (mm/min) | e (kPa) | k (kPa/mm) | N | Nb cycles normal | Cycles | Etat normal | k imposÈ (kPa/mm) | CN |
+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+
| 184 | 01/02/2022 | 12:36:52 | 402.163 | 6.910243 | 1.204797 | 0.001101 | 299.783665 | 31.494351 | 1428.988908 | 11.188704 | 505.825016 | 0.1 | 2.0 | 512.438828 | 50.918786 | 0.0 | 0.0 | Sort | Monte | 0.0 | 0 |
| 185 | 01/02/2022 | 12:36:54 | 404.288 | 6.907822 | 1.205647 | 4.9e-05 | 296.072718 | 31.162313 | 1404.195316 | 11.028167 | 494.97955 | 0.1 | -2.0 | 500.084986 | 49.685639 | 0.0 | 0.0 | Sort | Descend | 0.0 | 0 |
| 186 | 01/02/2022 | 12:36:56 | 406.536 | 6.907906 | 1.204194 | -0.000214 | 300.231424 | 31.586401 | 1429.123486 | 11.21895 | 505.750815 | 0.1 | 2.0 | 512.370164 | 50.914002 | 0.0 | 0.0 | Sort | Monte | 0.0 | 0 |
| 187 | 01/02/2022 | 12:36:58 | 408.627 | 6.910751 | 1.204293 | -0.000608 | 300.188686 | 31.754064 | 1428.979519 | 11.244542 | 505.624564 | 0.1 | 2.0 | 512.309254 | 50.906544 | 0.0 | 0.0 | Sort | Monte | 0.0 | 0 |
| 188 | 01/02/2022 | 12:37:00 | 410.679 | 6.907805 | 1.205854 | -0.000181 | 296.358074 | 31.563389 | 1415.224427 | 11.129375 | 502.464948 | 0.1 | 2.0 | 510.702313 | 50.742104 | 0.0 | 0.0 | Sort | Monte | 0.0 | 0 |
+-------+-------------+------------+-----------+----------+----------+------------+-------------------+-----------+-------------+-----------+------------+------------+--------------------------+------------+------------+-----+------------------+--------+-------------+-------------------+----+
A one line cumsum trick solves it.
cumsum(c(0L, diff(df1$N) != 0))
#> [1] 0 1 1 2 2 3 3 4 4 4 5 5 6 7 8 9 10
all.equal(
cumsum(c(0L, diff(df1$N) != 0)),
df1$CN
)
#> [1] TRUE
Created on 2022-02-14 by the reprex package (v2.0.1)
Data
x <- "
+---+----+
| N | CN |
+---+----+
| 0 | 0 |
| 1 | 1 |
| 1 | 1 |
| 2 | 2 |
| 2 | 2 |
| 0 | 3 |
| 0 | 3 |
| 1 | 4 |
| 1 | 4 |
| 1 | 4 |
| 2 | 5 |
| 2 | 5 |
| 3 | 6 |
| 4 | 7 |
| 0 | 8 |
| 1 | 9 |
| 2 | 10 |
+---+----+"
df1 <- read.table(textConnection(x), header = TRUE, sep = "|", comment.char = "+")[2:3]
Created on 2022-02-14 by the reprex package (v2.0.1)

Date time interpretation for Work& break time calculation

I extracted the data from csv and converted to below format after data preparation with python.
I want to further prepare as below to store it as table in DB.
If we see below table, 8th hour from 0 min to 52 min its working time (Status:1)
from 8th hour from 53min to 59min its break (snacks break)(Status:2)
How do i convert it.
Existing
+------+-------+------------+------+------+------+----------+--------+--------+-------+-----+
| | plant | date | shop | line | hour | startmin | endmin | status | shift | uph |
+------+-------+------------+------+------+------+----------+--------+--------+-------+-----+
| 8 | HEF1 | 03-01-2020 | E | 1 | 8 | 0 | 52 | 1 | 2 | 25 |
| 9 | HEF1 | 03-01-2020 | E | 1 | 8 | 53 | 59 | 2 | 2 | 25 |
| 10 | HEF1 | 03-01-2020 | E | 1 | 9 | 0 | 59 | 1 | 2 | 25 |
| 11 | HEF1 | 03-01-2020 | E | 1 | 10 | 0 | 59 | 1 | 2 | 25 |
| 9645 | HEF2 | 27-01-2020 | E | 1 | 7 | 0 | 59 | 1 | 1 | 58 |
| 9646 | HEF2 | 27-01-2020 | E | 1 | 8 | 0 | 52 | 1 | 1 | 58 |
| 9647 | HEF2 | 27-01-2020 | E | 1 | 8 | 53 | 59 | 2 | 1 | 58 |
+------+-------+------------+------+------+------+----------+--------+--------+-------+-----+
I want to convert it to as below
Required
+-------+---------------------+------+------+------+--------+-------+-----+
| plant | datetime | shop | line | hour | status | shift | uph |
+-------+---------------------+------+------+------+--------+-------+-----+
| HEF1 | 03-01-2020 08:00:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:01:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:02:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:03:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:04:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:05:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:06:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:07:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:08:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:09:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:10:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:11:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:12:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:13:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:14:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:15:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:16:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:17:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:18:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:19:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:20:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:21:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:22:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:23:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:24:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:25:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:26:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:27:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:28:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:29:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:30:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:31:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:32:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:33:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:34:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:35:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:36:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:37:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:38:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:39:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:40:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:41:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:42:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:43:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:44:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:45:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:46:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:47:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:48:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:49:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:50:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:51:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:52:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 08:53:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:54:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:55:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:56:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:57:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:58:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 08:59:00 | E | 1 | 8 | 2 | 2 | 25 |
| HEF1 | 03-01-2020 09:00:00 | E | 1 | 8 | 1 | 2 | 25 |
| HEF1 | 03-01-2020 09:01:00 | E | 1 | 8 | 1 | 2 | 25 |
+-------+---------------------+------+------+------+--------+-------+-----+
first create begin and end timestamps:
df['start_ts'] = pd.to_datetime(df['date'].astype(str) +' '+ df['hour'].astype(str)+':'+df['startmin'].astype(str))
df['end_ts'] = pd.to_datetime(df['date'].astype(str) +' '+ df['hour'].astype(str)+':'+df['endmin'].astype(str))
Then create a date range column:
df['t_range'] = [pd.date_range(start=x[0], end=x[1], freq='min') for x in zip(df['start_ts'], df['end_ts'])]
Then explode by that column:
df.explode('t_range')
delete and rename columns as needed

Add columns to DataFrame with difference of specific columns based on values of another column

I have a dataframe that looks something like this the following:
+------------+------------------+--------+-----+-----+---+--------+-----------------------------+
| B_date | B_Time | F_Type | Fix | Est | S | C_Type | C_Time |
+------------+------------------+--------+-----+-----+---+--------+-----------------------------+
| 2019-07-22 | 16:42:27.7325458 | 1 | 100 | 100 | 2 | 2 | 2019-07-22 16:42:47.2129273 |
| 2019-07-22 | 16:44:04.7817750 | 1 | 100 | 100 | 2 | 2 | 2019-07-22 16:45:26.2923547 |
| 2019-07-22 | 16:48:21.5976290 | 1 | 100 | 100 | 7 | | |
| 2019-07-23 | 13:11:20.4519581 | 1 | 100 | 100 | 7 | | |
| 2019-07-23 | 13:28:49.5092331 | 1 | 100 | 100 | 2 | 2 | 2019-07-23 13:28:54.5274793 |
| 2019-07-23 | 13:29:06.6108796 | 1 | 100 | 100 | 2 | 2 | 2019-07-23 13:30:48.5358081 |
| 2019-07-23 | 13:31:12.7684213 | 1 | 100 | 100 | 2 | 3 | 2019-07-23 13:33:50.9405643 |
| 2019-07-25 | 09:32:12.7799801 | 1 | 105 | 105 | 7 | | |
| 2019-07-25 | 09:57:58.4536238 | 1 | 158 | 158 | 4 | | |
| 2019-07-25 | 10:03:22.7888221 | 1 | 152 | 152 | 2 | 2 | 2019-07-25 10:03:27.9576175 |
+------------+------------------+--------+-----+-----+---+--------+-----------------------------+
I need to get output as follows:
+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+
| B_date | B_Time | F_Type | Fix | Est | S | C_Type | C_Time | cancel_diff_1 | cancel_diff_2 | cancel_diff_3 |
+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+
| 2019-07-22 | 2019-07-22 16:42:27.732545800 | 1 | 100 | 100 | 2 | 2 | 2019-07-22 16:42:47.212927300 | NaT | 00:00:19.480381 | NaT |
| 2019-07-22 | 2019-07-22 16:44:04.781775000 | 1 | 100 | 100 | 2 | 2 | 2019-07-22 16:45:26.292354700 | NaT | 00:01:21.510579 | NaT |
| 2019-07-22 | 2019-07-22 16:48:21.597629000 | 1 | 100 | 100 | 7 | NaN | NaT | NaT | NaT | NaT |
| 2019-07-23 | 2019-07-23 13:11:20.451958100 | 1 | 100 | 100 | 7 | NaN | NaT | NaT | NaT | NaT |
| 2019-07-23 | 2019-07-23 13:28:49.509233100 | 1 | 100 | 100 | 2 | 2 | 2019-07-23 13:28:54.527479300 | NaT | 00:00:05.018246 | NaT |
+------------+-------------------------------+--------+-----+-----+---+--------+-------------------------------+---------------+-----------------+---------------+
I have actually done it using a function but it and assigning and checking for values which you can say is a python way, I want to do it in simple pandas.
IIUC try this:
df['B_Time']=df['B_Date']+' '+df['B_Time']
df['B_Time']=pd.to_datetime(df['B_Time'])
df.loc[df['C_Type']==1.0, 'diff_1']=df.loc[df['C_Type']==1, 'C_Time']-df.loc[df['C_Time']==1, 'B_Time']
df.loc[df['C_Type']==2.0, 'diff_2']=df.loc[df['C_Type']==2, 'C_Time']-df.loc[df['C_Time']==2, 'B_Time']
df.loc[df['C_Type']==3.0, 'diff_3']=df.loc[df['C_Type']==3, 'C_Time']-df.loc[df['C_Time']==3, 'B_Time']

Categories

Resources