conditional dataframe shift - python

I have below dataframe
ID1 ID2 mon price
10 2 06 500
20 3 07 200
20 3 08 300
20 3 09 400
21 2 07 100
21 2 08 200
21 2 09 300
Required output :-
ID1 ID2 mon price ID1_shift ID2_shift mon_shift price_shift
10 2 06 500 10 2 06 500
20 3 07 200 20 3 07 200
20 3 08 300 20 3 07 200
20 3 09 400 20 3 08 300
21 2 07 100 21 2 07 100
21 3 08 200 21 2 07 100
21 4 09 300 21 3 08 200
I tried using df.shift() by different ways but was not successfull.
YOur valueable comments will be helpful.
I want to shift dataframe group by (ID1,ID2) and if NaN then fill with current values.
I tried below but it works with single column.
df["price_shift"]=df.groupby(["ID1","ID2"]).price.shift().fillna(df["price"])
Thanks
I came up with below , but this is feasible for less no of columns. Is there any way where complete row can be shifted with group by as above ?
df1['price_shift']=df.groupby(['ID1','ID2']).price.shift(1).fillna(df['price'])
df1['mon_shift']=df.groupby(['ID1','ID2']).mon.shift(1).fillna(df['mon'])
df1[['ID1_shift','ID2_shift']]=df[['ID1','ID2']]
df2=pd.concat([df, df1],axis=1)
df2

try the below:
for column_name in df.columns:
df[column_name+"_shift"]=df[column_name]
cheers

Related

add months in an existing data frame python

Year
Price
2017
200
2018
250
2019
300
Given the table above, is there a way to add months to each year ? For eg: 2017 should have months jan to dec and the same price carried forward in all of the 12 months for all the years listed in a data frame in Pandas?
Year
Price
2017/01/01
200
2017/02/01
200
2017/03/01
200
2017/04/01
200
2017/05/01
200
There's probably a better answer out there (I know very little Pandas), but one thing that comes to mind is:
Get the date represented by your numeric "Year". That will give you January 1st at midnight in that Year. You can drop the time part (the "hour", if you may) and keep just the date (January 1st of that year)
At this point you'll have your first row being January (month 1). Then you can replicate the row changing the "Year"'s month to 2 (February), 3 (March)... until... 12 (December) and insert it back in the Dataframe
import pandas as pd
df = pd.DataFrame([
{"Year": 2017, "Price": 200},
{"Year": 2018, "Price": 300},
{"Year": 2019, "Price": 400},
])
df["Year"] = pd.to_datetime(df["Year"], format='%Y').dt.date
for idx, row in df.iterrows():
for i in range(2, 13):
row["Year"] = row["Year"].replace(month=i)
df = pd.concat([df, row.to_frame().T])
df = df.sort_values(['Year']).reset_index(drop=True)
print(df)
# Year Price
# 0 2017-01-01 200
# 1 2017-02-01 200
# 2 2017-03-01 200
# 3 2017-04-01 200
# 4 2017-05-01 200
# 5 2017-06-01 200
# 6 2017-07-01 200
# 7 2017-08-01 200
# 8 2017-09-01 200
# 9 2017-10-01 200
# 10 2017-11-01 200
# 11 2017-12-01 200
# 12 2018-01-01 300
# 13 2018-02-01 300
# 14 2018-03-01 300
# 15 2018-04-01 300
# 16 2018-05-01 300
# 17 2018-06-01 300
# 18 2018-07-01 300
# 19 2018-08-01 300
# 20 2018-09-01 300
# 21 2018-10-01 300
# 22 2018-11-01 300
# 23 2018-12-01 300
# 24 2019-01-01 400
# 25 2019-02-01 400
# 26 2019-03-01 400
# 27 2019-04-01 400
# 28 2019-05-01 400
# 29 2019-06-01 400
# 30 2019-07-01 400
# 31 2019-08-01 400
# 32 2019-09-01 400
# 33 2019-10-01 400
# 34 2019-11-01 400
# 35 2019-12-01 400
You could try this:
df.columns = [i.strip() for i in df.columns]
df['Year'] = df['Year'].apply(lambda x: pd.date_range(start=str(x), end=str(x+1), freq='1M').strftime('%m'))
df = df.explode('Year').reset_index(drop=True)
>>>df
Year Price
0 01 200
1 02 200
2 03 200
3 04 200
4 05 200
5 06 200
6 07 200
7 08 200
8 09 200
9 10 200
10 11 200
11 12 200
12 01 250
13 02 250
14 03 250
15 04 250
16 05 250
17 06 250
18 07 250
19 08 250
20 09 250
21 10 250
22 11 250
23 12 250
24 01 300
25 02 300
26 03 300
27 04 300
28 05 300
29 06 300
30 07 300
31 08 300
32 09 300
33 10 300
34 11 300
35 12 300
Create a dataframe with months 1-12
Cross merge that with your original data
Create a date out of the year, month, and day 1
Sample code:
years = [2017, 2018, 2019, 2020, 2021, 2022]
prices = [200, 250, 300, 350, 350, 317]
your_df = pd.DataFrame(data=[(x, y) for x, y in zip(years, prices)], columns=["Year","Price"])
months = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
m_df = pd.DataFrame(data=months, columns=["Month"])
final_df = full_df.merge(your_df, how="cross")
final_df["Year"] = [datetime(y, m, 1) for y,m in zip(full_df.Year, full_df.Month)]
final_df = final_df.drop(columns="Month")
final_df

How to check the nearest matching value between two fields in same table and add data to the third field using Pandas?

I have one table:
Index
Month_1
Month_2
Paid
01
12
10
02
09
03
03
02
04
04
01
08
The output should be:
Index
Month_1
Month_2
Paid
01
12
10
Yes
02
09
03
03
02
04
Yes
04
01
08
Logic: Add 'Yes' to the Paid field whose Month_1 and Month_2 are nearby
You can subtract columns, get absolute values and compare if equal or less like threshold, e.g. 2 and then set values in numpy.where:
df['Paid'] = np.where(df['Month_1'].sub(df['Month_2']).abs().le(2), 'Yes','')
print (df)
Index Month_1 Month_2 Paid
0 01 12 10 Yes
1 02 9 3
2 03 2 4 Yes
3 04 1 8

How to find cumulative sum of specific column in CSV file

I have a csv file in the format:
20 05 2019 12:00:00, 100
21 05 2019 12:00:00, 200
22 05 2019 12:00:00, 480
And i want to access the second variable, ive tried a variety of different alterations but none have worked.
Initially i tried
import pandas as pd
import numpy as np
col = [i for i in range(2)]
col[1] = "Power"
data = pd.read_csv('FILENAME.csv', names=col)
df1 = data.sum(data, axis=1)
df2 = np.cumsum(df1)
print(df2)
You can use cumsum function:
data['Power'].cumsum()
Output:
0 100
1 300
2 780
Name: Power, dtype: int64
Use df.cumsum:
In [1820]: df = pd.read_csv('FILENAME.csv', names=col)
In [1821]: df
Out[1821]:
0 Power
0 20 05 2019 12:00:00 100
1 21 05 2019 12:00:00 200
2 22 05 2019 12:00:00 480
In [1823]: df['cumulative sum'] = df['Power'].cumsum()
In [1824]: df
Out[1824]:
0 Power cumulative sum
0 20 05 2019 12:00:00 100 100
1 21 05 2019 12:00:00 200 300
2 22 05 2019 12:00:00 480 780

Pandas create column for mean of grouped count

I have a grouped dataframe that looks as follows:
player_id shot_type count
01 03 3
02 01 3
03 2
03 01 4
I want to add an additional column which is the mean of the shot_type counts by player_id which would look as follows:
player_id shot_type count mean_shot_type_count_player
01 03 3 (3+2)/2
02 01 3 (3+4)/2
03 2 (3+2)/2
03 01 4 (3+4)/2
Use GroupBy.transform:
df['mean_shot_type_count_player']=df.groupby('shot_type')['count'].transform('mean')
print(df)
Output:
player_id shot_type count mean_shot_type_count_player
0 01 03 3 2.5
1 02 01 3 3.5
2 03 2 2.5
3 03 01 4 3.5

How to create a column for each year from a single date column containing year and month?

If I have a Data
Date Values
2005-01 10
2005-02 20
2005-03 30
2006-01 40
2006-02 50
2006-03 70
How can I change Year Column? like this
Date 2015 2016
01 10 40
02 20 50
03 30 70
Thanks.
You can use split with pivot:
df[['year','month']] = df.Date.str.split('-', expand=True)
df = df.pivot(index='month', columns='year', values='Values')
print (df)
year 2005 2006
month
01 10 40
02 20 50
03 30 70

Categories

Resources