To fill a dataframe column by values in another dataframe column values - python

I have 2 dataframes:
df1 =
item sale
0 7 10.0
1 4 10.0
2 6 10.0
3 5 10.0
4 5 10.0
5 6 10.0
6 4 10.0
df2 =
item sale
0 1 7
1 2 6
2 3 5
3 4 4
4 5 3
I want to change the values of df1 sales column, taking the values from df2 sales column.
I use the code:
df1.loc[df1.item.isin(df2.item), ['sale']] = df2[['sale']]
And I get
df1 =
item sale
0 7 10.0
1 4 6.0
2 6 10.0
3 5 4.0
4 5 3.0
5 6 10.0
6 4 NaN
The output I wanted was:
df1 =
item sale
0 7 10.0
1 4 4.0
2 6 10.0
3 5 3.0
4 5 3.0
5 6 10.0
6 4 4.0

The two dataframes are related by item number. So, set the item number as the index on both dataframes, run an update method on df1 with df2, and reset index
df1 = df1.set_index("item")
df1.update(df2.set_index("item"))
df1.reset_index()
item sale
0 7 10.0
1 4 4.0
2 6 10.0
3 5 3.0
4 5 3.0
5 6 10.0
6 4 4.0

Related

Jupyter Notebook for csv file to select 3 window rolling [duplicate]

This question already has answers here:
Rolling or sliding window iterator?
(29 answers)
Closed 1 year ago.
I have this big data in csv file:
I manage to open this on Jupyter Notebook.
The data in csv example: 1 2 3 4 5 6 7 8 9 10
And I wanted to open in the notebook as '3 windows rolling' without doing any (sum,mean for example)
The output I want in the notebook are>>
First open csv to get first column.
import pandas as pd
df = pd.read_csv("filename.csv")
I will use io only to simulate data from file
text = """first
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
Result
first
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
Next you can use shift to create other columns
df['second'] = df['first'].shift(-1)
df['third'] = df['first'].shift(-2)
Result
first second third
0 1 2.0 3.0
1 2 3.0 4.0
2 3 4.0 5.0
3 4 5.0 6.0
4 5 6.0 7.0
5 6 7.0 8.0
6 7 8.0 9.0
7 8 9.0 10.0
8 9 10.0 NaN
9 10 NaN NaN
At the end you can remove two last rows with NaN and convert all to integer
df = df[:-2].astype(int)
or if you don't have NaN in other places
df = df.dropna().astype(int)
Result:
first second third
0 1 2 3
1 2 3 4
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
6 7 8 9
7 8 9 10
Minimal working code
text = """first
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['first'])
print(df)
df['second'] = df['first'].shift(-1) #, fill_value=0)
df['third'] = df['first'].shift(-2)
print(df)
#df = df.dropna().astype(int)
df = df[:-2].astype(int)
print(df)
EDIT:
The same using for-loop to create any number of columns
text = """col 1
1
2
3
4
5
6
7
8
9
10"""
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text))
#df = pd.DataFrame(range(1,11), columns=['col 1'])
print(df)
number = 5
for x in range(1, number+1):
df[f'col {x+1}'] = df['col 1'].shift(-x)
print(df)
#df = df.dropna().astype(int)
df = df[:-number].astype(int)
print(df)
Result
col 1
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
col 1 col 2 col 3 col 4 col 5 col 6
0 1 2.0 3.0 4.0 5.0 6.0
1 2 3.0 4.0 5.0 6.0 7.0
2 3 4.0 5.0 6.0 7.0 8.0
3 4 5.0 6.0 7.0 8.0 9.0
4 5 6.0 7.0 8.0 9.0 10.0
5 6 7.0 8.0 9.0 10.0 NaN
6 7 8.0 9.0 10.0 NaN NaN
7 8 9.0 10.0 NaN NaN NaN
8 9 10.0 NaN NaN NaN NaN
9 10 NaN NaN NaN NaN NaN
col 1 col 2 col 3 col 4 col 5 col 6
0 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10

How to fill NaN in one column depending from values two different columns

I have a dataframe with three columns. Two of them are group and subgroup, adn the third one is a value. I have some NaN values in the values column. I need to fiil them by median values,according to group and subgroup.
I made a pivot table with double index and the median of target column. But I don`t understand how to get this values and put them into original dataframe
import pandas as pd
df=pd.DataFrame(data=[
[1,1,'A',1],
[2,1,'A',3],
[3,3,'B',8],
[4,2,'C',1],
[5,3,'A',3],
[6,2,'C',6],
[7,1,'B',2],
[8,1,'C',3],
[9,2,'A',7],
[10,3,'C',4],
[11,2,'B',6],
[12,1,'A'],
[13,1,'C'],
[14,2,'B'],
[15,3,'A']],columns=['id','group','subgroup','value'])
print(df)
id group subgroup value
0 1 1 A 1
1 2 1 A 3
2 3 3 B 8
3 4 2 C 1
4 5 3 A 3
5 6 2 C 6
6 7 1 B 2
7 8 1 C 3
8 9 2 A 7
9 10 3 C 4
10 11 2 B 6
11 12 1 A NaN
12 13 1 C NaN
13 14 2 B NaN
14 15 3 A NaN
df_struct=df.pivot_table(index=['group','subgroup'],values='value',aggfunc='median')
print(df_struct)
value
group subgroup
1 A 2.0
B 2.0
C 3.0
2 A 7.0
B 6.0
C 3.5
3 A 3.0
B 8.0
C 4.0
Will be thankfull for any help
Use pandas.DataFrame.groupby.transform then fillna:
id group subgroup value
0 1 1 A 1.0
1 2 1 A NaN # < Value with nan
2 3 3 B 8.0
3 4 2 C 1.0
4 5 3 A 3.0
5 6 2 C 6.0
6 7 1 B 2.0
7 8 1 C 3.0
8 9 2 A 7.0
9 10 3 C 4.0
10 11 2 B 6.0
df['value'] = df['value'].fillna(df.groupby(['group', 'subgroup'])['value'].transform('median'))
print(df)
Output:
id group subgroup value
0 1 1 A 1.0
1 2 1 A 1.0
2 3 3 B 8.0
3 4 2 C 1.0
4 5 3 A 3.0
5 6 2 C 6.0
6 7 1 B 2.0
7 8 1 C 3.0
8 9 2 A 7.0
9 10 3 C 4.0
10 11 2 B 6.0

GroupBy and Transform does not keep all columns of dataframe

Let's suppose that I am having the following dataset:
Stock_id Week Stock_value
1 1 2
1 2 4
1 4 7
1 5 1
2 3 8
2 4 6
2 5 5
2 6 3
I want to shift the values of the Stock_value column by one position so that I get the following:
Stock_id Week Stock_value
1 1 NA
1 2 2
1 4 4
1 5 7
2 3 NA
2 4 8
2 5 6
2 6 5
What I am doing is the following:
df = pd.read_csv('C:/Users/user/Desktop/test.txt', keep_default_na=True, sep='\t')
df = df.groupby('Store_id', as_index=False)['Waiting_time'].transform(lambda x:x.shift(periods=1))
But then this gives me:
Waiting_time
0 NaN
1 2.0
2 4.0
3 7.0
4 NaN
5 8.0
6 6.0
7 5.0
So it gives me the values shifted but it does not retain all the columns of the dataframe.
How do I also retain all the columns of the dataframe along with shifting the values of one column?
You can simplify solution by DataFrameGroupBy.shift and assign back to new column:
df['Waiting_time'] = df.groupby('Stock_id')['Stock_value'].shift()
Working same like:
df['Waiting_time']=df.groupby('Stock_id')['Stock_value'].transform(lambda x:x.shift(periods=1))
print (df)
Stock_id Week Stock_value Waiting_time
0 1 1 2 NaN
1 1 2 4 2.0
2 1 4 7 4.0
3 1 5 1 7.0
4 2 3 8 NaN
5 2 4 6 8.0
6 2 5 5 6.0
7 2 6 3 5.0
When you do df.groupby('Store_id', as_index=False)['Waiting_time'], you obtain a DataFrame with a single column 'Waiting_time', you can't generate the other columns from that.
As suggested by jezrael in the comment, you should do
df['new col'] = df.groupby('Store_id...
to add this new column to your previously existing DataFrame.

How to do forward filling for each group in pandas

I have a dataframe similar to below
id A B C D E
1 2 3 4 5 5
1 NaN 4 NaN 6 7
2 3 4 5 6 6
2 NaN NaN 5 4 1
I want to do a null value imputation for columns A, B, C in a forward filling but for each group. That means, I want the forward filling be applied on each id. How can I do that?
Use GroupBy.ffill for forward filling per groups for all columns, but if first values per groups are NaNs there is no replace, so is possible use fillna and last casting to integers:
print (df)
id A B C D E
0 1 2.0 3.0 4.0 5 NaN
1 1 NaN 4.0 NaN 6 NaN
2 2 3.0 4.0 5.0 6 6.0
3 2 NaN NaN 5.0 4 1.0
cols = ['A','B','C']
df[['id'] + cols] = df.groupby('id')[cols].ffill().fillna(0).astype(int)
print (df)
id A B C D E
0 1 2 3 4 5 NaN
1 1 2 4 4 6 NaN
2 2 3 4 5 6 6.0
3 2 3 4 5 4 1.0
Detail:
print (df.groupby('id')[cols].ffill().fillna(0).astype(int))
id A B C
0 1 2 3 4
1 1 2 4 4
2 2 3 4 5
3 2 3 4 5
Or:
cols = ['A','B','C']
df.update(df.groupby('id')[cols].ffill().fillna(0))
print (df)
id A B C D E
0 1 2.0 3.0 4.0 5 NaN
1 1 2.0 4.0 4.0 6 NaN
2 2 3.0 4.0 5.0 6 6.0
3 2 3.0 4.0 5.0 4 1.0

Select a range of column base on value of another column in pandas

My dataset looks like this(first row is header)
0 1 2 3 4 5
1 3 4 6 2 3
3 8 9 3 2 4
2 2 3 2 1 2
I want to select a range of columns of the dataset based on the column [5], e.g:
1 3 4
3 8 9 3
2 2
I have tried the following, but it did not work:
df.iloc[:,0:df['5'].values]
Let's try:
df.apply(lambda x: x[:x.iloc[5]], 1)
Output:
0 1 2 3
0 1.0 3.0 4.0 NaN
1 3.0 8.0 9.0 3.0
2 2.0 2.0 NaN NaN
Recreate your dataframe
df=pd.DataFrame([x[:x[5]] for x in df.values]).fillna(0)
df
Out[184]:
0 1 2 3
0 1 3 4.0 0.0
1 3 8 9.0 3.0
2 2 2 0.0 0.0

Categories

Resources