merge the specific row in from two dataframe - python

I have df like this
df1:
Name A B C
a b r t y U
0 xyz 1 2 3 4 3 4
1 abc 3 5 4 7 7 8
2 pqr 2 4 4 5 4 6
df2:
Name A B C
a b r t y U
0 xyz Nan Nan Nan Nan Nan Nan
1 abc 2 4 5 7 7 9
2 pqr Nan Nan Nan Nan Nan Nan
i want df like this
Name A B C
a b r t y U
0 xyz Nan Nan Nan Nan Nan Nan
1 abc 5 9 9 14 14 17
2 pqr Nan Nan Nan Nan Nan Nan
basically i want the sum of abc row only

First check what is columns names, obviously it is tuple ('Name', '') here, so set to index and then sum it:
print (df1.columns.tolist())
print (df2.columns.tolist())
df1 = df1.set_index([('Name', '')])
df2 = df2.set_index([('Name', '')])
#set by position
#df1 = df1.set_index([df1.columns[0]])
#df2 = df2.set_index([df2.columns[0]])
df = df1.add(df2)
Or:
df = df1 + df2

Related

How to create column based on previous value within the same group in pandas DataFrame?

I have a dataframe as below and want to create a column (date_fill) by filling the value in "date" column with the previous value if the value in "name" column is the same.
Input:
name group date
0 a sdf NaN
1 a dss NaN
2 a fff 2022-10-23
3 b ggg NaN
4 b few NaN
5 b mjf NaN
6 c ggj NaN
7 c ojg NaN
8 c ert NaN
9 c dfg 2022-11-03
Expected output:
name group date date_fill
0 a sdf NaN 2022-10-23
1 a dss NaN 2022-10-23
2 a fff 2022-10-23 2022-10-23
3 b ggg NaN NaN
4 b few NaN NaN
5 b mjf NaN NaN
6 c ggj NaN 2022-11-03
7 c ojg NaN 2022-11-03
8 c ert NaN 2022-11-03
9 c dfg 2022-11-03 2022-11-03
I have tried below, but it didn't work.
def date_before(df):
if df['name']==df['name'].shift(-1):
val = df['date'].shift(-1)
else:
val = np.NaN
return val
df['date_fill'] = df.apply(date_before, axis=1)
Thanks in advance.
Try this:
df['date_fill'] = df.groupby('name')['date'].ffill()
ffill replaces null values with the value from previous rows, just group by name for your case
df = df.groupby('name')
df['date_fill']=df['date'].ffill().shift(-1)

Create dataframe with hierarchical indices and extra columns from non-hierarchically indexed dataframe

Consider a simple dataframe:
import numpy as np
import pandas as pd
x = pd.DataFrame(np.arange(10).reshape(5,2))
print(x)
0 1
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
I would like to create a hierarchically indexed dataframe of the form:
0 1
a b a b
0 0 NaN 1 NaN
1 2 NaN 3 NaN
2 4 NaN 5 NaN
3 6 NaN 7 NaN
4 8 NaN 9 NaN
where the 'a' columns correspond to the original dataframe columns and the 'b' columns are blank (or nan).
I can certainly create a hierarchically indexed dataframe with all NaNs and loop over the columns of the original dataframe, writing them into
the new dataframe. Is there something more compact than that?
you can do with MultiIndex.from_product
extra_level = ['a', 'b']
new_cols = pd.MultiIndex.from_product([x.columns, extra_level])
x.columns = new_cols[::len(x.columns)] # select all the first element of extra_level
x = x.reindex(columns=new_cols)
print(x)
0 1
a b a b
0 0 NaN 1 NaN
1 2 NaN 3 NaN
2 4 NaN 5 NaN
3 6 NaN 7 NaN
4 8 NaN 9 NaN
Very much like #Ben.T I am using MultiIndex.from_product:
x.assign(l='a')
.set_index('l', append=True)
.unstack()
.reindex(pd.MultiIndex.from_product([x.columns.tolist(), ['a','b']]), axis=1)
Output:
0 1
a b a b
0 0 NaN 1 NaN
1 2 NaN 3 NaN
2 4 NaN 5 NaN
3 6 NaN 7 NaN
4 8 NaN 9 NaN

How to insert an empty column after each column in an existing dataframe

I have a dataframe that looks as follows
df = pd.DataFrame({"A":[1,2,3,4],
"B":[3,4,5,6],
"C":[2,3,4,5]})
I would like to insert an empty column (with type string) after each existing column in the dataframe, such that the output looks like
A col1 B col2 C col3
0 1 NaN 3 NaN 2 NaN
1 2 NaN 4 NaN 3 NaN
2 3 NaN 5 NaN 4 NaN
3 4 NaN 6 NaN 5 NaN
Actually there's a much more simple way thanks to reindex:
df.reindex([x for i, c in enumerate(df.columns, 1) for x in (c, f'col{i}')], axis=1)
Result:
A col1 B col2 C col3
0 1 NaN 3 NaN 2 NaN
1 2 NaN 4 NaN 3 NaN
2 3 NaN 5 NaN 4 NaN
3 4 NaN 6 NaN 5 NaN
Here's the other more complicated way:
import numpy as np
df.join(pd.DataFrame(np.empty(df.shape, dtype=object), columns=df.columns + '_sep')).sort_index(axis=1)
A A_sep B B_sep C C_sep
0 1 None 3 None 2 None
1 2 None 4 None 3 None
2 3 None 5 None 4 None
3 4 None 6 None 5 None
This solution worked for me:
merged = pd.concat([myDataFrame, pd.DataFrame(columns= [' '])], axis=1)
This is what you can do:
for count in range(len(df.columns)):
df.insert(count*2+1, str('col'+str(count+1)), 'NaN')
print(df)
Output:
A col1 B col2 C col3
0 1 NaN 3 NaN 2 NaN
1 2 NaN 4 NaN 3 NaN
2 3 NaN 5 NaN 4 NaN
3 4 NaN 6 NaN 5 NaN

pd.wide_to_long() lost data

I'm very new to Python. I've tried to reshape a data set using pd.wide_to_long. The original dataframe looks like this:
chk1 chk2 chk3 ... chf1 chf2 chf3 id var1 var2
0 3 4 2 ... nan nan nan 1 1 0
1 4 4 4 ... nan nan nan 2 1 0
2 2 nan nan ... 3 4 3 3 0 1
3 3 3 3 ... 3 2 2 4 1 0
I used the following code:
df2 = pd.wide_to_long(df,
stubnames=['chk', 'chf'],
i=['id', 'var1', 'var2'],
j='type')
When checking the data after these codes, it looks like this
chk chf
id var1 var2 egenskap
1 1 0 1 3 nan
2 4 nan
3 2 nan
4 nan nan
5 4 nan
6 nan nan
7 4 nan
8 4 nan
2 1 0 1 4 nan
2 4 nan
3 4 nan
4 5 nan
But when I check the columns in the new data set, it seems that all columns except 'chk' and 'chf' are gone!
df2.columns
Out[47]: Index(['chk', 'chf'], dtype='object')
df2.columns
for col in df2.columns:
print(col)
chk
chf
From the dataview it looks like 'id', 'var1', 'var2' have been merged into one common index:
Screenprint dataview here
Can someone please help me? :)

How to fill and merge df with 10 empty rows?

how to fill df with empty rows or create a df with empty rows.
have df :
df = pd.DataFrame(columns=["naming","type"])
how to fill this df with empty rows
Specify index values:
df = pd.DataFrame(columns=["naming","type"], index=range(10))
print (df)
naming type
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
If need empty strings:
df = pd.DataFrame('',columns=["naming","type"], index=range(10))
print (df)
naming type
0
1
2
3
4
5
6
7
8
9

Categories

Resources