How to add rows into existing dataframe in pandas? - python - python

df = pd.DataFrame({'a':[1,2,3,4],'b':[5,6,7,8],'c':[9,10,11,12]})
How can I insert a new row of zeros at index 0 in one single line?
I tried pd.concat([pd.DataFrame([[0,0,0]]),df) but it did not work.
The desired output:
a b c
0 0 0 0
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12

You can concat the temp df with the original df but you need to pass the same column names so that it aligns in the concatenated df, additionally to get the index as you desire call reset_index with drop=True param.
In [87]:
pd.concat([pd.DataFrame([[0,0,0]], columns=df.columns),df]).reset_index(drop=True)
Out[87]:
a b c
0 0 0 0
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12

alternatively to EdChum's solution you can do this:
In [163]: pd.DataFrame([[0,0,0]], columns=df.columns).append(df, ignore_index=True)
Out[163]:
a b c
0 0 0 0
1 1 5 9
2 2 6 10
3 3 7 11
4 4 8 12

An answer more specific to the dataframe being prepended to
pd.concat([df.iloc[[0], :] * 0, df]).reset_index(drop=True)

Related

On DataFrame.pivot(), different result with what I expected

I'm referring to
https://github.com/pandas-dev/pandas/tree/main/doc/cheatsheet.
As you can see, if I use pivot(), then all values are in row number 0 and 1.
But if I do use pivot(), the result was different like below.
DataFrame before pivot():
DataFrame after pivot():
Is the result on purpose?
In your data, the grey column (index of the row) is missing:
df = pd.DataFrame({'variable': list('aaabbbccc'), 'value': range(9)})
print(df)
# Output
variable value
0 a 0
1 a 1
2 a 2
3 b 3
4 b 4
5 b 5
6 c 6
7 c 7
8 c 8
Add the grey column:
df['grey'] = df.groupby('variable').cumcount()
print(df)
# Output
variable value grey
0 a 0 0
1 a 1 1
2 a 2 2
3 b 3 0
4 b 4 1
5 b 5 2
6 c 6 0
7 c 7 1
8 c 8 2
Now you can pivot:
df = df.pivot('grey', 'variable', 'value')
print(df)
# Output
variable a b c
grey
0 0 3 6
1 1 4 7
2 2 5 8
Take the time to read How can I pivot a dataframe?

Create a new df column with the maximum of the other columns involving shift()

I have a DF as follows:
d1=pd.DataFrame({'a':[0,1,0,1,2,4,5,1,7,8,0],'b':[0,1,2,4,5,7,1,3,1,0,1]})
I want to produce a new column called 'new' that is based on the maximum of column a and the previous row's value of column b (i.e. 'b' shift()).
I tried doing the follow:
df'[new']=df[['a',df['b'].shift()]].max(axis=1)
Yet I get some mutable error message.
My desired output is as follows:
IIUC, you can try:
df['new_col'] = df.assign(b = df['b'].shift(fill_value = 0)).max(axis =1)
OUTPUT:
a b new_col
0 0 0 0
1 1 1 1
2 0 2 1
3 1 4 2
4 2 5 4
5 4 7 5
6 5 1 7
7 1 3 1
8 7 1 7
9 8 0 8
10 0 1 0
NOTE: If you wanna be more specific about the columns.
df['new_col'] = df.assign(b=df['b'].shift(fill_value=0))[
['a', 'b']].max(axis=1)
df['new_col'] = pd.concat([df['b'].shift(fill_value=0), df['a']], 1).max(1)

How to set value to a cell filtered by rows in python DataFrame?

import pandas as pd
df = pd.DataFrame([[1,2,3],[4,5,6],[7,8,9],[10,11,12]],columns=['A','B','C'])
df[df['B']%2 ==0]['C'] = 5
I am expecting this code to change the value of columns C to 5, wherever B is even. But it is not working.
It returns the table as follow
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 10 11 12
I am expecting it to return
A B C
0 1 2 5
1 4 5 6
2 7 8 5
3 10 11 12
If need change value of column in DataFrame is necessary DataFrame.loc with condition and column name:
df.loc[df['B']%2 ==0, 'C'] = 5
print (df)
A B C
0 1 2 5
1 4 5 6
2 7 8 5
3 10 11 12
Your solution is nice example of chained indexing - docs.
You could just change the order to:
df['C'][df['B']%2 == 0] = 5
And it also works
Using numpy where
df['C'] = np.where(df['B']%2 == 0, 5, df['C'])
Output
A B C
0 1 2 5
1 4 5 6
2 7 8 5
3 10 11 12

Pandas: how to add row values by index value

I'm having trouble working out how to add the index value of a pandas dataframe to each value at that index. For example, if I have a dataframe of zeroes, the row with index 1 should have a value of 1 for all columns. The row at index 2 should have values of 2 for each column, and so on.
Can someone enlighten me please?
You can use pd.DataFrame.add with axis=0. Just remember, as below, to convert your index to a series first.
df = pd.DataFrame(np.random.randint(0, 10, (5, 5)))
print(df)
0 1 2 3 4
0 3 4 2 2 2
1 9 6 1 8 0
2 2 9 0 5 3
3 3 1 1 7 0
4 2 6 3 6 6
df = df.add(df.index.to_series(), axis=0)
print(df)
0 1 2 3 4
0 3 4 2 2 2
1 10 7 2 9 1
2 4 11 2 7 5
3 6 4 4 10 3
4 6 10 7 10 10

Index of matching rows in Pandas DataFrame [Python]

I have two Pandas DataFrames (A and B) with 2 columns and different number of rows.
They used to be numpy 2D matrices and they both contain integer values.
Is there any way to retrieve the indices of matching rows between those two?
I've been trying isin() or query() or merge(), without success.
This is actually a follow-up to a previous question: I'm trying with pandas dataframes since the original matrices are rather huge.
The desired output, if possible, should be an array (or list) containing in i-th position the row index in B for the i-th row of A. E.g an output list of [1,5,4] means that the first row of A has been found in first row of B, the second row of A has been found in fifth row in B and the third row of A has been found in forth row in B.
i would do it this way:
In [199]: df1.reset_index().merge(df2.reset_index(), on=['a','b'])
Out[199]:
index_x a b index_y
0 1 9 1 17
1 3 4 0 4
or like this:
In [211]: pd.merge(df1.reset_index(), df2.reset_index(), on=['a','b'], suffixes=['_1','_2'])
Out[211]:
index_1 a b index_2
0 1 9 1 17
1 3 4 0 4
data:
In [201]: df1
Out[201]:
a b
0 1 9
1 9 1
2 8 1
3 4 0
4 2 0
5 2 2
6 2 9
7 1 1
8 4 3
9 0 4
In [202]: df2
Out[202]:
a b
0 3 5
1 5 0
2 7 8
3 6 8
4 4 0
5 1 5
6 9 0
7 9 4
8 0 9
9 0 1
10 6 9
11 6 7
12 3 3
13 5 1
14 4 2
15 5 0
16 9 5
17 9 1
18 1 6
19 9 5
Without merging, you can use == and then look if on each row there is False.
df1 = pd.DataFrame({'a':[0,1,2,3,4],'b':[0,1,2,3,4]})
df2 = pd.DataFrame({'a':[0,1,2,3,4],'b':[2,1,2,2,4]})
test = pd.DataFrame(index = df1.index,columns = ['test'])
for row in df1.index:
if False in (df1 == df2).loc[row].values:
test.ix[row,'test'] = False
else:
test.ix[row,'test'] = True
Out[1]:
test
0 False
1 True
2 True
3 False
4 True

Categories

Resources