Append list to pandas DataFrame as new row with index

Append list to pandas DataFrame as new row with index - python

Despite of the numerous stack overflow questions on appending data to a dataframe I could not really find an answer to the following.
I am looking for a straight forward solution to append a list as last row of a dataframe.
Imagine I have a simple dataframe:
indexlist=['one']
columnList=list('ABC')
values=np.array([1,2,3])
# take care, the values array is a 3x1 size array.
# row has to be 1x3 so we have to reshape it
values=values.reshape(1,3)
df3=pd.DataFrame(values,index=indexlist,columns=columnList)
print(df3)
A B C
one 1 2 3
After some operations I get the following list:
listtwo=[4,5,6]
I want to append it at the end of the dataframe.
I change that list into a series:
oseries=pd.Series(listtwo)
print(type(oseries))
oseries.name="two"
now, this does not work:
df3.append(oseries)
since it gives:
A B C 0 1 2
one 1.0 2.0 3.0 NaN NaN NaN
two NaN NaN NaN 5.0 6.0 7.0
I would like to have the values under A B and C.
I also tried:
df3.append(oseries, columns=list('ABC')) *** not working ***
df3.append(oseries, ignore_index=True) *** working but wrong result
df3.append(oseries, ignore_index=False) *** working but wrong result
df3.loc[oseries.name]=oseries adds a row with NaN values
what I am looking for is
a) how can I add a list to a particular index name
b) how can I simple add a row of values out of a list even if I don't have a name for index (leave it empty)

Either assign in-place with loc:
df.loc['two'] = [4, 5, 6]
# df.loc['two', :] = [4, 5, 6]
df
A B C
one 1 2 3
two 4 5 6
Or, use df.append with the second argument being a Series object having appropriate index and name:
s = pd.Series(dict(zip(df.columns, [4, 5, 6])).rename('two'))
df2 = df.append(s)
df2
A B C
one 1 2 3
two 4 5 6
If you are appending to a DataFrame without an index (i.e., having a numeric index), you can use loc after finding the max of the index and incrementing by 1:
df4 = pd.DataFrame(np.array([1,2,3]).reshape(1,3), columns=list('ABC'))
df4
A B C
0 1 2 3
df4.loc[df4.index.max() + 1, :] = [4, 5, 6]
df4
A B C
0 1.0 2.0 3.0
1 4.0 5.0 6.0
Or, using append with ignore_index=True:
df4.append(pd.Series(dict(zip(df4.columns, [4, 5, 6]))), ignore_index=True)
A B C
0 1 2 3
1 4 5 6

Without index
lst1 = [1,2,3]
lst2 = [4,5,6]
p1 = pd.DataFrame([lst1])
p2 = p1.append([lst2], ignore_index = True)
p2.columns = list('ABC')
p2
A B C
0 1 2 3
1 4 5 6
With index
lst1 = [1,2,3]
lst2 = [4,5,6]
p1 = pd.DataFrame([lst1], index = ['one'], columns = list('ABC'))
p2 = p1.append(pd.DataFrame([lst2], index = ['two'], columns = list('ABC')))
p2
A B C
one 1 2 3
two 4 5 6

Related

Creating new column taking single value from column of another dataframe

I have two dataframes. The first one is df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4]}) i.e
A B
0 5 2
1 0 4
another one is df2 = pd.DataFrame({'C': [1, 1], 'D': [3, 3]}) i.e
C D
0 1 3
1 1 3
I want want to grab only 4 from df1 and make new column in df2. I have tried this df2['E']=df1['B'][df1['B']==4] and got
C D E
0 1 3 NaN
1 1 3 4.0
I want both rows of df2 to be 4. How can I achieve this? Any help would be immense help.

if the value '4' appears as the last value in your column( like your example), you could do:
df2['E'].fillna(method= 'backfill')
for other methods, have a look here:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html

It is not actually clear what you wanna accomplish here, but I assume you would like to check if there is any "4" in df1 (column B) and then filling all rows in df2 (column E) with "4". Then you could do:
import numpy as np
df2['E'] = np.where(df1['B'].isin([4]).any(), 4, np.nan)
Output:
C D E
0 1 3 4.0
1 1 3 4.0

pandas most efficient way to execute arithmetic operations on multiple dataframe columns

my first post!
I'm running python 3.8.5 & pandas 1.1.0 on jupyter notebooks.
I want to divide several columns by the corresponding elements in another column of the same dataframe.
For example:
import pandas as pd
df = pd.DataFrame({'a': [2, 3, 4], 'b': [4, 6, 8], 'c':[6, 9, 12]})
df
a b c
0 2 4 6
1 3 6 9
2 4 8 12
I'd like to divide columns 'b' & 'c' by the corresponding values in 'a' and substitute the values in 'b' and 'c' with the result of this division. So the above dataframe becomes:
a b c
0 2 2 3
1 3 2 3
2 4 2 3
I tried
df.iloc[: , 1:] = df.iloc[: , 1:] / df['a']
but this gives:
a b c
0 2 NaN NaN
1 3 NaN NaN
2 4 NaN NaN
I got it working by doing:
for colname in df.columns[1:]:
df[colname] = (df[colname] / df['a'])
Is there a faster way of doing the above by avoiding the for loop?
thanks,
mk

Almost there, use div with axis=0:
df.iloc[:,1:] = df.iloc[:,1:].div(df.a, axis=0)

df.b= df.b/df.a
df.c=df.c/df.a
or
df[['b','c']]=df.apply(lambda x: x[['b','c']]/x.a ,axis=1)

Filter pandas data frame for NaN value without isnull

I have a list A:
A = [nan, 2, 3, 4, 6]
And a pandas data frame df:
index X Y
0 A NaN
1 B 2
2 C 6
3 D 4
4 E 3
I'd like to create a list comprehension to get a list of the index where each value in the list equals column Y. Usually I would do this:
B = [df[df.Y == x].index[0] for x in A]
However, this doesn't work for the first element of A, nan. Obviously I could do the above with a normal for loop and using isnull, as below, but is there a way to do it with a list comprehension?
B = []
for x in A:
if pd.isnull(x):
B.append(df[pd.isnull(df.Y)].index[0])
else:
B.append(df[df.Y == x])
Expected result:
B = [0,1,4,3,2]

Giving you exactly what you want (and by essentially just re-purposing your existing if statement), try:
B = [df[pd.isnull(df.Y)].index[0] if pd.isnull(x) else df[df.Y == x].index[0] for x in A]

Using merge , about how it work check the link Why does pandas merge on NaN?
A = [np.nan, 2, 3, 4, 6]
pd.DataFrame({'Y':A}).merge(df,how='left')
Out[394]:
Y index X
0 NaN 0 A
1 2.0 1 B
2 3.0 4 E
3 4.0 3 D
4 6.0 2 C

sum values in different rows and columns dataframe python

My Data Frame
A B C D
2 3 4 5
1 4 5 6
5 6 7 8
How do I add values of different rows and different columns
Column A Row 2 with Column B row 1
Column A Row 3 with Column B row 2
Similarly for all rows

If you only need do this with two columns (and I understand your question well), I think you can use the shift function.
Your data frame (pandas?) is something like:
d = {'A': [2, 1, 5], 'B': [3, 4, 6], 'C': [4, 5, 7], 'D':[5, 6, 8]}
df = pd.DataFrame(data=d)
So, it's possible to create a new data frame with B column shifted:
df2 = df['B'].shift(1)
which gives:
0 NaN
1 3.0
2 4.0
Name: B, dtype: float64
and then, merge this new data with the previous df and, for example, sum the values:
df = df.join(df2, rsuffix='shift')
df['out'] = df['A'] + df['Bshift']
The final output is in out column:
A B C D Bshift out
0 2 3 4 5 NaN NaN
1 1 4 5 6 3.0 4.0
2 5 6 7 8 4.0 9.0
But it's only an intuition, I'm not sure about your question!

Find the column name which has the 2nd maximum value for each row (pandas)

Based on this post: Find the column name which has the maximum value for each row it is clear how to get the column name with the max value of each row using df.idxmax(axis=1).
The question is, how can I get the 2nd, 3rd and so on maximum value per row?

You need numpy.argsort for position and then reorder columns names by indexing:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
A B C D E
0 8 8 3 7 7
1 0 4 2 5 2
2 2 2 1 0 8
3 4 0 9 6 2
4 4 1 5 3 4
arr = np.argsort(-df.values, axis=1)
df1 = pd.DataFrame(df.columns[arr], index=df.index)
print (df1)
0 1 2 3 4
0 A B D E C
1 D B C E A
2 E A B C D
3 C D A E B
4 C A E D B
Verify:
#first column
print (df.idxmax(axis=1))
0 A
1 D
2 E
3 C
4 C
dtype: object
#last column
print (df.idxmin(axis=1))
0 C
1 A
2 D
3 B
4 B
dtype: object

While there is no method to find specific ranks within a row, you can rank elements in a pandas dataframe using the rank method.
For example, for a dataframe like this:
df = pd.DataFrame([[1, 2, 4],[3, 1, 7], [10, 4, 2]], columns=['A','B','C'])
>>> print(df)
A B C
0 1 2 4
1 3 1 7
2 10 4 2
You can get the ranks of each row by doing:
>>> df.rank(axis=1,method='dense', ascending=False)
A B C
0 3.0 2.0 1.0
1 2.0 3.0 1.0
2 1.0 2.0 3.0
By default, applying rank to dataframes and using method='dense' will result in float ranks. This can be easily fixed just by doing:
>>> ranks = df.rank(axis=1,method='dense', ascending=False).astype(int)
>>> ranks
A B C
0 3 2 1
1 2 3 1
2 1 2 3
Finding the indices is a little trickier in pandas, but it can be resumed to apply a filter on a condition (i.e. ranks==2):
>>> ranks.where(ranks==2)
A B C
0 NaN 2.0 NaN
1 2.0 NaN NaN
2 NaN 2.0 NaN
Applying where will return only the elements matching the condition and the rest set to NaN. We can retrieve the columns and row indices by doing:
>>> ranks.where(ranks==2).notnull().values.nonzero()
(array([0, 1, 2]), array([1, 0, 1]))
And for retrieving the column index or position within a row, which is the answer to your question:
>>> ranks.where(ranks==2).notnull().values.nonzero()[0]
array([1, 0, 1])
For the third element you just need to change the condition in where to ranks.where(ranks==3) and so on for other ranks.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Append list to pandas DataFrame as new row with index - python

Related

Creating new column taking single value from column of another dataframe

pandas most efficient way to execute arithmetic operations on multiple dataframe columns

Filter pandas data frame for NaN value without isnull

sum values in different rows and columns dataframe python

Find the column name which has the 2nd maximum value for each row (pandas)

Categories

Resources