pandas plotting 2 dataframes with same column names - python

I have 2 dataframes, containing same indexes and same column names (10 columns
For example:
from df1
A B C
1 0 4 8
2 5 6 9
3 2 5 1
from df2:
A B C
1 9 4 5
2 1 4 2
3 5 5 1
I want to plot on the same graph, column A from df1 vs column A from df2, column B from df1 vs column B from df2, etc..and this for every column.
how could I do that with pandas and matplotlib

This is one of the way to do:
import pandas as pd
import matplotlib.pyplot as plt
d1 = {'A':[0,5,2],'B':[4,6,5],'C':[8,9,1]}
d2 = {'A':[9,1,5],'B':[4,4,5],'C':[5,2,1]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
df1_a = df1['A'].tolist()
df1_b = df1['B'].tolist()
df2_a = df2['A'].tolist()
df2_b = df2['B'].tolist()
plt.plot(df1_a, df1_b, 'r')
plt.plot(df2_a, df2_b, 'b')
plt.show()

Assuming that df1 and df2 be your dataframes you could use the below code which loops over all columns and saves the plots for you as well.
import matplotlib.pyplot as plt
import pandas as pd
for column in df1.columns:
x = df1[column]
y = df2[column]
if len(x) != len(y):
x_ind = x.index
y_ind = y.index
common_ind = x_ind.intersection(y_ind)
x = x[common_ind]
y = y[common_ind]
plt.scatter(x,y)
plt.savefig("plot" +column+".png")
plt.clf()
Hope this helps!

Related

Merging df in python

Say I have two DataFrames
df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])
I want to merge so that any values in df1 are overwritten in there is a value in df2 at that location and any new values in df2 are added including the new rows and columns.
The result should be:
A B C
0 1 3 nan
1 2 8 10
2 nan 9 11
I've tried combine_first but that causes only nan values to be overwritten
updated has the issue where new rows are created rather than overwritten
merge has many issues.
I've tried writing my own function
def take_right(df1, df2, j, i):
print (df1)
print (df2)
try:
s1 = df1[j][i]
except:
s1 = np.NaN
try:
s2 = df2[j][i]
except:
s2 = np.NaN
if math.isnan(s2):
#print(s1)
return s1
else:
# print(s2)
return s2
def combine_df(df1, df2):
rows = (set(df1.index.values.tolist()) | set(df2.index.values.tolist()))
#print(rows)
columns = (set(df1.columns.values.tolist()) | set(df2.columns.values.tolist()))
#print(columns)
df = pd.DataFrame()
#df.columns = columns
for i in rows:
#df[:][i]=[]
for j in columns:
df = df.insert(int(i), j, take_right(df1,df2,j,i), allow_duplicates=False)
# print(df)
return df
This won't add new columns or rows to an empty DataFrame.
Thank you!!
One approach is to create an empty output dataframe with the union of columns and indices from df1 and df2 and then use the df.update method to assign their values into the out_df
import pandas as pd
df1 = pd.DataFrame({'A':[1,2], 'B':[3,4]}, index = [0,1])
df2 = pd.DataFrame({'B':[8,9], 'C':[10,11]}, index = [1,2])
out_df = pd.DataFrame(
columns = df1.columns.union(df2.columns),
index = df1.index.union(df2.index),
)
out_df.update(df1)
out_df.update(df2)
out_df
Why does combine_first not work?
df = df2.combine_first(df1)
print(df)
Output:
A B C
0 1.0 3 NaN
1 2.0 8 10.0
2 NaN 9 11.0

combining two dataframes into one new dataframe in a zig zag/zipper way

I have df1 and df2, i want to create new data frame df3, such that the first record of df3 should be first record from df1, second record of df3 should be first record of df2. and it continues in the similar manner.
I tried many methods with pandas, but didn't get answer.
Is there any ways to achieve it.
You can create a column with incremental id (one with odd numbers and other with even numbers:
import numpy as np
df1['unique_id'] = np.arange(0, df1.shape[0]*2,2)
df2['unique_id'] = np.arange(1, df2.shape[0]*2,2)
and then append them and sort by this column:
df3 = df1.append(df2)
df3 = df3.sort_values(by=['unique_id'])
after which you can drop the column you created:
df3 = df3.drop(columns=['unique_id'])
You could do it this way:
import pandas as pd
df1 = pd.DataFrame({'A':[3,3,4,6], 'B':['a1','b1','c1','d1']})
df2 = pd.DataFrame({'A':[5,4,6,1], 'B':['a2','b2','c2','d2']})
dfff = pd.DataFrame()
for i in range(0,4):
dfx = pd.concat([df1.iloc[i].T, df2.iloc[i].T])
dfff = pd.concat([dfff, dfx])
print(pd.concat([df1, df2]).sort_index(kind='merge'))
Which gives
A B
0 3 a1
0 5 a2
1 3 b1
1 4 b2
2 4 c1
2 6 c2
3 6 d1
3 1 d2

Create DataFrame with multiple arrays by column

I'm creating a DataFrame with pandas. The source is from multiple arrays, but I want to create DataFrames column by column, not row by row in default pandas.Dataframe() function.
pd.DataFrame seems to have lack of 'axis=' parameter, how can I achieve this goal?
You might use python's built-in zip for that following way:
import pandas as pd
arrayA = ['f','d','g']
arrayB = ['1','2','3']
arrayC = [4,5,6]
df = pd.DataFrame(zip(arrayA, arrayB, arrayC), columns=['AA','NN','gg'])
print(df)
Output:
AA NN gg
0 f 1 4
1 d 2 5
2 g 3 6
Zip is a great solution in this case as pointed out by Daweo, but alternatively you can use a dictionary for readability purposes:
import pandas as pd
arrayA = ['f','d','g']
arrayB = ['1','2','3']
arrayC = [4,5,6]
my_dict = {
'AA': arrayA,
'NN': arrayB,
'gg': arrayC
}
df = pd.DataFrame(my_dict)
print(df)
Output
AA NN gg
0 f 1 4
1 d 2 5
2 g 3 6

How I can merge the columns into a single column in Python?

I want to merge 3 columns into a single column. I have tried changing the column types. However, I could not do it.
For example, I have 3 columns such as A: {1,2,4}, B:{3,4,4}, C:{1,1,1}
Output expected: ABC Column {131, 241, 441}
My inputs are like this:
df['ABC'] = df['A'].map(str) + df['B'].map(str) + df['C'].map(str)
df.head()
ABC {13.01.0 , 24.01.0, 44.01.0}
The type of ABC seems object and I could not change via str, int.
df['ABC'].apply(str)
Also, I realized that there are NaN values in A, B, C column. Is it possible to merge these even with NaN values?
# Example
import pandas as pd
import numpy as np
df = pd.DataFrame()
# Considering NaN's in the data-frame
df['colA'] = [1,2,4, np.NaN,5]
df['colB'] = [3,4,4,3,np.NaN]
df['colC'] = [1,1,1,4,1]
# Using pd.isna() to check for NaN values in the columns
df['colA'] = df['colA'].apply(lambda x: x if pd.isna(x) else str(int(x)))
df['colB'] = df['colB'].apply(lambda x: x if pd.isna(x) else str(int(x)))
df['colC'] = df['colC'].apply(lambda x: x if pd.isna(x) else str(int(x)))
# Filling the NaN values with a blank space
df = df.fillna('')
# Transform columns into string
df = df.astype(str)
# Concatenating all together
df['ABC'] = df.sum(axis=1)
A workaround your NaN problem could look like this but now NaN will be 0
import numpy as np
df = pd.DataFrame({'A': [1,2,4, np.nan], 'B':[3,4,4,4], 'C':[1,np.nan,1, 3]})
df = df.replace(np.nan, 0, regex=True).astype(int).applymap(str)
df['ABC'] = df['A'] + df['B'] + df['C']
output
A B C ABC
0 1 3 1 131
1 2 4 0 240
2 4 4 1 441
3 0 4 3 043

How to rename a pandas DataFrame index based on a column condition

I have a DataFrame like this:
import pandas as pd
df = pd.DataFrame(data= {"x": [1,2,3,4],"y":[5,6,7,8],"i":["a.0","a.1","a.0","a.1"]}).set_index("i")
df
Out:
x y
i
a.0 1 5
a.1 2 6
a.0 3 7
a.1 4 8
and I want to rename the index based on a column condition:
df.loc[df["y"]>6].rename(index=lambda x: x+ ">6" )
what gives me:
x y
i
a.0>6 3 7
a.1>6 4 8
I tried it with inplace=True, but it does not work
df.loc[df["y"]>6].rename(index=lambda x: x+ ">6" , inplace=True )
I only could get it done by resetting the index, changing the i-column-values via apply and set the index again:
df1 = df.reset_index()
df1.loc[df1["y"]>6, "i"] = df1.loc[df1["y"]>6, "i"].apply(lambda x: x+ ">6" )
df1.set_index("i", inplace=True)
df1
Out:
x y
i
a.0 1 5
a.1 2 6
a.0>6 3 7
a.1>6 4 8
But this is so complicated.
Do you know if there is an easier way?
How about trying this?
import numpy as np
df.index=np.where(df['y']>6, df.index+'>6', df.index)

Categories

Resources