Pandas: Replacing a row with another data frame - python

I have two data frames: df1
and df2
Now I want to replace one of rows of df1 (highlighted in red colour) with all values of df2. I try with following codes but didn't give the desired result. Here is the code:
df1[df1['Category_2']=='Specified Functionality'].update(df2)
I also tried:
df1[df1['Category_2']=='Specified Functionality'] = df2
Could anyone guide me where I am making the mistake?

You can insert the rows like this:
row = 13
df2 = df2.rename(columns = {'Functionality': 'Category_2')
df = pd.concat([df1[0:row], df2, df1[row+1:]]).reset_index(drop=True)

Related

pandas dataframe: remove all rows that includes in other dataframe

I have pandas dataframe like below:
dataframe 1 (name: df)
as you can see: each (A,B,C) has n X's and V's
and I made outlier df as
df_outlier = df[(df["V"] > 150)]
Then, I want to remove all (A,B,C) that includes in df_outlier
for example, if df_outlier looks like below:
I want to remove below rows from original dataframe:
First, I tried below codes:
df_filtered = pd.merge(df, df_outlier, indicator=True, how = 'outer').query('_merge=="left_only"').drop(['_merge'],axis=1)
However, it only remove rows in df_outlier, not all (a,b,c) rows in df_outlier
Sorry for my poor English skills, so if you fell harder to understand..
Just select the column in df_outlier for check
df_filtered = pd.merge(df, df_outlier[['A','B','C']], indicator=True, how = 'outer').query('_merge=="left_only"').drop(['_merge'],axis=1)

How to assign values to the rows of a data frame which satisfy certain conditions?

I have two data frames:
df1 = pd.read_excel("test1.xlsx")
df2 = pd.read_excel("test2.xlsx")
I am trying to assign values of df1 to df2 where a certain condition is met (Column1 is equal to Column1 then assign values of ColY to ColX).
df1.loc[df1['Col1'] == df2['Col1'],'ColX'] = df2['ColY']
This results in an error as df2['ColY] is the whole column. How do i assign for only the rows that match?
You can use numpy.where:
import numpy as np
df1['ColX'] = np.where(df1['Col1'].eq(df2['Col1']), df2['ColY'], df1['ColX'])
Since you wanted to assign from df1 to df2 your code should have been
df2.loc[df1['Col1']==df2['Col2'],'ColX']=df1.['ColY']
The code you wrote won't assign the values from df1 to df2, but from df2 to df1.
And also if you could clarify to which dataframe ColX and ColY belong to I could help more(Or does both dataframe have them??).
Your code is pretty much right!!! Only change the df1 and df2 as above.

How to add a column from df1 to df2 if it not present in df2, else do nothing

I have 2 data frame from a basic web scrape using Pandas (below). The second table has less columns than the first, and I need to concat the dataframes. I have been manually inserting columns for a while but seeing as they change frequently I would like to have a function that can assess the columns in df2, check whether they are all in df2, and if not, add the column, with the data from df2.
import pandas as pd
link = 'https://en.wikipedia.org/wiki/Opinion_polling_for_the_next_French_presidential_election'
df = pd.read_html(link,header=0)
df1 = df[1]
df1 = df1.drop([0])
df1 = df1.drop('Abs.',axis=1)
df2 = df[2]
df2 = df2.drop([0])
df2 = df2.drop(['Abs.'],axis=1)
Many thanks,
#divingTobi's answer:
pd.concat([df1, df2]) does the trick.

Pandas Pivot Table Column with empty value do not show

I need your help:
I have dataframe d3:
i am using pivot_table
df4 = df3.pivot_table(index = ['Number','Department','Task'], columns="Date", values="Score",fill_value = 'N/A')
output d4 looks like:
why is not showing rows where Task empty is.
What i am doing wrong?
I would like to create dataframe like this:
I think here is necessary replace misisng values before pivot_table:
cols = ['Number','Department','Task']
df[cols] = df[cols].fillna('N/A')

How to merge columns interspersing the data?

I'm new to python and pandas and working to create a Pandas MultiIndex with two independent variables: flow and head which create a dataframe and I have 27 different design points. It's currently organized in a single dataframe with columns for each variable and rows for each design point.
Here's how I created the MultiIndex:
flow = df.loc[0, ["Mass_Flow_Rate", "Mass_Flow_Rate.1",
"Mass_Flow_Rate.2"]]
dp = df.loc[:,"Design Point"]
index = pd.MultiIndex.from_product([dp, flow], names=
['DP','Flows'])
I then created three columns of data:
df0 = df.loc[:,"Head2D"]
df1 = df.loc[:,"Head2D.1"]
df2 = df.loc[:,"Head2D.1"]
And want to merge these into a single column of data such that I can use this command:
pc = pd.DataFrame(data, index=index)
Using the three columns with the same indexes for the rows (0-27), I want to merge the columns into a single column such that the data is interspersed. If I call the columns col1, col2 and col3 and I denote the index in parentheses such that col1(0) indicates column1 index 0, I want the data to look like:
col1(0)
col2(0)
col3(0)
col1(1)
col2(1)
col3(1)
col1(2)...
it is a bit confusing. But what I understood is that you are trying to do this:
flow = df.loc[0, ["Mass_Flow_Rate", "Mass_Flow_Rate.1",
"Mass_Flow_Rate.2"]]
dp = df.loc[:,"Design Point"]
index = pd.MultiIndex.from_product([dp, flow], names=
['DP','Flows'])
df0 = df.loc[:,"Head2D"]
df1 = df.loc[:,"Head2D.1"]
df2 = df.loc[:,"Head2D.1"]
data = pd.concat[df0, df1, df2]
pc = pd.DataFrame(data=data, index=index)

Categories

Resources