Matching the column names of two pandas data-frames in python - python

I have two pandas dataframes with names df1 and df2 such that
`
df1: a b c d
1 2 3 4
5 6 7 8
and
df2: b c
12 13
I want the result be like
result: b c
2 3
6 7
Here it should be noted that a b c d are the column names in pandas dataframe. The shape and values of both pandas dataframe are different. I want to match the column names of df2 with that of column names of df1 and select all the rows of df1 the headers of which are matched with the column names of df2.. df2 is only used to select the specific columns of df1 maintaining all the rows. I tried some code given below but that gives me an empty index.
df1.columns.intersection(df2.columns)
The above code is not giving me my resut as it gives index headers with no values. I want to write a code in which I can give my two dataframes as input and it compares the columns headers for selection. I don't have to hard code column names.

I believe you need:
df = df1[df1.columns.intersection(df2.columns)]
Or like #Zero pointed in comments:
df = df1[df1.columns & df2.columns]

Or, use reindex
In [594]: df1.reindex(columns=df2.columns)
Out[594]:
b c
0 2 3
1 6 7
Also as
In [595]: df1.reindex(df2.columns, axis=1)
Out[595]:
b c
0 2 3
1 6 7

Alternatively to intersection:
df = df1[df1.columns.isin(df2.columns)]

Related

grouping and printing the maximum in a dataframe in python

A dataframe has 3 Columns
A B C
^0hand(%s)leg$ 27;30 42;54
^-(%s)hand0leg 39;30 47;57
^0hand(%s)leg$ 24;33 39;54
So column A has regex patterns like this if those patterns are similar for example now row 1 and row 3 is similar so it has to merge the two rows and output only the maximum as below:
Output:
A B C
^0hand(%s)leg$ 27;33 42;54
^-(%s)hand0leg 39;30 47;57
Any leads will be helpful
You could use:
(df.set_index('A').stack()
.str.extract('(\d+);(\d+)').astype(int)
.groupby(level=[0,1]).agg(max).astype(str)
.assign(s=lambda d: d[0]+';'+d[1])['s'] # OR # .apply(';'.join, axis=1)
.unstack(1)
.loc[df['A'].unique()] ## only if the order of rows matters
.reset_index()
)
output:
A B C
0 ^0hand(%s)leg$ 27;33 42;54
1 ^-(%s)hand0leg 39;30 47;57

Search string in dataframe column that contains lists of string and return complete dataframe

I have a dataframe df which has 4 columns 'A','B','C','D'
I have to search for a substring in each column and return the complete dataframe in the search order for example if I get the substring in column B row 3,4,5 then my final df would be having
3 rows. For this I am using df[df['A'].str.contains('string_to _search') and it's working fine but one of the column consist each element in the column as list of strings like in column B
A B C D
0 asdfg [asdfgh, cvb] asdfg nbcjsh
1 fghjk [ertyu] fghhjk yrewf
2 xcvb [qwerr, hjklk, bnm] cvbvb gjfsjgf
3 ertyu [qwert] ertyhhu ertkkk
so df[df['A'].str.contains('string_to _search') is not working for column B pls suggest how can I search in this column and maintain the order of complete dataframe.
There are lists in column B, so need in statement:
df1 = df[df['B'].apply(lambda x: 'cvb' in x)]
print (df1)
A B C D
0 asdfg [asdfgh, cvb] asdfg nbcjsh
If want use str.contains then is possible use str.join first, so is possible search also substrings:
df1 = df[df['B'].str.join(' ').str.contains('er')]
print (df1)
A B C D
1 fghjk [ertyu] fghhjk yrewf
2 xcvb [qwerr, hjklk, bnm] cvbvb gjfsjgf
3 ertyu [qwert] ertyhhu ertkkk
If want search in all columns:
df2 = (df[df.assign(B = df['B'].str.join(' '))
.apply(' '.join, axis=1)
.str.contains('g')]
)
print (df2)
A B C D
0 asdfg [asdfgh, cvb] asdfg nbcjsh
1 fghjk [ertyu] fghhjk yrewf
2 xcvb [qwerr, hjklk, bnm] cvbvb gjfsjgf

how to merge csv cells by pandas

I have below dataframe.
I want it transform it to below with merging cells with same value in a column
Anyone can provide some sample code?
try this,
df.loc[df.duplicated(['A', 'B']),['A', 'B']]=''
Get duplicate values and mask the values to empty string.
I/P:
A B C
0 1 a A
1 1 a B
2 2 b C
3 2 b A
O/P:
A B C
0 1 a A
1 B
2 2 b C
3 A
note: You can't exactly merge cells using pandas, the idea is suppressing values except first record
Based on the sample data generated by #mohamed thasin ah,
df.groupby(['A', 'B'], as_index=False).agg(', '.join)
A B C
0 1 a A, B
1 2 b C, A
so try:
df.groupby(['cd', 'ci', 'ui', 'module_behavior', 'feature_behavior', 'at']).agg(', '.join)
The output that you want seems to be an Excel file. If that is the case, I suggest :
df.groupby(['cn', 'ci', 'ui', 'module_behaviour', 'feature_behaviour', 'at']).apply(
lambda x: x.sort_values('caseid')).to_excel('filename.xlsx')
Pandas will groupby those columns and turn them into mutilevel indexes, and to_excel saves the DataFrame to an Excel file with the default setting merge_cells=True.

Pandas DataFrames: Extract Information and Collapse Columns

I have a pandas DataFrame which contains information in columns which I would like to extract into a new column.
It is best explained visually:
df = pd.DataFrame({'Number Type 1':[1,2,np.nan],
'Number Type 2':[np.nan,3,4],
'Info':list('abc')})
The Table shows the initial DataFrame with Number Type 1 and NumberType 2 columns.
I would like to extract the types and create a new Type column, refactoring the DataFrame accordingly.
basically, Numbers are collapsed into the Number columns, and the types extracted into the Type column. The information in the Info column is bound to the numbers (f.e. 2 and 3 have the same information b)
What is the best way to do this in Pandas?
Use melt with dropna:
df = df.melt('Info', value_name='Number', var_name='Type').dropna(subset=['Number'])
df['Type'] = df['Type'].str.extract('(\d+)')
df['Number'] = df['Number'].astype(int)
print (df)
Info Type Number
0 a 1 1
1 b 1 2
4 b 2 3
5 c 2 4
Another solution with set_index and stack:
df = df.set_index('Info').stack().rename_axis(('Info','Type')).reset_index(name='Number')
df['Type'] = df['Type'].str.extract('(\d+)')
df['Number'] = df['Number'].astype(int)
print (df)
Info Type Number
0 a 1 1
1 b 1 2
2 b 2 3
3 c 2 4

Check values in dataframe against another dataframe and append values if present

I have two dataframes as follows:
DF1
A B C
1 2 3
4 5 6
7 8 9
DF2
Match Values
1 a,d
7 b,c
I want to match DF1['A'] with DF2['Match'] and append DF2['Values'] to DF1 if the value exists
So my result will be:
A B C Values
1 2 3 a,d
7 8 9 b,c
Now I can use the following code to match the values but it's returning an empty dataframe.
df1 = df1[df1['A'].isin(df2['Match'])]
Any help would be appreciated.
Instead of doing a lookup, you can do this in one step by merging the dataframes:
pd.merge(df1, df2, how='inner', left_on='A', right_on='Match')
Specify how='inner' if you only want records that appear in both, how='left' if you want all of df1's data.
If you want to keep only the Values column:
pd.merge(df1, df2.set_index('Match')['Values'].to_frame(), how='inner', left_on='A', right_index=True)

Categories

Resources