iloc[] by value columns

iloc[] by value columns - python

I want to use iloc with value in column.
df1 = pd.DataFrame({'col1': ['1' ,'1','1','2','2','2','2','2','3' ,'3','3'],
'col2': ['A' ,'B','C','D','E','F','G','H','I' ,'J','K']})
I want to select index 2 in each column value as data frame and the result will be like
col1 col2
1 C
2 F
3 K
Thank you so much

Use GroupBy.nth:
df2 = df1.groupby('col1', as_index=False).nth(2)
Alternative with GroupBy.cumcount:
df2 = df1[df1.groupby('col1').cumcount().eq(2)]
print (df2)
col1 col2
2 1 C
5 2 F
10 3 K

Use GroupBy.nth with as_index=False:
df1.groupby('col1', as_index=False).nth(2)
output:
col1 col2
2 1 C
5 2 F
10 3 K

df1.groupby('col1').agg(lambda ss:ss.iloc[2])
col2
col1
1 C
2 F
3 K

Related

Pandas: Split dataframe with duplicate values into dataframe with unique values

I have a dataframe in Pandas with duplicate values in Col1:
Col1
a
a
b
a
a
b
What I want to do is to split this df into different df-s with unique Col1 values in each.
DF1:
Col1
a
b
DF2:
Col1
a
b
DF3:
Col1
a
DF4:
Col1
a
Any suggestions ?

I don't think you can achieve this in a vectorial way.
One possibility is to use a custom function to iterate the items and keep track of the unique ones. Then use this to split with groupby:
def cum_uniq(s):
i = 0
seen = set()
out = []
for x in s:
if x in seen:
i+=1
seen = set()
out.append(i)
seen.add(x)
return pd.Series(out, index=s.index)
out = [g for _,g in df.groupby(cum_uniq(df['Col1']))]
output:
[ Col1
0 a,
Col1
1 a
2 b,
Col1
3 a,
Col1
4 a
5 b]
intermediate:
cum_uniq(df['Col1'])
0 0
1 1
2 1
3 2
4 3
5 3
dtype: int64
if order doesn't matter
Let's ad a Col2 to the example:
Col1 Col2
0 a 0
1 a 1
2 b 2
3 a 3
4 a 4
5 b 5
the previous code gives:
[ Col1 Col2
0 a 0,
Col1 Col2
1 a 1
2 b 2,
Col1 Col2
3 a 3,
Col1 Col2
4 a 4
5 b 5]
If order does not matter, you can vectorize it:
out = [g for _,g in df.groupby(df.groupby('Col1').cumcount())]
output:
[ Col1 Col2
0 a 0
2 b 2,
Col1 Col2
1 a 1
5 b 5,
Col1 Col2
3 a 3,
Col1 Col2
4 a 4]

Sum column in one dataframe based on row value of another dataframe

Say, I have one data frame df:
a b c d e
0 1 2 dd 5 Col1
1 2 3 ee 9 Col2
2 3 4 ff 1 Col4
There's another dataframe df2:
Col1 Col2 Col3
0 1 2 4
1 2 3 5
2 3 4 6
I need to add a column sum in the first dataframe, wherein it sums values of columns in the second dataframe df2, based on values of column e in df1.
Expected output
a b c d e Sum
0 1 2 dd 5 Col1 6
1 2 3 ee 9 Col2 9
2 3 4 ff 1 Col4 0
The Sum value in the last row is 0 because Col4 doesn't exist in df2.
What I tried: Writing some lamdas, apply function. Wasn't able to do it.
I'd greatly appreciate the help. Thank you.

Try
df['Sum']=df.e.map(df2.sum()).fillna(0)
df
Out[89]:
a b c d e Sum
0 1 2 dd 5 Col1 6.0
1 2 3 ee 9 Col2 9.0
2 3 4 ff 1 Col4 0.0

Try this. The following solution sums all values for a particular column if present in df2 using apply method and returns 0 if no such column exists in df2.
df1.loc[:,"sum"]=df1.loc[:,"e"].apply(lambda x: df2.loc[:,x].sum() if(x in df2.columns) else 0)

Use .iterrows() to iterate through a data frame pulling out the values for each row as well as index.
A nest for loop style of iteration can be used to grab needed values from the second dataframe and apply them to the first
import pandas as pd
df1 = pd.DataFrame(data={'a': [1,2,3], 'b': [2,3,4], 'c': ['dd', 'ee', 'ff'], 'd': [5,9,1], 'e': ['Col1','Col2','Col3']})
df2 = pd.DataFrame(data={'Col1': [1,2,3], 'Col2': [2,3,4], 'Col3': [4,5,6]})
df1['Sum'] = df1['a'].apply(lambda x: None)
for index, value in df1.iterrows():
sum = 0
for index2, value2 in df2.iterrows():
sum += value2[value['e']]
df1['Sum'][index] = sum
Output:
a b c d e Sum
0 1 2 dd 5 Col1 6
1 2 3 ee 9 Col2 9
2 3 4 ff 1 Col3 15

Create new Dataframe from matching two dataframe index's

I'm looking create a new dataframe from data in two separate dataframes - effectively matching the index of each cell and input into a two column dataframe. My real datasets have the exact same number of rows and columns, FWIW. Example below:
DF1:
Col1 Col2 Col3
1 2 3
3 8 7
DF2:
Col1 Col2 Col3
A B E
R S W
Desired Dataframe:
Col1 Col2
1 A
2 B
3 E
3 R
8 S
7 W
Thank you for your help!

here is your code
df3 = pd.Series(df1.values.ravel('F'))
df4 = pd.Series(df2.values.ravel('F'))
df = pd.concat([df3, df4], axis=1)

Use, DataFrame.to_numpy and .flatten:
df = pd.DataFrame(
{'Col1': df1.to_numpy().flatten(), 'Col2': df2.to_numpy().flatten()})
# print(df)
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

You can do it easily like so:
list1 = df1.values.tolist()
list1 = [item for sublist in list1 for item in sublist]
list2 = df2.values.tolist()
list2 = [item for sublist in list2 for item in sublist]
df = {
'Col1': list1,
'Col2': list2
}
df = DataFrame(df)
print(df)
Hope this helps :)

pd.concat(map(lambda x: x.unstack().sort_index(level=-1), (df1, df2)), axis=1).reset_index(drop=True).rename(columns=['Col1', 'Col2'].__getitem__)
Result:
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

Another way (alternative):
pd.concat((df1.stack(),df2.stack()),axis=1).add_prefix('Col').reset_index(drop=True)
or:
d = {'Col1':df1,'Col2':df2}
pd.concat((v.stack() for k,v in d.items()),axis=1,keys=d.keys()).reset_index(drop=True)
#or pd.concat((d.values()),keys=d.keys()).stack().unstack(0).reset_index(drop=True)
Col1 Col2
0 1 A
1 2 B
2 3 E
3 3 R
4 8 S
5 7 W

Deleting rows from a pandas Dataframe which does not match a combination of colums in another Dataframe

My data Frame 1 looks like:
Col1 Col2 Col3
1 A 4 ab
2 A 5 de
3 A 2 ah
4 B 1 ac
5 B 3 jd
6 B 2 am
data frame 2:
col1 col2
1 A 4
2 B 3
How do i delete all the rows in Data Frame 1 which do not match the combination of rows of dataframe 2?
Output Expected:
Col1 Col2 Col3
1 A 4 ab
2 B 3 jd

Use DataFrame.merge with inner join, only necessary rename columns:
df = df2.rename(columns={'col1':'Col1','col2':'Col2'}).merge(df1, on=['Col1','Col2'])
#on should be omited, then merge by intersection of columns of df1, df2
#df = df2.rename(columns={'col1':'Col1','col2':'Col2'}).merge(df1)
print (df)
Col1 Col2 Col3
0 A 4 ab
1 B 3 jd
Another idea is use left_on and right_on parameter and then remove columns with names by df2.columns:
df = (df2.merge(df1, left_on=['col1','col2'],
right_on=['Col1','Col2']).drop(df2.columns, axis=1))
print (df)
Col1 Col2 Col3
0 A 4 ab
1 B 3 jd
If columns names are same:
print (df2)
Col1 Col2
1 A 4
2 B 3
df = df2.merge(df1, on=['Col1','Col2'])
#df = df2.merge(df1)
print (df)
Col1 Col2 Col3
0 A 4 ab
1 B 3 jd

You can also use join, to do an inner join
dfR = df1.join( df ,on=['Col1','Col2'] ,how='inner',rsuffix='_x')
dfR[['Col1','Col2','Col3']]
This will also give you the same result
Col1 Col2 Col3
1 A 4 ab
2 B 3 jd
For more details check these links Join Documentation and
examples

Sort and align 2 dataframes by values in corresponding columns

I have 2 dataframes that I want to sort that are similar in structure to what I have shown below, but the rows of values when looking at only the first 3 columns are jumbled. How do I sort the dataframes such that the row indices match?
Also it could so happen that there may not be matching rows in which case I want to create a blank entry in the other dataframe at that index. How would I go about doing this?
Dataframe1:
Col1 Col2 Col3 Col4
0 a b c 1
1 b c d 4
2 f e g 5
Dataframe2:
Col1 Col2 Col3 Col4
0 f e g 6
1 a b c 5
2 b c d 3

Is this what you want?:
import pandas as pd
df=pd.DataFrame({'a':[1,3,2],'b':[4,6,5]})
print(df.sort_values(df.columns.tolist()))
Output:
a b
0 1 4
2 2 5
1 3 6

How do I sort the dataframes such that the row indices match
You can sort by the columns that should determine order on both data frames & reset index.
cols = ['Col1', 'Col2', 'Col3']
df1.sort_values(cols).reset_index(drop=True)
#outputs:
Col1 Col2 Col3 Col4
0 a b c 1
1 b c d 4
2 f e g 5
df2.sort_values(cols).reset_index(drop=True)
#outputs:
Col1 Col2 Col3 Col4
0 a b c 5
1 b c d 3
2 f e g 6
...there may not be matching rows in which case I want to create a blank entry in the other dataframe at that index
lets add 1 more row to df1
df1 = pd.DataFrame({
'Col1': list('abfh'),
'Col2': list('bceg'),
'Col3': list('cdgi'),
'Col4': [1,4,5,7]
})
df1
# outputs:
Col1 Col2 Col3 Col4
0 a b c 1
1 b c d 4
2 f e g 5
3 h g i 7
We can use an outer join to add a blank row to df2 where each column in pd.Nan at index 3
if you have sorted both databases already, you can merge using the indexes
df3 = df1.merge(df2, 'left', left_index=True, right_index=True, suffixes=('_x', ''))
otherwise, merge on the columns that *should* determine the sort order, this will create a new dataframe with joined values, sorted in the same way df1 is sorted
df3 = df1.merge(df2, 'left', on=cols, suffixes=('_x', ''))
Then filter out the columns from the left data frame
df3.iloc[:, ~df3.columns.str.endswith('_x')]
#outputs:
Col1 Col2 Col3 Col4
0 f e g 6.0
1 a b c 5.0
2 b c d 3.0
3 NaN NaN NaN NaN

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

iloc[] by value columns - python

Use GroupBy.nth: df2 = df1.groupby('col1', as_index=False).nth(2) Alternative with GroupBy.cumcount: df2 = df1[df1.groupby('col1').cumcount().eq(2)] print (df2) col1 col2 2 1 C 5 2 F 10 3 K

Use GroupBy.nth with as_index=False: df1.groupby('col1', as_index=False).nth(2) output: col1 col2 2 1 C 5 2 F 10 3 K

df1.groupby('col1').agg(lambda ss:ss.iloc[2]) col2 col1 1 C 2 F 3 K

Related

Pandas: Split dataframe with duplicate values into dataframe with unique values

Sum column in one dataframe based on row value of another dataframe

Create new Dataframe from matching two dataframe index's

Deleting rows from a pandas Dataframe which does not match a combination of colums in another Dataframe

Sort and align 2 dataframes by values in corresponding columns

Categories

Resources