find inverse/mirror pair and assign a pair number

find inverse/mirror pair and assign a pair number - python

I am trying to find the inverse pair and assign a pair number to the pair but am stuck for moving forward from the below.
df1:
col1 col2 no. of records
A B 2
B A 5
C D 4
D C 6
E F 4
G H 6
I am trying get this result.
col1 col2 pair 1 no. of records totalcount
A B 1 2 7
B A 1 5 7
C D 2 4 10
D C 2 6 10
E F 3 4 4
G H 4 6 6
I tried this method but it has only returned true/false.
to make a duplicate dataframe df2 and use isin function but was stucked for a long time while group them together.
df1['row_matched'] = np.where((df1.col1+df1.col2).isin(df2.col2+ df2.col1), df2['row'], "")
will appreciate any help available!

Use rank of group pair of col1, col2, which you could setup with set
In [37]: df['pair'] = (df.apply(lambda x: '-'.join(set(x[['col1', 'col2']])), 1)
.rank(method='dense').astype(int))
In [38]: df['totalcount'] = df.groupby('pair')['no.ofrecords'].transform('sum')
In [39]: df
Out[39]:
col1 col2 no.ofrecords pair totalcount
0 A B 2 1 7
1 B A 5 1 7
2 C D 4 2 10
3 D C 6 2 10
4 E F 4 3 4
5 G H 6 4 6

Related

I want to groupby and drop groups if the shape is 3 and non of the values from a column contains zero

I want to groupby and drop groups if it satisfies two conditions (the shape is 3 and column A doesn't contain zeros).
My df
ID value
A 3
A 2
A 0
B 1
B 1
C 3
C 3
C 4
D 0
D 5
D 5
E 6
E 7
E 7
F 3
F 2
my desired df would be
ID value
A 3
A 2
A 0
B 1
B 1
D 0
D 5
D 5
F 3
F 2

You can use boolean indexing with groupby operations:
g = df['value'].eq(0).groupby(df['ID'])
# group contains a 0
m1 = g.transform('any')
# group doesn't have size 3
m2 = g.transform('size').ne(3)
# keep if any of the condition above is met
# this is equivalent to dropping if contains 0 AND size 3
out = df[m1|m2]
Output:
ID value
0 A 3
1 A 2
2 A 0
3 B 1
4 B 1
8 D 0
9 D 5
10 D 5
14 F 3
15 F 2

coding 2 columns in pandas with the same key

I am asked to code the following 2 columns, and you have these values, when using the method cat.codes the problem arises that the 2 columns are not with the same codes, what I want is that the data that are equal are with the same code?
Example:
The input is a dataframe
col1 col2
0 A E
1 B F
2 C A
3 D B
4 A B
5 E A

Assuming this input as df:
col1 col2
0 A E
1 B F
2 C A
3 D B
4 A B
5 E A
You can compute the unique values and use them to factorize:
vals = df[['col1', 'col2']].stack().unique()
d = {k:v for v,k in enumerate(vals)}
df['col1_codes'] = df['col1'].map(d)
df['col2_codes'] = df['col2'].map(d)
output:
col1 col2 col1_codes col2_codes
0 A E 0 1
1 B F 2 3
2 C A 4 0
3 D B 5 2
4 A B 0 2
5 E A 1 0

You can try below as well
a b
0 apple nokia
1 xiomi samsung
2 samsung apple
3 moto oneplus
import pandas as pd
from sklearn import preprocessing
cat_var = list(df.a.values)+list(df.b.values)
le = preprocessing.LabelEncoder()
le.fit(cat_var)
df['a'] = le.transform(df.a)
df['b'] = le.transform(df.b)
will give you below output
a b
0 0 2
1 5 4
2 4 0
3 1 3

Pandas self join on a single column with no duplicates

Is there a way to find unique rows, where unique is in the sense of two "identical" columns?
>>> d = pandas.DataFrame([['A',1],['A',2],['A',3],['B',1],['B',4],['B',2]], columns = ['col_a','col_b'])
>>> d col_a col_b
0 A 1
1 A 2
2 A 3
3 B 1
4 B 4
5 B 2
>>> d.merge(d,left_on='col_b',right_on='col_b') col_a_x col_b col_a_y
0 A 1 A
1 A 1 B
2 B 1 A
3 B 1 B
4 A 2 A
5 A 2 B
6 B 2 A
7 B 2 B
8 A 3 A
9 B 4 B
>>> d_desired
0 A 1 A
1 A 1 B
3 B 1 B
4 A 2 A
5 A 2 B
7 B 2 B
8 A 3 A
9 B 4 B
But I would like to drop the duplicate entries - e.g B 1 A,B 2 A
I would later want to group by the two columns, thus I need somehow to always drop the same "duplicate", meaning if I dropped B1A I should also drop B2A and not A2B.

Try this and see if it works for you :
M = d.merge(d,left_on='col_b',right_on='col_b')
#find rows where col first is greater than col last
#and not equal to each other
cond = (M.col_a_x > M.col_a_y) & (M.col_a_x != M.col_a_y)
#filter out the row
M.loc[~cond]

pandas group by the column values with all values less than certain numbers and assign the group numbers as new columns

I have a data frame like this,
df
col1 col2
A 2
B 3
C 1
D 4
E 6
F 1
G 2
H 8
I 1
J 10
Now I want to create another column col3 with grouping all the col2 values which are under below 5 and keep col3 values as 1 to number of groups, so the final data frame would look like,
col1 col2 col3
A 2 1
B 3 1
C 1 1
D 4 1
E 6 2
F 1 2
G 2 2
H 8 3
I 1 3
J 10 4
I could do this comparing the the prev values with the current values and store into a list and make it the col3.
But the execution time will be huge in this case, so looking for some shortcuts/pythonic way to do it most efficiently.

Compare by Series.gt for > and then use Series.cumsum. New column always starts by 0, because first values of column is less like 5, else it should be 1:
df['col3'] = df['col2'].gt(5).cumsum()
print (df)
col1 col2 col3
0 A 2 0
1 B 3 0
2 C 1 0
3 D 4 0
4 E 6 1
5 F 1 1
6 G 2 1
7 H 8 2
8 I 1 2
9 J 10 3
So for general solution starting by 1 use this trick - compare first values if less like 5, convert to integers for True->1 and False->0 and add to column:
N = 5
df['col3'] = df['col2'].gt(N).cumsum() + int(df.loc[0, 'col2'] < 5)
df = df.assign(col21 = df['col2'].add(pd.Series({0:5}), fill_value=0).astype(int))
N = 5
df['col3'] = df['col2'].gt(N).cumsum() + int(df.loc[0, 'col2'] < N)
#test for first value > 5
df['col31'] = df['col21'].gt(N).cumsum() + int(df.loc[0, 'col21'] < N)
print (df)
col1 col2 col21 col3 col31
0 A 2 7 1 1
1 B 3 3 1 1
2 C 1 1 1 1
3 D 4 4 1 1
4 E 6 6 2 2
5 F 1 1 2 2
6 G 2 2 2 2
7 H 8 8 3 3
8 I 1 1 3 3
9 J 10 10 4 4

How to groupby with consecutive occurrence of duplicates in pandas

I have a dataframe which contains two columns [Name,In.cl]. I want to groupby Name but it based on continuous occurrence. For example consider below DataFrame,
Code to generate below DF:
df=pd.DataFrame({'Name':['A','B','B','A','A','B','C','C','C','B','C'],'In.Cl':[2,1,5,2,4,2,3,1,8,5,7]})
Input:
In.Cl Name
0 2 A
1 1 B
2 5 B
3 2 A
4 4 A
5 2 B
6 3 C
7 1 C
8 8 C
9 5 B
10 7 C
I want to group the rows where it repeated consecutively. example group [B] (1,2), [A] (3,4), [C] (6,8) etc., and perform sum operation in In.cl column.
Expected Output:
In.Cl Name col1 col2
0 2 A A(1) 2
1 1 B B(2) 6
2 5 B B(2) 6
3 2 A A(2) 6
4 4 A A(2) 6
5 2 B B(1) 2
6 3 C C(3) 12
7 1 C C(3) 12
8 8 C C(3) 12
9 5 B B(1) 5
10 7 C C(1) 7
So far i tried combination of duplicate and groupby, it didn't work as i expected. I think I need some thing groupby + consecutive. but i don't have an idea to solve this problem.
Any help would be appreciated.

In [37]: g = df.groupby((df.Name != df.Name.shift()).cumsum())
In [38]: df['col1'] = df['Name'] + '(' + g['In.Cl'].transform('size').astype(str) + ')'
In [39]: df['col2'] = g['In.Cl'].transform('sum')
In [40]: df
Out[40]:
Name In.Cl col1 col2
0 A 2 A(1) 2
1 B 1 B(2) 6
2 B 5 B(2) 6
3 A 2 A(2) 6
4 A 4 A(2) 6
5 B 2 B(1) 2
6 C 3 C(3) 12
7 C 1 C(3) 12
8 C 8 C(3) 12
9 B 5 B(1) 5
10 C 7 C(1) 7

Slightly long-winded answer utilizing itertools.groupby.
For greater than ~1000 rows, use #MaxU's solution - it's faster.
from itertools import groupby, chain
from operator import itemgetter
chainer = chain.from_iterable
def sumfunc(x):
return (sum(map(itemgetter(1), x)), len(x))
grouper = groupby(zip(df['Name'], df['In.Cl']), key=itemgetter(0))
summer = [sumfunc(list(j)) for _, j in grouper]
df['Name'] += pd.Series(list(chainer(repeat(j, j) for i, j in summer))).astype(str)
df['col2'] = list(chainer(repeat(i, j) for i, j in summer))
print(df)
In.Cl Name col2
0 2 A1 2
1 1 B2 6
2 5 B2 6
3 2 A2 6
4 4 A2 6
5 2 B1 2
6 3 C3 12
7 1 C3 12
8 8 C3 12
9 5 B1 5
10 7 C1 7

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

find inverse/mirror pair and assign a pair number - python

Related

I want to groupby and drop groups if the shape is 3 and non of the values from a column contains zero

coding 2 columns in pandas with the same key

Pandas self join on a single column with no duplicates

pandas group by the column values with all values less than certain numbers and assign the group numbers as new columns

How to groupby with consecutive occurrence of duplicates in pandas

Categories

Resources