Mark repeated id with a-b relationship in dataframe - python

I'm trying to create a relationship between repeated ID's in dataframe. For example take 91, so 91 is repeated 4 times so for first 91 entry first column row value will be updated to A and second will be updated to B then for next row of 91, first will be updated to B and second will updated to C then for next first will be C and second will be D and so on and this same relationship will be there for all duplicated ID's.
For ID's that are not repeated first will marked as A.
id
first
other
11
0
0
09
0
0
91
0
0
91
0
0
91
0
0
91
0
0
15
0
0
15
0
0
12
0
0
01
0
0
01
0
0
01
0
0
Expected output:
id
first
other
11
A
0
09
A
0
91
A
B
91
B
C
91
C
D
91
D
E
15
A
B
15
B
C
12
A
0
01
A
B
01
B
C
01
C
D
I using df.iterrows() for this but that's becoming very messy code and will be slow if dataset increases is there any easy way of doing it.

You can perform a mapping using a cumcount per group as source:
from string import ascii_uppercase
# mapping dictionary
# this is an example, you can use any mapping
d = dict(enumerate(ascii_uppercase))
# {0: 'A', 1: 'B', 2: 'C'...}
g = df.groupby('id')
c = g.cumcount()
m = g['id'].transform('size').gt(1)
df['first'] = c.map(d)
df.loc[m, 'other'] = c[m].add(1).map(d)
Output:
id first other
0 11 A 0
1 9 A 0
2 91 A B
3 91 B C
4 91 C D
5 91 D E
6 15 A B
7 15 B C
8 12 A 0
9 1 A B
10 1 B C
11 1 C D

Given:
id
0 12
1 9
2 91
3 91
4 91
5 91
6 15
7 15
8 12
9 1
10 1
11 1
Doing:
# Count ids per group
df['first'] = df.groupby('id').cumcount()
# convert to letters and make other col
m = df.groupby('id').filter(lambda x: len(x)>1).index
df.loc[m, 'other'] = df['first'].add(66).apply(chr)
df['first'] = df['first'].add(65).apply(chr)
# fill in missing with 0
df['other'] = df['other'].fillna(0)
Output:
id first other
0 11 A 0
1 9 A 0
2 91 A B
3 91 B C
4 91 C D
5 91 D E
6 15 A B
7 15 B C
8 12 A 0
9 1 A B
10 1 B C
11 1 C D

Related

How to convert boolean pandas dataframe to square matrix dataframe

I have a dataframe like this (boolean values)
a b c d count
1 0 1 0 196
0 1 0 1 110
0 1 0 0 17
0 0 1 0 10
0 0 0 0 9
As you can, someone can be a and c // or b and d // or only c
I want to built a square matrix dataframe, where
a b
a 0 0
b 0 17
c 196 10
d 0 110
Can I get something like this? I tried
result = df.merge(df, on=['ID']
count = pd.crosstab(df[columns_x],df[columns_y])
but it didn't get what I want
Note
The main data frame is like:
a b c d
Yes No Yes No
No Yes No Yes
No Yes No No
No No Yes No
No No No No
I got the answer by simply do a dot product.
df_transpose = df.transpose()
count = df_transpose.dot(df)
>
a b c d
a 222 5 8 1
b 5 154 14 22
c 8 14 34 6
d 1 22 6 29

Add a column with sequence values if conditions of value in another column with binary values is satisfied

I have a dataframe df with column A with random numbers and column B with categories. Now, I obtain another column C using the code below:
df.loc[df['A'] >= 50, 'C'] = 1
df.loc[df['A'] < 50, 'C'] = 0
I want to obtain a column 'D' which creates a sequence if 1 is encountered else returns the value 0. The required dataframe is given below.
Required df
A B C D
17 a 0 0
88 a 1 1
99 a 1 2
76 a 1 3
73 a 1 4
23 b 0 0
36 b 0 0
47 b 0 0
74 b 1 1
80 c 1 1
77 c 1 2
97 d 1 1
30 d 0 0
80 d 1 2
Use GroupBy.cumcount with Series.mask:
df['D'] = df.groupby(['B', 'C']).cumcount().add(1).mask(df['C'].eq(0), 0)
print (df)
A B C D
17 a 0 0
88 a 1 1
99 a 1 2
76 a 1 3
73 a 1 4
23 b 0 0
36 b 0 0
47 b 0 0
74 b 1 1
80 c 1 1
77 c 1 2
97 d 1 1
30 d 0 0
80 d 1 2
Or numpy.where:
df['D'] = np.where(df['C'].eq(0), 0, df.groupby(['B', 'C']).cumcount().add(1))

Aggregating string columns using pandas GroupBy

I have a DF such as the following:
df =
vid pos value sente
1 a A 21
2 b B 21
3 b A 21
3 a A 21
1 d B 22
1 a C 22
1 a D 22
2 b A 22
3 a A 22
Now I want to combine all rows with the same value for sente and vid into one row with the values for value joined by an " "
df2 =
vid pos value sente
1 a A 21
2 b B 21
3 b a A A 21
1 d a a B C D 22
2 b A 22
3 a A 22
I suppose a modification of this should do the trick:
df2 = df.groupby["sente"].agg(lambda x: " ".join(x))
But I can't seem to figure out how to add the second column to the statement.
Groupers can be passed as lists. Furthermore, you can simplify your solution a bit by ridding your code of the lambda—it isn't needed.
df.groupby(['vid', 'sente'], as_index=False, sort=False).agg(' '.join)
vid sente pos value
0 1 21 a A
1 2 21 b B
2 3 21 b a A A
3 1 22 d a a B C D
4 2 22 b A
5 3 22 a A
Some other notes: specifying as_index=False means your groupers will be present as columns in the result (and not as the index, as is the default). Furthermore, sort=False will preserve the original order of the columns.
As of this edit, #cᴏʟᴅsᴘᴇᴇᴅ's answer is way better.
Fun Way! Only works because single char values
df.set_index(['sente', 'vid']).sum(level=[0, 1]).applymap(' '.join).reset_index()
sente vid pos value
0 21 1 a A
1 21 2 b B
2 21 3 b a A A
3 22 1 d a a B C D
4 22 2 b A
5 22 3 a A
somewhat ok answer
df.set_index(['sente', 'vid']).groupby(level=[0, 1]).apply(
lambda d: pd.Series(d.to_dict('l')).str.join(' ')
).reset_index()
sente vid pos value
0 21 1 a A
1 21 2 b B
2 21 3 b a A A
3 22 1 d a a B C D
4 22 2 b A
5 22 3 a A
not recommended
df.set_index(['sente', 'vid']).add(' ') \
.sum(level=[0, 1]).applymap(str.strip).reset_index()
sente vid pos value
0 21 1 a A
1 21 2 b B
2 21 3 b a A A
3 22 1 d a a B C D
4 22 2 b A
5 22 3 a A

pandas get the value and the location from another DataFrame and make a series

Say, I have a DataFrame (dfrtn)
A B C D E F
0 33 34 35 36 37 38
1 39 40 41 42 43 44
2 45 46 47 48 49 50
3 51 52 53 54 55 56
4 57 58 59 60 61 62
then, I make another DataFrame (dfrtn2) from dfrtn (dfrtn2 = dfrtn % 7)
A B C D E F
0 5 6 0 1 2 3
1 4 5 6 0 1 2
2 3 4 5 6 0 1
3 2 3 4 5 6 0
4 1 2 3 4 5 6
I want to add two new columns, "Minimum" and "MinCol" like this
Minimum MinCol
0 0 C
1 0 D
2 0 E
3 0 F
4 1 A
I just have the source dataframe only, can't make the Minimum and Miminum Column
dfrtn = pd.DataFrame(np.arange(33,63).reshape(5,6), columns=['A', 'B', 'C', 'D', 'E', 'F'])
dfrtn2 = rtn % 7
I tried to use this, but can't solve yet.
for x in dfrtn.T.idxmin()
anyone can help me, please?
===============================
Thanks, I really appreciate it and here's an Additional Question...
If I want to get the value from dfrtn which is the same location of dfrtn2, how can I do this?
result should be
Minimum MinCol
0 35 C
1 42 D
2 49 E
3 56 F
4 57 A
Thanks in advance!
Use DataFrame contructor with idxmin and min per rows:
df1 = pd.DataFrame({'MinCol': dfrtn2.idxmin(axis=1),
'Minimum': dfrtn2.min(axis=1)}, columns=['Minimum','MinCol'])
print (df1)
Minimum MinCol
0 0 C
1 0 D
2 0 E
3 0 F
4 1 A
For original values add lookup
df1['new'] = dfrtn.lookup(dfrtn.index, df1['MinCol'])
print (df1)
Minimum MinCol new
0 0 C 35
1 0 D 42
2 0 E 49
3 0 F 56
4 1 A 57
df1['Minimum'] = dfrtn.lookup(dfrtn.index, df1['MinCol'])
print (df1)
Minimum MinCol
0 35 C
1 42 D
2 49 E
3 56 F
4 57 A

How To transpose specific columns into rows in pandas associate other column value

Hi I am trying to do transpose operation in pandas, but the condition is the value of one column should be associated with the transposed rows.
The example given below will explain the better way:
the data is looks like:
A 1 2 3 4 51 52 53 54
B 11 22 23 24 71 72 73 74
The result I am trying to do like this:
A 1 51
A 2 52
A 3 53
A 4 54
B 11 71
B 22 72
B 23 73
B 24 74
In first row, the data is in single row, I want to transpose data from 1 to 4 with the value 'A' in other column. Can anyone suggest how can I do this??
It seems you need melt or stack:
print (df)
0 1 2 3 4
0 A 1 2 3 4
1 B 11 22 23 24
df1 = pd.melt(df, id_vars=0).drop('variable', axis=1).sort_values(0)
df1.columns = list('ab')
print (df1)
a b
0 A 1
2 A 2
4 A 3
6 A 4
1 B 11
3 B 22
5 B 23
7 B 24
df2 = df.set_index(0).stack().reset_index(level=1, drop=True).reset_index(name='a')
df2.columns = list('ab')
print (df2)
a b
0 A 1
1 A 2
2 A 3
3 A 4
4 B 11
5 B 22
6 B 23
7 B 24
EDIT by comment:
#set index with first column
df = df.set_index(0)
#create MultiIndex
cols = np.arange(len(df.columns))
df.columns = [ cols // 4, cols % 4]
print (df)
0 1
0 1 2 3 0 1 2 3
0
A 1 2 3 4 51 52 53 54
B 11 22 23 24 71 72 73 74
#stack, reset index names, remove level and reset index
df1 = df.stack().rename_axis((None, None)).reset_index(level=1, drop=True).reset_index()
#set new columns names
df1.columns = ['a','b','c']
print (df1)
a b c
0 A 1 51
1 A 2 52
2 A 3 53
3 A 4 54
4 B 11 71
5 B 22 72
6 B 23 73
7 B 24 74

Categories

Resources