Imagine an electrical connector. It has pins. Each pin has a corresponding X/Y location in space. I am trying to figure out how to mirror, or 'flip' each pin on the connector given their X/Y coordinate. note: I am using pandas version 23.4 We can assume that x,y, and pin are not unique but connector is. Connectors can be any size, so two rows of 5, 3 rows of 6, etc.
x y pin connector
1 1 A 1
2 1 B 1
3 1 C 1
1 2 D 1
2 2 E 1
3 2 F 1
1 1 A 2
2 1 B 2
3 1 C 2
1 2 D 2
2 2 E 2
3 2 F 2
The dataframe column, 'flip', is the solution I am trying to get to. Notice the pins that would be in the same row are now in reverse order.
x y pin flip connector
1 1 A C 1
2 1 B B 1
3 1 C A 1
1 2 D F 1
2 2 E E 1
3 2 F D 1
1 1 A C 2
2 1 B B 2
3 1 C A 2
1 2 D F 2
2 2 E E 2
3 2 F D 2
IIUC try using [::-1] a reversing element and groupby with transform:
df['flip'] = df.groupby(['connector','y'])['pin'].transform(lambda x: x[::-1])
Output:
x y pin connector flip
0 1 1 A 1 C
1 2 1 B 1 B
2 3 1 C 1 A
3 1 2 D 1 F
4 2 2 E 1 E
5 3 2 F 1 D
6 1 1 A 2 C
7 2 1 B 2 B
8 3 1 C 2 A
9 1 2 D 2 F
10 2 2 E 2 E
11 3 2 F 2 D
import io
import pandas as pd
data = """
x y pin connector
1 1 A 1
2 1 B 1
3 1 C 1
1 2 D 1
2 2 E 1
3 2 F 1
1 1 A 2
2 1 B 2
3 1 C 2
1 2 D 2
2 2 E 2
3 2 F 2
"""
#strip blank lines at the beginning and end
data = data.strip()
#make it quack like a file
data_file = io.StringIO(data)
#read data from a "wsv" file (whitespace separated values)
df = pd.read_csv(data_file, sep='\s+')
Make the new column:
flipped = []
for name, group in df.groupby(['connector','y']):
flipped.extend(group.loc[::-1,'pin'])
df = df.assign(flip=flipped)
df
Final DataFrame:
x y pin connector flip
0 1 1 A 1 C
1 2 1 B 1 B
2 3 1 C 1 A
3 1 2 D 1 F
4 2 2 E 1 E
5 3 2 F 1 D
6 1 1 A 2 C
7 2 1 B 2 B
8 3 1 C 2 A
9 1 2 D 2 F
10 2 2 E 2 E
11 3 2 F 2 D
You can create a map between the original coordinates and the coordinates of the 'flipped' component. Then you can select the flipped values.
import numpy as np
midpoint = 2
coordinates_of_flipped = pd.MultiIndex.from_arrays([df['x'].map(lambda x: x - midpoint * np.sign(x - midpoint )), df['y'], df['connector']])
df['flipped'] = df.set_index(['x', 'y', 'connector']).loc[coordinates_of_flipped].reset_index()['pin']
which gives
Out[30]:
x y pin connector flipped
0 1 1 A 1 C
1 2 1 B 1 B
2 3 1 C 1 A
3 1 2 D 1 F
4 2 2 E 1 E
5 3 2 F 1 D
6 1 1 A 2 C
7 2 1 B 2 B
8 3 1 C 2 A
9 1 2 D 2 F
10 2 2 E 2 E
11 3 2 F 2 D
Related
I have a problem where I need to group the data by two groups, and attach a column that sort of counts the subgroup.
Example dataframe looks like this:
colA colB
1 a
1 a
1 c
1 c
1 f
1 z
1 z
1 z
2 a
2 b
2 b
2 b
3 c
3 d
3 k
3 k
3 m
3 m
3 m
Expected output after attaching the new column is as follows:
colA colB colC
1 a 1
1 a 1
1 c 2
1 c 2
1 f 3
1 z 4
1 z 4
1 z 4
2 a 1
2 b 2
2 b 2
2 b 2
3 c 1
3 d 2
3 k 3
3 k 3
3 m 4
3 m 4
3 m 4
I tried the following but I cannot get this trivial looking problem solved:
Solution 1 I tried that does not give what I am looking for:
df['ONES']=1
df['colC']=df.groupby(['colA','colB'])['ONES'].cumcount()+1
df.drop(columns='ONES', inplace=True)
I also played with transform, and cumsum functions, and apply, but I cannot seem to solve this. Any help is appreciated.
Edit: minor error on dataframes.
Edit 2: For simplicity purposes, I showed similar values for column B, but the problem is within a larger group (indicated by colA), colB may be different and therefore, it needs to be grouped by both at the same time.
Edit 3: Updated dataframes to emphasize what I meant by my second edit. Hope this makes it more clear and reproduceable.
You could use groupby + ngroup:
df['colC'] = df.groupby('colA').apply(lambda x: x.groupby('colB').ngroup()+1).droplevel(0)
Output:
colA colB colC
0 1 a 1
1 1 a 1
2 1 c 2
3 1 c 2
4 1 f 3
5 1 z 4
6 1 z 4
7 1 z 4
8 2 a 1
9 2 b 2
10 2 b 2
11 2 b 2
12 3 c 1
13 3 d 2
14 3 k 3
15 3 k 3
16 3 m 4
17 3 m 4
18 3 m 4
Categorically, factorize
df['colC'] =df['colB'].astype('category').cat.codes+1
colA colB colC
0 1 a 1
1 1 a 1
2 1 b 2
3 1 b 2
4 1 c 3
5 1 d 4
6 1 d 4
7 1 d 4
8 2 a 1
9 2 b 2
10 2 b 2
11 2 b 2
12 3 a 1
13 3 b 2
14 3 c 3
15 3 c 3
16 3 d 4
17 3 d 4
18 3 d 4
I have this structure with column B holding the number of same occurrence of the value of column A.
df = pd.DataFrame(dict(A=list('aaabbcccc'), B=list('333224444')))
df
# A B
# 0 a 3
# 1 a 3
# 2 a 3
# 3 b 2
# 4 b 2
# 5 c 4
# 6 c 4
# 7 c 4
# 8 c 4
I look for an elegant way to add the C column, that decrement for each line the value of B.
res
# A B C
# 0 a 3 2
# 1 a 3 1
# 2 a 3 0
# 3 b 2 1
# 4 b 2 0
# 5 c 4 3
# 6 c 4 2
# 7 c 4 1
# 8 c 4 0
Use cumcount(ascending=False), as suggested by #ALollz:
df.groupby('B').cumcount(ascending=False)
0 2
1 1
2 0
3 1
4 0
5 3
6 2
7 1
8 0
dtype: int64
I am working on a huge dataframe and trying to create a new column, based on a condition in another column. Right now, I have a big while-loop and this calculation takes too much time, is there an easier way to do it?
With lambda for example?:
def promo(dataframe, a):
i=0
while i < len(dataframe)-1:
i=i+1
if dataframe.iloc[i-1,5] >= a:
dataframe.iloc[i-1,6] = 1
else:
dataframe.iloc[i-1,6] = 0
return dataframe
Don't use loops in pandas, they are slow compared to a vectorized solution - convert boolean mask to integers by astype True, False are converted to 1, 0:
dataframe = pd.DataFrame({'A':list('abcdef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb'),
'F':[5,3,6,9,2,4],
'G':[5,3,6,9,2,4]
})
a = 5
dataframe['new'] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
A B C D E F G new
0 a 4 7 1 a 5 5 1
1 b 5 8 3 a 3 3 0
2 c 4 9 5 a 6 6 1
3 d 5 4 7 b 9 9 1
4 e 5 2 1 b 2 2 0
5 f 4 3 0 b 4 4 0
If you want to overwrite the 7th column:
a = 5
dataframe.iloc[:,6] = (dataframe.iloc[:,5] >= a).astype(int)
print (dataframe)
A B C D E F G
0 a 4 7 1 a 5 1
1 b 5 8 3 a 3 0
2 c 4 9 5 a 6 1
3 d 5 4 7 b 9 1
4 e 5 2 1 b 2 0
5 f 4 3 0 b 4 0
I have a dataframe like this:
a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3
How to groupby 'a', and do nothing to column b c d, and split into several dataframes? Like this:
First groupby column 'a':
a b c d
0 1 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
4 2 1 1 1
5 2 2 2
6 3 3 3
And then split into different dataframes based on numbers in 'a':
dataframe 1:
a b c d
0 1 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
dataframe 2:
a b c d
0 2 1 1 1
1 2 2 2
2 3 3 3
:
:
:
dataframe n:
a b c d
0 n 1 1 1
1 2 2 2
2 3 3 3
Iterate over each group that df.groupby returns.
for _, g in df.groupby('a'):
print(g, '\n')
a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4
a b c d
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3
If you want a dict of dataframes, I'd recommend:
df_dict = {idx : g for idx, g in df.groupby('a')}
Here, idx is the unique a value.
A couple of nifty techniques courtesy Root:
df_dict = dict(list(df.groupby('a'))) # for a dictionary
And,
idxs, dfs = zip(*df.groupby('a')) # separate lists
idxs
(1, 2)
dfs
( a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4, a b c d
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3)
This is the way by using np.split
idx=df.a.diff().fillna(0).nonzero()[0]
dfs = np.split(df, idx, axis=0)
dfs
Out[210]:
[ a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4, a b c d
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3]
dfs[0]
Out[211]:
a b c d
0 1 1 1 1
1 1 2 2 2
2 1 3 3 3
3 1 4 4 4
i have a pandas data frame
id tag
1 A
1 A
1 B
1 C
1 A
2 B
2 C
2 B
I want to add a column which computes the cumulative number of unique tags over at id level. More specifically, I would like to have
id tag count
1 A 1
1 A 1
1 B 2
1 C 3
1 A 3
2 B 1
2 C 2
2 B 2
For a given id, count will be non-decreasing. Thanks for your help!
I think this does what you want:
unique_count = df.drop_duplicates().groupby('id').cumcount() + 1
unique_count.reindex(df.index).ffill()
The +1 is because the count starts at zero. This only works if the dataframe is sorted by id. Was that intended? You can always sort beforehand.
You can find some other approaches in R and Python here
df = pd.DataFrame({'id':[1,1,1,1,1,2,2,2],'tag':["A","A", "B","C","A","B","C","B"]})
df['count']=df.groupby('id')['tag'].apply(lambda x: (~pd.Series(x).duplicated()).cumsum())
id tag count
0 1 A 1
1 1 A 1
2 1 B 2
3 1 C 3
4 1 A 3
5 2 B 1
6 2 C 2
7 2 B 2
How about this:
d['X'] = 1
d.groupby("Col").X.cumsum()
idt=[1,1,1,1,1,2,2,2]
tag=['A','A','B','C','A','B','C','B']
df=pd.DataFrame(tag,index=idt,columns=['tag'])
df=df.reset_index()
print(df)
index tag
0 1 A
1 1 A
2 1 B
3 1 C
4 1 A
5 2 B
6 2 C
7 2 B
df['uCnt']=df.groupby(['index','tag']).cumcount()+1
print(df)
index tag uCnt
0 1 A 1
1 1 A 2
2 1 B 1
3 1 C 1
4 1 A 3
5 2 B 1
6 2 C 1
7 2 B 2
df['uCnt']=df['uCnt']//df['uCnt']**2
print(df)
index tag uCnt
0 1 A 1
1 1 A 0
2 1 B 1
3 1 C 1
4 1 A 0
5 2 B 1
6 2 C 1
7 2 B 0
df['uCnt']=df.groupby(['index'])['uCnt'].cumsum()
print(df)
index tag uCnt
0 1 A 1
1 1 A 1
2 1 B 2
3 1 C 3
4 1 A 3
5 2 B 1
6 2 C 2
7 2 B 2
df=df.set_index('index')
print(df)
tag uCnt
index
1 A 1
1 A 1
1 B 2
1 C 3
1 A 3
2 B 1
2 C 2
2 B 2