All possible permutations within groups of column pandas - python

I have a df
a b c d
1 0 1 2 4
2 0 1 3 5
3 0 2 1 7
4 1 3 2 5
Within groups, grouped by 'a' and 'b' I want all possible permutations of 'c'
a b c d
1 0 1 2 4
0 1 3 5
0 2 1 7
2 0 1 3 5
0 1 2 4
0 2 1 7
3 1 3 2 5
...
...
I tried:
s=pd.Series({x: list(it.permutations(y) )for x , y in df.groupby(['a','b']).c})
0 1 [(3,2),(2,3)]
2 [(1,)]
1 3 [(2,)]
Explode() only does not do what I need, since I need all combinations of groups within subgroups.
For example in this case there are 2 different ways to combine rows 1 and 2. If row 2 would have been 2 different permutations, it would be 2*2=4 ways.
Does anybody have an idea?

Fix your code with groupby and explode
s=pd.Series({x: list(itertools.permutations(y) )for x , y in df.groupby('a').b}).explode().explode().reset_index()
index 0
0 0 1
1 0 2
2 0 3
3 0 1
4 0 3
5 0 2
6 0 2
7 0 1
8 0 3
9 0 2
10 0 3
11 0 1
12 0 3
13 0 1
14 0 2
15 0 3
16 0 2
17 0 1
18 1 1
19 1 2
20 1 2
21 1 1

Related

How to count consecutive same values in a pythonic way that looks iterative

So I am trying to count the number of consecutive same values in a dataframe and put that information into a new column in the dataframe, but I want the count to look iterative.
Here is what I have so far:
df = pd.DataFrame(np.random.randint(0,3, size=(15,4)), columns=list('ABCD'))
df['subgroupA'] = (df.A != df.A.shift(1)).cumsum()
dfg = df.groupby(by='subgroupA', as_index=False).apply(lambda grp: len(grp))
dfg.rename(columns={None: 'numConsec'}, inplace=True)
df = df.merge(dfg, how='left', on='subgroupA')
df
Here is the result:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 2
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 2
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 4
9 0 0 0 2 7 4
10 0 2 1 1 7 4
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
The problem is, in the numConsec column, I don't want the full count for every row. I want it to reflect how it looks as you iteratively look at the dataframe. The problem is, my dataframe is too large to iteratively loop through and make the counts, as that would be too slow. I need to do it in a pythonic way and make it look like this:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 1
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 1
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 1
9 0 0 0 2 7 2
10 0 2 1 1 7 3
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
Any ideas?

Padding and reshaping pandas dataframe

I have a dataframe with the following form:
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3],'Time':[0,1,2,0,1,2,3,0,1],
'sig':[2,3,1,4,2,0,2,3,5],'sig2':[9,2,8,0,4,5,1,1,0],
'group':['A','A','A','B','B','B','B','A','A']})
print(data)
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 2 0 4 0 B
4 2 1 2 4 B
5 2 2 0 5 B
6 2 3 2 1 B
7 3 0 3 1 A
8 3 1 5 0 A
I want to reshape and pad such that each 'ID' has the same number of Time values, the sig1,sig2 are padded with zeros (or mean value within ID) and the group carries the same letter value. The output after repadding would be :
data_pad = pd.DataFrame({'ID':[1,1,1,1,2,2,2,2,3,3,3,3],'Time':[0,1,2,3,0,1,2,3,0,1,2,3],
'sig1':[2,3,1,0,4,2,0,2,3,5,0,0],'sig2':[9,2,8,0,0,4,5,1,1,0,0,0],
'group':['A','A','A','A','B','B','B','B','A','A','A','A']})
print(data_pad)
ID Time sig1 sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
My end goal is to ultimately reshape this into something with shape (number of ID, number of time points, number of sequences {2 here}).
It seems that if I pivot data, it fills in with nan values, which is fine for the signal values, but not the groups. I am also hoping to avoid looping through data.groupby('ID'), since my actual data has a large number of groups and the looping would likely be very slow.
Here's one approach creating the new index with pd.MultiIndex.from_product and using it to reindex on the Time column:
df = data.set_index(['ID', 'Time'])
# define a the new index
ix = pd.MultiIndex.from_product([df.index.levels[0],
df.index.levels[1]],
names=['ID', 'Time'])
# reindex using the above multiindex
df = df.reindex(ix, fill_value=0)
# forward fill the missing values in group
df['group'] = df.group.mask(df.group.eq(0)).ffill()
print(df.reset_index())
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
IIUC:
(data.pivot_table(columns='Time', index=['ID','group'], fill_value=0)
.stack('Time')
.sort_index(level=['ID','Time'])
.reset_index()
)
Output:
ID group Time sig sig2
0 1 A 0 2 9
1 1 A 1 3 2
2 1 A 2 1 8
3 1 A 3 0 0
4 2 B 0 4 0
5 2 B 1 2 4
6 2 B 2 0 5
7 2 B 3 2 1
8 3 A 0 3 1
9 3 A 1 5 0
10 3 A 2 0 0
11 3 A 3 0 0

create a 'group number' column for a pandas data frame column of '0' and '1' s

How to get the data frame below
dd = pd.DataFrame({'val':[0,0,1,1,1,0,0,0,0,1,1,0,1,1,1,1,0,0],
'groups':[1,1,1,1,1,2,2,2,2,2,2,3,3,3,3,3,'ignore','ignore']})
val groups
0 0 1
1 0 1
2 1 1
3 1 1
4 1 1
5 0 2
6 0 2
7 0 2
8 0 2
9 1 2
10 1 2
11 0 3
12 1 3
13 1 3
14 1 3
15 1 3
16 0 ignore
17 0 ignore
I have a series df.val with has values [0,0,1,1,1,0,0,0,0,1,1,0,1,1,1,1,0,0].
How to create df.groups from df.val.
first 0,0,1,1,1 will form group 1,(i.e. from the beginning upto next occurrence of 0 after 1's)
0,0,0,0,1,1 will form group 2, (incremental group number, starting where previous group ended uptill next occurrence of 0 after 1's),...etc
Can anyone please help.
First test if next value after 0 is 1 and create groups by sumulative sums by Series.cumsum:
s = (dd['val'].eq(0) & dd['val'].shift().eq(1)).cumsum().add(1)
Then convert last group to ignore if last value of data are 0 with numpy.where:
mask = s.eq(s.max()) & (dd['val'].iat[-1] == 0)
dd['new'] = np.where(mask, 'ignore', s)
print (dd)
val groups new
0 0 1 1
1 0 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 0 2 2
6 0 2 2
7 0 2 2
8 0 2 2
9 1 2 2
10 1 2 2
11 0 3 3
12 1 3 3
13 1 3 3
14 1 3 3
15 1 3 3
16 0 ignore ignore
17 0 ignore ignore
IIUC first we do diff and cumsum , then we need to find the condition to ignore the previous value we get (np.where)
s=df.val.diff().eq(-1).cumsum()+1
df['New']=np.where(df['val'].eq(1).groupby(s).transform('any'),s,'ignore')
df
val groups New
0 0 1 1
1 0 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 0 2 2
6 0 2 2
7 0 2 2
8 0 2 2
9 1 2 2
10 1 2 2
11 0 3 3
12 1 3 3
13 1 3 3
14 1 3 3
15 1 3 3
16 0 ignore ignore
17 0 ignore ignore

Python append dataframe such that only columns remain the same

I have the following dataframes in python pandas:
A:
1 2 3 4 5 6 7 8 9 10
0 1 1 1 1 1 1 1 0 0 1 1
B:
1 2 3 4 5 6 7 8 9 10
1 0 1 1 1 1 1 1 0 0 1 0
C:
1 2 3 4 5 6 7 8 9 10
2 0 1 1 1 0 0 0 0 0 1 0
I want to concatenate them together such that the column titles remain the same while row index and values get appended so the new dataframe is:
df:
1 2 3 4 5 6 7 8 9 10
0 1 1 1 1 1 1 1 0 0 1 1
1 0 1 1 1 1 1 1 0 0 1 0
2 0 1 1 1 0 0 0 0 0 1 0
I have tried using append and concat but none seem to be fulfilling the output I am trying to achieve. Any suggestions?
Here is what I tried:
df = pd.concat([df,pd.concat([A,B,C], ignore_index=True)], axis=1)
This is a plain vanilla concat
pd.concat([A, B, C])
1 2 3 4 5 6 7 8 9 10
0 1 1 1 1 1 1 1 0 0 1 1
1 0 1 1 1 1 1 1 0 0 1 0
2 0 1 1 1 0 0 0 0 0 1 0
Simple pd.concat will just do the work, you over complicated the task a little bit:
pd.concat([A,B,C], axis=0, ignore_index=True)

Transform dataframe to have a row for every observation at every time point

I have the following short dataframe:
A B C
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0
I want the output to look like this:
A B C
1 1 3
2 1 3
3 0 0
4 0 0
5 0 0
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0
use pd.MultiIndex.from_product with unique As and Bs. Then reindex.
cols = list('AB')
mux = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()], names=cols)
df.set_index(cols).reindex(mux, fill_value=0).reset_index()
A B C
0 1 1 3
1 1 2 0
2 1 0 0
3 2 1 3
4 2 2 0
5 2 0 0
6 3 1 0
7 3 2 3
8 3 0 0
9 4 1 0
10 4 2 3
11 4 0 0
12 5 1 0
13 5 2 0
14 5 0 0

Categories

Resources