How do I get the index of a dataframe set as the columns and vice versa? I tried unstacking it but in vain.
I want to turn this dataframe
Type1 Type2 Type3
Hour
0 5 0 13
1 3 5 5
2 3 2 11
3 9 3 8
4 1 3 2
5 0 0 2
6 1 5 0
7 0 1 0
8 2 0 0
9 1 0 1
10 0 0 2
11 6 2 2
12 5 3 1
13 3 4 2
14 4 2 4
15 10 3 6
16 7 1 6
17 18 1 5
18 6 2 6
19 2 4 27
20 10 8 16
21 19 12 36
22 5 9 11
23 8 8 23
to the following;
0 1 2 3 4 5 6 7 8 9 10 ...
Type 1 5 3 3 9 1 ....
Type 2 0 5 2 3 3 ....
Type 3 13 5 11 8 2 ....
EDIT:
I actually have a multi index in the original df which looks like [(0, 1), (1, 1), (2, 1), (3, 1), (4, 1), (5, 1), (6, 1), (7, 1), (8, 1), (9, 1), (10, 1), (11, 1), (12, 1), (13, 1), (14, 1), (15, 1), (16, 1), (17, 1), (18, 1), (19, 1), (20, 1), (21, 1), (22, 1), (23, 1)]. How do I handle that?
Transpose the dataframe:
df.T
Does this do the trick?
Call unstack twice:
In [47]:
df.unstack().unstack()
Out[47]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 \
Type1 5 3 3 9 1 0 1 0 2 1 0 6 5 3 4 10 7 18
Type2 0 5 2 3 3 0 5 1 0 0 0 2 3 4 2 3 1 1
Type3 13 5 11 8 2 2 0 0 0 1 2 2 1 2 4 6 6 5
18 19
Type1 6 2 ...
Type2 2 4 ...
Type3 6 27 ...
[3 rows x 24 columns]
Also .T would work:
In [48]:
df.T
Out[48]:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 \
Type1 5 3 3 9 1 0 1 0 2 1 0 6 5 3 4 10 7 18
Type2 0 5 2 3 3 0 5 1 0 0 0 2 3 4 2 3 1 1
Type3 13 5 11 8 2 2 0 0 0 1 2 2 1 2 4 6 6 5
18 19
Type1 6 2 ...
Type2 2 4 ...
Type3 6 27 ...
[3 rows x 24 columns]
Related
assume i have df:
pd.DataFrame({'data': [0,0,0,1,1,1,2,2,2,3,3,4,4,5,5,0,0,0,0,2,2,2,2,4,4,4,4]})
data
0 0
1 0
2 0
3 1
4 1
5 1
6 2
7 2
8 2
9 3
10 3
11 4
12 4
13 5
14 5
15 0
16 0
17 0
18 0
19 2
20 2
21 2
22 2
23 4
24 4
25 4
26 4
I'm looking for a way to create a new column in df that shows the number of data items repeated in new column For example:
data new
0 0 1
1 0 2
2 0 3
3 1 1
4 1 2
5 1 3
6 2 1
7 2 2
8 2 3
9 3 1
10 3 2
11 4 1
12 4 2
13 5 1
14 5 2
15 0 1
16 0 2
17 0 3
18 0 4
19 2 1
20 2 2
21 2 3
22 2 4
23 4 1
24 4 2
25 4 3
26 4 4
My logic was to get the rows to python list compare and create a new list.
Is there a simple way to do this?
Example
df = pd.DataFrame({'data': [0,0,0,1,1,1,2,2,2,3,3,4,4,5,5,0,0,0,0,2,2,2,2,4,4,4,4]})
Code
grouper = df['data'].ne(df['data'].shift(1)).cumsum()
df['new'] = df.groupby(grouper).cumcount().add(1)
df
data new
0 0 1
1 0 2
2 0 3
3 1 1
4 1 2
5 1 3
6 2 1
7 2 2
8 2 3
9 3 1
10 3 2
11 4 1
12 4 2
13 5 1
14 5 2
15 0 1
16 0 2
17 0 3
18 0 4
19 2 1
20 2 2
21 2 3
22 2 4
23 4 1
24 4 2
25 4 3
26 4 4
I have the following data frame
df = pd.DataFrame([
{"A": 1, "B": "20", "pairs": [(1,2), (2,3)]},
{"A": 2, "B": "22", "pairs": [(1,1), (2,2), (1,3)]},
{"A": 3, "B": "24", "pairs": [(1,1), (3,3)]},
{"A": 4, "B": "26", "pairs": [(1,3)]},
])
>>> df
A B pairs
0 1 20 [(1, 2), (2, 3)]
1 2 22 [(1, 1), (2, 2), (1, 3)]
2 3 24 [(1, 1), (3, 3)]
3 4 26 [(1, 3)]
Instead of these being a list of tuples, I'd like to make new columns for these pairs, p1 and p2, where these are ordered as the first and second members of each tuple respectively. There is also a wide to long element here in that I explode a single row into as many rows as there are pairs in the list.
This does not appear to fit a lot of the wide to long documentation I can find. My desired output format is this:
>>> df
A B p1 p2
0 1 20 1 2
1 1 20 2 3
2 2 22 1 1
3 2 22 2 2
4 2 22 1 3
5 3 24 1 1
6 3 24 3 3
7 4 26 1 3
1st explode then join
s = df.explode('pairs').reset_index(drop=True)
out = s.join(pd.DataFrame(s.pop('pairs').tolist(),columns=['p1','p2']))
out
Out[98]:
A B p1 p2
0 1 20 1 2
1 1 20 2 3
2 2 22 1 1
3 2 22 2 2
4 2 22 1 3
5 3 24 1 1
6 3 24 3 3
7 4 26 1 3
Use explode:
>>> df.join(df.pop('pairs').explode().apply(pd.Series)
.rename(columns={0: 'p1', 1: 'p2'}))
A B p1 p2
0 1 20 1 2
0 1 20 2 3
1 2 22 1 1
1 2 22 2 2
1 2 22 1 3
2 3 24 1 1
2 3 24 3 3
3 4 26 1 3
Is this what you have in mind:
(df.explode('pairs') # blow it up into individual rows
.assign(p1 = lambda df: df.pairs.str[0],
p2 = lambda df: df.pairs.str[-1])
.drop(columns='pairs')
)
Out[1234]:
A B p1 p2
0 1 20 1 2
0 1 20 2 3
1 2 22 1 1
1 2 22 2 2
1 2 22 1 3
2 3 24 1 1
2 3 24 3 3
3 4 26 1 3
Another option, using the apply method, and longer lines of code (peformance wise I have no idea which is better):
(df
.set_index(['A', 'B'])
.pairs
.apply(pd.Series)
.stack()
.apply(pd.Series)
.droplevel(-1)
.set_axis(['p1', 'p2'],axis=1)
.reset_index()
)
Out[1244]:
A B p1 p2
0 1 20 1 2
1 1 20 2 3
2 2 22 1 1
3 2 22 2 2
4 2 22 1 3
5 3 24 1 1
6 3 24 3 3
7 4 26 1 3
Since pair is a list of tuples, you may get some performance if you move the wrangling/transformation into pure python, before recombining back into a DataFrame:
from itertools import chain
repeats = [*map(len, df.pairs)]
reshaped = chain.from_iterable(df.pairs)
reshaped = pd.DataFrame(reshaped,
columns = ['p1', 'p2'],
index = df.index.repeat(repeats))
df.drop(columns='pairs').join(reshaped)
Out[1265]:
A B p1 p2
0 1 20 1 2
0 1 20 2 3
1 2 22 1 1
1 2 22 2 2
1 2 22 1 3
2 3 24 1 1
2 3 24 3 3
3 4 26 1 3
I have a DataFrame that looks somehow like the following one:
time status A
0 0 2 20
1 1 2 21
2 2 2 20
3 3 2 19
4 4 10 18
5 5 2 17
6 6 2 18
7 7 2 19
8 8 2 18
9 9 10 17
... ... ... ...
Now, I'd like to select all rows with status == 2 and kind of group the resulting rows, that are not interupted by any other row-status so that I can access each group afterwards separately.
Something like:
print df1
time status A
0 0 2 20
1 1 2 21
2 2 2 20
3 3 2 19
print df2
time status A
0 5 2 17
1 6 2 18
2 7 2 19
3 8 2 18
Is there an efficient, loop-avoiding way to achieve this?
Thank you in advance!
Input data:
>>> df
time status A
0 0 2 20 # group 1
1 1 2 21 # 1
2 2 2 20 # 1
3 3 2 19 # 1
4 4 10 18 # group 2
5 5 2 17 # group 3
6 6 2 18 # 3
7 7 2 19 # 3
8 8 2 18 # 3
9 9 10 17 # group 4
df["group"] = df.status.ne(df.status.shift()).cumsum()
>>> df
time status A group
0 0 2 20 1
1 1 2 21 1
2 2 2 20 1
3 3 2 19 1
4 4 10 18 2
5 5 2 17 3
6 6 2 18 3
7 7 2 19 3
8 8 2 18 3
9 9 10 17 4
Now you can do what you want. For example:
(_, df1), (_, df2) = list(df.loc[df["status"] == 2].groupby("group"))
>>> df1
time status A group
0 0 2 20 1
1 1 2 21 1
2 2 2 20 1
3 3 2 19 1
>>> df2
time status A group
5 5 2 17 3
6 6 2 18 3
7 7 2 19 3
8 8 2 18 3
I want to get all the rows in a dataset that are between two rows where a certain value is met. Is it possible to do that? I cannot sort the dataset because then all the crucial information will be lost.
Edit:
The dataset contains data as such:
Index| game_clock| quarter | event_type
0 | 711 | 1 | 1
1 | 710 | 1 | 3
2 | 709 | 2 | 4
3 | 708 | 3 | 2
4 | 707 | 4 | 4
5 | 706 | 4 | 1
I want to slice the dataset so that I get subsets of all the rows that are between event_type (1 or 2) and (1 or 2).
Edit 2:
Suppose the dataset is as follows:
A B
0 1 0.278179
1 2 0.069914
2 2 0.633110
3 4 0.584766
4 3 0.581232
5 3 0.677205
6 3 0.687155
7 1 0.438927
8 4 0.320927
9 3 0.570552
10 3 0.479849
11 1 0.861074
12 3 0.834805
13 4 0.105766
14 1 0.060408
15 4 0.596882
16 1 0.792395
17 3 0.226356
18 4 0.535201
19 1 0.136066
20 1 0.372244
21 1 0.151977
22 4 0.429822
23 1 0.792706
24 2 0.406957
25 1 0.177850
26 1 0.909252
27 1 0.545331
28 4 0.100497
29 2 0.718721
The subsets I would like to get are indexed as:
[0], [1], [2], [3:8], [8:12],
[12:15], [15:20], [20], [21], [22:24], [24], [25], [26], [27], [28: ]
I believe you need:
a = pd.factorize(df['A'].isin([1,2]).iloc[::-1].cumsum().sort_index())[0]
print (a)
[ 0 1 2 3 3 3 3 3 4 4 4 4 5 5 5 6 6 7 7 7 8 9 10 10 11
12 13 14 15 15]
dfs = dict(tuple(df.groupby(a)))
print (dfs[0])
A B
0 1 0.278179
print (dfs[1])
A B
1 2 0.069914
print (dfs[2])
A B
2 2 0.63311
print (dfs[3])
A B
3 4 0.584766
4 3 0.581232
5 3 0.677205
6 3 0.687155
7 1 0.438927
print (dfs[4])
A B
8 4 0.320927
9 3 0.570552
10 3 0.479849
11 1 0.861074
Explanation:
#check values to boolean mask
a = df['A'].isin([1,2])
#reverse Series
b = df['A'].isin([1,2]).iloc[::-1]
#cumulative sum
c = df['A'].isin([1,2]).iloc[::-1].cumsum()
#get original order
d = df['A'].isin([1,2]).iloc[::-1].cumsum().sort_index()
#factorize for keys in dictionary of DataFrames
e = pd.factorize(df['A'].isin([1,2]).iloc[::-1].cumsum().sort_index())[0]
df = pd.concat([a,pd.Series(b.values),pd.Series(c.values),d,pd.Series(e)],
axis=1, keys=list('abcde'))
print (df)
a b c d e
0 True True 1 16 0
1 True False 1 15 1
2 True True 2 14 2
3 False True 3 13 3
4 False True 4 13 3
5 False True 5 13 3
6 False True 6 13 3
7 True False 6 13 3
8 False True 7 12 4
9 False True 8 12 4
10 False True 9 12 4
11 True False 9 12 4
12 False False 9 11 5
13 False True 10 11 5
14 True False 10 11 5
15 False True 11 10 6
16 True False 11 10 6
17 False False 11 9 7
18 False True 12 9 7
19 True False 12 9 7
20 True False 12 8 8
21 True False 12 7 9
22 False True 13 6 10
23 True False 13 6 10
24 True False 13 5 11
25 True False 13 4 12
26 True False 13 3 13
27 True True 14 2 14
28 False True 15 1 15
29 True True 16 1 15
That list still doesn't make sense. Sometimes you include first occurence, sometimes not. Try this:
import pandas as pd
import numpy as np
np.random.seed(314)
df = pd.DataFrame({'A': np.random.choice([1,2,3,4], 30), 'B':np.random.rand(30)})
ar = np.where(df.A.isin((1,2)))[0]
ids = list(zip(ar,ar[1:]))
for item in ids:
print(df.iloc[item[0]:item[1],:])
ids are now:
[(0, 1), (1, 2), (2, 7), (7, 11), (11, 14), (14, 16), (16, 19), (19, 20),
(20, 21), (21, 23), (23, 24), (24, 25), (25, 26), (26, 27), (27, 29)]
This will include 1 or 2 in the start and stop at 1,2 in the end.
Suppose the following pandas dataframe
Wafer_Id v1 v2
0 0 9 6
1 0 7 8
2 0 1 5
3 1 6 6
4 1 0 8
5 1 5 0
6 2 8 8
7 2 2 6
8 2 3 5
9 3 5 1
10 3 5 6
11 3 9 8
I want to group it according to WaferId and I would like to get something like
w
Out[60]:
Wafer_Id v1_1 v1_2 v1_3 v2_1 v2_2 v2_3
0 0 9 7 1 6 ... ...
1 1 6 0 5 6
2 2 8 2 3 8
3 3 5 5 9 1
I think that I can obtain the result with the pivot function but I am not sure of how to do it
Possible solution
oes = pd.DataFrame()
oes['Wafer_Id'] = [0,0,0,1,1,1,2,2,2,3,3,3]
oes['v1'] = np.random.randint(0, 10, 12)
oes['v2'] = np.random.randint(0, 10, 12)
oes['id'] = [0, 1, 2] * 4
oes.pivot(index='Wafer_Id', columns='id')
oes
Out[74]:
Wafer_Id v1 v2 id
0 0 8 7 0
1 0 3 3 1
2 0 8 0 2
3 1 2 5 0
4 1 4 1 1
5 1 8 8 2
6 2 8 6 0
7 2 4 7 1
8 2 4 3 2
9 3 4 6 0
10 3 9 2 1
11 3 7 1 2
oes.pivot(index='Wafer_Id', columns='id')
Out[75]:
v1 v2
id 0 1 2 0 1 2
Wafer_Id
0 8 3 8 7 3 0
1 2 4 8 5 1 8
2 8 4 4 6 7 3
3 4 9 7 6 2 1