Pandas Sample Based on Two Criteria - python

I have a dataframe that looks like this:
feature target
0 2 0
1 0 0
2 0 0
3 0 0
4 1 0
... ... ...
33208 1 0
33209 0 0
33210 2 0
33211 2 0
33212 1 0
In the feature column there are 3 classes (0, 1, 2) and in the target column there are two classes (0, 1). If I group the dataframe by this two columns, I get:
df.groupby(['feature', 'target']).size()
feature target
0 0 4282
1 81
1 0 8537
1 37
2 0 20161
1 115
dtype: int64
Each feature class have 0s and 1s as target values, I need to find a way of sampling this values, my intention is to have something like this at the end:
new_df.groupby(['feature', 'target']).size()
feature target
0 0 4282
1 81
1 0 4282
1 37
2 0 4282
1 115
dtype: int64
I need to sample the amount of target values for each feature class, any suggestions?

You have different distributions, depending on the value of feature.
You need to sample n value from a distribution, provided the value of feature: given that there are 2 possible outcomes, that is a binomial distribution problem.
The approach shown below should facilitate situation when target is not necessarily (0, 1) - could be anything (win vs lose, team A vs team B, as so forth) as far as I can see:
import numpy as np
import pandas as pd
# this is just reproducting your grouped end stated
df = pd.DataFrame({"feature":[0, 0, 1, 1, 2, 2], "target":[0, 1, 0, 1, 0, 1], "number":[4282, 81, 4282, 37, 4282, 115]})
df = df.set_index(["feature", "target"])
def sample_values(feature, sample_size):
# select one of the distribution by feature
df_sub = df.loc[feature]
(event1, number1), (event2, number2) = zip(df_sub.index,df_sub["number"].tolist())
return [event2 if np.random.binomial(1, number2/(number1+number2))==1 else event1 for _ in range(sample_size)]
print(sample_values(2, 100))
OUTPUT
[1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Related

Subset/Filter column in Dataframe [duplicate]

This question already has answers here:
Accessing every 1st element of Pandas DataFrame column containing lists
(5 answers)
Closed 1 year ago.
I have dataframe like this:
text emotion
0 working add oil [1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0]
1 you're welcome [0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0]
7 off to face my exam now [0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, ...
12 no, i'm so not la! i want to sleeeeeeeeeeep. [0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, ...
151 i try to register on ebay. when i enter my hom... [1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, ...
18 Swam 6050 yards on just a yogurt for breakfast... [0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, ...
19 Alright! [0, 0, 1, 1, 0, 0, 0, 0]
120 Visiting gma. It's getting cold [0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, ...
22 You are very missed [0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, ...
345 ...LOL! You mean Rhode Island...close enough [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, ...
How can I leave only the first numbers in emotion column, to get data like this?:
text emotion
0 working add oil 1
1 you're welcome 0
7 off to face my exam now 0
12 no, i'm so not la! i want to sleeeeeeeeeeep. 0
151 i try to register on ebay. when i enter my hom... 1
18 Swam 6050 yards on just a yogurt for breakfast... 0
19 Alright! **0**
120 Visiting gma. It's getting cold 0
22 You are very missed **0**
345 ...LOL! You mean Rhode Island...close enough 0
If "emotion" column is a list and not string:
df["emotion"] = df["emotion"].apply(lambda x: x[0])
print(df)
Prints:
text emotion
0 working add oil 1
1 you're welcome 0
2 off to face my exam now 0
3 no, i'm so not la! i want to sleeeeeeeeeeep. 0
4 i try to register on ebay. when i enter my hom... 1
5 Swam 6050 yards on just a yogurt for breakfast... 0
6 Alright! 0
7 Visiting gma. It's getting cold 0
8 You are very missed 0
9 ...LOL! You mean Rhode Island...close enough 0
If it's string, you can convert it to list using ast.literal_eval:
from ast import literal_eval
df["emotion"] = df["emotion"].apply(literal_eval)
# and then:
df["emotion"] = df["emotion"].apply(lambda x: x[0])

Convert a list of strings containing list-of-integers into an array

I have a series that is a list of lists that contain integers that I am attempting to turn into an array. This is a small snip-it of the list I am trying to convert into an array.
['[0, 0, 0, 0, 0, 0, 0, 0, 0, 1]',
'[0, 0, 0, 0, 0, 0, 0, 0, 1, 0]',
'[0, 0, 0, 0, 0, 0, 0, 1, 0, 0]',
'[0, 0, 0, 0, 0, 0, 0, 1, 0, 1]',
'[0, 0, 0, 0, 0, 0, 0, 1, 1, 1]']
I've tried to replace the quotes with .replace, but that hasn't worked out.
sequence = [i.replace(" '' ", ' ') for i in sequence]
You can use ast.literal_eval to change the string to list of lists of ints
sequence = [literal_eval(i) for i in sequence]
# [[0, 0, 0, 0, 0, 0, 0, 0, 0, 1], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 1], [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]]
You can change it to numpy array
import numpy as np
array = np.asarray(sequence)
print(array)
output
[[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 0 0 0 0 0 1 0 1]
[0 0 0 0 0 0 0 1 1 1]]
Or to 1d pandas array
import pandas as pd
array = pd.array([item for items in sequence for item in items])
print(array)
outout
<IntegerArray>
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0,
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1]
Length: 50, dtype: Int64

How can I add a small matrix into a bigger one along the diagonal in a specific way?

I am trying to write a big matrix that includes a smaller row matrix (size changeable) that are spread on the "diagonal" of the matrix. All the other values are 0. How do I create such a matrix?
I've tried np.put, np.append. Here's what I have so far:
t = [1,2,3]
n=3
m=4
A = np.zeros((2*m,m*n+m),dtype=int)
for i in range (m):
A[i-1:i-1+t.shape[0], n*(i-1):n*(i-1)+t.shape[1]] += t
print("A= \n",np.matrix(A))
I want the following matrix (I'm sorry I don't know how to show matrix but if someone can help me with this too I would appreciate it a lot) :
A=
[[1 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 2 3 0 0 0 0 0 0 0 0 0 0 ]
[0 0 0 0 0 0 1 2 3 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1 2 3 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
It causes the following error:
ValueError: operands could not be broadcast together with shapes (0,0) (1,3) (0,0)
You can use careful reshaping like so:
t = [1,2,3]
n=3
m=4
A = np.zeros((2*m,m*n+m),dtype=int)
A.ravel()[:m*(m*n+m+n)].reshape(m,-1)[:,:len(t)] = t
A
# array([[1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Make mask for 12 positions and use it for assignment
idx = np.zeros(A.shape).astype(bool)
for i in range(m):
idx[i,i*n:i*n+3] = True
A[idx]= t*m
array([[1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

How to find most common elemtent in a ndarray [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I have a numpy array with the following shape (11617, 37). The data is multi class data, and to establish a baseline, I need to find which class (or classes) are the most common.
I have tried this formula and also this
A = np.array([[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0],
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0],
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]])
axis = 0
u, indices = np.unique(arr, return_inverse=True)
answer = u[np.argmax(np.apply_along_axis(np.bincount, axis, indices.reshape(arr.shape),
None, np.max(indices) + 1), axis=axis)]
I need to find the most frequent combination of the 37 classes in my array
Expected output:
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]
To find the most frequent combination (rows, which means axis=0), you can try this!
A = np.array([[1,0,0,0],
[1,0,0,1],
[1,0,0,0]])
unique_rows,counts = np.unique(A, return_counts=True,axis=0)
unique_rows[np.argmax(counts)]
FYI, If the array you mentioned in the question is your target variable, then it is an example of multi-label data.
This may be of use for you to understand multi-class and multi-label
You could try np.unique with return_counts parameter:
from operator import itemgetter
import numpy as np
A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
uniques, counts = np.unique(A, axis=0, return_counts=True)
idxmax, _ = max(zip(range(len(counts)), counts), key=itemgetter(1))
print(uniques[idxmax])
Output
[0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0]
You can use collections.Counter.most_common if you convert your list of list elements to a tuple (convert the lists to tuples so they can be counted)
from collections import Counter
A = [[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0]]
c = Counter(tuple(x) for x in A)
print(c.most_common()[0]) # ((0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0), 2)
This returns a tuple containing the most common list and the number of occurrences.
A really quick and easy solution:
A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
print(max(A, key=A.count))
Which prints:
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
If you need to pay attention to runtime or want to optimize your code - this is not the way you want to go. However, if you just need a quick solution, it might help to keep this one-liner in mind.
(A.tolist() gets you a list from a np.ndarray if you need that first.)
from collections import Counter
A = [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]
most_common = [Counter(i).most_common(1).pop()[0] for i in A]
most_common
[0, 0, 0]

Transform Pandas dataframe into frequency matrix

I'm trying to transform a pandas dataframe with three columns (Date, Start, End) into a frequency matrix. My input dataframe look like this:
Date, Start, End
2016-09-02 09:16:00 18 16
2016-09-02 16:14:10 16 1
2016-09-02 06:17:21 18 17
2016-09-02 05:51:07 23 17
2016-09-02 18:34:44 18 17
2016-09-02 05:44:44 20 4
2016-09-02 09:25:22 18 17
2016-09-02 22:27:44 18 17
2016-09-02 16:02:46 0 18
2016-09-02 15:35:07 17 17
2016-09-02 16:06:42 8 17
2016-09-02 14:47:04 16 23
2016-09-02 07:47:24 20 1
...
The values of 'Start' and 'End' are integers between 0 and 23 inclusive. The 'Date' is a datetime. The frequency matrix I'm trying to create is a 24 by 24 csv, where row i and column j is the number of times 'End'=i and 'Start'=j occurs in the input. For example, the above data would create:
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0
2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0
5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
11, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
13, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
14, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0
17, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 0, 0, 0, 0, 1
18, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
19, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
21, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
22, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
23, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0
For extra help, could this be done in a way that creates a separate matrix for every 15 minutes? That would be 672 matrices as this date range is one week.
I'm a self taught beginner, and I really can't think of how to solve this in a pythonic way, any solutions or advice would be greatly appreciated.
Create your matrix with a simple count and unstack one of one column:
mat = df.groupby(['Start', 'End']).count().unstack(level=0)
Clean up the Date level:
mat.columns = mat.columns.droplevel(0)
Now reindex rows and columns and cast into integers:
mat.reindex(*[range(0,24)]*2).fillna(0)
Detailed explanations
First, you count the number of occurences a given (start,end) couple appears. The result of groupby against these two columns actually brings back a multiindex.
df.groupby(['Start', 'End']).count()
Out[134]:
Date
Start End
0 18 1
8 17 1
16 1 1
23 1
17 17 1
18 16 1
17 4
20 1 1
4 1
23 17 1
What we want from that result is to get the Start index in columns. unstack does this:
df.groupby(['Start', 'End']).count().unstack(level=0)
Out[135]:
Date
Start 0 8 16 17 18 20 23
End
1 NaN NaN 1.0 NaN NaN 1.0 NaN
4 NaN NaN NaN NaN NaN 1.0 NaN
16 NaN NaN NaN NaN 1.0 NaN NaN
17 NaN 1.0 NaN 1.0 4.0 NaN 1.0
18 1.0 NaN NaN NaN NaN NaN NaN
23 NaN NaN 1.0 NaN NaN NaN NaN
The result of unstack is the Start column being moved as an additional column index level on top of the current Date column index (see below). That's why we drop the level 0 afterwards. Another way - depending on your current source code - could be to filter out the Date column upfront, then unstack would bring one level.
_.columns
Out[136]:
MultiIndex(levels=[['Date'], [0, 8, 16, 17, 18, 20, 23]],
labels=[[0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5, 6]],
names=[None, 'Start'])
Bit late but for anyone who's here:
There is a function explicitly for this called pd.crosstab()
https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html
You will want to use it like:
output = pd.crosstab(df["Start"], df["End"])

Categories

Resources