I've tried several solutions from similar problems, but so far, no luck. I know it's probably simple.
I have two pandas dataframes. One contains temperatures and months, df1. The other contains months and a possible range of temperatures, df2. I would like to count how many times a temperature for a particular month occurs based on df2.
df1:
Month Temp
1 10
1 10
1 20
2 5
2 10
2 15
df2:
Month Temp
1 0
1 5
1 10
1 15
1 20
1 25
2 0
2 5
2 10
2 15
2 20
2 25
desired output with a new columns, Count, in df2:
Month Temp Count
1 0 0
1 5 0
1 10 2
1 15 0
1 20 1
1 25 0
2 0 0
2 5 1
2 10 1
2 15 1
2 20 0
2 25 0
import pandas as pd
df1 = pd.DataFrame({'Month': [1]*3 + [2]*3,
'Temp': [10,10,20,5,10,15]})
df2 = pd.DataFrame({'Month': [1]*6 + [2]*6,
'Temp': [0,5,10,15,20,25]*2})
df2['Count'] =
An approach using value_counts and reindex:
new_index = pd.MultiIndex.from_frame(df2)
new_df = (
df1.value_counts(["Month", "Temp"])
.reindex(new_index, fill_value=0)
.rename("Count")
.reset_index()
)
Month Temp Count
0 1 0 0
1 1 5 0
2 1 10 2
3 1 15 0
4 1 20 1
5 1 25 0
6 2 0 0
7 2 5 1
8 2 10 1
9 2 15 1
10 2 20 0
11 2 25 0
Try this:
(df2.join(
df.groupby(['Month','Temp']).size().rename('count'),
on=['Month','Temp'])
.fillna(0))
Another solution:
x = (
df1.assign(Count=1)
.merge(df2, on=["Month", "Temp"], how="outer")
.fillna(0)
.groupby(["Month", "Temp"], as_index=False)
.sum()
.astype(int)
)
print(x)
Prints:
Month Temp Count
0 1 0 0
1 1 5 0
2 1 10 2
3 1 15 0
4 1 20 1
5 1 25 0
6 2 0 0
7 2 5 1
8 2 10 1
9 2 15 1
10 2 20 0
11 2 25 0
try:
res = (df2.set_index(['Month', 'Temp'])
.join(df1.value_counts().to_frame(name='count'))
.reset_index().fillna(0).astype(int))
OR
di = df1.value_counts().to_dict()
df2['count'] = df2.apply(lambda x: 0 if tuple(x) not in di.keys() else di[tuple(x)], axis=1)
Month Temp count
0 1 0 0
1 1 5 0
2 1 10 2
3 1 15 0
4 1 20 1
5 1 25 0
6 2 0 0
7 2 5 1
8 2 10 1
9 2 15 1
10 2 20 0
11 2 25 0
Related
I have the following dataframe:
df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,1,1,0,0,1,1,1,0,1,1,1,1,0,0,0]})
Now I would like to set all the rows equal to zero where less than four 1's appear "in a row", i.e. I would like to have the following resulting DataFrame:
df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0]})
I was not able to find a way to achieve this nicely...
Try with groupby and where:
streaks = df.groupby(df["col"].ne(df["col"].shift()).cumsum()).transform("sum")
output = df.where(streaks.ge(4), 0)
>>> output
col
0 0
1 0
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 1
17 1
18 1
19 1
20 0
21 0
22 0
We can do
df.loc[df.groupby(df.col.eq(0).cumsum()).transform('count')['col']<5,'col'] = 0
df
Out[77]:
col
0 0
1 0
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 1
17 1
18 1
19 1
20 0
21 0
22 0
This question already has answers here:
GroupBy Pandas Count Consecutive Zero's
(2 answers)
Closed 1 year ago.
I want to count consecutive 0s, if there are 0s, count the consecutive numbers, and assign the numbers to the count column, and if they encounter 1, recount.
I also tried several methods, but none of them achieved my results.
An example of my Dataframe is as follows:
import numpy as np
import pandas as pd
np.random.seed(2021)
a = np.random.randint(0, 2, 20)
df = pd.DataFrame(a, columns=['No.'])
print(df)
No.
0 0
1 1
2 1
3 0
4 1
5 0
6 0
7 0
8 1
9 0
10 1
11 1
12 1
13 1
14 0
15 0
16 0
17 0
18 0
19 0
The result I need:
No. count
0 0 1
1 1 0
2 1 0
3 0 1
4 1 0
5 0 3
6 0 3
7 0 3
8 1 0
9 0 1
10 1 0
11 1 0
12 1 0
13 1 0
14 0 6
15 0 6
16 0 6
17 0 6
18 0 6
19 0 6
I tried the following methods, but none of them achieved my results. What should I do?
groups = df['No.'].ne(0).cumsum()
df['count'] = df['No.'].eq(0).groupby(groups).count()
df['count'] = df['No.'].eq(0).groupby(groups).agg(len)
df['count'] = df['No.'].groupby(groups).agg(len)
df['count'] = df['No.'].groupby(groups).count()
For your groups variable, calculate diff first, so you assign an id to each consecutive sequence that contains the same value. And to get the equal sized count Series that can be assigned to original data frame, use transform instead of agg:
df['count'] = 0
groups = df['No.'].diff().ne(0).cumsum()
df.loc[df['No.'] == 0, 'count'] = df['No.'].groupby(groups).transform('size')
df
No. count
0 0 1
1 1 0
2 1 0
3 0 1
4 1 0
5 0 3
6 0 3
7 0 3
8 1 0
9 0 1
10 1 0
11 1 0
12 1 0
13 1 0
14 0 6
15 0 6
16 0 6
17 0 6
18 0 6
19 0 6
Let's say I have a dataframe:
index day
0 21
1 2
2 7
and to each day I want to assign 3 values: 0,1,2 in the end the dataframe should look like this:
index day value
0 21 0
1 21 1
2 21 2
3 2 0
4 2 1
5 2 2
6 7 0
7 7 1
8 7 2
Does anyone have any idea?
You could introduce a column containing (0, 1, 2)-tuples and then explode the dataframe on that column:
import pandas as pd
df = pd.DataFrame({'day': [21, 2, 7]})
df['value'] = [(0, 1, 2)] * len(df)
df = df.explode('value')
df.index = range(len(df))
print(df)
day value
0 21 0
1 21 1
2 21 2
3 2 0
4 2 1
5 2 2
6 7 0
7 7 1
8 7 2
Try:
N = 3
df = df.assign(value=[range(N) for _ in range(len(df))]).explode("value")
print(df)
Prints:
index day value
0 0 21 0
0 0 21 1
0 0 21 2
1 1 2 0
1 1 2 1
1 1 2 2
2 2 7 0
2 2 7 1
2 2 7 2
A reindex option:
df = (
df.reindex(index=pd.MultiIndex.from_product([df.index, [0, 1, 2]]),
level=0)
.droplevel(0)
.rename_axis(index='value')
.reset_index()
)
df:
value day
0 0 21
1 1 21
2 2 21
3 0 2
4 1 2
5 2 2
6 0 7
7 1 7
8 2 7
So I am trying to count the number of consecutive same values in a dataframe and put that information into a new column in the dataframe, but I want the count to look iterative.
Here is what I have so far:
df = pd.DataFrame(np.random.randint(0,3, size=(15,4)), columns=list('ABCD'))
df['subgroupA'] = (df.A != df.A.shift(1)).cumsum()
dfg = df.groupby(by='subgroupA', as_index=False).apply(lambda grp: len(grp))
dfg.rename(columns={None: 'numConsec'}, inplace=True)
df = df.merge(dfg, how='left', on='subgroupA')
df
Here is the result:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 2
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 2
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 4
9 0 0 0 2 7 4
10 0 2 1 1 7 4
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
The problem is, in the numConsec column, I don't want the full count for every row. I want it to reflect how it looks as you iteratively look at the dataframe. The problem is, my dataframe is too large to iteratively loop through and make the counts, as that would be too slow. I need to do it in a pythonic way and make it look like this:
A B C D subgroupA numConsec
0 2 1 1 1 1 1
1 1 2 1 0 2 1
2 1 0 2 1 2 2
3 0 1 2 0 3 1
4 1 0 0 1 4 1
5 0 2 2 1 5 1
6 0 2 1 1 5 2
7 1 0 0 1 6 1
8 0 2 0 0 7 1
9 0 0 0 2 7 2
10 0 2 1 1 7 3
11 0 2 2 0 7 4
12 1 2 0 1 8 1
13 0 1 1 0 9 1
14 1 1 1 0 10 1
Any ideas?
I have a dataframe with a column populated with groups of 1s and 0s. How can I assign each group a consecutive number beginning from 1?
I have tried a for loop across rows, but I need a column operation for fast performance.
d = {'col1': [1,1,1,0,0,1,1,0,0,0,1,1]}
df1 = pd.DataFrame(data=d)
df1
col1
0 1
1 1
2 1
3 0
4 0
5 1
6 1
7 0
8 0
9 0
10 1
11 1
I need the following output:
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5
You can compare shifted values for not equal and add cumulative sum by Series.cumsum:
df1['col2'] = df1['col1'].ne(df1['col1'].shift()).cumsum()
print (df1)
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5