image of jupter notebook issue
For my quarters instead of values for examples 1,0,0,0 showing up I get NaN.
How do I fix the code below so I return values in my dataframe
qrt_1 = {'q1':[1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]}
qrt_2 = {'q2':[0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0]}
qrt_3 = {'q3':[0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0]}
qrt_4 = {'q4':[0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}
year = {'year': [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9]}
value = data_1['Sales']
data = [year, qrt_1, qrt_2, qrt_3, qrt_4]
dataframes = []
for x in data:
dataframes.append(pd.DataFrame(x))
df = pd.concat(dataframes)
I am expecting a dataframe that returns the qrt_1, qrt_2 etc with their corresponding column names
Try to use axis=1 in pd.concat:
df = pd.concat(dataframes, axis=1)
print(df)
Prints:
year q1 q2 q3 q4
0 1 1 0 0 0
1 1 0 1 0 0
2 1 0 0 1 0
3 1 0 0 0 1
4 2 1 0 0 0
5 2 0 1 0 0
6 2 0 0 1 0
7 2 0 0 0 1
8 3 1 0 0 0
9 3 0 1 0 0
10 3 0 0 1 0
11 3 0 0 0 1
12 4 1 0 0 0
13 4 0 1 0 0
14 4 0 0 1 0
15 4 0 0 0 1
16 5 1 0 0 0
17 5 0 1 0 0
18 5 0 0 1 0
19 5 0 0 0 1
20 6 1 0 0 0
21 6 0 1 0 0
22 6 0 0 1 0
23 6 0 0 0 1
24 7 1 0 0 0
25 7 0 1 0 0
26 7 0 0 1 0
27 7 0 0 0 1
28 8 1 0 0 0
29 8 0 1 0 0
30 8 0 0 1 0
31 8 0 0 0 1
32 9 1 0 0 0
33 9 0 1 0 0
34 9 0 0 1 0
35 9 0 0 0 1
I have the following dataframe:
df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,1,1,0,0,1,1,1,0,1,1,1,1,0,0,0]})
Now I would like to set all the rows equal to zero where less than four 1's appear "in a row", i.e. I would like to have the following resulting DataFrame:
df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0]})
I was not able to find a way to achieve this nicely...
Try with groupby and where:
streaks = df.groupby(df["col"].ne(df["col"].shift()).cumsum()).transform("sum")
output = df.where(streaks.ge(4), 0)
>>> output
col
0 0
1 0
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 1
17 1
18 1
19 1
20 0
21 0
22 0
We can do
df.loc[df.groupby(df.col.eq(0).cumsum()).transform('count')['col']<5,'col'] = 0
df
Out[77]:
col
0 0
1 0
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 1
17 1
18 1
19 1
20 0
21 0
22 0
This question already has answers here:
GroupBy Pandas Count Consecutive Zero's
(2 answers)
Closed 1 year ago.
I want to count consecutive 0s, if there are 0s, count the consecutive numbers, and assign the numbers to the count column, and if they encounter 1, recount.
I also tried several methods, but none of them achieved my results.
An example of my Dataframe is as follows:
import numpy as np
import pandas as pd
np.random.seed(2021)
a = np.random.randint(0, 2, 20)
df = pd.DataFrame(a, columns=['No.'])
print(df)
No.
0 0
1 1
2 1
3 0
4 1
5 0
6 0
7 0
8 1
9 0
10 1
11 1
12 1
13 1
14 0
15 0
16 0
17 0
18 0
19 0
The result I need:
No. count
0 0 1
1 1 0
2 1 0
3 0 1
4 1 0
5 0 3
6 0 3
7 0 3
8 1 0
9 0 1
10 1 0
11 1 0
12 1 0
13 1 0
14 0 6
15 0 6
16 0 6
17 0 6
18 0 6
19 0 6
I tried the following methods, but none of them achieved my results. What should I do?
groups = df['No.'].ne(0).cumsum()
df['count'] = df['No.'].eq(0).groupby(groups).count()
df['count'] = df['No.'].eq(0).groupby(groups).agg(len)
df['count'] = df['No.'].groupby(groups).agg(len)
df['count'] = df['No.'].groupby(groups).count()
For your groups variable, calculate diff first, so you assign an id to each consecutive sequence that contains the same value. And to get the equal sized count Series that can be assigned to original data frame, use transform instead of agg:
df['count'] = 0
groups = df['No.'].diff().ne(0).cumsum()
df.loc[df['No.'] == 0, 'count'] = df['No.'].groupby(groups).transform('size')
df
No. count
0 0 1
1 1 0
2 1 0
3 0 1
4 1 0
5 0 3
6 0 3
7 0 3
8 1 0
9 0 1
10 1 0
11 1 0
12 1 0
13 1 0
14 0 6
15 0 6
16 0 6
17 0 6
18 0 6
19 0 6
I would like to split pandas dataframe to groups in order to process each group separately. My 'value.csv' file contains the following numbers
num tID y x height width
2 0 0 0 1 16
2 1 1 0 1 16
5 0 1 0 1 16
5 1 0 0 1 8
5 2 0 8 1 8
6 0 0 0 1 16
6 1 1 0 1 8
6 2 1 8 1 8
2 0 0 0 1 16
2 1 1 0 1 16
5 0 1 0 1 16
5 1 0 0 1 8
5 2 0 8 1 8
6 0 0 0 1 16
6 1 1 0 1 8
6 2 1 8 1 8
I would like to split the data based on the starting value of 0 at the tID column like that for the first 4 seperation.
First:
2 0 0 0 1 16
2 1 1 0 1 16
Second:
5 0 1 0 1 16
5 1 0 0 1 8
5 2 0 8 1 8
Third:
6 0 0 0 1 16
6 1 1 0 1 8
6 2 1 8 1 8
Fourth:
2 0 0 0 1 16
2 1 1 0 1 16
For this, I tried to split it using if but no success, any efficient ideas?
import pandas as pd
statQuality = 'value.csv'
df = pd.read_csv(statQuality, names=['num','tID','y','x','height','width'])
df2 = df.copy()
df2.drop(['num'], axis=1, inplace=True)
x = []
for index, row in df2.iterrows():
if row['tID'] == 0:
x = []
x.append(row)
print(x)
else:
x.append(row)
Use:
#create groups by consecutive values
s = df['num'].ne(df['num'].shift()).cumsum()
#create helper count Series for duplicated groups like `2_0`, `2_1`...
g = s.groupby(df['num']).transform(lambda x: x.factorize()[0])
#dictionary of DataFrames
d = {'{}_{}'.format(i,j): v.drop('num', axis=1) for (i, j), v in df.groupby(['num', g])}
print (d)
{'2_0': tID y x height width
0 0 0 0 1 16
1 1 1 0 1 16, '2_1': tID y x height width
8 0 0 0 1 16
9 1 1 0 1 16, '5_0': tID y x height width
2 0 1 0 1 16
3 1 0 0 1 8
4 2 0 8 1 8, '5_1': tID y x height width
10 0 1 0 1 16
11 1 0 0 1 8
12 2 0 8 1 8, '6_0': tID y x height width
5 0 0 0 1 16
6 1 1 0 1 8
7 2 1 8 1 8, '6_1': tID y x height width
13 0 0 0 1 16
14 1 1 0 1 8
15 2 1 8 1 8}
having following column in dataframe:
0
0
0
0
0
5
I would like to check for values greater than a threshold. If found, set to zero and move up by the difference value-threshold, setting threshold on the new position. Let's say threshold=3, then the resulting column has to be:
0
0
0
3
0
0
Any idea for fast transformation?
For this DataFrame:
df
Out:
A
0 0
1 0
2 0
3 0
4 0
5 5
6 0
7 0
8 0
9 0
10 6
11 0
12 0
threshold = 3
above_threshold = df['A'] > threshold
df.loc[df[above_threshold].index - (df.loc[above_threshold, 'A'] - 3).values, 'A'] = 3
df.loc[above_threshold, 'A'] = 0
df
Out:
A
0 0
1 0
2 0
3 3
4 0
5 0
6 0
7 3
8 0
9 0
10 0
11 0
12 0