image of jupter notebook issue
For my quarters instead of values for examples 1,0,0,0 showing up I get NaN.
How do I fix the code below so I return values in my dataframe
qrt_1 = {'q1':[1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0]}
qrt_2 = {'q2':[0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0]}
qrt_3 = {'q3':[0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0]}
qrt_4 = {'q4':[0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1]}
year = {'year': [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,6,6,6,6,7,7,7,7,8,8,8,8,9,9,9,9]}
value = data_1['Sales']
data = [year, qrt_1, qrt_2, qrt_3, qrt_4]
dataframes = []
for x in data:
dataframes.append(pd.DataFrame(x))
df = pd.concat(dataframes)
I am expecting a dataframe that returns the qrt_1, qrt_2 etc with their corresponding column names
Try to use axis=1 in pd.concat:
df = pd.concat(dataframes, axis=1)
print(df)
Prints:
year q1 q2 q3 q4
0 1 1 0 0 0
1 1 0 1 0 0
2 1 0 0 1 0
3 1 0 0 0 1
4 2 1 0 0 0
5 2 0 1 0 0
6 2 0 0 1 0
7 2 0 0 0 1
8 3 1 0 0 0
9 3 0 1 0 0
10 3 0 0 1 0
11 3 0 0 0 1
12 4 1 0 0 0
13 4 0 1 0 0
14 4 0 0 1 0
15 4 0 0 0 1
16 5 1 0 0 0
17 5 0 1 0 0
18 5 0 0 1 0
19 5 0 0 0 1
20 6 1 0 0 0
21 6 0 1 0 0
22 6 0 0 1 0
23 6 0 0 0 1
24 7 1 0 0 0
25 7 0 1 0 0
26 7 0 0 1 0
27 7 0 0 0 1
28 8 1 0 0 0
29 8 0 1 0 0
30 8 0 0 1 0
31 8 0 0 0 1
32 9 1 0 0 0
33 9 0 1 0 0
34 9 0 0 1 0
35 9 0 0 0 1
I have the following dataframe:
df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,1,1,0,0,1,1,1,0,1,1,1,1,0,0,0]})
Now I would like to set all the rows equal to zero where less than four 1's appear "in a row", i.e. I would like to have the following resulting DataFrame:
df = pd.DataFrame({"col":[0,0,1,1,1,1,0,0,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0]})
I was not able to find a way to achieve this nicely...
Try with groupby and where:
streaks = df.groupby(df["col"].ne(df["col"].shift()).cumsum()).transform("sum")
output = df.where(streaks.ge(4), 0)
>>> output
col
0 0
1 0
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 1
17 1
18 1
19 1
20 0
21 0
22 0
We can do
df.loc[df.groupby(df.col.eq(0).cumsum()).transform('count')['col']<5,'col'] = 0
df
Out[77]:
col
0 0
1 0
2 1
3 1
4 1
5 1
6 0
7 0
8 0
9 0
10 0
11 0
12 0
13 0
14 0
15 0
16 1
17 1
18 1
19 1
20 0
21 0
22 0
I'm aiming to replace values in a df column Num. Specifically:
where 1 is located in Num, I want to replace preceding 0's with 1 until the nearest Item is 1 working backwards or backfilling.
where Num == 1, the corresponding row in Item will always be 0.
Also, Num == 0 will always follow Num == 1.
Input and code:
df = pd.DataFrame({
'Item' : [0,1,2,3,4,4,0,1,2,3,1,1,2,3,4,0],
'Num' : [0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0]
})
df['Num'] = np.where((df['Num'] == 1) & (df['Item'].shift() > 1), 1, 0)
Item Num
0 0 0
1 1 0
2 2 0
3 3 0
4 4 0
5 4 1
6 0 0
7 1 0
8 2 0
9 3 0
10 1 0
11 1 0
12 2 0
13 3 0
14 4 1
15 0 0
intended output:
Item Num
0 0 0
1 1 1
2 2 1
3 3 1
4 4 1
5 4 1
6 0 0
7 1 0
8 2 0
9 3 0
10 1 0
11 1 1
12 2 1
13 3 1
14 4 1
15 0 0
First, create groups of the rows according to the two start and end conditions using cumsum. Then we can group by this new column and sum over the Num column. In this way, all groups that contain a 1 in the Num column will get the value 1 while all other groups will get 0.
groups = ((df['Num'].shift() == 1) | (df['Item'] == 1)).cumsum()
df['Num'] = df.groupby(groups)['Num'].transform('sum')
Result:
Item Num
0 0 0
1 1 1
2 2 1
3 3 1
4 4 1
5 4 1
6 0 0
7 1 0
8 2 0
9 3 0
10 1 0
11 1 1
12 2 1
13 3 1
14 4 1
15 0 0
You could try:
for a, b in zip(df[df['Item'] == 0].index, df[df['Num'] == 1].index):
df.loc[(df.loc[a+1:b-1, 'Item'] == 1)[::-1].idxmax():b-1, 'Num'] = 1
I have a dataframe with about 60 columns and the following structure:
A B C Y
0 12 1 0 1
1 13 1 0 [....] 0
2 14 0 1 1
3 15 1 0 0
4 16 0 1 1
I want to create a zth column which will be the sum of the values from columns B to Y.
How can I proceed?
To create a copy of the dataframe while including a new column, use assign
df.assign(Z=df.loc[:, 'B':'Y'].sum(1))
A B C Y Z
0 12 1 0 1 2
1 13 1 0 0 1
2 14 0 1 1 2
3 15 1 0 0 1
4 16 0 1 1 2
To assign it to the same dataframe, in place, use
df['Z'] = df.loc[:, 'B':'Y'].sum(1)
df
A B C Y Z
0 12 1 0 1 2
1 13 1 0 0 1
2 14 0 1 1 2
3 15 1 0 0 1
4 16 0 1 1 2
Try this
df['z']=df.iloc[:,1:].sum(1)
You could
In [2361]: df.assign(Z=df.loc[:, 'B':'Y'].sum(1))
Out[2361]:
A B C Y Z
0 12 1 0 1 2
1 13 1 0 0 1
2 14 0 1 1 2
3 15 1 0 0 1
4 16 0 1 1 2