I'm aiming to subset a pandas df using a condition and append those rows to the right of a df. For example, where Num2 is equal to 1, I want to take the following row and append it to the right of the df. The following appends every row, where as I just want to append the following row after a 1 in Num2. I'd also like to be able to append specific cols. Using below, this could be only Num1 and Num2.
df = pd.DataFrame({
'Num1' : [0,1,2,3,4,4,0,1,2,3,1,1,2,3,4,0],
'Num2' : [0,0,0,0,0,1,3,0,1,2,0,0,0,0,1,4],
'Value' : [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
})
df1 = df.add_suffix('1').join(df.shift(-1).add_suffix('2'))
intended output:
# grab all rows after a 1 in Num2
ones = df.loc[df["Num2"].shift().isin([1])]
# append these to the right
Num1 Num2 Value Num12 Num22
0 0 0 0
1 1 0 0
2 2 0 0
3 3 0 0
4 4 0 0
5 4 1 0 0 3
6 0 3 0
7 1 0 0
8 2 1 0 3 2
9 3 2 0
10 1 0 0
11 1 0 0
12 2 0 0
13 3 0 0
14 4 1 0 0 4
15 0 4 0
You can try:
df=df.join(df.shift(-1).mask(df['Num2'].ne(1)).drop('Value',1).add_suffix('2'))
OR
ones.index=ones.index-1
df=df.join(ones.drop('Value',1).add_suffix('2'))
#OR(use any 1 since both method doing the same thing)
df=pd.concat([df,ones.drop('Value',1).add_suffix('2')],axis=1)
If needed use fillna():
df[["Num12", "Num22"]]=df[["Num12", "Num22"]].fillna('')
We can do this by making new columns that are the -1 shifts of the previous three, then setting them equal to "" if Num2 isn't 1.
mask = df.Num2 != 1
df[["Num12", "Num22"]] = df[["Num1", "Num2"]].shift(-1)
df.loc[mask, ["Num12", "Num22"]] = ""
Got a warning on this, but nevertheless
>>> df[["Num12", "Num22"]] = np.where(df[['Num1', "Num2"]]['Num2'][:,np.newaxis] == 1, df[['Num1', 'Num2']].shift(-1), [np.nan, np.nan])
<stdin>:1: FutureWarning: Support for multi-dimensional indexing (e.g. `obj[:, None]`) is deprecated and will be removed in a future version. Convert to a numpy array before indexing instead.
>>> df
Num1 Num2 Value Num12 Num22
0 0 0 0 NaN NaN
1 1 0 0 NaN NaN
2 2 0 0 NaN NaN
3 3 0 0 NaN NaN
4 4 0 0 NaN NaN
5 4 1 0 0.0 3.0
6 0 3 0 NaN NaN
7 1 0 0 NaN NaN
8 2 1 0 3.0 2.0
9 3 2 0 NaN NaN
10 1 0 0 NaN NaN
11 1 0 0 NaN NaN
12 2 0 0 NaN NaN
13 3 0 0 NaN NaN
14 4 1 0 0.0 4.0
15 0 4 0 NaN NaN
Related
Task
I have a df where I do some ratios that are groupby date and id. I want to fill column c with NaN if the sum of a and b is 0. Any help would be awesome!!
df
date id a b c
0 2001-09-06 1 3 1 1
1 2001-09-07 1 3 1 1
2 2001-09-08 1 4 0 1
3 2001-09-09 2 6 0 1
4 2001-09-10 2 0 0 2
5 2001-09-11 1 0 0 2
6 2001-09-12 2 1 1 2
7 2001-09-13 2 0 0 2
8 2001-09-14 1 0 0 2
Try this:
df['new_c'] = df.c.where(df[['a','b']].sum(1).ne(0))
Out[75]:
date id a b c new_c
0 2001-09-06 1 3 1 1 1.0
1 2001-09-07 1 3 1 1 1.0
2 2001-09-08 1 4 0 1 1.0
3 2001-09-09 2 6 0 1 1.0
4 2001-09-10 2 0 0 2 NaN
5 2001-09-11 1 0 0 2 NaN
6 2001-09-12 2 1 1 2 2.0
7 2001-09-13 2 0 0 2 NaN
8 2001-09-14 1 0 0 2 NaN
It is better to build a new dataframe with same shape , and then do the following :
i = 0
for line in df :
new_df[i]['date'] = line['date']
new_df[i]['a'] = line['a']
new_df[i]['b'] = line['b']
if line['a'] + line['b'] == 0 :
new_df[i]['c'] = Nan
i += 1
I have the following data:
one_dict = {0: "zero", 1: "one", 2: "two", 3: "three", 4: "four"}
two_dict = {0: "light", 1: "calc", 2: "line", 3: "blur", 4: "color"}
np.random.seed(2)
n = 15
a_df = pd.DataFrame(dict(a=np.random.randint(0, 4, n), b=np.random.randint(0, 3, n)))
a_df["c"] = np.nan
a_df = a_df.sort_values("b").reset_index(drop=True)
where the dataframe looks as:
In [45]: a_df
Out[45]:
a b c
0 3 0 NaN
1 1 0 NaN
2 0 0 NaN
3 2 0 NaN
4 3 0 NaN
5 1 0 NaN
6 2 1 NaN
7 2 1 NaN
8 3 1 NaN
9 0 2 NaN
10 3 2 NaN
11 3 2 NaN
12 0 2 NaN
13 3 2 NaN
14 1 2 NaN
I would like to replace values in c with those from dictionaries one_dict
and two_dict, with the result as follows:
In [45]: a_df
Out[45]:
a b c
0 3 0 three
1 1 0 one
2 0 0 zero
3 2 0 .
4 3 0 .
5 1 0 .
6 2 1 calc
7 2 1 calc
8 3 1 blur
9 0 2 NaN
10 3 2 NaN
11 3 2 NaN
12 0 2 NaN
13 3 2 NaN
14 1 2 NaN
Attempt
I'm not sure what a good approach to this would be though.
I thought that I might do something along the following lines:
merge_df = pd.DataFrame(dict(one = one_dict, two=two_dict)).reset_index()
merge_df['zeros'] = 0
merge_df['ones'] = 1
giving
In [62]: merge_df
Out[62]:
index one two zeros ones
0 0 zero light 0 1
1 1 one calc 0 1
2 2 two line 0 1
3 3 three blur 0 1
4 4 four color 0 1
Then merge this into the a_df, but I'm not sure how to merge in and update
at the same time, or if this is a good approach.
Edit
keys correspond to the values of column a
. is just shorthand, this should be filled in with the value as others are
This is just matter of creating new dataframe with the correct structure and merge:
(a_df.drop('c', axis=1)
.merge(pd.DataFrame([one_dict,two_dict])
.rename_axis(index='b',columns='a')
.stack().reset_index(name='c'),
on=['a','b'],
how='left')
)
Output:
a b c
0 3 0 three
1 1 0 one
2 0 0 zero
3 2 0 two
4 3 0 three
5 1 0 one
6 2 1 line
7 2 1 line
8 3 1 blur
9 0 2 NaN
10 3 2 NaN
11 3 2 NaN
12 0 2 NaN
13 3 2 NaN
14 1 2 NaN
Following is what my dataframe looks like. Expected_Output is my desired column:
Group Signal Ready Value Expected_Output
0 1 0 0 3 NaN
1 1 0 1 72 NaN
2 1 0 0 0 NaN
3 1 4 0 0 72.0
4 1 4 0 0 72.0
5 1 4 0 0 72.0
6 2 0 0 0 NaN
7 2 7 0 0 NaN
8 2 7 0 0 NaN
9 2 7 0 0 NaN
If Signal > 1, then I am trying to fetch the most recent non-zero Value in the previous rows within the Group where Ready = 1. So in row 3, Signal = 4, so I want to fetch the most recent non-zero Value of 72 from row 1 where Ready = 1.
Once I can fetch the value, I can do df.groupby(['Group','Signal']).Value.transform('first') as Signals appear repeatedly like 444 but not sure how to fetch Value.
IIUC groupby + ffill with Boolean assign
df['Help']=df.Value.where(df.Ready==1).replace(0,np.nan)
df['New']=df.groupby('Group').Help.ffill()[df.Signal>1]
df
Out[1006]:
Group Signal Ready Value Expected_Output Help New
0 1 0 0 3 NaN 3.0 NaN
1 1 0 1 72 NaN 72.0 NaN
2 1 0 0 0 NaN NaN NaN
3 1 4 0 0 72.0 NaN 72.0
4 1 4 0 0 72.0 NaN 72.0
5 1 4 0 0 72.0 NaN 72.0
6 2 0 0 0 NaN NaN NaN
7 2 7 0 0 NaN NaN NaN
8 2 7 0 0 NaN NaN NaN
9 2 7 0 0 NaN NaN NaN
Create a series via GroupBy + ffill, then mask the resultant series:
s = df.assign(Value_mask=df['Value'].where(df['Ready'].eq(1)))\
.groupby('Group')['Value_mask'].ffill()
df['Value'] = s.where(df['Signal'].gt(1))
Group Signal Ready Value
0 1 0 0 NaN
1 1 0 1 NaN
2 1 0 0 NaN
3 1 4 0 72.0
4 1 4 0 72.0
5 1 4 0 72.0
6 2 0 0 NaN
7 2 7 0 NaN
8 2 7 0 NaN
9 2 7 0 NaN
I use pandas:
input:
import pandas as pd
a=pd.Series([0,0,1,0,0,0,0])
output:
0 0
1 0
2 1
3 0
4 0
5 0
6 0
I want to get data for next rows in same values:
output:
0 0
1 0
2 1
3 1
4 1
5 1
6 0
use
a+a.shift(1)+a.shift(2)+a.shift(3)
I think this is not a smart solution
who have a smart solution for this
You can try this assuming index 6 should be value 1 too,
a=pd.Series([0,0,1,0,0,0,0])
a.eq(1).cumsum()
Out[19]:
0 0
1 0
2 1
3 1
4 1
5 1
6 1
dtype: int32
Updated : More than one value not equal to 0.
a=pd.Series([0,0,1,0,1,3,0])
a.ne(0).cumsum()
A=pd.DataFrame({'a':a,'Id':a.ne(0).cumsum()})
A.groupby('Id').a.cumsum()
Out[58]:
0 0
1 0
2 1
3 1
4 1
5 3
6 3
Or you can use ffill
a[a.eq(0)]=np.nan
a.ffill().fillna(0)
Out[64]:
0 0.0
1 0.0
2 1.0
3 1.0
4 1.0
5 3.0
6 3.0
1 You could filter the series for "your" value (SearchValue).
2 Re-index the dataseries to a to-be-stated length (LengthOfIndex) and forward fill the "your" a given number of times (LengthOfFillRange)
3 Fill it with zeros again.
import pandas as pd
import numpy as np
a=pd.Series([0,0,1,0,0,0,0])
SearchValue = 1
LengthOfIndex = 7
LengthOfFillRange = 4
a=a[a==SearchValue]\
.reindex(np.linspace(1,LengthOfIndex,LengthOfIndex, dtype='int32'),
method='ffill',
limit=LengthOfFillRange)\
.fillna(0)
If you need repeat only 2 values Series by some limit use replace for NaNs, then ffill (fillna with method ffill) and last fillna for convert NaNs to original values (and if necessary convert to int):
a=pd.Series([0,0,1,0,0,0,0,1,0,0,0,])
print (a)
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 1
8 0
9 0
10 0
dtype: int64
b= a.replace(0,np.nan).ffill(limit=2).fillna(0).astype(a.dtype)
print (b)
0 0
1 0
2 1
3 1
4 1
5 0
6 0
7 1
8 1
9 1
10 0
dtype: int64
I have a dataframe df:
AID JID CID
0 1 A NaN
1 1 A NaN
2 1 B NaN
3 1 NaN X
4 3 A NaN
5 4 NaN NaN
6 4 C X
7 5 C Y
8 5 C X
9 6 A NaN
10 6 B NaN
I want to calculate how many times has AID used either JID or CID.
Resulting dataframe should be like this, where the index is the AID values and the columns are the CID and JID values:
A B C X Y
1 2 1 0 1 0
3 1 0 0 0 0
4 0 0 1 1 0
5 0 0 2 1 1
6 1 1 0 0 0
I know how to do it by looping and counting manually. But I was wondering what the more efficient way is?
I'd melt and then use pivot_table:
In [80]: d2 = pd.melt(df, id_vars="AID")
In [81]: d2.pivot_table(index="AID", columns="value", values="variable",
aggfunc="count", fill_value=0)
Out[81]:
value A B C X Y
AID
1 2 1 0 1 0
3 1 0 0 0 0
4 0 0 1 1 0
5 0 0 2 1 1
6 1 1 0 0 0
This works because melt "flattens" the dataframe into something where we can more easily access the values together, and pivot_table is for exactly the type of aggregation you have in mind:
In [90]: pd.melt(df, "AID")
Out[90]:
AID variable value
0 1 JID A
1 1 JID A
2 1 JID B
3 1 JID NaN
4 3 JID A
[... skipped]
17 4 CID X
18 5 CID Y
19 5 CID X
20 6 CID NaN
21 6 CID NaN
You can create first Series by stack and then groupby with value_counts. Last reshape by unstack:
df = df.set_index('AID').stack().groupby(level=0).value_counts().unstack(1, fill_value=0)
print (df)
A B C X Y
AID
1 2 1 0 1 0
3 1 0 0 0 0
4 0 0 1 1 0
5 0 0 2 1 1
6 1 1 0 0 0