Python Pandas: populate a column until a different value appears?

Python Pandas: populate a column until a different value appears? - python

I am working on an algo trading project using the pandas and numpy libraries and would like to achieve the following result:
Current output:
1
0
0
2
0
2
0
0
4
0
0
0
5
desired output:
1
1
1
2
2
2
2
2
4
4
4
4
5
How do I go about this?

Replace 0 by NA then fill forward:
df['col1'] = df['col1'].replace(0, pd.NA).ffill()
print(df)
# Output
col1
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 2
8 4
9 4
10 4
11 4
12 5

You can try the method argument of pandas.DataFrame.replace
df['col'] = df['col'].replace(to_replace=0, method='ffill')
print(df)
col
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 2
8 4
9 4
10 4
11 4
12 5

Related

How to fill by counting last and forward N values with static window in pandas

I have a calendar data of type of dayworks - the day is the holiday or not.
I want to create a new feautures:
The value in the cell is the number of holidays in the week.
The value in the cell is the number of holidays in the N-window (right and left windows). In example - N=5 (and including current value)
Example:
is_holiday feature_1 feature_2
idx
0 0 2 0
1 0 2 1
2 0 2 2
3 0 2 2
4 0 2 2
5 1 2 2
6 1 2 2
7 0 3 3
8 0 3 4
9 0 3 5
10 0 3 4
11 1 3 3
12 1 3 3
13 1 3 3
...

I think you need grouping for each 7 values and aggregate sum and for second is used Series.rolling:
df['f1'] = df.groupby(df.index // 7)['is_holiday'].transform('sum')
df['f2'] = df['is_holiday'].rolling(9, center=True, min_periods=1).sum().astype(int)
print (df)
is_holiday feature_1 feature_2 f1 f2
idx
0 0 2 0 2 0
1 0 2 1 2 1
2 0 2 2 2 2
3 0 2 2 2 2
4 0 2 2 2 2
5 1 2 2 2 2
6 1 2 2 2 2
7 0 3 3 3 3
8 0 3 4 3 4
9 0 3 5 3 5
10 0 3 4 3 4
11 1 3 3 3 3
12 1 3 3 3 3
13 1 3 3 3 3

Count the most frequent values in a row pandas and make a column with that most frequent value

I have a data frame like this below:
a b c
0 3 3 3
1 3 3 3
2 3 3 3
3 3 3 3
4 2 3 2
5 3 3 3
6 1 2 1
7 2 3 2
8 0 0 0
9 0 1 0
I want to count frequency of each row and add a column result containing the max frequency like this below:
a b c result
0 3 3 3 3
1 3 3 3 3
2 3 3 3 3
3 3 3 3 3
4 2 3 2 2
5 3 3 3 3
6 1 2 1 1
7 2 3 2 2
8 0 0 0 0
9 0 1 0 0
I tries to do transpose and looping through the transposed columns to get the value_counts but could not got the right result.
Any help is highly appreciated.

Use DataFrame.mode with select first column by positions with DataFrame.iloc:
df['result'] = df.mode(axis=1).iloc[:, 0]
print (df)
a b c result
0 3 3 3 3
1 3 3 3 3
2 3 3 3 3
3 3 3 3 3
4 2 3 2 2
5 3 3 3 3
6 1 2 1 1
7 2 3 2 2
8 0 0 0 0
9 0 1 0 0

Padding and reshaping pandas dataframe

I have a dataframe with the following form:
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3],'Time':[0,1,2,0,1,2,3,0,1],
'sig':[2,3,1,4,2,0,2,3,5],'sig2':[9,2,8,0,4,5,1,1,0],
'group':['A','A','A','B','B','B','B','A','A']})
print(data)
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 2 0 4 0 B
4 2 1 2 4 B
5 2 2 0 5 B
6 2 3 2 1 B
7 3 0 3 1 A
8 3 1 5 0 A
I want to reshape and pad such that each 'ID' has the same number of Time values, the sig1,sig2 are padded with zeros (or mean value within ID) and the group carries the same letter value. The output after repadding would be :
data_pad = pd.DataFrame({'ID':[1,1,1,1,2,2,2,2,3,3,3,3],'Time':[0,1,2,3,0,1,2,3,0,1,2,3],
'sig1':[2,3,1,0,4,2,0,2,3,5,0,0],'sig2':[9,2,8,0,0,4,5,1,1,0,0,0],
'group':['A','A','A','A','B','B','B','B','A','A','A','A']})
print(data_pad)
ID Time sig1 sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
My end goal is to ultimately reshape this into something with shape (number of ID, number of time points, number of sequences {2 here}).
It seems that if I pivot data, it fills in with nan values, which is fine for the signal values, but not the groups. I am also hoping to avoid looping through data.groupby('ID'), since my actual data has a large number of groups and the looping would likely be very slow.

Here's one approach creating the new index with pd.MultiIndex.from_product and using it to reindex on the Time column:
df = data.set_index(['ID', 'Time'])
# define a the new index
ix = pd.MultiIndex.from_product([df.index.levels[0],
df.index.levels[1]],
names=['ID', 'Time'])
# reindex using the above multiindex
df = df.reindex(ix, fill_value=0)
# forward fill the missing values in group
df['group'] = df.group.mask(df.group.eq(0)).ffill()
print(df.reset_index())
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A

IIUC:
(data.pivot_table(columns='Time', index=['ID','group'], fill_value=0)
.stack('Time')
.sort_index(level=['ID','Time'])
.reset_index()
)
Output:
ID group Time sig sig2
0 1 A 0 2 9
1 1 A 1 3 2
2 1 A 2 1 8
3 1 A 3 0 0
4 2 B 0 4 0
5 2 B 1 2 4
6 2 B 2 0 5
7 2 B 3 2 1
8 3 A 0 3 1
9 3 A 1 5 0
10 3 A 2 0 0
11 3 A 3 0 0

Filling in sequence previous values based on current value in pandas

I have a pandas data frame which looks like below:
ID Value
1 2
2 6
3 3
4 5
I want a new dataframe which gives
ID Value
1 0
1 1
1 2
2 0
2 1
2 2
2 3
2 4
2 5
2 6
3 1
3 2
3 3
3 4
Any kind of suggestions would be appreciated.

Using reindex with repeat and cumcount for get the new value updated
df.reindex(df.index.repeat(df.Value+1)).assign(Value=lambda x : x.groupby('ID').cumcount())
Out[611]:
ID Value
0 1 0
0 1 1
0 1 2
1 2 0
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 2 6
2 3 0
2 3 1
2 3 2
2 3 3
3 4 0
3 4 1
3 4 2
3 4 3
3 4 4
3 4 5

Try,
new_df = df.groupby('ID').Value.apply(lambda x: pd.Series(np.arange(x+1)))\
.reset_index().drop('level_1', 1)
ID Value
0 1 0
1 1 1
2 1 2
3 2 0
4 2 1
5 2 2
6 2 3
7 2 4
8 2 5
9 2 6
10 3 0
11 3 1
12 3 2
13 3 3
14 4 0
15 4 1
16 4 2
17 4 3
18 4 4
19 4 5

Using stack and a list comprehension:
vals = [np.arange(i+1) for i in df.Value]
(pd.DataFrame(vals, index=df.ID)
.stack().reset_index(1, drop=True).astype(int).to_frame('Value'))
Value
ID
1 0
1 1
1 2
2 0
2 1
2 2
2 3
2 4
2 5
2 6
3 0
3 1
3 2
3 3
4 0
4 1
4 2
4 3
4 4
4 5

Transform dataframe to have a row for every observation at every time point

I have the following short dataframe:
A B C
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0
I want the output to look like this:
A B C
1 1 3
2 1 3
3 0 0
4 0 0
5 0 0
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0

use pd.MultiIndex.from_product with unique As and Bs. Then reindex.
cols = list('AB')
mux = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()], names=cols)
df.set_index(cols).reindex(mux, fill_value=0).reset_index()
A B C
0 1 1 3
1 1 2 0
2 1 0 0
3 2 1 3
4 2 2 0
5 2 0 0
6 3 1 0
7 3 2 3
8 3 0 0
9 4 1 0
10 4 2 3
11 4 0 0
12 5 1 0
13 5 2 0
14 5 0 0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Pandas: populate a column until a different value appears? - python

I am working on an algo trading project using the pandas and numpy libraries and would like to achieve the following result: Current output: 1 0 0 2 0 2 0 0 4 0 0 0 5 desired output: 1 1 1 2 2 2 2 2 4 4 4 4 5 How do I go about this?

Replace 0 by NA then fill forward: df['col1'] = df['col1'].replace(0, pd.NA).ffill() print(df) # Output col1 0 1 1 1 2 1 3 2 4 2 5 2 6 2 7 2 8 4 9 4 10 4 11 4 12 5

You can try the method argument of pandas.DataFrame.replace df['col'] = df['col'].replace(to_replace=0, method='ffill') print(df) col 0 1 1 1 2 1 3 2 4 2 5 2 6 2 7 2 8 4 9 4 10 4 11 4 12 5

Related

How to fill by counting last and forward N values with static window in pandas

Count the most frequent values in a row pandas and make a column with that most frequent value

Padding and reshaping pandas dataframe

Filling in sequence previous values based on current value in pandas

Transform dataframe to have a row for every observation at every time point

Categories

Resources