I have a calendar data of type of dayworks - the day is the holiday or not.
I want to create a new feautures:
The value in the cell is the number of holidays in the week.
The value in the cell is the number of holidays in the N-window (right and left windows). In example - N=5 (and including current value)
Example:
is_holiday feature_1 feature_2
idx
0 0 2 0
1 0 2 1
2 0 2 2
3 0 2 2
4 0 2 2
5 1 2 2
6 1 2 2
7 0 3 3
8 0 3 4
9 0 3 5
10 0 3 4
11 1 3 3
12 1 3 3
13 1 3 3
...
I think you need grouping for each 7 values and aggregate sum and for second is used Series.rolling:
df['f1'] = df.groupby(df.index // 7)['is_holiday'].transform('sum')
df['f2'] = df['is_holiday'].rolling(9, center=True, min_periods=1).sum().astype(int)
print (df)
is_holiday feature_1 feature_2 f1 f2
idx
0 0 2 0 2 0
1 0 2 1 2 1
2 0 2 2 2 2
3 0 2 2 2 2
4 0 2 2 2 2
5 1 2 2 2 2
6 1 2 2 2 2
7 0 3 3 3 3
8 0 3 4 3 4
9 0 3 5 3 5
10 0 3 4 3 4
11 1 3 3 3 3
12 1 3 3 3 3
13 1 3 3 3 3
I have a data frame like this below:
a b c
0 3 3 3
1 3 3 3
2 3 3 3
3 3 3 3
4 2 3 2
5 3 3 3
6 1 2 1
7 2 3 2
8 0 0 0
9 0 1 0
I want to count frequency of each row and add a column result containing the max frequency like this below:
a b c result
0 3 3 3 3
1 3 3 3 3
2 3 3 3 3
3 3 3 3 3
4 2 3 2 2
5 3 3 3 3
6 1 2 1 1
7 2 3 2 2
8 0 0 0 0
9 0 1 0 0
I tries to do transpose and looping through the transposed columns to get the value_counts but could not got the right result.
Any help is highly appreciated.
Use DataFrame.mode with select first column by positions with DataFrame.iloc:
df['result'] = df.mode(axis=1).iloc[:, 0]
print (df)
a b c result
0 3 3 3 3
1 3 3 3 3
2 3 3 3 3
3 3 3 3 3
4 2 3 2 2
5 3 3 3 3
6 1 2 1 1
7 2 3 2 2
8 0 0 0 0
9 0 1 0 0
I have a dataframe with the following form:
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3],'Time':[0,1,2,0,1,2,3,0,1],
'sig':[2,3,1,4,2,0,2,3,5],'sig2':[9,2,8,0,4,5,1,1,0],
'group':['A','A','A','B','B','B','B','A','A']})
print(data)
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 2 0 4 0 B
4 2 1 2 4 B
5 2 2 0 5 B
6 2 3 2 1 B
7 3 0 3 1 A
8 3 1 5 0 A
I want to reshape and pad such that each 'ID' has the same number of Time values, the sig1,sig2 are padded with zeros (or mean value within ID) and the group carries the same letter value. The output after repadding would be :
data_pad = pd.DataFrame({'ID':[1,1,1,1,2,2,2,2,3,3,3,3],'Time':[0,1,2,3,0,1,2,3,0,1,2,3],
'sig1':[2,3,1,0,4,2,0,2,3,5,0,0],'sig2':[9,2,8,0,0,4,5,1,1,0,0,0],
'group':['A','A','A','A','B','B','B','B','A','A','A','A']})
print(data_pad)
ID Time sig1 sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
My end goal is to ultimately reshape this into something with shape (number of ID, number of time points, number of sequences {2 here}).
It seems that if I pivot data, it fills in with nan values, which is fine for the signal values, but not the groups. I am also hoping to avoid looping through data.groupby('ID'), since my actual data has a large number of groups and the looping would likely be very slow.
Here's one approach creating the new index with pd.MultiIndex.from_product and using it to reindex on the Time column:
df = data.set_index(['ID', 'Time'])
# define a the new index
ix = pd.MultiIndex.from_product([df.index.levels[0],
df.index.levels[1]],
names=['ID', 'Time'])
# reindex using the above multiindex
df = df.reindex(ix, fill_value=0)
# forward fill the missing values in group
df['group'] = df.group.mask(df.group.eq(0)).ffill()
print(df.reset_index())
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
IIUC:
(data.pivot_table(columns='Time', index=['ID','group'], fill_value=0)
.stack('Time')
.sort_index(level=['ID','Time'])
.reset_index()
)
Output:
ID group Time sig sig2
0 1 A 0 2 9
1 1 A 1 3 2
2 1 A 2 1 8
3 1 A 3 0 0
4 2 B 0 4 0
5 2 B 1 2 4
6 2 B 2 0 5
7 2 B 3 2 1
8 3 A 0 3 1
9 3 A 1 5 0
10 3 A 2 0 0
11 3 A 3 0 0
I have a pandas data frame which looks like below:
ID Value
1 2
2 6
3 3
4 5
I want a new dataframe which gives
ID Value
1 0
1 1
1 2
2 0
2 1
2 2
2 3
2 4
2 5
2 6
3 1
3 2
3 3
3 4
Any kind of suggestions would be appreciated.
Using reindex with repeat and cumcount for get the new value updated
df.reindex(df.index.repeat(df.Value+1)).assign(Value=lambda x : x.groupby('ID').cumcount())
Out[611]:
ID Value
0 1 0
0 1 1
0 1 2
1 2 0
1 2 1
1 2 2
1 2 3
1 2 4
1 2 5
1 2 6
2 3 0
2 3 1
2 3 2
2 3 3
3 4 0
3 4 1
3 4 2
3 4 3
3 4 4
3 4 5
Try,
new_df = df.groupby('ID').Value.apply(lambda x: pd.Series(np.arange(x+1)))\
.reset_index().drop('level_1', 1)
ID Value
0 1 0
1 1 1
2 1 2
3 2 0
4 2 1
5 2 2
6 2 3
7 2 4
8 2 5
9 2 6
10 3 0
11 3 1
12 3 2
13 3 3
14 4 0
15 4 1
16 4 2
17 4 3
18 4 4
19 4 5
Using stack and a list comprehension:
vals = [np.arange(i+1) for i in df.Value]
(pd.DataFrame(vals, index=df.ID)
.stack().reset_index(1, drop=True).astype(int).to_frame('Value'))
Value
ID
1 0
1 1
1 2
2 0
2 1
2 2
2 3
2 4
2 5
2 6
3 0
3 1
3 2
3 3
4 0
4 1
4 2
4 3
4 4
4 5
I have the following short dataframe:
A B C
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0
I want the output to look like this:
A B C
1 1 3
2 1 3
3 0 0
4 0 0
5 0 0
1 1 3
2 1 3
3 2 3
4 2 3
5 0 0
use pd.MultiIndex.from_product with unique As and Bs. Then reindex.
cols = list('AB')
mux = pd.MultiIndex.from_product([df.A.unique(), df.B.unique()], names=cols)
df.set_index(cols).reindex(mux, fill_value=0).reset_index()
A B C
0 1 1 3
1 1 2 0
2 1 0 0
3 2 1 3
4 2 2 0
5 2 0 0
6 3 1 0
7 3 2 3
8 3 0 0
9 4 1 0
10 4 2 3
11 4 0 0
12 5 1 0
13 5 2 0
14 5 0 0