python pandas shift next rows for values - python

I use pandas:
input:
import pandas as pd
a=pd.Series([0,0,1,0,0,0,0])
output:
0 0
1 0
2 1
3 0
4 0
5 0
6 0
I want to get data for next rows in same values:
output:
0 0
1 0
2 1
3 1
4 1
5 1
6 0
use
a+a.shift(1)+a.shift(2)+a.shift(3)
I think this is not a smart solution
who have a smart solution for this

You can try this assuming index 6 should be value 1 too,
a=pd.Series([0,0,1,0,0,0,0])
a.eq(1).cumsum()
Out[19]:
0 0
1 0
2 1
3 1
4 1
5 1
6 1
dtype: int32
Updated : More than one value not equal to 0.
a=pd.Series([0,0,1,0,1,3,0])
a.ne(0).cumsum()
A=pd.DataFrame({'a':a,'Id':a.ne(0).cumsum()})
A.groupby('Id').a.cumsum()
Out[58]:
0 0
1 0
2 1
3 1
4 1
5 3
6 3
Or you can use ffill
a[a.eq(0)]=np.nan
a.ffill().fillna(0)
Out[64]:
0 0.0
1 0.0
2 1.0
3 1.0
4 1.0
5 3.0
6 3.0

1 You could filter the series for "your" value (SearchValue).
2 Re-index the dataseries to a to-be-stated length (LengthOfIndex) and forward fill the "your" a given number of times (LengthOfFillRange)
3 Fill it with zeros again.
import pandas as pd
import numpy as np
a=pd.Series([0,0,1,0,0,0,0])
SearchValue = 1
LengthOfIndex = 7
LengthOfFillRange = 4
a=a[a==SearchValue]\
.reindex(np.linspace(1,LengthOfIndex,LengthOfIndex, dtype='int32'),
method='ffill',
limit=LengthOfFillRange)\
.fillna(0)

If you need repeat only 2 values Series by some limit use replace for NaNs, then ffill (fillna with method ffill) and last fillna for convert NaNs to original values (and if necessary convert to int):
a=pd.Series([0,0,1,0,0,0,0,1,0,0,0,])
print (a)
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 1
8 0
9 0
10 0
dtype: int64
b= a.replace(0,np.nan).ffill(limit=2).fillna(0).astype(a.dtype)
print (b)
0 0
1 0
2 1
3 1
4 1
5 0
6 0
7 1
8 1
9 1
10 0
dtype: int64

Related

rolling most recent index where a value ocurred

I have a dataframe
pd.DataFrame([1,2,3,4,1,2,3])
0
0 1
1 2
2 3
3 4
4 1
5 2
6 3
I want to create another column, where it records the most recent index the value "1" occurred
d={'data':[1,2,3,4,1,2,3], 'desired_new_col': [0,0,0,0,4,4,4]}
pd.DataFrame(d)
data desired_new_col
0 1 0
1 2 0
2 3 0
3 4 0
4 1 4
5 2 4
6 3 4
I have some idea of using df.expand().apply(func), but not sure what would be an appropriate function to write for this.
Thanks
Using a mask on the index and ffill:
df = pd.DataFrame({'data': [1,2,3,4,1,2,3]})
df['new'] = (df.index.to_series()
.where(df['data'].eq(1))
.ffill(downcast='infer')
)
Output:
data new
0 1 0
1 2 0
2 3 0
3 4 0
4 1 4
5 2 4
6 3 4
You can do cumsum with sub-group by key then we can groupby with transform idxmax
s = df['data'].eq(1)
df['out'] = s.groupby(s.cumsum())['data'].transform('idxmax')
Out[293]:
0 0
1 0
2 0
3 0
4 4
5 4
6 4
Name: data, dtype: int64
You can do this just by using list comprehension. :)
idx = [i for i in df.index if df[0][i] == 1][-1]
df['desired_new_col'] = [idx if idx <= df.index[i] else 0 for i in df.index]
Output:
df
0 desired_new_col
0 1 0
1 2 0
2 3 0
3 4 0
4 1 4
5 2 4
6 3 4

Modify time column to exclusive and inclusive time in Pandas DataFrame

I have the following DataFrame of individuals and the time of an event.
id time
1 0
2 0
3 0
4 0
2 1
3 1
1 2
4 2
1 3
2 3
1 4
2 4
3 4
4 4
I want a column of left exclusive time points (start: time of the previous event). The column of right inclusive time points (stop) is the column time.
id start stop
1 0 0
2 0 0
3 0 0
4 0 0
2 0 1
3 0 1
1 0 2
4 0 2
1 2 3
2 1 3
1 3 4
2 3 4
3 1 4
4 2 4
Any straightforward functions that accomplish this?
Use DataFrameGroupBy.shift in DataFrame.insert, for get new column like second column, last rename column:
df.insert(1, 'start', df.groupby('id')['time'].shift(fill_value=0))
df = df.rename(columns={'time':'stop'})
print (df)
id start stop
0 1 0 0
1 2 0 0
2 3 0 0
3 4 0 0
4 2 0 1
5 3 0 1
6 1 0 2
7 4 0 2
8 1 2 3
9 2 1 3
10 1 3 4
11 2 3 4
12 3 1 4
13 4 2 4
To get the previous value of every id, you want to group by 'id' and retrieve the previous value by using shift as your new column 'start':
df['start'] = df.groupby('id').time.shift(1, fill_value=0)
id time start
0 1 0 0.0
1 2 0 0.0
2 3 0 0.0
3 4 0 0.0
4 2 1 0.0
5 3 1 0.0
6 1 2 0.0
7 4 2 0.0
8 1 3 2.0
9 2 3 1.0
10 1 4 3.0
11 2 4 3.0
12 3 4 1.0
13 4 4 2.0
Then you might want to rename your 'time' column to 'end':
df.rename({'time':'end'}, axis=1, inplace=True)
If you want the switch start and end, reshuffle your columns like this:
df[['id', 'start', 'end']]

how to add a DataFrame to some columns of another DataFrame

I want to add a DataFrame a (containing a loadprofile) to some of the columns of another DataFrame b (also containing one load profile per column). So some columns (load profiles) of b should be overlaid withe the load profile of a.
So lets say my DataFrames look like:
a:
P[kW]
0 0
1 0
2 0
3 8
4 8
5 0
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 4 4
4 2 2 2
5 2 2 2
Now I want to overlay some colums of b:
b.iloc[:, [1]] += a.iloc[:, 0]
I would expect this:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
but what I actually get:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 nan 2
1 3 nan 3
2 3 nan 3
3 4 nan 4
4 2 nan 2
5 2 nan 2
That's not exactly what my code and data look like, but the principle is the same as in this abstract example.
Any guesses, what could be the problem?
Many thanks for any help in advance!
EDIT:
I actually have to overlay more than one column.Another example:
load = [0,0,0,0,0,0,0]
data = pd.DataFrame(load)
for i in range(1, 10):
data[i] = data[0]
data
overlay = pd.DataFrame([0,0,0,0,6,6,0])
overlay
data.iloc[:, [1,2,4,5,7,8]] += overlay.iloc[:, 0]
data
WHAT??! The result is completely crazy. Columns 1 and 2 aren't changed at all. Columns 4 and 5 are changed, but in every row. Columns 7 and 8 are nans. What am I missing?
That is what I would expect the result to look like:
Please do not pass the column index '1' of dataframe 'b' as a list but as an element.
Code
b.iloc[:, 1] += a.iloc[:, 0]
b
Output
P1[kW] P2[kW] Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
Edit
Seems like this what we are looking for i.e to sum certain columns of data df with overlay df
Two Options
Option 1
cols=[1,2,4,5,7,8]
data[cols] = data[cols] + overlay.values
data
Option 2, if we want to use iloc
cols=[1,2,4,5,7,8]
data[cols] = data.iloc[:,cols] + overlay.iloc[:].values
data
Output
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 6 6 0 6 6 0 6 6 0
5 0 6 6 0 6 6 0 6 6 0
6 0 0 0 0 0 0 0 0 0 0

Padding and reshaping pandas dataframe

I have a dataframe with the following form:
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3],'Time':[0,1,2,0,1,2,3,0,1],
'sig':[2,3,1,4,2,0,2,3,5],'sig2':[9,2,8,0,4,5,1,1,0],
'group':['A','A','A','B','B','B','B','A','A']})
print(data)
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 2 0 4 0 B
4 2 1 2 4 B
5 2 2 0 5 B
6 2 3 2 1 B
7 3 0 3 1 A
8 3 1 5 0 A
I want to reshape and pad such that each 'ID' has the same number of Time values, the sig1,sig2 are padded with zeros (or mean value within ID) and the group carries the same letter value. The output after repadding would be :
data_pad = pd.DataFrame({'ID':[1,1,1,1,2,2,2,2,3,3,3,3],'Time':[0,1,2,3,0,1,2,3,0,1,2,3],
'sig1':[2,3,1,0,4,2,0,2,3,5,0,0],'sig2':[9,2,8,0,0,4,5,1,1,0,0,0],
'group':['A','A','A','A','B','B','B','B','A','A','A','A']})
print(data_pad)
ID Time sig1 sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
My end goal is to ultimately reshape this into something with shape (number of ID, number of time points, number of sequences {2 here}).
It seems that if I pivot data, it fills in with nan values, which is fine for the signal values, but not the groups. I am also hoping to avoid looping through data.groupby('ID'), since my actual data has a large number of groups and the looping would likely be very slow.
Here's one approach creating the new index with pd.MultiIndex.from_product and using it to reindex on the Time column:
df = data.set_index(['ID', 'Time'])
# define a the new index
ix = pd.MultiIndex.from_product([df.index.levels[0],
df.index.levels[1]],
names=['ID', 'Time'])
# reindex using the above multiindex
df = df.reindex(ix, fill_value=0)
# forward fill the missing values in group
df['group'] = df.group.mask(df.group.eq(0)).ffill()
print(df.reset_index())
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
IIUC:
(data.pivot_table(columns='Time', index=['ID','group'], fill_value=0)
.stack('Time')
.sort_index(level=['ID','Time'])
.reset_index()
)
Output:
ID group Time sig sig2
0 1 A 0 2 9
1 1 A 1 3 2
2 1 A 2 1 8
3 1 A 3 0 0
4 2 B 0 4 0
5 2 B 1 2 4
6 2 B 2 0 5
7 2 B 3 2 1
8 3 A 0 3 1
9 3 A 1 5 0
10 3 A 2 0 0
11 3 A 3 0 0

python - Converting pandas Matrix to DataFrame

I have created a matrix:
items = [0, 1, 2, 3]
item_to_item = pd.DataFrame(index=items, columns=items)
I've put values in it so:
Its symetric
Its diagonal is all 0's
for example:
0 1 2 3
0 0 4 5 9
1 4 0 3 7
2 5 3 0 3
3 9 7 3 0
I want to create a data frame of all possible pairs (from [0, 1, 2, 3]) so that there wont be pairs of (x, x) and if (x, y) is in, I dont want (y, x) becuase its symetric and holds the same value.
In the end I will have the following Dataframe (or numpy 2d array)
item, item, value
0 1 4
0 2 5
0 3 9
1 2 3
1 3 7
2 3 3
Here's a NumPy solution with np.triu_indices -
In [453]: item_to_item
Out[453]:
0 1 2 3
0 0 4 5 9
1 4 0 3 7
2 5 3 0 3
3 9 7 3 0
In [454]: r,c = np.triu_indices(len(items),1)
In [455]: pd.DataFrame(np.column_stack((r,c, item_to_item.values[r,c])))
Out[455]:
0 1 2
0 0 1 4
1 0 2 5
2 0 3 9
3 1 2 3
4 1 3 7
5 2 3 3
numpy's np.triu gives you the upper triangle with all other elements set to zero. You can use that to construct your DataFrame and replace them with NaNs (so that they are dropped when you stack the columns):
pd.DataFrame(np.triu(df), index=df.index, columns=df.columns).replace(0, np.nan).stack()
Out:
0 1 4.0
2 5.0
3 9.0
1 2 3.0
3 7.0
2 3 3.0
dtype: float64
You can use reset_index at the end to convert indices to columns.
Another alternative would be resetting the index and stacking again but this time use a callable to slice the DataFrame:
df.stack().reset_index()[lambda x: x['level_0'] < x['level_1']]
Out:
level_0 level_1 0
1 0 1 4
2 0 2 5
3 0 3 9
6 1 2 3
7 1 3 7
11 2 3 3
This one requires pandas 0.18.0.

Categories

Resources