Python - Cbind previous and next row to current row - python

I have a Pandas data frame like so:
d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]}
df = pd.DataFrame(data=d)
Which looks like:
doc sent col1 col2 col3
0 0 0 5 4 8
1 0 1 6 3 2
2 0 2 1 2 9
3 1 0 6 1 6
4 1 1 5 1 5
I'd like to bind the previous row and the next next row to each column like so (accounting for "doc" and "sent" column in my example, which count as indices that nothing can come before or after as seen below):
doc sent col1 col2 col3 p_col1 p_col2 p_col3 n_col1 n_col2 n_col3
0 0 0 5 4 8 0 0 0 6 3 2
1 0 1 6 3 2 5 4 8 1 2 9
2 0 2 1 2 9 6 3 2 6 1 6
3 1 0 6 1 6 0 0 0 5 1 5
4 1 1 5 1 5 6 1 6 0 0 0

use pd.DataFrame.shift to get the prev / next rows, pd.concat to merge the dataframes & fillna to set nulls to zero
The presence of nulls upcasts the ints to floats, since numpy integer arrays cannot contain null values, which are cast back to ints after replacing nulls with 0.
cs = ['col1', 'col2', 'col3']
g = df.groupby('doc')
pd.concat([
df,
g[cs].shift(-1).add_prefix('n'),
g[cs].shift().add_prefix('p')
], axis=1).fillna(0).astype(int)
outputs:
doc sent col1 col2 col3 ncol1 ncol2 ncol3 pcol1 pcol2 pcol3
0 0 0 5 4 8 6 3 2 0 0 0
1 0 1 6 3 2 1 2 9 5 4 8
2 0 2 1 2 9 0 0 0 6 3 2
3 1 0 6 1 6 5 1 5 0 0 0
4 1 1 5 1 5 0 0 0 6 1 6

Related

regrouping similar column values in pandas

I have dataframe with many lines and columns, looking like this :
index
col1
col2
1
0
1
2
5
1
3
5
4
4
5
4
5
3
4
6
2
4
7
2
1
8
2
2
I would like to keep only the values that are different from the previous index and replace the others by 0. On the example dataframe, it would be :
index
col1
col2
1
0
1
2
5
0
3
0
4
4
0
0
5
3
0
6
2
0
7
0
1
8
0
2
What is a solution that works for any number of row/columns ?
So you'd like to keep the values where the difference to previous row is not equal to 0 (i.e., they're not the same), and put 0 to other places:
>>> df.where(df.diff().ne(0), other=0)
col1 col2
index
1 0 1
2 5 0
3 0 4
4 0 0
5 3 0
6 2 0
7 0 1
8 0 2

Increment the value in a new column based on a condition using an existing column

I have a pandas dataframe with two columns:
temp_1 flag
1 0
1 0
1 0
2 0
3 0
4 0
4 1
4 0
5 0
6 0
6 1
6 0
and I wanted to create a new column named "final" based on :
if "flag" has a value = 1 , then it increments "temp_1" by 1 and following values as well. If we find value = 1 again in flag column then the previous value in "final" with get incremented by 1 , please refer to expected output
I have tired using .cumsum() with filters but not getting the desired result.
Expected output
temp_1 flag final
1 0 1
1 0 1
1 0 1
2 0 2
3 0 3
4 0 4
4 1 5
4 0 5
5 0 6
6 0 7
6 1 8
6 0 8
Just do cumsum for flag:
>>> df['final'] = df['temp_1'] + df['flag'].cumsum()
>>> df
temp_1 flag final
0 1 0 1
1 1 0 1
2 1 0 1
3 2 0 2
4 3 0 3
5 4 0 4
6 4 1 5
7 4 0 5
8 5 0 6
9 6 0 7
10 6 1 8
11 6 0 8
>>>

how to add a DataFrame to some columns of another DataFrame

I want to add a DataFrame a (containing a loadprofile) to some of the columns of another DataFrame b (also containing one load profile per column). So some columns (load profiles) of b should be overlaid withe the load profile of a.
So lets say my DataFrames look like:
a:
P[kW]
0 0
1 0
2 0
3 8
4 8
5 0
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 4 4
4 2 2 2
5 2 2 2
Now I want to overlay some colums of b:
b.iloc[:, [1]] += a.iloc[:, 0]
I would expect this:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
but what I actually get:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 nan 2
1 3 nan 3
2 3 nan 3
3 4 nan 4
4 2 nan 2
5 2 nan 2
That's not exactly what my code and data look like, but the principle is the same as in this abstract example.
Any guesses, what could be the problem?
Many thanks for any help in advance!
EDIT:
I actually have to overlay more than one column.Another example:
load = [0,0,0,0,0,0,0]
data = pd.DataFrame(load)
for i in range(1, 10):
data[i] = data[0]
data
overlay = pd.DataFrame([0,0,0,0,6,6,0])
overlay
data.iloc[:, [1,2,4,5,7,8]] += overlay.iloc[:, 0]
data
WHAT??! The result is completely crazy. Columns 1 and 2 aren't changed at all. Columns 4 and 5 are changed, but in every row. Columns 7 and 8 are nans. What am I missing?
That is what I would expect the result to look like:
Please do not pass the column index '1' of dataframe 'b' as a list but as an element.
Code
b.iloc[:, 1] += a.iloc[:, 0]
b
Output
P1[kW] P2[kW] Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
Edit
Seems like this what we are looking for i.e to sum certain columns of data df with overlay df
Two Options
Option 1
cols=[1,2,4,5,7,8]
data[cols] = data[cols] + overlay.values
data
Option 2, if we want to use iloc
cols=[1,2,4,5,7,8]
data[cols] = data.iloc[:,cols] + overlay.iloc[:].values
data
Output
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 6 6 0 6 6 0 6 6 0
5 0 6 6 0 6 6 0 6 6 0
6 0 0 0 0 0 0 0 0 0 0

Padding and reshaping pandas dataframe

I have a dataframe with the following form:
data = pd.DataFrame({'ID':[1,1,1,2,2,2,2,3,3],'Time':[0,1,2,0,1,2,3,0,1],
'sig':[2,3,1,4,2,0,2,3,5],'sig2':[9,2,8,0,4,5,1,1,0],
'group':['A','A','A','B','B','B','B','A','A']})
print(data)
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 2 0 4 0 B
4 2 1 2 4 B
5 2 2 0 5 B
6 2 3 2 1 B
7 3 0 3 1 A
8 3 1 5 0 A
I want to reshape and pad such that each 'ID' has the same number of Time values, the sig1,sig2 are padded with zeros (or mean value within ID) and the group carries the same letter value. The output after repadding would be :
data_pad = pd.DataFrame({'ID':[1,1,1,1,2,2,2,2,3,3,3,3],'Time':[0,1,2,3,0,1,2,3,0,1,2,3],
'sig1':[2,3,1,0,4,2,0,2,3,5,0,0],'sig2':[9,2,8,0,0,4,5,1,1,0,0,0],
'group':['A','A','A','A','B','B','B','B','A','A','A','A']})
print(data_pad)
ID Time sig1 sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
My end goal is to ultimately reshape this into something with shape (number of ID, number of time points, number of sequences {2 here}).
It seems that if I pivot data, it fills in with nan values, which is fine for the signal values, but not the groups. I am also hoping to avoid looping through data.groupby('ID'), since my actual data has a large number of groups and the looping would likely be very slow.
Here's one approach creating the new index with pd.MultiIndex.from_product and using it to reindex on the Time column:
df = data.set_index(['ID', 'Time'])
# define a the new index
ix = pd.MultiIndex.from_product([df.index.levels[0],
df.index.levels[1]],
names=['ID', 'Time'])
# reindex using the above multiindex
df = df.reindex(ix, fill_value=0)
# forward fill the missing values in group
df['group'] = df.group.mask(df.group.eq(0)).ffill()
print(df.reset_index())
ID Time sig sig2 group
0 1 0 2 9 A
1 1 1 3 2 A
2 1 2 1 8 A
3 1 3 0 0 A
4 2 0 4 0 B
5 2 1 2 4 B
6 2 2 0 5 B
7 2 3 2 1 B
8 3 0 3 1 A
9 3 1 5 0 A
10 3 2 0 0 A
11 3 3 0 0 A
IIUC:
(data.pivot_table(columns='Time', index=['ID','group'], fill_value=0)
.stack('Time')
.sort_index(level=['ID','Time'])
.reset_index()
)
Output:
ID group Time sig sig2
0 1 A 0 2 9
1 1 A 1 3 2
2 1 A 2 1 8
3 1 A 3 0 0
4 2 B 0 4 0
5 2 B 1 2 4
6 2 B 2 0 5
7 2 B 3 2 1
8 3 A 0 3 1
9 3 A 1 5 0
10 3 A 2 0 0
11 3 A 3 0 0

splitting/grouping pandas dataframe column

I have a dataframe with a column populated with groups of 1s and 0s. How can I assign each group a consecutive number beginning from 1?
I have tried a for loop across rows, but I need a column operation for fast performance.
d = {'col1': [1,1,1,0,0,1,1,0,0,0,1,1]}
df1 = pd.DataFrame(data=d)
df1
col1
0 1
1 1
2 1
3 0
4 0
5 1
6 1
7 0
8 0
9 0
10 1
11 1
I need the following output:
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5
You can compare shifted values for not equal and add cumulative sum by Series.cumsum:
df1['col2'] = df1['col1'].ne(df1['col1'].shift()).cumsum()
print (df1)
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5

Categories

Resources