regrouping similar column values in pandas - python

I have dataframe with many lines and columns, looking like this :
index
col1
col2
1
0
1
2
5
1
3
5
4
4
5
4
5
3
4
6
2
4
7
2
1
8
2
2
I would like to keep only the values that are different from the previous index and replace the others by 0. On the example dataframe, it would be :
index
col1
col2
1
0
1
2
5
0
3
0
4
4
0
0
5
3
0
6
2
0
7
0
1
8
0
2
What is a solution that works for any number of row/columns ?

So you'd like to keep the values where the difference to previous row is not equal to 0 (i.e., they're not the same), and put 0 to other places:
>>> df.where(df.diff().ne(0), other=0)
col1 col2
index
1 0 1
2 5 0
3 0 4
4 0 0
5 3 0
6 2 0
7 0 1
8 0 2

Related

Increment the value in a new column based on a condition using an existing column

I have a pandas dataframe with two columns:
temp_1 flag
1 0
1 0
1 0
2 0
3 0
4 0
4 1
4 0
5 0
6 0
6 1
6 0
and I wanted to create a new column named "final" based on :
if "flag" has a value = 1 , then it increments "temp_1" by 1 and following values as well. If we find value = 1 again in flag column then the previous value in "final" with get incremented by 1 , please refer to expected output
I have tired using .cumsum() with filters but not getting the desired result.
Expected output
temp_1 flag final
1 0 1
1 0 1
1 0 1
2 0 2
3 0 3
4 0 4
4 1 5
4 0 5
5 0 6
6 0 7
6 1 8
6 0 8
Just do cumsum for flag:
>>> df['final'] = df['temp_1'] + df['flag'].cumsum()
>>> df
temp_1 flag final
0 1 0 1
1 1 0 1
2 1 0 1
3 2 0 2
4 3 0 3
5 4 0 4
6 4 1 5
7 4 0 5
8 5 0 6
9 6 0 7
10 6 1 8
11 6 0 8
>>>

how to add a DataFrame to some columns of another DataFrame

I want to add a DataFrame a (containing a loadprofile) to some of the columns of another DataFrame b (also containing one load profile per column). So some columns (load profiles) of b should be overlaid withe the load profile of a.
So lets say my DataFrames look like:
a:
P[kW]
0 0
1 0
2 0
3 8
4 8
5 0
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 4 4
4 2 2 2
5 2 2 2
Now I want to overlay some colums of b:
b.iloc[:, [1]] += a.iloc[:, 0]
I would expect this:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
but what I actually get:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 nan 2
1 3 nan 3
2 3 nan 3
3 4 nan 4
4 2 nan 2
5 2 nan 2
That's not exactly what my code and data look like, but the principle is the same as in this abstract example.
Any guesses, what could be the problem?
Many thanks for any help in advance!
EDIT:
I actually have to overlay more than one column.Another example:
load = [0,0,0,0,0,0,0]
data = pd.DataFrame(load)
for i in range(1, 10):
data[i] = data[0]
data
overlay = pd.DataFrame([0,0,0,0,6,6,0])
overlay
data.iloc[:, [1,2,4,5,7,8]] += overlay.iloc[:, 0]
data
WHAT??! The result is completely crazy. Columns 1 and 2 aren't changed at all. Columns 4 and 5 are changed, but in every row. Columns 7 and 8 are nans. What am I missing?
That is what I would expect the result to look like:
Please do not pass the column index '1' of dataframe 'b' as a list but as an element.
Code
b.iloc[:, 1] += a.iloc[:, 0]
b
Output
P1[kW] P2[kW] Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
Edit
Seems like this what we are looking for i.e to sum certain columns of data df with overlay df
Two Options
Option 1
cols=[1,2,4,5,7,8]
data[cols] = data[cols] + overlay.values
data
Option 2, if we want to use iloc
cols=[1,2,4,5,7,8]
data[cols] = data.iloc[:,cols] + overlay.iloc[:].values
data
Output
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 6 6 0 6 6 0 6 6 0
5 0 6 6 0 6 6 0 6 6 0
6 0 0 0 0 0 0 0 0 0 0

Find first non-zero element within a group in pandas

I have a dataframe that you can see how it is in the following. The column named target is my desired column:
group value target
1 1 0
1 2 0
1 3 2
1 4 0
1 5 1
2 1 0
2 2 0
2 3 0
2 4 1
2 5 3
Now I want to find the first non-zero value in the target column for each group and remove rows before that row in each group. So the output should be like this:
group value target
1 3 2
1 4 0
1 5 1
2 4 1
2 5 3
I have seen this post, but I don't how to change the code to get my desired result.
How can I do this?
In the groupby, set sort to False, get the cumsum, then filter for rows not equal to 0 :
df.loc[df.groupby(["group"], sort=False).target.cumsum() != 0]
group value target
2 1 3 2
3 1 4 0
4 1 5 1
8 2 4 1
9 2 5 3
This shoul do. I'm sure you can do it with less reset_index(), but this shouldn't affect too much the speed if your dataframe isn't too big:
idx = dff[dff.target.ne(0)].reset_index().groupby('group').index.first()
mask = (dff.reset_index().set_index('group')['index'].ge(idx.to_frame()['index'])).values
df_final = dff[mask]
Output:
0 group value target
3 1 3 2
4 1 4 0
5 1 5 1
9 2 4 1
10 2 5 3

splitting/grouping pandas dataframe column

I have a dataframe with a column populated with groups of 1s and 0s. How can I assign each group a consecutive number beginning from 1?
I have tried a for loop across rows, but I need a column operation for fast performance.
d = {'col1': [1,1,1,0,0,1,1,0,0,0,1,1]}
df1 = pd.DataFrame(data=d)
df1
col1
0 1
1 1
2 1
3 0
4 0
5 1
6 1
7 0
8 0
9 0
10 1
11 1
I need the following output:
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5
You can compare shifted values for not equal and add cumulative sum by Series.cumsum:
df1['col2'] = df1['col1'].ne(df1['col1'].shift()).cumsum()
print (df1)
col1 col2
0 1 1
1 1 1
2 1 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 4
8 0 4
9 0 4
10 1 5
11 1 5

Python - Cbind previous and next row to current row

I have a Pandas data frame like so:
d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]}
df = pd.DataFrame(data=d)
Which looks like:
doc sent col1 col2 col3
0 0 0 5 4 8
1 0 1 6 3 2
2 0 2 1 2 9
3 1 0 6 1 6
4 1 1 5 1 5
I'd like to bind the previous row and the next next row to each column like so (accounting for "doc" and "sent" column in my example, which count as indices that nothing can come before or after as seen below):
doc sent col1 col2 col3 p_col1 p_col2 p_col3 n_col1 n_col2 n_col3
0 0 0 5 4 8 0 0 0 6 3 2
1 0 1 6 3 2 5 4 8 1 2 9
2 0 2 1 2 9 6 3 2 6 1 6
3 1 0 6 1 6 0 0 0 5 1 5
4 1 1 5 1 5 6 1 6 0 0 0
use pd.DataFrame.shift to get the prev / next rows, pd.concat to merge the dataframes & fillna to set nulls to zero
The presence of nulls upcasts the ints to floats, since numpy integer arrays cannot contain null values, which are cast back to ints after replacing nulls with 0.
cs = ['col1', 'col2', 'col3']
g = df.groupby('doc')
pd.concat([
df,
g[cs].shift(-1).add_prefix('n'),
g[cs].shift().add_prefix('p')
], axis=1).fillna(0).astype(int)
outputs:
doc sent col1 col2 col3 ncol1 ncol2 ncol3 pcol1 pcol2 pcol3
0 0 0 5 4 8 6 3 2 0 0 0
1 0 1 6 3 2 1 2 9 5 4 8
2 0 2 1 2 9 0 0 0 6 3 2
3 1 0 6 1 6 5 1 5 0 0 0
4 1 1 5 1 5 0 0 0 6 1 6

Categories

Resources