I want to subtract a value from a slice so that these rows are updated, however, the rows never change.
df
A B C
1 1 3
2 3 4
5 6 8
2 3 4
idx = 1
val = 2
df.iloc[idx:-1,0].sub(val)
Desired result:
A B C
1 1 3
0 3 4
3 6 8
0 3 4
I've tried the following as well
df.iloc[idx:-1,0] = df.iloc[idx:-1,0].sub(val)
Easier with -=:
>>> df.iloc[idx:, 0] -= val
>>> df
A B C
0 1 1 3
1 0 3 4
2 3 6 8
3 0 3 4
>>>
The reason your code doesn't work is because you're adding -1 to the end of the slice, that would skip the last value, so to fix your code try:
df.iloc[idx:, 0] = df.iloc[idx:, 0].sub(val)
Related
I have a DataFrame with multiple columns I'll provide code to a artificial df for reproduction:
import pandas as pd
from itertools import product
df = pd.DataFrame(data=list(product([0,1,2], [0,1,2], [0,1,2])), columns=['A', 'B','C'])
df['D'] = range(len(df))
This results in the following dataframe:
A B C D
0 0 0 0 0
1 0 0 1 1
2 0 0 2 2
3 0 1 0 3
4 0 1 1 4
5 0 1 2 5
6 0 2 0 6
7 0 2 1 7
8 0 2 2 8
9 1 0 0 9
I want to get a new column new_C That takes the C value where B fullfills a condition and spreads it over all matching values in Column A.
The following code does exactly that:
new_df = df[['A','B', 'D']].loc[df['C'] == 0]
new_df.columns = ['A', 'B','new_D']
df = df.merge(new_df, on=['A', 'B'], how= 'outer')
However, I a strongly believe there is a better solution to this, where I do not have to introduce a whole new DataFrame and merging it back together.
Preferable a oneliner.
Thanks in advance.
Desired Output:
A B C D new_D
0 0 0 0 0 0
1 0 0 1 1 0
2 0 0 2 2 0
3 0 1 0 3 3
4 0 1 1 4 3
5 0 1 2 5 3
6 0 2 0 6 6
7 0 2 1 7 6
8 0 2 2 8 6
9 1 0 0 9 9
EDIT:
Adding other example:
A B C D
A B C D
0 0 4 foo 0
1 0 4 bar 1
2 0 4 baz 2
3 0 5 foo 3
4 0 5 bar 4
5 0 5 baz 5
6 0 6 foo 6
7 0 6 bar 7
8 0 6 baz 8
9 1 4 foo 9
Should be turned into the following with the condition being:df['C'] == 'bar'
A B C D new_D
0 0 4 foo 0 1
1 0 4 bar 1 1
2 0 4 baz 2 1
3 0 5 foo 3 4
4 0 5 bar 4 4
5 0 5 baz 5 4
6 0 6 foo 6 7
7 0 6 bar 7 7
8 0 6 baz 8 7
9 1 4 foo 9 10
Meaning all numbers are arbetrary. Order is also not the same, it just happens to work to take the first number.
If you want to get a new baseline every time C equals zero, you can use:
df['new_D'] = df['D'].where(df['C'].eq(0)).ffill(downcast='infer')
old answer
What you want is not fully clear, but it looks like you want to repeat the first item per group of A and B. You can easily achieve this with:
df['new_D'] = df.groupby(['A', 'B'])['D'].transform('first')
Even simpler, if your data is really composed of consecutive integers:
df['D'] = df['D']//3*3
I want to add a DataFrame a (containing a loadprofile) to some of the columns of another DataFrame b (also containing one load profile per column). So some columns (load profiles) of b should be overlaid withe the load profile of a.
So lets say my DataFrames look like:
a:
P[kW]
0 0
1 0
2 0
3 8
4 8
5 0
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 4 4
4 2 2 2
5 2 2 2
Now I want to overlay some colums of b:
b.iloc[:, [1]] += a.iloc[:, 0]
I would expect this:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
but what I actually get:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 nan 2
1 3 nan 3
2 3 nan 3
3 4 nan 4
4 2 nan 2
5 2 nan 2
That's not exactly what my code and data look like, but the principle is the same as in this abstract example.
Any guesses, what could be the problem?
Many thanks for any help in advance!
EDIT:
I actually have to overlay more than one column.Another example:
load = [0,0,0,0,0,0,0]
data = pd.DataFrame(load)
for i in range(1, 10):
data[i] = data[0]
data
overlay = pd.DataFrame([0,0,0,0,6,6,0])
overlay
data.iloc[:, [1,2,4,5,7,8]] += overlay.iloc[:, 0]
data
WHAT??! The result is completely crazy. Columns 1 and 2 aren't changed at all. Columns 4 and 5 are changed, but in every row. Columns 7 and 8 are nans. What am I missing?
That is what I would expect the result to look like:
Please do not pass the column index '1' of dataframe 'b' as a list but as an element.
Code
b.iloc[:, 1] += a.iloc[:, 0]
b
Output
P1[kW] P2[kW] Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
Edit
Seems like this what we are looking for i.e to sum certain columns of data df with overlay df
Two Options
Option 1
cols=[1,2,4,5,7,8]
data[cols] = data[cols] + overlay.values
data
Option 2, if we want to use iloc
cols=[1,2,4,5,7,8]
data[cols] = data.iloc[:,cols] + overlay.iloc[:].values
data
Output
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 6 6 0 6 6 0 6 6 0
5 0 6 6 0 6 6 0 6 6 0
6 0 0 0 0 0 0 0 0 0 0
How to drop duplicate in that specific way:
Index B C
1 2 1
2 2 0
3 3 1
4 3 1
5 4 0
6 4 0
7 4 0
8 5 1
9 5 0
10 5 1
Desired output :
Index B C
3 3 1
5 4 0
So dropping duplicate on B but if C is the same on all row and keep one sample/record.
For example, B = 3 for index 3/4 but since C = 1 for both, I do not destroy them all
But for example B = 5 for index 8/9/10 since C = 1 or 0, it get destroy.
Try this, using transform with nunique and drop_duplicates:
df[df.groupby('B')['C'].transform('nunique') == 1].drop_duplicates(subset='B')
Output:
B C
Index
3 3 1
5 4 0
See the example below.
Given a dataframe whose index has values repeated, how can I get a new dataframe with a hierarchical index whose first level is the original index and whose second level is 0, 1, 2, ..., n?
Example:
>>> df
0 1
a 2 4
a 4 6
b 7 8
b 2 4
c 3 7
>>> df2 = df.some_operation()
>>> df2
0 1
a 0 2 4
1 4 6
b 0 7 8
1 2 4
c 0 3 7
You can using cumcount
df.assign(level2=df.groupby(level=0).cumcount()).set_index('level2',append=True)
Out[366]:
0 1
level2
a 0 2 4
1 4 6
b 0 7 8
1 2 4
c 0 3 7
Can do the fake way (totally not recommended, don't use this):
>>> df.index=[v if i%2 else '' for i,v in enumerate(df.index)]
>>> df.insert(0,'',([0,1]*3)[:-1])
>>> df
0 1
0 2 4
a 1 4 6
0 7 8
b 1 2 4
0 3 7
>>>
Change index names and create a column which the column name is '' (empty string).
I have a Pandas DataFrame with two columns. In some of the rows the columns are swapped. If they're swapped then column "a" will be negative. What would be the best way to check that and then swap the values of the two columns.
def swap(a,b):
if a < 0:
return b,a
else:
return a,b
Is there some way to use apply with this function to swap the two values?
Try this ? By using np.where
ary=np.where(df.a<0,[df.b,df.a],[df.a,df.b])
pd.DataFrame({'a':ary[0],'b':ary[1]})
Out[560]:
a b
0 3 -1
1 3 -1
2 8 -1
3 2 9
4 0 7
5 0 4
Data input :
df
Out[561]:
a b
0 -1 3
1 -1 3
2 -1 8
3 2 9
4 0 7
5 0 4
And using apply
def swap(x):
if x[0] < 0:
return [x[1],x[0]]
else:
return [x[0],x[1]]
df.apply(swap,1)
Out[568]:
a b
0 3 -1
1 3 -1
2 8 -1
3 2 9
4 0 7
5 0 4
Out of boredom:
df.values[:] = df.values[
np.arange(len(df))[:, None],
np.eye(2, dtype=int)[(df.a.values >= 0).astype(int)]
]
df
a b
0 3 -1
1 3 -1
2 8 -1
3 2 9
4 0 7
5 0 4