I have a Pandas DataFrame with two columns. In some of the rows the columns are swapped. If they're swapped then column "a" will be negative. What would be the best way to check that and then swap the values of the two columns.
def swap(a,b):
if a < 0:
return b,a
else:
return a,b
Is there some way to use apply with this function to swap the two values?
Try this ? By using np.where
ary=np.where(df.a<0,[df.b,df.a],[df.a,df.b])
pd.DataFrame({'a':ary[0],'b':ary[1]})
Out[560]:
a b
0 3 -1
1 3 -1
2 8 -1
3 2 9
4 0 7
5 0 4
Data input :
df
Out[561]:
a b
0 -1 3
1 -1 3
2 -1 8
3 2 9
4 0 7
5 0 4
And using apply
def swap(x):
if x[0] < 0:
return [x[1],x[0]]
else:
return [x[0],x[1]]
df.apply(swap,1)
Out[568]:
a b
0 3 -1
1 3 -1
2 8 -1
3 2 9
4 0 7
5 0 4
Out of boredom:
df.values[:] = df.values[
np.arange(len(df))[:, None],
np.eye(2, dtype=int)[(df.a.values >= 0).astype(int)]
]
df
a b
0 3 -1
1 3 -1
2 8 -1
3 2 9
4 0 7
5 0 4
Related
I want to subtract a value from a slice so that these rows are updated, however, the rows never change.
df
A B C
1 1 3
2 3 4
5 6 8
2 3 4
idx = 1
val = 2
df.iloc[idx:-1,0].sub(val)
Desired result:
A B C
1 1 3
0 3 4
3 6 8
0 3 4
I've tried the following as well
df.iloc[idx:-1,0] = df.iloc[idx:-1,0].sub(val)
Easier with -=:
>>> df.iloc[idx:, 0] -= val
>>> df
A B C
0 1 1 3
1 0 3 4
2 3 6 8
3 0 3 4
>>>
The reason your code doesn't work is because you're adding -1 to the end of the slice, that would skip the last value, so to fix your code try:
df.iloc[idx:, 0] = df.iloc[idx:, 0].sub(val)
I want to add a DataFrame a (containing a loadprofile) to some of the columns of another DataFrame b (also containing one load profile per column). So some columns (load profiles) of b should be overlaid withe the load profile of a.
So lets say my DataFrames look like:
a:
P[kW]
0 0
1 0
2 0
3 8
4 8
5 0
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 4 4
4 2 2 2
5 2 2 2
Now I want to overlay some colums of b:
b.iloc[:, [1]] += a.iloc[:, 0]
I would expect this:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
but what I actually get:
b:
P1[kW] P2[kW] ... Pn[kW]
0 2 nan 2
1 3 nan 3
2 3 nan 3
3 4 nan 4
4 2 nan 2
5 2 nan 2
That's not exactly what my code and data look like, but the principle is the same as in this abstract example.
Any guesses, what could be the problem?
Many thanks for any help in advance!
EDIT:
I actually have to overlay more than one column.Another example:
load = [0,0,0,0,0,0,0]
data = pd.DataFrame(load)
for i in range(1, 10):
data[i] = data[0]
data
overlay = pd.DataFrame([0,0,0,0,6,6,0])
overlay
data.iloc[:, [1,2,4,5,7,8]] += overlay.iloc[:, 0]
data
WHAT??! The result is completely crazy. Columns 1 and 2 aren't changed at all. Columns 4 and 5 are changed, but in every row. Columns 7 and 8 are nans. What am I missing?
That is what I would expect the result to look like:
Please do not pass the column index '1' of dataframe 'b' as a list but as an element.
Code
b.iloc[:, 1] += a.iloc[:, 0]
b
Output
P1[kW] P2[kW] Pn[kW]
0 2 2 2
1 3 3 3
2 3 3 3
3 4 12 4
4 2 10 2
5 2 2 2
Edit
Seems like this what we are looking for i.e to sum certain columns of data df with overlay df
Two Options
Option 1
cols=[1,2,4,5,7,8]
data[cols] = data[cols] + overlay.values
data
Option 2, if we want to use iloc
cols=[1,2,4,5,7,8]
data[cols] = data.iloc[:,cols] + overlay.iloc[:].values
data
Output
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0
4 0 6 6 0 6 6 0 6 6 0
5 0 6 6 0 6 6 0 6 6 0
6 0 0 0 0 0 0 0 0 0 0
Here is example of my df (for example):
2000-02-01 2000-03-01 ...
sub_col_one sub_col_two sub_col_one sub_col_two ...
idx_one idx_two
2 a 5 2 3 3
0 b 0 5 8 1
2 x 0 0 6 1
0 d 8 3 5 5
3 x 5 6 5 9
2 e 2 5 0 5
3 x 1 7 4 4
The question:
How could I get all rows of that df, where idx_two is not equal to x?
I've tried get_level_values, but cant get what I need.
Use Index.get_level_values with name of level with boolean indexing:
df1 = df[df.index.get_level_values('idx_two') != 'x']
Or with position of level, here 1, because python counts from 0:
df1 = df[df.index.get_level_values(1) != 'x']
I am trying to create a new column in a Pandas data frame based on values from three columns,if the value for each column ['A','B','C'] is greater than 5 then output = 1 and output =0 if there is any value in either one of the columns ['A','B','C'] that is less then 5
The data frame looks like this:
A B C
5 8 6
9 2 1
6 0 0
2 2 6
0 1 2
5 8 10
5 5 1
9 5 6
Expected output:
A B C new_column
5 8 6 1
9 2 1 0
6 0 0 0
2 2 6 0
0 1 2 0
5 8 10 1
5 5 1 0
9 5 6 1
I tried using this code,but it is not giving me the desired output:
conditions = [(df['A'] >= 5) , (df['B'] >= 5) , (df['C'] >= 5)]
choices = [1,1,1]
df['new_colum'] = np.select(conditions, choices, default=0)
You need chain conditions by & for bitwise AND:
conditions = (df['A'] >= 5) & (df['B'] >= 5) & (df['C'] >= 5)
Or use DataFrame.all for check if all values in row are Trues:
conditions = (df[['A','B','C']] >= 5 ).all(axis=1)
#if need all columns >=5
conditions = (df >= 5 ).all(axis=1)
And then convert mask to integer for True, False to 1, 0:
df['new_colum'] = conditions.astype(int)
Or use numpy.where:
df['new_colum'] = np.where(conditions, 1, 0)
print (df)
A B C new_colum
0 5 8 6 1
1 9 2 1 0
2 6 0 0 0
3 2 2 6 0
4 0 1 2 0
5 5 8 10 1
6 5 5 1 0
7 9 5 6 1
I have a dataframe like so:
ID A B
0 7 4
0 5 2
0 0 3
1 6 7
1 8 9
2 5 5
I would like to select the first x rows for all IDs, but only with there are more than rows for those IDs like so:
If x == 2:
ID A B
0 7 4
0 5 2
1 6 7
1 8 9
If x == 3:
ID A B
0 7 4
0 5 2
0 0 3
... and so on.
Using df.groupby("ID").head(2) approximates what I want, but includes the first row for ID "2", which I don't want:
ID A B
0 7 4
0 5 2
1 6 7
1 8 9
2 5 5
Is there an efficient way to do that, without having to resort to counting rows for each ID?
Use groupby + duplicated with keep=False:
v = df.groupby('ID').head(2)
v[v.ID.duplicated(keep=False)]
ID A B
0 0 7 4
1 0 5 2
3 1 6 7
4 1 8 9
You could also do a 2x groupby (nah... wouldn't recommend):
df[df.groupby('ID').ID.transform('size').gt(1)].groupby('ID').head(2)
ID A B
0 0 7 4
1 0 5 2
3 1 6 7
4 1 8 9
Use the following code:
x = 2
gr = df.groupby('ID', as_index=False)\
.apply(lambda grp: grp.head(x) if len(grp) >= x else None)\
.reset_index(drop=True)
The lambda function applied here checks whether the group length
is at least x (a kind of filtration on group lenght)
and for such groups outputs the first x rows.
This way you avoid the second groupby.
The result is:
ID A B
0 0 7 4
1 0 5 2
2 1 6 7
3 1 8 9