Python & Pandas: How to do conditional calculation

Python & Pandas: How to do conditional calculation - python

df['direction'] is the number of direction of the wind, ranging from 1-16. I want to convert it into 360-degree system.
#1 direction is 90, and #2 is 67.5, they run in clockwise.
I can dodf['degree'] = 90-(df.direction-1)*22.5, but this would produce negative value, you can see the output below
]1
But I don't know how to use the conditional here, for df['degree']<0, then + 360. How can I do that?

df['degree'] = df['degree'].apply(lambda x: x + 360 if x < 0 else x)

You can use .loc for conditional indexing
df.loc[ df.degree < 0 , 'degree'] += 360

If I understand correctly then you want the following:
In [36]:
df = pd.DataFrame({'direction':np.random.randint(1,17,20)})
df
Out[36]:
direction
0 13
1 12
2 4
3 5
4 16
5 6
6 3
7 16
8 12
9 2
10 14
11 13
12 8
13 9
14 8
15 12
16 9
17 2
18 7
19 8
In [37]:
df['degrees'] = (90 - (df['direction'] -1) * 22.5)
df.loc[df['degrees']<0,'degrees'] += 360.0
df
Out[37]:
direction degrees
0 13 180.0
1 12 202.5
2 4 22.5
3 5 0.0
4 16 112.5
5 6 337.5
6 3 45.0
7 16 112.5
8 12 202.5
9 2 67.5
10 14 157.5
11 13 180.0
12 8 292.5
13 9 270.0
14 8 292.5
15 12 202.5
16 9 270.0
17 2 67.5
18 7 315.0
19 8 292.5

Short way to solve your problem will be using modulo.
df['degree'] = (90-(df.direction-1)*22.5)%360

Use df[degree]<0 as condition statement:
df[degree] += (df[degree]<0) * 360

Related

How do I classify a dataframe in a specific case?

I have a pandas.DataFrame of the form. I'll show you a simple example.(In reality, it consists of hundreds of millions of rows of data.).
I want to change the number as the letter in column '2' changes. Numbers in the remaining columns (columns:1,3 ~) should not change.
df=
index 1 2 3
0 0 a100 1
1 1.04 a100 2
2 32 a100 3
3 5.05 a105 4
4 1.01 a105 5
5 155 a105 6
6 3155.26 a105 7
7 354.12 a100 8
8 5680.13 a100 9
9 125.55 a100 10
10 13.32 a100 11
11 5656.33 a156 12
12 456.61 a156 13
13 23.52 a1235 14
14 35.35 a1235 15
15 350.20 a100 16
16 30. a100 17
17 13.50 a100 18
18 323.13 a231 19
19 15.11 a1111 20
20 11.22 a1111 21
Here is my expected result:
df=
index 1 2 3
0 0 0 1
1 1.04 0 2
2 32 0 3
3 5.05 1 4
4 1.01 1 5
5 155 1 6
6 3155.26 1 7
7 354.12 2 8
8 5680.13 2 9
9 125.55 2 10
10 13.32 2 11
11 5656.33 3 12
12 456.61 3 13
13 23.52 4 14
14 35.35 4 15
15 350.20 5 16
16 30 5 17
17 13.50 5 18
18 323.13 6 19
19 15.11 7 20
20 11.22 7 21
How do I solve this problem?

Use consecutive groups created by compare for not equal shifted values with cumulative sum and then subtract 1:
#if column is string '2'
df['2'] = df['2'].ne(df['2'].shift()).cumsum().sub(1)
#if column is number 2
df[2] = df[2].ne(df[2].shift()).cumsum().sub(1)
print (df)
index 1 2 3
0 0 0.00 0 1
1 1 1.04 0 2
2 2 32.00 0 3
3 3 5.05 1 4
4 4 1.01 1 5
5 5 155.00 1 6
6 6 3155.26 1 7
7 7 354.12 2 8
8 8 5680.13 2 9
9 9 125.55 2 10
10 10 13.32 2 11
11 11 5656.33 3 12
12 12 456.61 3 13
13 13 23.52 4 14
14 14 35.35 4 15
15 15 350.20 5 16
16 16 30.00 5 17
17 17 13.50 5 18
18 18 323.13 6 19
19 19 15.11 7 20
20 20 11.22 7 21

how to Replace column values with several conditions

I have a column as follows:
A B
0 0 20.00
1 1 35.00
2 2 75.00
3 3 29.00
4 4 125.00
5 5 16.00
6 6 52.50
7 7 NaN
8 8 NaN
9 9 NaN
10 10 NaN
11 11 NaN
12 12 NaN
13 13 239.91
14 14 22.87
15 15 52.74
16 16 37.20
17 17 27.44
18 18 57.01
19 19 29.88
I want to change the values of the column as follows
if 0<B<10.0, then Replace the cell value of B by "0 to 10"
if 10.1<B<20.0, then Replace the cell value of B by "10 to 20"
continue like this until the maximum range achieved.
I have tried
ds['B'] = np.where(ds['B'].between(10.0,20.0), "10 to 20", ds['B'])
But once I perform this operation, the DataFrame is occupied by the string "10 to 20" so I cannot perform this operation again for the remaining values of the DataFrame. After this step, the DataFrame looks like this:
A B
0 0 10 to 20
1 1 35.0
2 2 75.0
3 3 29.0
4 4 125.0
5 5 10 to 20
6 6 52.5
7 7 nan
8 8 nan
9 9 nan
10 10 nan
11 11 nan
12 12 nan
13 13 239.91
14 14 22.87
15 15 52.74
16 16 37.2
17 17 27.44
18 18 57.01
19 19 29.88
And the following line: ds['B'] = np.where(ds['B'].between(20.0,30.0), "20 to 30", ds['B']) will throw TypeError: '>=' not supported between instances of 'str' and 'float'
How can i code this to change all of the values in the DataFrame to these strings of ranges all at once?

Build your bins and labels and use pd.cut:
bins = np.arange(0, df["B"].max() // 10 * 10 + 10, 10).astype(int)
labels = [' to '.join(t) for t in zip(bins[:-1].astype(str), bins[1:].astype(str))]
df["B"] = pd.cut(df["B"], bins=bins, labels=labels)
>>> df
A B
0 0 10 to 20
1 1 30 to 40
2 2 70 to 80
3 3 20 to 30
4 4 120 to 130
5 5 10 to 20
6 6 50 to 60
7 7 NaN
8 8 NaN
9 9 NaN
10 10 NaN
11 11 NaN
12 12 NaN
13 13 NaN
14 14 20 to 30
15 15 50 to 60
16 16 30 to 40
17 17 20 to 30
18 18 50 to 60
19 19 20 to 30

This can be done with much less code as this is actually just a matter of string formatting.
ds['B'] = ds['B'].apply(lambda x: f'{int(x/10) if x>=10 else ""}0 to {int(x/10)+1}0' if pd.notnull(x) else x)

You can create a custom function that maps each range to a string. For example, 19.0 will be mapped to "10 to 20", and then apply this function to each row.
I've written the code so that the minimum and maximum of the range is generalizable to the DataFrame, and takes on values that are multiples of 10.
import numpy as np
import pandas as pd
## copy and paste your DataFrame
ds = pd.read_clipboard()
# floor to nearest multiple of 10
ds_min = ds['B'].min()//10*10
# ceiling to the nearest multiple of 10
ds_max = round(ds['B'].max(),-1)
ranges = np.linspace(ds_min, ds_max, ((ds_max-ds_min)/10)+1)
def map_value_to_string(value):
for idx in range(1,len(ranges)):
low_value, high_value = ranges[idx-1], ranges[idx]
if low_value < value <= high_value:
return f"{int(low_value)} to {int(high_value)}"
else:
continue
ds['B'] = ds['B'].apply(lambda x: map_value_to_string(x))
Output:
>>> ds
A B
0 0 10 to 20
1 1 30 to 40
2 2 70 to 80
3 3 20 to 30
4 4 120 to 130
5 5 10 to 20
6 6 50 to 60
7 7 None
8 8 None
9 9 None
10 10 None
11 11 None
12 12 None
13 13 230 to 240
14 14 20 to 30
15 15 50 to 60
16 16 30 to 40
17 17 20 to 30
18 18 50 to 60
19 19 20 to 30

How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?

Disclaimer: This might be possible duplicate but I cannot find the exact solution. Please feel free to mark this question as duplicate and provide link to duplicate question in comments.
I am still learning python dataframe operations and this possibly has a very simple solution which I am not able to figure out.
I have a python dataframe with a single columns. Now I want to change value of each row to value of previous row if certain conditions are satisfied. I have created a loop solution to implement this but I was hoping for a more efficient solution.
Creation of initial data:
import numpy as np
import pandas as pd
data = np.random.randint(5,30,size=20)
df = pd.DataFrame(data, columns=['random_numbers'])
print(df)
random_numbers
0 6
1 24
2 29
3 18
4 22
5 17
6 12
7 7
8 6
9 27
10 29
11 13
12 23
13 6
14 25
15 24
16 16
17 15
18 25
19 19
Now lets assume two condition are 1) value less than 10 and 2) value more than 20. In any of these cases, set row value to previous row value. This has been implement in loop format as follows:
for index,row in df.iterrows():
if index == 0:
continue;
if(row.random_numbers<10):
df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
if(row.random_numbers>20):
df.loc[index,'random_numbers']=df.loc[index-1,'random_numbers']
random_numbers
0 6
1 6
2 6
3 18
4 18
5 17
6 12
7 12
8 12
9 12
10 12
11 13
12 13
13 13
14 13
15 13
16 16
17 15
18 15
19 19
Please suggest a more efficient way to implement this logic as I am using large number of rows.

You can replace the values less than 10 and values more than 20 with NaN then use pandas.DataFrame.ffill() to fill nan with previous row value.
mask = (df['random_numbers'] < 10) | (df['random_numbers'] > 20)
# Since you escape with `if index == 0:`
mask[df.index[0]] = False
df.loc[mask, 'random_numbers'] = np.nan
df['random_numbers'].ffill(inplace=True)
# Original
random_numbers
0 7
1 28
2 8
3 14
4 12
5 20
6 21
7 11
8 16
9 27
10 19
11 23
12 18
13 5
14 6
15 11
16 6
17 8
18 17
19 8
# After replaced
random_numbers
0 7.0
1 7.0
2 7.0
3 14.0
4 12.0
5 20.0
6 20.0
7 11.0
8 16.0
9 16.0
10 19.0
11 19.0
12 18.0
13 18.0
14 18.0
15 11.0
16 11.0
17 11.0
18 17.0
19 17.0

We can also do it in a simpler way by using .mask() together with .ffill() and slicing on [1:] as follows:
df['random_numbers'][1:] = df['random_numbers'][1:].mask((df['random_numbers'] < 10) | (df['random_numbers'] > 20))
df['random_numbers'] = df['random_numbers'].ffill(downcast='infer')
.mask() tests for the condition and replace with NaN when the condition is true (default to replace with NaN if the parameter other= is not supplied). Retains the original values for rows with condition not met.
Note that the resulting numbers are maintained as integer instead of transformed unexpectedly to float type by supplying the downcast='infer' in the call to .ffill().
We use [1:] on the first line to ensure the data on row 0 is untouched without transformation.
# Original data: (reusing your sample data)
random_numbers
0 6
1 24
2 29
3 18
4 22
5 17
6 12
7 7
8 6
9 27
10 29
11 13
12 23
13 6
14 25
15 24
16 16
17 15
18 25
19 19
# After transposition:
random_numbers
0 6
1 6
2 6
3 18
4 18
5 17
6 12
7 12
8 12
9 12
10 12
11 13
12 13
13 13
14 13
15 13
16 16
17 15
18 15
19 19

How to rolling non-overlapping window in pandas

My dataframe looks like:
c1
0 10
1 11
2 12
3 13
4 14
5 15
6 16
7 17
I want to find the minimum for every 3 rows. which looks like:
c1 min
0 10 10
1 11 10
2 12 10
3 13 13
4 14 13
5 15 13
6 16 16
7 17 16
and the number of rows might not be divisible by 3. I can't achieve it with rolling function.

If there is default index values use integer division by 3 and pass to GroupBy.transform with min:
df['min'] = df['c1'].groupby(df.index // 3).transform('min')
Or if any index generate helper np.arange:
df['min'] = df['c1'].groupby(np.arange(len(df)) // 3).transform('min')
print (df)
c1 min
0 10 10
1 11 10
2 12 10
3 13 13
4 14 13
5 15 13
6 16 16
7 17 16

You can also do this:
>>> df['min'] = df['c1'][::3]
>>> df.ffill().astype(int)
c1 min
0 10 10
1 11 10
2 12 10
3 13 13
4 14 13
5 15 13
6 16 16
7 17 16

Cutting every nth row in dataframe in Python

I have a dataframe with columns like so:
x y z
1 10 20
2 10 18
3 11 16.5
4 11 12
5 12 23
6 11 21
7 10 19
8 10 26
.
.
Every time z_n+1 is greater than z_n I want to cut that z_n.
The output would be:
x y z
1 10 20
2 10 18
3 11 16.5
5 12 23
6 11 21
8 10 26
.
.
It doesn't occur every x-many times - the index of each change from smaller to larger z_n is not 'regular'.
Is there an easy way to do this?

We can use shift to look one row back and take the reverse with ~:
df[~(df['z'].shift() < df['z'])]
x y z
0 1 10 20.0
1 2 10 18.0
2 3 11 16.5
3 4 11 12.0
5 6 11 21.0
6 7 10 19.0

Try:
df[~(df.z.diff(-1) < 0)]
Output:
x y z
0 1 10 20.0
1 2 10 18.0
2 3 11 16.5
4 5 12 23.0
5 6 11 21.0
7 8 10 26.0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python & Pandas: How to do conditional calculation - python

df['degree'] = df['degree'].apply(lambda x: x + 360 if x < 0 else x)

You can use .loc for conditional indexing df.loc[ df.degree < 0 , 'degree'] += 360

Short way to solve your problem will be using modulo. df['degree'] = (90-(df.direction-1)*22.5)%360

Use df[degree]<0 as condition statement: df[degree] += (df[degree]<0) * 360

Related

How do I classify a dataframe in a specific case?

how to Replace column values with several conditions

How to populate rows of pandas dataframe column based with previous row based on a multiple conditions?

How to rolling non-overlapping window in pandas

Cutting every nth row in dataframe in Python

Categories

Resources