Python - problem with changing values to groups - python

I have a dataset that has different attributes. One of these attributes is temperature. My temperature range is from about -30 to about 30 degrees. I want to do a machine learning study and I wanted to group the temperature into different groups. On a principle: below -30: 0, -30 to -10: 1 and so on. I wrote the code below, but it doesn't work the way I want it to. The data type is: int32, I converted it with float64.
dane = [treningowy_df]
for zbior in dane:
zbior['temperatura'] = zbior['temperatura'].astype(int)
zbior.loc[ zbior['temperatura'] <= -30, 'temperatura'] = 0
zbior.loc[(zbior['temperatura'] > -30) & (zbior['temperatura'] <= -10), 'temperatura'] = 1
zbior.loc[(zbior['temperatura'] > -10) & (zbior['temperatura'] <= 0), 'temperatura'] = 2
zbior.loc[(zbior['temperatura'] > 0) & (zbior['temperatura'] <= 10), 'temperatura'] = 3
zbior.loc[(zbior['temperatura'] > 10) & (zbior['temperatura'] <= 20), 'temperatura'] = 4
zbior.loc[(zbior['temperatura'] > 20) & (zbior['temperatura'] <= 30), 'temperatura'] = 5
zbior.loc[ zbior['temperatura'] > 30, 'temperatura'] = 6
For example: before the code is executed, record 1 has a temperature: -3, and after the code is applied, record 1 has a temperature: 3. why? A record with a temperature before a change: 22 after the change: 5, i.e. the assignment was executed correctly.

it looks like you're manipulating a dataframe. have you tried using the apply function?
Personally I would go about this as such (in fact, with a new column).
1. Write a function to process the value
def _check_temperature_range(x):
if x <= -30:
return 0
elif x <= -10:
return 1
# so on and so forth...
2. Apply the function onto the column of the dataframe
df[new_column] = df[column].apply(lambda x: _check_temperature_range(x))
The results should then be reflected in the new_column or old column should you use back the same column

I think your code is applying multiple times on the same row.
With you're exemple with the first line :
temp = -3 gives 2
but then temp = 2 gives 3
So I recommend to create a new column in your dataframe

I believe it has to do with the sequence of your code.
A record with temperature -3, gets assigned as 2 -
zbior.loc[(zbior['temperatura'] > -10) & (zbior['temperatura'] <= 0), 'temperatura'] = 2
Then in the next line, it is found again as being between 0 and 10, and so assigned again as 3 -
zbior.loc[(zbior['temperatura'] > 0) & (zbior['temperatura'] <= 10), 'temperatura'] = 3
One solution is to assign a number that doesn't make you "jump" a category.
So, for -3, I'd assign 0 so it sticks around.
After that you can do another pass, and change to the actual numbers you wanted, eg 0->3 etc.

If zbior is a pandas.DataFrame, you can use the map function
def my_func(x):
if x <= -30:
return 0
elif x <= -10:
return 1
elif x <= 0:
return 2
elif x <= 10:
return 3
elif x <= 20:
return 4
elif x <= 30:
return 5
else:
return 6
zbior.temperatura=zbior.temperatura.map(my_func)

Related

How to create a function to output ranges for a list (in python)

I need to create a function that inputs a list of integers and outputs a count of one of three ranges. I know there are easier ways to do it, but this is the way it needs to be done for this project.
let's say the ranges need to be: (x < 10, 10 <= x < 100, x >= 100)
So far I've tried...
list = (1, 2, 10, 20, 50, 100, 200)
low = 0
mid = 0
high = 0
final_list = list()
def func(x):
if x < 10:
low = low + 1
elif x < 100:
mid = mid + 1
else:
high = high + 1
for i in range(len(list)):
x = func(list[i])
final_list.append(x)
This is the best I've been able to come up with, but obviously it's not correct. Again, I realize there are easier ways to accomplish this, but the created function and for loop are required for this specific problem.
So... any ideas?
You have two problems:
Your function doesn't return anything
Your accumulators (counts) are separated from your loop
Move the loop inside the function:
def func(values):
low = 0
mid = 0
high = 0
for x in values:
if x < 10:
low = low + 1
elif x < 100:
mid = mid + 1
else:
high = high + 1
return low, mid, high
print(func([1, 2, 10, 20, 50, 100, 200]))
Output:
(2, 3, 2)
Or another way, depending on what you need to use the counters for:
my_list = (1, 2, 10, 20, 50, 100, 200)
# renamed the variable so it does not shadow the builtin list() type
def main():
# created a function main() to allow the counters to be available to func() as nonlocal variables
low = 0
mid = 0
high = 0
final_list = list()
def func(x): # embedded func() in main()
nonlocal low, mid, high # declared the counters as nonlocal variables in func()
if x < 10:
low += 1
elif x < 100:
mid += 1
else:
high += 1
for x in my_list: # simplified the iteration over my_list
func(x)
final_list.append(x) # final_list is just a copy of my_list - may not be necessary
print(low, mid, high) # to display the final values of the counters
main()

Code simplification. Between positive and negative conditions

I have a question, how to simplify this code? I have the impression that it can be done in 3 conditional instructions and not in 6 ...
if (PID > 10 and self.last_pid > 0):
if (PID >= self.last_pid):
self.setKp(self.Kp+self.increase_val)
self.increase_val = self.increase_val*2
else:
percent_last = PID/self.last_pid*100
self.increase_val + (percent_last/100*self.increase_val)
self.setKp(self.Kp+self.increase_val)
if (PID < -10 and self.last_pid < 0):
if (PID <= self.last_pid):
self.setKp(self.Kp+self.increase_val)
self.increase_val = self.increase_val*2
else:
percent_last = PID/self.last_pid*100
self.increase_val + (percent_last/100*self.increase_val)
self.setKp(self.Kp+self.increase_val)
(Which might be simplified to:)
if A > 10 and B > 0:
if A >= B:
# do block A
else:
# do block B
if A < -10 and B < 0:
if A <= B):
# do block A
else:
# do block B
This should be equivalent to your two cases for positive and negative values:
if abs(A) > 10 and A * B > 0:
if abs(A) >= abs(B):
# do block A
else:
# do block B
Explanation:
abs(A) corresponds to A > 10 and A < -10 respectively
A * B > 0 means that both have the same sign and B != 0
abs(A) >= abs(B) means A <= B if both are < 0 and A => B if both are > 0
Now that's shorter and less repetitive, but whether it's easier to understand is for you to decide. In any case, you should add a comment explaining the code and that it is supposed to do.
With your original variables and procedures, this would be:
if abs(PID) > 10 and PID * self.last_pid > 0:
if abs(PID) >= abs(self.last_pid):
self.setKp(self.Kp+self.increase_val)
self.increase_val = self.increase_val*2
else:
percent_last = PID/self.last_pid*100
self.increase_val + (percent_last/100*self.increase_val)
self.setKp(self.Kp+self.increase_val)
Some more points that I just noticed:
your line self.increase_val + (percent_last / 100 * self.increase_val) does not do anything. I guess the + should be = or +=?
it is kind of pointless to first * 100 to get percent just to then / 100 again
it's odd how in one case you add increase_val to KP before increasing it, but after increasing it in the other case; is this intentional?
In fact, I think that this could be further simplified to this, provided that the inner if is used to cap the increase to the increase_val; not sure whether it should be added to Kp before or after being increased itself, though, or if that should actually depend on the case.
if abs(PID) > 10 and PID * self.last_pid > 0:
self.setKp(self.Kp + self.increase_val)
self.increase_val *= 1 + min(PID/self.last_pid, 1)
It's a bit long, but it takes less lines:
if (A > 10 and B > 10 and A>=B) or (A < -10 and B < 0 and A<= B):
#do block a
else:
#do block b
If you don't like it being so long, I would recommend turning each side of the or on the first line into a boolean variable and then using said variable in the if statement. Like so:
condA = A > 10 and B > 10 and A>=B
condB = A < -10 and B < 0 and A<= B
if condA or condB:
#do block a
else:
#do block b

combining a range for pandas

I have a data in excel, one of the variable is temp (ranges from -3 to 3), the other is wind_speed
for my catplot how to I set the range(temp) for the x axis
g = sns.catplot(x="temp", y="wind_speed", data=ieq_data, kind="bar")
when I input this code the x axis is showing -3, -2, -1, 0, 1, 2, 3
I want to combine it to become (-3,-2), (-1), (0), (1), (2,3)
edited:
def temp(x):
if -3 <= x < -2:
return "(-3,-2)"
elif -2 <= x < 0:
return "(-1)"
elif 0 <= x < 2:
return "(1)"
elif 2 <= x <= 3:
return "(2,3)"
ieq_data["new_Thermal"] = ieq_data.temp.apply(temp)
g = sns.catplot (x="new_Thermal", y="wind_speed", data=ieq_data, kind="bar")
May be create an extra column like below and use this column as x.
You'll have to revisit the categorization in the function. This is as per my understanding of your question.
def grouping_func(val):
if -3 <= val < -2:
return "(-3, -2)"
elif -2 <= val < 0:
return "(-1)"
elif 0 <= val < 2:
return "(1)"
elif 2 <= val <=3:
return "(2, 3)"
ieq_data["new_col"] = ieq_data.temp.apply(grouping_func)
g = sns.catplot(x="new_col", y="wind_speed", data=ieq_data, kind="bar")
You can explicitly replace the ticks using your arbitrary list.
g.set(xticks=my_ticks_list)

Find enclosed spaces in 2D array

So, I'm generating an array of spaces, which have the property that they can be either red or black. However, I want to prevent red from being enclosed by black. I have some examples to show exactly what I mean:
0 0 0 0 0 0 0 1
0 1 0 0 0 0 1 0
1 0 1 0 0 0 0 1
0 1 0 0 1 1 1 0
0 0 0 0 1 0 1 0
1 1 1 0 1 1 1 0
0 0 1 0 0 0 0 0
0 0 1 0 0 0 0 0
If red is 0 and black is 1, then this example contains four enclosures, all of which I want to avoid when I generate the array. The inputs I have are the size of the array and the number of 1s I can generate.
How would I go about doing this?
Does this code fits well for you?
Basically I fill a matrix from left to right, from top to bottom.
When I have to assign 0 or 1 to a cell, I check (north and west) if adding a 1 could enclose a 0; in this case I put a 0, else a random 0 or 1.
import sys, random
n = int(sys.argv[1])
m = int(sys.argv[2])
# fill matrix with zeroes
matrix = [[0 for _ in xrange(m)] for _ in xrange(n)]
# functions to get north, south, west and east
# cell wrt this cell.
# If we are going out of bounds, we suppose the matrix
# is sorrounded by 1s.
def get_n(r, c):
if r <= 0: return 1
return matrix[r - 1][c]
def get_s(r, c):
if r >= n - 1: return 1
return matrix[r + 1][c]
def get_w(r, c):
if c <= 0: return 1
return matrix[r][c - 1]
def get_e(r, c):
if c >= m - 1: return 1
return matrix[r][c + 1]
# Checks if the cell is already enclosed by 3 1s.
def enclosed(r, c):
enclosing = get_n(r, c) + get_s(r, c) + get_w(r, c) + get_e(r, c)
if (enclosing > 3): raise Exception('Got a 0 enclosed by 1s')
return enclosing == 3
for r in xrange(n):
for c in xrange(m):
# check west and north
if enclosed(r, c - 1) or enclosed(r - 1, c):
matrix[r][c] = 0
else:
matrix[r][c] = random.randint(0, 1)
print str(matrix[r][c]) + ' ',
print ''
Sample run: python spaces.py 10 10
So you can do the following:
Fill array with zeroes
Randomly select a point
If the condition holds, flip color
Repeat from step 2 or exit
The condition holds for all-zeros array. It is hold on any iteration. So, by induction, it is also true for the final array.
In the step 4 you can decide whether to stop or continue by doing, say N=a*b*1000 iterations or whether the ratio red/black is close to 1. In both cases, the result would be slightly biased since you start from all zeros.
Now, what is the condition. You have to ensure that all black points connected and all red points connected as well. In other words, there's maximum 2 connected clusters. Flipping a color could create more connected clusters, so you flip only when the its number is one or two. You can do the check quite efficiently using Union-Find algorithm, described here.
Edit: if however you want to permit black points to be surrounded by red ones but not vice-versa, you may change the condition to have any number of black clusters but only 0 or 1 of red clusters.
This would be a possible way to check the condition:
def: findStart(myArr):
for i in range(len(myArr)):
for j in range(len(myArr[0])):
if(myArr[i][j] == 0):
return (i,j)
def: checkCon(myArr, number_Ones):
width = len(myArr[0])
height = len(myArr)
pen = [] #A list of all points that are waiting to get a visit
vis = [] #A list of all points that are already visited
x = findStart(myArr)
while(len(pen) != 0): #Visit points as long as there are points left
p = pen.pop() #Pick a point to visit
if p in vis:
#do nothing since this point already was visited
else:
vis.append(p)
x,y = p
#A vertical check
if(x == 0 and myArr[x+1][y] == 0):
pen.append((x+1,y))
elif(x == (height-1) and myArr[x-1][y] == 0):
pen.append((x-1,y))
else:
if(myArr[x-1][y] == 0 and x-1 >= 0):
pen.append((x-1,y))
if(myArr[x+1][y] == 0):
pen.append((x+1,y))
#A horizontal check
if(y == 0 and myArr[x][y+1] == 0):
pen.append((x,y+1))
elif(y == (width-1) and myArr[x][y-1] == 0):
pen.append((x,y-1))
else:
if(myArr[x][y+1] == 0):
pen.append((x,y+1))
if(myArr[x][y-1] == 0 and y-1 >= 0):
pen.append((x,y-1))
print((height*width-number_Ones) == len(vis)) #if true, alle Zeros are connected and not enclosed
To clarify this is just a concept to check the condition. The idea is to visit all connected zeros and see if there are any left (that are not connected). If that is the case, there are some enclosed.
This method also doesn't work when the 1's form a frame around the matrix like this:
1 1 1 1
1 0 0 1
1 0 0 1
1 1 1 1
Again, just a concept :)
The problem has two parts actually. Generating the board state, and then checking if it is correct. I realised that checking the correctness was actually worse than just being sure correct states were always generated. This is what I did:
Note that I have defined self.WallSpaces to be an array equal in length to the height of my array, comprised of integers with the number of bits equal to the width of my array. self.Width and self.Height provide the end indices for the array. Basically, Intersects works by checking all the spaces surrounding a point for 1s, except the direction the space was "built from" (see below) and returning True if any of these are the edge of the array or a 1.
def Intersects(self, point, direction):
if (point[0] > 0):
if (direction != [1, 0] and self.WallSpaces[point[0] - 1] & (1 << point[1]) != 0):
return True
if (point[1] == 0 or self.WallSpaces[point[0] - 1] & (1 << (point[1] - 1)) != 0):
return True
if (point[1] == self.Width or self.WallSpaces[point[0] - 1] & (1 << (point[1] + 1)) != 0):
return True
else:
return True
if (point[0] < self.Height):
if (direction != [-1, 0] and self.WallSpaces[point[0] + 1] & (1 << point[1]) != 0):
return True
if (point[1] == 0 or self.WallSpaces[point[0] + 1] & (1 << (point[1] - 1)) != 0):
return True
if (point[1] == self.Width or self.WallSpaces[point[0] + 1] & (1 << (point[1] + 1)) != 0):
return True
else:
return True
if (point[1] == 0 or (direction != [0, 1] and self.WallSpaces[ point[0] ] & (1 << (point[1] - 1)) != 0)):
return True
if (point[1] == self.Width or (direction != [0, -1] and self.WallSpaces[ point[0] ] & (1 << (point[1] + 1)) != 0)):
return True
return False
The directions GPacW.Left, GPacW.Right, GPackW.Up, and GPacW.Down represent the cardinal directions for movement. This function works by constructing "walls" in the array from random points, which can turn in random directions, ending when they have intersected twice.
def BuildWalls(self):
numWalls = 0
directions = [ [GPacW.Left, GPacW.Right], [GPacW.Up, GPacW.Down] ]
start = [ random.randint(0, self.Height), random.randint(0, self.Width) ]
length = 0
horizontalOrVertical = random.randint(0, 1)
direction = random.randint(0, 1)
d = directions[horizontalOrVertical][direction]
intersected = False
while (numWalls < self.Walls):
while (start == [0, 0] or start == [self.Height, self.Width] or self.Intersects(start, d)):
start = [ random.randint(0, self.Height), random.randint(0, self.Width) ]
if (length == 0):
horizontalOrVertical = not horizontalOrVertical
direction = random.randint(0, 1)
length = random.randint(3, min(self.Height, self.Width))
d = directions[horizontalOrVertical][direction]
if (self.WallSpaces[ start[0] ] & (1 << start[1] ) == 0):
self.WallSpaces[ start[0] ] |= 1 << start[1]
numWalls += 1
length -= 1
if (0 <= (start[0] + d[0]) <= self.Height and 0 <= (start[1] + d[1]) <= self.Width):
start[0] += d[0]
start[1] += d[1]
else:
start = [0,0]
if (self.Intersects(start, d)):
if (intersected):
intersected = False
start = [0,0]
length = 0
else:
intersected = True
return

Is there a way to let Python generate multiple if statements?

I just need to add +30 to every number in every if statement. I need 36 of these, is there a way to let turtle make more if statements or something similar? I'm really stuck and the manual way would be crazy.
For example:
if 0 <= x <=30 and 0 <= y <= 30:
turtle.drawsstuff
if 30 <= x <=60 and 0 <= y <= 60:
etc.
Use a for loop.
for n in range(0, 36 * 30, 30):
if n <= x <= n + 30 and 0 <= y <= n + 30:
pass #do something
for n in range(0, 36 * 30, 30):
if n <= x <= (n+30) and n <= y <= (n+30):
pass # (do stuff)
range can take an optional third argument for the "step" value. For reference, see Python's documentation on range.

Categories

Resources