How to call a function with data passed in later - python

I am working on a school project where I calculate different predictors for some data and I created a function with some predictors so I can use them in a for cycle.
predictor_day = (
[(f"Median {x}", create_median_predictor(x)) for x in (10, 30, 60, 80, 120)]
+ [(f"Average {x}", create_average_predictor(x)) for x in (10, 30, 60, 80, 120)]
+ [
(f"Weighted average {x}", create_weighted_average_predictor(x))
for x in (10, 30, 60, 80, 120)
]
)
This is what one of the predictor functions looks like:
def create_median_predictor(window_size):
def median_predictor(train_data):
return median(train_data[-window_size:])
return median_predictor
Now I also wanted to create a predictor, which takes all the data and returns a median of it, this is what it looks like:
def all_data_median_predictor(train_data):
return median(train_data)
and this is where I am calling it:
for predictor in predictor_day:
prediction = predictor(train_data)
but I cant seem to figure out a way how to add this one to my predictors_day variable, as it is allways missing parameter train_data, is there any way how I can add it to this variable?

Based on the other lists, I assume the the type of predictor_day is List[Tuple[str, Callable]] .
predictor_day = (
[(f"Median {x}", create_median_predictor(x)) for x in (10, 30, 60, 80, 120)]
+ [(f"Average {x}", create_average_predictor(x)) for x in (10, 30, 60, 80, 120)]
+ [
(f"Weighted average {x}", create_weighted_average_predictor(x))
for x in (10, 30, 60, 80, 120)
]
+ [("all data median predictor", all_data_median_predictor)] # the change is in this line
)

Related

How to split a list into 2 unsorted groupings based on the median

I am aiming to sort a list into two subsections that don't need to be sorted.
Imagine I have a list of length 10 that has values 0-9 in it.
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
I would want to sort it in a way that indices 0 through 4 contain values 10, 20, 30, 40, and 50 in any ordering.
For example:
# SPLIT HERE V
[40, 30, 20, 50, 10, 70, 60, 80, 90, 100]
I've looked into various divide and conquer sorting algorithms, but I'm uncertain which one would be the best to use in this case.
My current thought is to use quicksort, but I believe there is a better way to do what I am searching to do since everything does not need to be sorted exactly, but sorted in a "general" sense that all values are on their respective side of the median in any ordering.
to me this seems to do the trick , unless you exactly need the output to be unordered :
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
sorted_arr = sorted(arr)
median_index = len(arr)//2
sub_list1, sub_list2 = sorted_arr[:median_index],sorted_arr[median_index:]
this outputs :
[10, 20, 30, 40, 50] [60, 70, 80, 90, 100]
The statistics package has a method for finding the median of a list of numbers. From there, you can use a for loop to separate the values into two separate lists based on whether or not it is greater than the median:
from statistics import median
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
med = median(arr)
result1 = []
result2 = []
for item in arr:
if item <= med:
result1.append(item)
else:
result2.append(item)
print(result1)
print(result2)
This outputs:
[50, 30, 20, 10, 40]
[90, 100, 70, 60, 80]
If you would like to solve the problem from scratch you could implement Median of Medians algorithm to find median of unsorted array in linear time. Then it depends what is your goal.
If you would like to make the reordering in place you could use the result of Median of Medians algorithm to select a Pivot for Partition Algorithm (part of quick sort).
On the other hand using python you could then just iterate through the array and append the values respectively to left or right array.
Other current other answers have the list split into two lists, and based on your example I am under the impression there is two groupings, but the output is one list.
import numpy as np
# setup
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
# output array
unsorted_grouping = []
# get median
median = np.median(arr)
# loop over array, if greater than median, append. Append always assigns
# values at the end of array
# else insert it at position 0, the beginning / left side
for val in arr:
if val >= median:
unsorted_grouping.append(val)
else:
unsorted_grouping.insert(0, val)
# output
unsorted_grouping
[40, 10, 20, 30, 50, 90, 100, 70, 60, 80]
You can use the statistics module to calculate the median, and then use it to add each value to one group or the other:
import statistics
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
median = statistics.median(arr)
bins = [], [] # smaller and bigger values
for value in arr:
bins[value > median].append(value)
print(bins[0]) # -> [50, 30, 20, 10, 40]
print(bins[1]) # -> [90, 100, 70, 60, 80]
You can do this with numpy (which is significantly faster if arr is large):
import numpy as np
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
arr = np.array(arr)
median = np.median(arr)
result1 = arr[arr <= median]
result2 = arr[arr > median]
Output:
array([50, 30, 20, 10, 40])
array([ 90, 100, 70, 60, 80])
And if you want one list as the output, you can do:
[*result1, *result2]
Output:
[50, 30, 20, 10, 40, 90, 100, 70, 60, 80]
My first Python program, so please bear with me.
Basically does QuickSort, as you suggest, but only sub-sorts the partition that holds the median index.
arr = [50, 30, 20, 10, 90, 40, 100, 70, 60, 80]
def partition(a, left, right):
pivot = (left + right)//2
a[left],a[pivot] = a[pivot], a[left] # swap
pivot = left
left += 1
while right >= left :
while left <= right and a[left] <= a[pivot] :
left += 1
while left <= right and a[right] > a[pivot] :
right -= 1
if left <= right:
a[left] , a[right] = a[right], a[left]
left += 1
right -= 1
else:
break
a[pivot], a[right] = a[right] , a[pivot]
return right
def medianSplit(array):
left = 0;
right = len(array) - 1;
med = len(array) // 2;
while (left < right):
pivot = partition(array, left, right)
if pivot > med:
right = pivot - 1;
else:
left = pivot + 1;
def main():
medianSplit(arr)
print(arr)
main()

How to define a function that will check any data frame for Age column and return bins?

I am trying to define a function that will take any dataframe with an 'Age' column, bin the ages, and return how many Xs are in each age category.
Consider the following:
def age_range():
x = input("Enter Dataframe Name: ")
df = x
df['Age']
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s','100s']
pd.df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
return print("Age Ranges:", result)
I keep getting a Type Error: string indices must be integers.
I thought that by calling the df['Age'], it would return a one-column series from which the binning and labelling would work effectively. But it isn't working for me.
the problem lies here
x = input("Enter Dataframe Name: ") # type of x is a string
df = x # now type of df is also a string
df['Age'] # python uses [] as a slicing operation for string, hence generate error
this would resolve your problem
def age_range(df):
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
result = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
return result
for example, you can run it like:
df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
df["AgeRange"] = age_range(df)
or
df = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(500)]})
AgeRangeDf = pd.DataFrame({"Age_Range" :age_range(df)})
Assuming you want the total bin counts over the dataFrame:
from numpy import random
import pandas as pd
df1 = pd.DataFrame({'Age' : [random.randint(1, 99) for i in range(100)]})
def age_range(df):
import pandas as pd
df['Age']
bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels=['0-9', '10-19', '20s', '30s', '40s', '50s', '60s', '70s', '80s', '90s']
df['AgeGroup'] = pd.cut(df['Age'], bins=bins, labels=labels, right=False)
result = pd.DataFrame(df['AgeGroup'].groupby(df['AgeGroup']).count())
return result
print(age_range(df1))
This returns a single column DataFrame

Numpy array not getting updated

I have a numpy array data3 of size 640X480.
I have written this code to update a specific condition which works well.
data4=np.where((data3<=119) & (data3>110),13,data3)
Following is the list:-
b = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 150]
To update the data, following is the code
for i in range(2,17):
data4=np.where((data3<=b[i]) & (data3>b[i-1]),i+1,data3)
any pointers why data doesn't get updated?
when i=16, the condition becomes:
data4=np.where((data3<=150) & (data3>150),i+1,data3)
of course the answer of "(data3<=150) & (data3>150)" is False (nothing), so all the data will be replace with data3.
So, at the end of the loop, you will get data4=data3.

How to use tweening in Python, without losing accuracy?

I've been struggling to use tweening to make mouse movements smooth in Python, I am currently trying to automate some repetitive tasks.
I've tried to use tweening to remove some of the roughness that occurs without smoothing applied, however by doing so I am losing a noticeable amount of accuracy, after all my dy and dx values are getting split by a number I end up with remainders. This could possibly be solved by getting the greatest common factor on both my values (since both dx and dy need to be split by the same number) unfortunately this leads to a too small of a GCD.
Since the mouse cannot move the remainder of a pixel on a screen I end up a with noticeable loss of accuracy.
Question: How to apply tweening on mouse movements, without losing accuracy?
import pytweening
import win32api
import win32con
from time import sleep
dy = [50, 46, 42, 38, 33, 29, 24, 20, 15, 10, 10]
dx = [-35, 6, -55, -43, 0, 17, 29, 38, 42, 42, 38]
while True:
count = 0
values = [(pytweening.getPointOnLine(0, 0, x, y, 0.20)) for x, y in zip(dx, dy)]
while win32api.GetAsyncKeyState(win32con.VK_RBUTTON) and win32api.GetAsyncKeyState(win32con.VK_LBUTTON):
if count < len(dx):
for _ in range(5):
win32api.mouse_event(1, int(values[count][0]), int(values[count][1]), 0, 0)
sleep(0.134 / 5)
count += 1
The fundamental problem here is that you are using relative movement in integer amounts, which will not add up to the total movement you are looking for. If you only want to move linearly, you also don't need PyTweening at all. How about this solution?
import win32api
import win32con
from time import sleep
Npoints = 5
sleeptime = 0.134 / Npoints
dys = [50, 46, 42, 38, 33, 29, 24, 20, 15, 10, 10]
dxs = [-35, 6, -55, -43, 0, 17, 29, 38, 42, 42, 38]
x, y = win32api.GetCursorPos()
for dx, dy in zip(dxs, dys):
ddx = dx/Npoints
ddy = dy/Npoints
for _ in range(Npoints):
x += ddx
y += ddy
win32api.SetCursorPos(int(x), int(y))
sleep(sleeptime)
Note that there will still be some very small round-off error and that the cursor will move in a straight line between the points. If the cursor starts at (0, 0), this is the shape it will make (the red crosses are the points where the cursor will be set to):
If you wanted to move in smooth curves through the points and you're OK with using numpy and scipy, this will handle that:
import numpy as np
import scipy.interpolate as sci
totalpoints = 50 # you can set this to a larger number to get closer spaced points
x, y = win32api.GetCursorPos()
# work out absolute coordinates of new points
xs = np.cumsum([x, *dxs])
ys = np.cumsum([y, *dys])
# fit spline between the points (s=0 makes the spline hit all the points)
tck, u = sci.splprep([xs, ys], s=0)
# Evaluate the spline and move to those points
for x, y in zip(*sci.splev(np.linspace(0, 1, totalpoints), tck)):
win32api.SetCursorPos(int(x), int(y))
sleep(sleeptime)
This results in positions as shown below:
Question: Tweening, without losing accuracy?
Reference:
PyTweening - getLinePoint()
x, y = getLinePoint(startPoint x, startPoint y, endPoint x, endPoint y, intervall)
The getLinePoint() function finds a point on the provided line.
Cast your lists, dx anddy, into a list of tuple(x, y)
dx = [-35, 6, -55, -43, 0, 17, 29, 38, 42, 42, 38]
dy = [50, 46, 42, 38, 33, 29, 24, 20, 15, 10, 10]
points = list(zip(dx, dy))
print(points)
Output:
[(-35, 50), (6, 46), (-55, 42), (-43, 38), (0, 33), (17, 29), (29, 24), (38, 20), (42, 15), (42, 10), (38, 10)]
Process this list of points in a double for loop.
import pytweening
for startPoint in points:
for endPoint in points:
x, y = pytweening.getPointOnLine(startPoint[0], startPoint[1],
endPoint[0], endPoint[1],
0.20)
x, y = int(x), int(y)
print('{}, '.format((x, y)), end='')
# win32api.mouse_event(1, x, y, 0, 0)
# sleep(0.134)
Output: The End Points are allways reached!
First move from (-35, 50) to (6, 46):
(-35, 50), (-26, 49), (-39, 48), (-36, 47), (-28, 46), (-24, 45),(-22, 44),
(-20, 44), (-19, 43), (-19, 42), (-20, 42), (-2, 46), (6, 46)
... (omitted for brevity)
Last move from (42, 10) to (38, 10):
(42, 10), (41, 10), (23, 18), (31, 17), (19, 16), (21, 15), (30, 14),
(33, 13), (36, 12), (38, 12), (38, 11), (38, 10), (38, 10)
Tested with Python: 3.6 - pytweening: 1.0.3

Building a set of tile coordinates

I have an image which I want to divide into tiles of specific size (and cropping tiles that don't fit).
The output of this operation should be a list of coordinates in tuples [(x, y, width, height),...]. For example, dividing a 50x50 image in tiles of size 20 would give: [(0,0,20,20),(20,0,20,20),(40,0,10,20),(0,20,20,20),(20,20,20,20),(40,20,10,20),...] etc.
Given a height, width and tile_size, it seems like I should be able to do this in a single list comprehension, but I can't wrap my head around it. Any help would be appreciated. Thanks!
Got it with:
output = [(x,y,w,h) for x,w in zip(range(width)[::tile_size],[tile_size]*(w_tiles-1) + [w_padding]) for y,h in zip(range(height)[::tile_size],[tile_size]*(h_tiles-1) + [h_padding])]
import itertools
def tiles(h, w, ts):
# here is the one list comprehension for list of tuples
return [tuple(list(ele) + [ts if w-ele[0] > 20 else w-ele[0], ts if h-ele[1] > 20 else h-ele[1]]) for ele in itertools.product(*[filter(lambda x: x % ts == 0, range(w)), filter(lambda x: x % ts == 0, range(h))])]
print tiles(50, 50, 20)
[(0, 0, 20, 20), (0, 20, 20, 20), (0, 40, 20, 10), (20, 0, 20, 20), (20, 20, 20, 20), (20, 40, 20, 1
0), (40, 0, 10, 20), (40, 20, 10, 20), (40, 40, 10, 10)]

Categories

Resources