I would like to know how I can round a number in numpy to an upper or lower threshold which is function of predefined step size. Hopefully stated in a clearer way, if I have the number 123 and a step size equal to 50, I need to round 123 to the closest of either 150 or 100, in this case 100. I came out with function below which does the work but I wonder if there is a better, more succint, way to do this.
Thanks in advance,
Paolo
def getRoundedThresholdv1(a, MinClip):
import numpy as np
import math
digits = int(math.log10(MinClip))+1
b = np.round(a, -digits)
if b > a: # rounded-up
c = b - MinClip
UpLow = np.array((b,c))
else: # rounded-down
c = b + MinClip
UpLow = np.array((c,b))
AbsDelta = np.abs(a - UpLow)
return UpLow[AbsDelta.argmin()]
getRoundedThresholdv1(143, 50)
The solution by pb360 is much better, using the second argument of builtin round in python3.
I think you don't need numpy:
def getRoundedThresholdv1(a, MinClip):
return round(float(a) / MinClip) * MinClip
here a is a single number, if you want to vectorize this function you only need to replace round with np.round and float(a) with np.array(a, dtype=float)
Summary: This is a correct way to do it, the top answer has cases that do not work:
def round_step_size(quantity: Union[float, Decimal], step_size: Union[float, Decimal]) -> float:
"""Rounds a given quantity to a specific step size
:param quantity: required
:param step_size: required
:return: decimal
"""
precision: int = int(round(-math.log(step_size, 10), 0))
return float(round(quantity, precision))
My reputation is too low to post a comment on the top answer from Ruggero Turra and point out the issue. However it has cases which did not work for example:
def getRoundedThresholdv1(a, MinClip):
return round(float(a) / MinClip) * MinClip
getRoundedThresholdv1(quantity=13.200000000000001, step_size=0.0001)
Returns 13.200000000000001 right back whether using numpy or the standard library round. I didn't even find this by stress testing the function. It just came up when using it in production code and spat an error.
Note full credit for this answer comes out of an open source github repo which is not mine found here
Note that round() in Ruggero Turra his answer rounds to the nearest even integer. Meaning:
a= 0.5
round(a)
Out: 0
Which may not be what you expect.
In case you want 'classical' rounding, you can use this function, which supports both scalars and Numpy arrays:
import Numpy as np
def getRoundedThresholdv1(a, MinClip):
scaled = a/MinClip
return np.where(scaled % 1 >= 0.5, np.ceil(scaled), np.floor(scaled))*MinClip
Alternatively, you could use Numpy's method digitize. It requires you to define the array of your steps. digitize will kind of ceil your value to the next step. So in order to round in a 'classical' way we need an intermediate step.
You can use this:
import Numpy as np
def getRoundedThresholdv1(a, MinClipBins):
intermediate = (MinClipBins[1:] + MinClipBins[:-1])/2
return MinClipBins[np.discritize(a, intermediate)]
You can then call it like:
bins = np.array([0, 50, 100, 150])
test1 = getRoundedThresholdv1(74, bins)
test2 = getRoundedThresholdv1(125, bins)
Which gives:
test1 = 50
test2 = 150
Related
I'm starting with numba and my first goal is to try and accelerate a not so complicated function with a nested loop.
Given the following class:
class TestA:
def __init__(self, a, b):
self.a = a
self.b = b
def get_mult(self):
return self.a * self.b
and a numpy ndarray that contains class TestA objects. Dimension (N,) where N is usually ~3 million in length.
Now given the following function:
def test_no_jit(custom_class_obj_container):
container_length = len(custom_class_obj_container)
sum = 0
for i in range(container_length):
for j in range(i + 1, container_length):
obj_i = custom_class_obj_container[i]
obj_j = custom_class_obj_container[j]
sum += (obj_i.get_mult() + obj_j.get_mult())
return sum
I've tried to play around numba to get it to work with the function above however I cannot seem to get it to work with nopython=True flag, and if it's set to false, then the runtime is higher than the no-jit function.
Here is my latest try in trying to jit the function (also using nb.prange):
#nb.jit(nopython=False, parallel=True)
def test_jit(custom_class_obj_container):
container_length = len(custom_class_obj_container)
sum = 0
for i in nb.prange(container_length):
for j in nb.prange(i + 1, container_length):
obj_i = custom_class_obj_container[i]
obj_j = custom_class_obj_container[j]
sum += (obj_i.get_mult() + obj_j.get_mult())
return sum
I've tried to search around but I cannot seem to find a tutorial of how to define a custom class in the signature, and how would I go in order to accelerate a function of that sort and get it to run on GPU and possibly (any info regarding that matter would be highly appreciated) to get it to run with cuda libraries - which are installed and ready to use (previously used with tensorflow)
The numba docs give an example of creating a custom type, even for nopython mode: https://numba.pydata.org/numba-doc/latest/extending/interval-example.html
In your case though, unless this is a really slimmed down version of what you actually want to do, it seems like the easiest approach would be to re-use existing types. Additionally, the construction of a 3M length object array is going to be slow, and produce fragmented memory (as the objects are not being stored in contiguous blocks).
An example of how using record arrays might be used to solve the problem:
x_dt = np.dtype([('a', np.float64),
('b', np.float64)])
n = 30000
buf = np.arange(n*2).reshape((n, 2)).astype(np.float64)
vec3 = np.recarray(n, dtype=x_dt, buf=buf)
#numba.njit
def mult(a):
return a.a * a.b
#numba.jit(nopython=True, parallel=True)
def sum_of_prod(vector):
sum = 0
vector_len = len(vector)
for i in numba.prange(vector_len):
for j in numba.prange(i + 1, vector_len):
sum += mult(vector[i]) + mult(vector[j])
return sum
sum_of_prod(vec3)
FWIW, I'm no numba expert. I found this question when searching for how to implement a custom type in numba for non-numerical stuff. In your case, because this is highly numerical, I think a custom type is probably overkill.
I am writing a scientific code in python to calculate the energy of a system.
Here is my function : cte1, cte2, cte3, cte4 are constants previously computed; pii is np.pi (calculated beforehand, since it slows the loop otherwise). I calculate the 3 components of the total energy, then sum them up.
def calc_energy(diam):
Energy1 = cte2*((pii*diam**2/4)*t)
Energy2 = cte4*(pii*diam)*t
d=diam/t
u=np.sqrt((d)**2/(1+d**2))
cc= u**2
E = sp.special.ellipe(cc)
K = sp.special.ellipk(cc)
Id=cte3*d*(d**2+(1-d**2)*E/u-K/u)
Energy3 = cte*t**3*Id
total_energy = Energy1+Energy2+Energy3
return (total_energy,Energy1)
My first idea was to simply loop over all values of the diameter :
start_diam, stop_diam, step_diam = 1e-10, 500e-6, 1e-9 #Diametre
diametres = np.arange(start_diam,stop_diam,step_diam)
for d in diametres:
res1,res2 = calc_energy(d)
totalEnergy.append(res1)
Energy1.append(res2)
In an attempt to speed up calculations, I decided to use numpy to vectorize, as shown below :
diams = diametres.reshape(-1,1) #If not reshaped, calculations won't run
r1 = np.apply_along_axis(calc_energy,1,diams)
However, the "vectorized" solution does not properly work. When timing I get 5 seconds for the first solution and 18 seconds for the second one.
I guess I'm doing something the wrong way but can't figure out what.
With your current approach, you're applying a Python function to each element of your array, which carries additional overhead. Instead, you can pass the whole array to your function and get an array of answers back. Your existing function appears to work fine without any modification.
import numpy as np
from scipy import special
cte = 2
cte1 = 2
cte2 = 2
cte3 = 2
cte4 = 2
pii = np.pi
t = 2
def calc_energy(diam):
Energy1 = cte2*((pii*diam**2/4)*t)
Energy2 = cte4*(pii*diam)*t
d=diam/t
u=np.sqrt((d)**2/(1+d**2))
cc= u**2
E = special.ellipe(cc)
K = special.ellipk(cc)
Id=cte3*d*(d**2+(1-d**2)*E/u-K/u)
Energy3 = cte*t**3*Id
total_energy = Energy1+Energy2+Energy3
return (total_energy,Energy1)
start_diam, stop_diam, step_diam = 1e-10, 500e-6, 1e-9 #Diametre
diametres = np.arange(start_diam,stop_diam,step_diam)
a = calc_energy(diametres) # Pass the whole array
As I'm really struggleing to get from R-code, to Python code, I would like to ask some help. The code I want to use has been provided to my from withing the mathematics forum of stackexchange.
https://math.stackexchange.com/questions/2205573/curve-fitting-on-dataset
I do understand what is going on. But I'm really having a hard time trying to solve the R-code, as I have never seen anything of it. I have written the function to return the sum of squares. But I'm stuck at how I could use a function similar to the optim function. And also I don't really like the guesswork at the initial values. I would like it better to run and re-run a type of optim function untill I get the wanted result, because my needs for a nearly perfect curve fit are really high.
def model (par,x):
n = len(x)
res = []
for i in range(1,n):
A0 = par[3] + (par[4]-par[1])*par[6] + (par[5]-par[2])*par[6]**2
if(x[i] == par[6]):
res[i] = A0 + par[1]*x[i] + par[2]*x[i]**2
else:
res[i] = par[3] + par[4]*x[i] + par[5]*x[i]**2
return res
This is my model function...
def sum_squares (par, x, y):
ss = sum((y-model(par,x))^2)
return ss
And this is the sum of squares
But I have no idea on how to convert this:
#I found these initial values with a few minutes of guess and check.
par0 <- c(7,-1,-395,70,-2.3,10)
sol <- optim(par= par0, fn=sqerror, x=x, y=y)$par
To Python code...
I wrote an open source Python package (BSD license) that has a genetic algorithm (Differential Evolution) front end to the scipy Levenberg-Marquardt solver, it functions similarly to what you describe in your question. The github URL is:
https://github.com/zunzun/pyeq3
It comes with a "user-defined function" example that's fairly easy to use:
https://github.com/zunzun/pyeq3/blob/master/Examples/Simple/FitUserDefinedFunction_2D.py
along with command-line, GUI, cluster, parallel, and web-based examples. You can install the package with "pip3 install pyeq3" to see if it might suit your needs.
Seems like I have been able to fix the problem.
def model (par,x):
n = len(x)
res = np.array([])
for i in range(0,n):
A0 = par[2] + (par[3]-par[0])*par[5] + (par[4]-par[1])*par[5]**2
if(x[i] <= par[5]):
res = np.append(res, A0 + par[0]*x[i] + par[1]*x[i]**2)
else:
res = np.append(res,par[2] + par[3]*x[i] + par[4]*x[i]**2)
return res
def sum_squares (par, x, y):
ss = sum((y-model(par,x))**2)
print('Sum of squares = {0}'.format(ss))
return ss
And then I used the functions as follow:
parameter = sy.array([0.0,-8.0,0.0018,0.0018,0,200])
res = least_squares(sum_squares, parameter, bounds=(-360,360), args=(x1,y1),verbose = 1)
The only problem is that it doesn't produce the results I'm looking for... And that is mainly because my x values are [0,360] and the Y values only vary by about 0.2, so it's a hard nut to crack for this function, and it produces this (poor) result:
Result
I think that the range of x values [0, 360] and y values (which you say is ~0.2) is probably not the problem. Getting good initial values for the parameters is probably much more important.
In Python with numpy / scipy, you would definitely want to not loop over values of x but do something more like
def model(par,x):
res = par[2] + par[3]*x + par[4]*x**2
A0 = par[2] + (par[3]-par[0])*par[5] + (par[4]-par[1])*par[5]**2
res[np.where(x <= par[5])] = A0 + par[0]*x + par[1]*x**2
return res
It's not clear to me that that form is really what you want: why should A0 (a value independent of x added to a portion of the model) be so complicated and interdependent on the other parameters?
More importantly, your sum_of_squares() function is actually not what least_squares() wants: you should return the residual array, you should not do the sum of squares yourself. So, that should be
def sum_of_squares(par, x, y):
return (y - model(par, x))
But most importantly, there is a conceptual problem that is probably going to plague this model: Your par[5] is meant to represent a breakpoint where the model changes form. This is going to be very hard for these optimization routines to find. These routines generally make a very small change to each parameter value to estimate to derivative of the residual array with respect to that variable in order to figure out how to change that variable. With a parameter that is essentially used as an integer, the small change in the initial value will have no effect at all, and the algorithm will not be able to determine the value for this parameter. With some of the scipy.optimize algorithms (notably, leastsq) you can specify a scale for the relative change to make. With leastsq that is called epsfcn. You may need to set this as high as 0.3 or 1.0 for fitting the breakpoint to work. Unfortunately, this cannot be set per variable, only per fit. You might need to experiment with this and other options to least_squares or leastsq.
I'm trying to nice print some divisions with Sympy but I noticed it didn't display aligned.
import sympy
sympy.init_printing(use_unicode=True)
sympy.pprint(sympy.Mul(-1, sympy.Pow(-5, -1, evaluate=False), evaluate=False))
# Output:
# -1
# ───
# -5 # Note that "-5" is displayed slightly more on the right than "-1".
Reason/fix for this?
EDIT: I did a lot of reverse-engineering using inspect.getsource and inspect.getsourcefile but it didn't really help out in the end.
Pretty Printing in Sympy seems to be relying on the Prettyprinter by Jurjen Bos.
import sympy
from sympy.printing.pretty.stringpict import *
sympy.init_printing(use_unicode=True)
prettyForm("-1")/prettyForm("-5")
# Displays:
# -1
# --
# -5
So it does display aligned, but I can't get it to use unicode.
The PrettyPrinter is called from the file sympy/printing/pretty/pretty.py in the method PrettyPrinter._print_Mul which simply return prettyForm.__mul__(*a)/prettyForm.__mul__(*b) with, I thought, a and b simply being ['-1'] and ['-5'] but it wouldn't work.
Found out where the weird part is coming from:
stringpict.py line 417:
if num.binding==prettyForm.NEG:
num = num.right(" ")[0]
This is being done ONLY for the numerator. It adds a space after the numerator if the numerator is negative… Weird!
I'm not sure if there can be a fixed other than directly editing the file. I'm going to report this on Github.
Thanks all for your help and suggestion.
PS: In the end, I used pdb to help me debug and figure out what was actually going out!
EDIT: Hotfix if you can't / don't want to edit the code source:
import sympy
sympy.init_printing(use_unicode=True)
from sympy.printing.pretty.stringpict import prettyForm, stringPict
def newDiv(self, den, slashed=False):
if slashed:
raise NotImplementedError("Can't do slashed fraction yet")
num = self
if num.binding == prettyForm.DIV:
num = stringPict(*num.parens())
if den.binding == prettyForm.DIV:
den = stringPict(*den.parens())
return prettyForm(binding=prettyForm.DIV, *stringPict.stack(
num,
stringPict.LINE,
den))
prettyForm.__div__ = newDiv
sympy.pprint(sympy.Mul(-1, sympy.Pow(-5, -1, evaluate=False), evaluate=False))
# Displays properly:
# -1
# ──
# -5
I just copied the function from the code source and removed the incriminated line.
Possible improvement could be to functools.wraps the new function with the original one.
Negative denominators are not standard and badly handled. If you really need them, you can modify the string outpout given by the pretty function :
import sympy
sympy.init_printing(use_unicode=True)
def ppprint(expr):
p=sympy.pretty(expr)
s=p.split('\n')
if len(s)==3 and int(s[2])<0:
s[0]=" "+s[0]
s[1]=s[1][0]+s[1]
p2="\n".join(s)
print(p2)
else: print(p)
This extend the bar and the numerator of one unit for negative denominators. No warranty of robustness on big expressions.
>>>> ppprint(sympy.Mul(sympy.Pow(-5, -1,evaluate=False),-1,evaluate=False))
-1
────
-5
I was not quite sure what you are searching for, but I think I was dealing with something similar a while ago.
I got a comprehension list and used this for printing.
You may find it useful.
x = amp * np.sin( 2 * np.pi * 200 * times ) + nse1
x2 = np.array_split(x,epochs(
Rxy[i], freqs_xy = mlab.csd(x2[i], y2[i], NFFT=nfft, Fs=sfreq)
Rxy_mean0 = [complex(sum(x)/len(x)) for x in Rxy]
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(Rxy_mean0)
I'm really struggling with the Pandas rolling_apply function. I'm trying to apply a filter to some time series data like below and make a new series for outliers. I want the value to return True when the value is an outlier.
ts = pd.Series(np.random.randn(1000), index=pd.date_range('1/1/2000', periods=1000))
window, alpha, gamma = 60, .05, .03
def trim_moments(arr, alpha):
np.sort(arr)
n = len(arr)
k = int(round(n*float(alpha))/2)
return np.mean(arr[k+1:n-k]), np.std(arr[k+1:n-k])
# First function that tests whether criteria is met.
def bg_test(arr,alpha,gamma):
local_mean, local_std = trim_moments(arr, alpha)
return np.abs(arr - local_mean) < 3 * local_std + gamma
This is the function that I run
outliers = pd.rolling_apply(ts, window, bg_test, args=(alpha,gamma))
Returns the error:
TypeError: only length-1 arrays can be converted to Python scalars
My troubleshooting indicates that the problem lies in the boolean return statement. I keep getting the similar error when I simplify the function and use np.mean/std rather than my own functions. It seems like previous issues with TypeError were due to performing non-vectorized operations on Numpy Arrays but this doesn't seem to be the issue here.
What am I doing wrong here?
It's less than a helpful message, but I believe the error is happening because rolling_apply currently expects a like typed return array (may even have to be float). But, if you break your three operations (mean, std, outlier logic) into steps, it should work ok.
ts.name = 'value'
df = pd.DataFrame(ts)
def trimmed_apply(arr, alpha, f):
np.sort(arr)
n = len(arr)
k = int(round(n*float(alpha))/2)
return f(arr[k+1:n-k])
df['trimmed_mean'] = pd.rolling_apply(df['value'], window, trimmed_apply, args=(alpha, np.mean))
df['trimmed_std'] = pd.rolling_apply(df['value'], window, trimmed_apply, args=(alpha, np.std))
df['outlier'] = np.abs(arr - df['trimmed_mean']) < 3 * df['trimmed_std'] + gamma