What is the fastest way to remove the N-Rightmost number in python integer?
Here are some of my code:
def f_int(dividend,n):
return int(dividend/(10 ** n))
def f_str_to_int(dividend,n):
return int(str(dividend)[:-n])
If we don't care whether the output is in str on int, we can skip the int() in def_f_str_to_int():
def f_str(dividend,n):
return str(dividend)[:-n]
We also can increase speed by asking the input in the form of power of ten:
divisor_in_power_of_ten = 10 ** n #outside function run time
def f_int_hack (dividend,divisor_in_power_of_ten):
return int(dividend/(divisor_in_power_of_ten))
My question is, is there a faster way (maybe using bit manipulation)? I need to optimize it since it will be used as part of a real-time web. Note: Restricted only using python, not JS or Cython (it's okay to extend your answer in JS and Cython, but that is not the main question).
Here is my result:
FUNCTION: f_int_hack Used 200170 times
MEDIAN 1.1000000004202093e-06
MEAN 1.5179877104460484e-06
STDEV 3.600025074234889e-05
FUNCTION: f_int Used 199722 times
MEDIAN 1.8999999999991246e-06
MEAN 2.420203582980709e-06
STDEV 3.482858342790541e-05
FUNCTION: f_str Used 200132 times
MEDIAN 1.4999999997655777e-06
MEAN 1.7462234924949252e-06
STDEV 1.4733864640549157e-05
FUNCTION: f_str_to_int Used 199639 times
MEDIAN 2.000000000279556e-06
MEAN 2.751038624717222e-06
STDEV 6.383386278143267e-05
Edit:
Function for the benchmark (edit accordingly, since I edit some of them):
import time
import random
import statistics
def benchmark(functions, iteration, *args):
times = {f.__name__: [] for f in functions}
for i in range(iteration):
func = random.choice(functions)
t0 = time.perf_counter()
func(*args)
t1 = time.perf_counter()
times[func.__name__].append(t1 - t0)
for name, numbers in times.items():
print('FUNCTION:', name, 'Used', len(numbers), 'times')
print('\tMEDIAN', statistics.median(numbers))
print('\tMEAN ', statistics.mean(numbers))
print('\tSTDEV ', statistics.stdev(numbers))
if __name__=="__main__":
# Variables
divident = 12345600000
n = 3
iteration = 1000000
# The functions to compare
def f_int(divident,n):
return int(divident/(10 ** n))
def f_str_to_int(divident,n):
return int(str(divident)[:-n])
functions = f_int, f_str_to_int
benchmark(functions, iteration, divident, n)
More clarification: input is python integer. Actually, i didn't care with the output format (wether it's str or int), but let's make the output in int first and str as "bonus question".
Edit:
From comment section a//b:
FUNCTION: f_int_double_slash Used 166028 times
MEDIAN 1.2000000002565514e-06
MEAN 1.4399938564575845e-06
STDEV 1.2767417156171526e-05
YES, IT'S FASTER THAN int(a/b)
Edit:
From comment, if we accept float, the fastest way is:
def f_float_hack(dividend,divisor_in_power_of_ten):
return dividend//divisor_in_power_of_ten
With result:
FUNCTION: f_float_hack Used 142983 times
MEDIAN 7.000000001866624e-07
MEAN 9.574040270508322e-07
STDEV 3.725603760159355e-05
Improvement of int double slash using precomputing power of ten:
FUNCTION: f_int_double_slash_hack Used 143082 times
MEDIAN 7.999999995789153e-07
MEAN 1.1596266476572136e-06
STDEV 4.9442788346866335e-05
Current result: float_hack is the fastest (if we accept float).
Related
I have a following question. I have a function get_time that return time between two coordinates. I would like to create a time matrix.
Here is my code:
def time_matrix(coordinates):
times = np.zeros((len(coordinates), len(coordinates)), dtype=float)
for i in range(len(coordinates)):
for j in range(len(coordinates)):
time = get_time(
coordinates[i][0], coordinates[i][1], coordinates[j][0], coordinates[j][1]
) / 60
times[i][j] = time
return times.tolist()
My function works, but it is very ineffective. times is symmetric, so it would be better to use each time twice. In other words, I don`t want to compute the result row by row. Can you help me how can I modify my function, please?
If you just want to use the evaluated time twice, it could simply be achieved by changing the assignment line and limit the second loop
def time_matrix(coordinates):
times = np.zeros((len(coordinates), len(coordinates)), dtype=float)
for i in range(len(coordinates)):
for j in range(i,len(coordinates)):
time = get_time(
coordinates[i][0], coordinates[i][1], coordinates[j][0], coordinates[j][1]
) / 60
times[i][j] = times[j][i] = time
return times.tolist()
This should work, right?
But a vectorized get_time() would be better, of course.
I'm performing Data Science and am calculating the Log Likelihood of a Poisson Distribution of arrival times.
def LogLikelihood(arrival_times, _lambda):
"""Calculate the likelihood that _lambda predicts the arrival_times well."""
ll = 0
for t in arrival_times:
ll += -(_lambda) + t*np.log(_lambda) - np.log(factorial(t))
return ll
Mathematically, the expression is on the last line:
Is there a more pythonic way to perform this sum? Perhaps in one line?
Seems perfectly Pythonic to me; but since numpy is already here, why not to vectorize the whole thing?
return (
-_lambda
+ arrival_times * np.log(_lambda)
- np.log(np.vectorize(np.math.factorial)(arrival_times))
).sum()
If you have scipy available, use loggamma which is more robust than chaining log and factorial:
from scipy import special
def loglikeli(at,l):
return (np.log(l)*at-l-special.loggamma(at+1)).sum()
### example
rng = np.random.default_rng()
at = rng.integers(1,3,10).cumsum()
l = rng.uniform(0,1)
### check against OP's impementation
np.isclose(loglikeli(at,l),LogLikelihood(at,l))
# True
This looks ugly to me but it fits a single line:
def LogLikelihood(arrival_times, _lambda):
return np.cumsum(list(map(lambda t: -(_lambda) + t*np.log(_lambda) - np.log(factorial(t)),arrival_times)))[-1]
I'm writing a python module to allow me to make unit-based calculations, and I'm trying to implement unit-sensitive integration of functions. My idea is basically to write a wrapper for scipy.integrate -- take the function and arguments given, including the limits of integration, nondimensionalize them all, pass to scipy.integrate.quad or some such thing, get the answer, and then multiply by the correct units at the end.
To accomplish this, I'm trying to figure out how to nondimensionalize an arbitrary function. I've implemented units so that if you divide two quantities with the same units, it returns an ordinary number, so my first thought was to just do this:
def nonDimensionalize(func, *args):
val = func(*args)
dimensions = val / val.value
return lambda args : (func(args) / dimensions)
This works like a charm to nondimensionalize the function's output, but I'm having a harder time with the input. What I really need is to return a function that takes in ordinary numbers, multiplies them by the correct SI dimensions (which I can figure out how to do), gets the output, divides it by the correct SI dimensions, and returns that value as an ordinary number. Then I can pass said function to scipy.integrate (or scipy.fslove, etc.). I tried the following:
def nonDimensionalize(func, *args):
argDims = []
for arg in args:
aDim = arg / arg.value
argDims.append(aDim)
nDargs = []
index = 0
for arg in args:
nDargs.append(arg / argDims[index])
index += 1
val = func(*args)
dimensions = val / val.value
return lambda args : (func(args) / dimensions)
but it doesn't work; it has exactly the same effect as my four-line function above. I'm not sure how to proceed at this point. Help?
What I really need is to return a function that takes in ordinary numbers, multiplies them by the correct SI dimensions (which I can figure out how to do), gets the output, divides it by the correct SI dimensions, and returns that value as an ordinary number.
I'm not sure I understand exactly how you dimensionalize/non-dimensionalize values, so just modify the corresponding functions as necessary, but you could do it like this:
def dimensionalizeValue(nonDimValue, dimensions):
return nonDimValue * dimensions
def nonDimensionalizeValue(dimValue):
dimensions = dimValue / dimValue.value
return dimValue / dimensions
def nonDimensionalizeFunction(function):
def wrapper(*nonDimArgs):
# Figure out the correct dimensions.
dimensions = None
# Transform/dimensionalize the arguments.
dimArgs = [dimensionalizeValue(arg, dimensions) for arg in nonDimArgs]
# Get output using dimensionalized arguments.
dimVal = function(*dimArgs)
# Non-dimensionalize the output.
nonDimVal = nonDimensionalizeValue(dimVal)
return nonDimVal
return wrapper
Is it possible to have a function where you specify a function within it as a variable.
For example, I have two functions which follow exactly the same process, except one calculaate the Average using np.mean and the other calculates the standard deviation where only np.std is different.
i.e.
it would be defined:
def calculate(function)
you would call one in the script like:
calculate(mean)
and the other
calculate(std)
I'm just wondering if it is possible to do something like this s it would greatly reduce my script length.
EDIT
Sorry I should have said that I wanted the mean and std to be the ones predefined in numpy. getattr() in Xu's answer worked
Use getattr to get the method object according to the method name:
def calculate(function):
func = getattr(np, function)
func(...) # do what you want
calculate("mean") # calculate the average number
calculate("std") # calculate the standard deviation
Yes this is possible.
Example:
def addIt(x):
return x+x
def test(fn):
for x in xrange(5):
print fn(x)
test(addIt)
Output:
0
2
4
6
8
def f1(t): return t * 2
def f2(t): return t * t
def comb(t,f): return f(t)
print comb(10, f1)
print comb(10, f2)
I would like to know how I can round a number in numpy to an upper or lower threshold which is function of predefined step size. Hopefully stated in a clearer way, if I have the number 123 and a step size equal to 50, I need to round 123 to the closest of either 150 or 100, in this case 100. I came out with function below which does the work but I wonder if there is a better, more succint, way to do this.
Thanks in advance,
Paolo
def getRoundedThresholdv1(a, MinClip):
import numpy as np
import math
digits = int(math.log10(MinClip))+1
b = np.round(a, -digits)
if b > a: # rounded-up
c = b - MinClip
UpLow = np.array((b,c))
else: # rounded-down
c = b + MinClip
UpLow = np.array((c,b))
AbsDelta = np.abs(a - UpLow)
return UpLow[AbsDelta.argmin()]
getRoundedThresholdv1(143, 50)
The solution by pb360 is much better, using the second argument of builtin round in python3.
I think you don't need numpy:
def getRoundedThresholdv1(a, MinClip):
return round(float(a) / MinClip) * MinClip
here a is a single number, if you want to vectorize this function you only need to replace round with np.round and float(a) with np.array(a, dtype=float)
Summary: This is a correct way to do it, the top answer has cases that do not work:
def round_step_size(quantity: Union[float, Decimal], step_size: Union[float, Decimal]) -> float:
"""Rounds a given quantity to a specific step size
:param quantity: required
:param step_size: required
:return: decimal
"""
precision: int = int(round(-math.log(step_size, 10), 0))
return float(round(quantity, precision))
My reputation is too low to post a comment on the top answer from Ruggero Turra and point out the issue. However it has cases which did not work for example:
def getRoundedThresholdv1(a, MinClip):
return round(float(a) / MinClip) * MinClip
getRoundedThresholdv1(quantity=13.200000000000001, step_size=0.0001)
Returns 13.200000000000001 right back whether using numpy or the standard library round. I didn't even find this by stress testing the function. It just came up when using it in production code and spat an error.
Note full credit for this answer comes out of an open source github repo which is not mine found here
Note that round() in Ruggero Turra his answer rounds to the nearest even integer. Meaning:
a= 0.5
round(a)
Out: 0
Which may not be what you expect.
In case you want 'classical' rounding, you can use this function, which supports both scalars and Numpy arrays:
import Numpy as np
def getRoundedThresholdv1(a, MinClip):
scaled = a/MinClip
return np.where(scaled % 1 >= 0.5, np.ceil(scaled), np.floor(scaled))*MinClip
Alternatively, you could use Numpy's method digitize. It requires you to define the array of your steps. digitize will kind of ceil your value to the next step. So in order to round in a 'classical' way we need an intermediate step.
You can use this:
import Numpy as np
def getRoundedThresholdv1(a, MinClipBins):
intermediate = (MinClipBins[1:] + MinClipBins[:-1])/2
return MinClipBins[np.discritize(a, intermediate)]
You can then call it like:
bins = np.array([0, 50, 100, 150])
test1 = getRoundedThresholdv1(74, bins)
test2 = getRoundedThresholdv1(125, bins)
Which gives:
test1 = 50
test2 = 150