How to find the nth derivative given the first derivative with SymPy? - python

Given some f and the differential equation x'(t) = f(x(t)), how do I compute x(n)(t) in terms of x(t)?
For example, given f(x(t)) = sin(x(t)),
I want to obtain x(3)(t) = (cos(x(t))2 − sin(x(t))2) sin(x(t)).
So far I've tried
>>> from sympy import diff, sin
>>> from sympy.abc import x, t
>>> diff(sin(x(t)), t, 2)
which gives me
-sin(x(t))*Derivative(x(t), t)**2 + cos(x(t))*Derivative(x(t), t, t)
but I'm not sure how to tell SymPy what Derivative(x(t), t) is and have it figure out Derivative(x(t), t, t), etc. automatically.
Answer:
Here's my final solution based on the answers I received below:
def diff(x_derivs_known, t, k, simplify=False):
try: n = len(x_derivs_known)
except TypeError: n = None
if n is None:
result = sympy.diff(x_derivs_known, t, k)
if simplify: result = result.simplify()
elif k < n:
result = x_derivs_known[k]
else:
i = n - 1
result = x_derivs_known[i]
while i < k:
result = result.diff(t)
j = len(x_derivs_known)
x0 = None
while j > 1:
j -= 1
result = result.subs(sympy.Derivative(x_derivs_known[0], t, j), x_derivs_known[j])
i += 1
if simplify: result = result.simplify()
return result
Example:
>>> diff((x(t), sympy.sin(x(t))), t, 3, True)
sin(x(t))*cos(2*x(t))

Here is one approach that returns a list of all derivatives up to n-th order
import sympy as sp
x = sp.Function('x')
t = sp.symbols('t')
f = lambda x: x**2 #sp.exp, sp.sin
n = 4 #3, 4, 5
deriv_list = [x(t), f(x(t))] # list of derivatives [x(t), x'(t), x''(t),...]
for i in range(1,n):
df_i = deriv_list[-1].diff(t).replace(sp.Derivative,lambda *args: f(x(t)))
deriv_list.append(df_i)
print(deriv_list)
[x(t), x(t)**2, 2*x(t)**3, 6*x(t)**4, 24*x(t)**5]
With f=sp.sin it returns
[x(t), sin(x(t)), sin(x(t))*cos(x(t)), -sin(x(t))**3 + sin(x(t))*cos(x(t))**2, -5*sin(x(t))**3*cos(x(t)) + sin(x(t))*cos(x(t))**3]
EDIT: A recursive function for the computation of the n-th derivative:
def der_xt(f, n):
if n==1:
return f(x(t))
else:
return der_xt(f,n-1).diff(t).replace(sp.Derivative,lambda *args: f(x(t)))
print(der_xt(sp.sin,3))
-sin(x(t))**3 + sin(x(t))*cos(x(t))**2

Declare f and use substitution:
>>> f = diff(x(t))
>>> diff(sin(x(t)), t, 2).subs(f, sin(x(t)))
-sin(x(t))**3 + cos(x(t))*Derivative(sin(x(t)), t)

Related

python is giving ZERODIVISIONERROR. can anyone fix this

from math import sqrt
S1 = [1,0,0,0,1,0,0,2]
S3 = [0,1,1,2,0,1,2,0]
sum = 0
sums1 = 0
sums3 = 0
for i, j in zip(S1,S3):
sums1 += i*i
sums3 += j*j
sum += i*j
cosine_similarity = sum / ((sqrt(sums1)) * (sqrt(sums3)))
print (cosine_similarity)
plz how can I remove this error from code. I want to find cosine similarity of vectors.
The error is due to the indentation level of the last two lines (as mentioned by in the comments by j1-lee):
# ...
sum += i*j
# deindentation
cosine_similarity = sum / ((sqrt(sums1)) * (sqrt(sums3)))
print (cosine_similarity)
Here another implementation by decomposing the definition of cosine similarity into smaller operations:
def scalar_product(a, b):
return sum(a_i*b_i for a_i, b_i in zip(a, b))
def norm(a):
return sum(a_i**2 for a_i in a )**.5
def cosine_similarity(a, b):
return scalar_product(a, b) / (norm(a)*norm(b))
S1 = [1,0,0,0,1,0,0,2]
S3 = [0,1,1,2,0,1,2,0]
cs = cosine_similarity(S1, S3)
print(cs)
# 0.0 # orthogonality
cs = cosine_similarity(S1, S1)
print(cs)
# 1.0...# parallelity

get a specific value from a function when function is given to as an argument?

I have a function that needs to give a specific value to an other function depending on the current iteration in a for loop. the get_change_vector returns a tuple of 4 elements depending on the iteration a want to get a specific value from it.
def get_change_vector(x, r,):
Xp = r*x + x**3 - x**5
Xp2 = r*x + x**2 - x**3
Xp3 = r*x -x/(1+x**2)
Xp4 = x-r+(2-x)/(1+x**2)
return (Xp, Xp2, Xp3, Xp4)
def main ():
for n in range (4):
Xs = [i for i in np.arange(-2, 2, 0.001)]
rs = [funcR(x)[n] for x in Xs]
i1, i2 = 0 if n < 2 else 1, 0 if n % 2 ==0 else 1
ax = axes[i1] [i2]
for x, r in zip (Xs, rs):
clr = 'g' if is_stable(get_change_vector,x, r) else 'r'
ax.plot(r, x, 'o', color=clr, markersize=0.1)
I tried to give a specific index to get_change_vector but it returns an error saying function is not subcriptable.
I tried making a variable of the needed function
function = get_change_vector(x,r)[n]
but this returned an error this is because of the what the is_stable when it reaches func(*args)
'numpy.float64' object is not callable
def get_derivative(func, n, i):
'''
Wrapper around our change_vector function
so derivative can handle multiple parameters
'''
def wraps(n):
args = i, n
return func(*args)
return derivative(wraps, n, dx=1e-6)
def is_stable(func, n , i ):
return get_derivative(func, n, i) < 0

SymPy division doesn't cancel what it can when using symbolic denominator

I have some code using sympy.solvers.solve() that basically leads to the following:
>>> k, u, p, q = sympy.symbols('k u p q')
>>> solution = (k*u + p*u + q)/(k+p)
>>> solution.simplify()
(k*u + p*u + q)/(k + p)
Now, my problem is that it is not simplified enough/correctly. It should be giving the following:
q/(k + p) + u
From the original equation q = (k + p)*(m - u) this is more obvious (when you solve it manually, which my students will be doing).
I have tried many combinations of sol.simplify(), sol.cancel(), sol.collect(u) but I haven't found what can make it work (btw, the collect I can't really use, as I won't know beforehand which symbol will have to be collected, unless you can make something that collects all the symbols in the solution).
I am working with BookWidgets, which automatically corrects the answers that students give, which is why it's important that I have an output which will match what the students will enter.
First things first:
there is no "standard" output to a simplification step.
if the output of a simplification step doesn't suit your need, you might want to manipulate the expression with simplify, expand, collect, ...
two or more sequences of operations (simplify, expand, collect, ...) might lead to different results, or might lead to the same results. It depends on the expression being manipulated.
Let me show you with your example:
k, u, p, q = symbols('k u p q')
solution = (k*u + p*u + q)/(k+p)
# out1: (k*u + p*u + q)/(k + p)
solution = solution.collect(u)
# out2: (q + u*(k + p))/(k + p)
num, den = fraction(solution)
# use the linearity of addition
solution = Add(*[t / den for t in num.args])
# out3: q/(k + p) + u
In the above code, out1, out2, out3 are mathematically equivalent.
Instead of spending time to simplify outputs, I would test for mathematical equivalence with the equals method. For example:
verified_solution = (k*u + p*u + q)/(k+p)
num, den = fraction(verified_solution)
first_studend_sol = Add(*[t / den for t in num.args])
print(verified_solution.equals(first_studend_sol))
# True
second_student_solution = q/(k + p) + u
print(verified_solution.equals(second_student_solution))
# True
third_student_solution = q/(k + p) + u + 2
print(verified_solution.equals(third_student_solution))
# False
It looks like you want the expression in quotient/remainder form:
>>> n, d = solution.as_numer_denom()
>>> div(n, d)
(u, q)
>>> _[0] + _[1]/d
q/(k + p) + u
But that SymPy function may give unexpected results when the symbol names are changed as described here. Here is an alternative (for which I did not find and existing function in SymPy) that attempts more a synthetic division result:
def sdiv(p, q):
"""return w, r if p = w*q + r else 0, p
Examples
========
>>> from sympy.abc import x, y
>>> sdiv(x, x)
(1, 0)
>>> sdiv(x, y)
(0, x)
>>> sdiv(2*x + 3, x)
(2, 3)
>>> a, b=x + 2*y + z, x + y
>>> sdiv(a, b)
(1, y + z)
>>> sdiv(a, -b)
(-1, y + z)
>>> sdiv(-a, -b)
(1, -y - z)
>>> sdiv(-a, b)
(-1, -y - z)
"""
from sympy.core.function import _mexpand
P, Q = map(lambda i: _mexpand(i, recursive=True), (p, q))
r, wq = P.as_independent(*Q.free_symbols, as_Add=True)
# quick exit if no full division possible
if Q.is_Add and not wq.is_Add:
return S.Zero, P
# check multiplicative cancellation
w, bot = fraction((wq/Q).cancel())
if bot != 1 and wq.is_Add and Q.is_Add:
# try maximal additive extraction
s1 = s2 = 1
if signsimp(Q, evaluate=False).is_Mul:
wq = -wq
r = -r
Q = -Q
s1 = -1
if signsimp(wq, evaluate=False).is_Mul:
wq = -wq
s2 = -1
xa = wq.extract_additively(Q)
if xa:
was = wq.as_coefficients_dict()
now = xa.as_coefficients_dict()
dif = {k: was[k] - now.get(k, 0) for k in was}
n = min(was[k]//dif[k] for k in dif)
dr = wq - n*Q
w = s2*n
r = s1*(r + s2*dr)
assert _mexpand(p - (w*q + r)) == 0
bot = 1
return (w, r) if bot == 1 else (S.Zero, p)
The more general suggestion from Davide_sd about using equals is good if you are only testing the equality of two expressions in different forms.

Modular multiplicative inverse function in Python

Does some standard Python module contain a function to compute modular multiplicative inverse of a number, i.e. a number y = invmod(x, p) such that x*y == 1 (mod p)? Google doesn't seem to give any good hints on this.
Of course, one can come up with home-brewed 10-liner of extended Euclidean algorithm, but why reinvent the wheel.
For example, Java's BigInteger has modInverse method. Doesn't Python have something similar?
Python 3.8+
y = pow(x, -1, p)
Python 3.7 and earlier
Maybe someone will find this useful (from wikibooks):
def egcd(a, b):
if a == 0:
return (b, 0, 1)
else:
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)
def modinv(a, m):
g, x, y = egcd(a, m)
if g != 1:
raise Exception('modular inverse does not exist')
else:
return x % m
If your modulus is prime (you call it p) then you may simply compute:
y = x**(p-2) mod p # Pseudocode
Or in Python proper:
y = pow(x, p-2, p)
Here is someone who has implemented some number theory capabilities in Python: http://www.math.umbc.edu/~campbell/Computers/Python/numbthy.html
Here is an example done at the prompt:
m = 1000000007
x = 1234567
y = pow(x,m-2,m)
y
989145189L
x*y
1221166008548163L
x*y % m
1L
You might also want to look at the gmpy module. It is an interface between Python and the GMP multiple-precision library. gmpy provides an invert function that does exactly what you need:
>>> import gmpy
>>> gmpy.invert(1234567, 1000000007)
mpz(989145189)
Updated answer
As noted by #hyh , the gmpy.invert() returns 0 if the inverse does not exist. That matches the behavior of GMP's mpz_invert() function. gmpy.divm(a, b, m) provides a general solution to a=bx (mod m).
>>> gmpy.divm(1, 1234567, 1000000007)
mpz(989145189)
>>> gmpy.divm(1, 0, 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 8)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: not invertible
>>> gmpy.divm(1, 4, 9)
mpz(7)
divm() will return a solution when gcd(b,m) == 1 and raises an exception when the multiplicative inverse does not exist.
Disclaimer: I'm the current maintainer of the gmpy library.
Updated answer 2
gmpy2 now properly raises an exception when the inverse does not exists:
>>> import gmpy2
>>> gmpy2.invert(0,5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: invert() no inverse exists
As of 3.8 pythons pow() function can take a modulus and a negative integer. See here. Their case for how to use it is
>>> pow(38, -1, 97)
23
>>> 23 * 38 % 97 == 1
True
Here is a one-liner for CodeFights; it is one of the shortest solutions:
MMI = lambda A, n,s=1,t=0,N=0: (n < 2 and t%N or MMI(n, A%n, t, s-A//n*t, N or n),-1)[n<1]
It will return -1 if A has no multiplicative inverse in n.
Usage:
MMI(23, 99) # returns 56
MMI(18, 24) # return -1
The solution uses the Extended Euclidean Algorithm.
Sympy, a python module for symbolic mathematics, has a built-in modular inverse function if you don't want to implement your own (or if you're using Sympy already):
from sympy import mod_inverse
mod_inverse(11, 35) # returns 16
mod_inverse(15, 35) # raises ValueError: 'inverse of 15 (mod 35) does not exist'
This doesn't seem to be documented on the Sympy website, but here's the docstring: Sympy mod_inverse docstring on Github
Here is a concise 1-liner that does it, without using any external libraries.
# Given 0<a<b, returns the unique c such that 0<c<b and a*c == gcd(a,b) (mod b).
# In particular, if a,b are relatively prime, returns the inverse of a modulo b.
def invmod(a,b): return 0 if a==0 else 1 if b%a==0 else b - invmod(b%a,a)*b//a
Note that this is really just egcd, streamlined to return only the single coefficient of interest.
I try different solutions from this thread and in the end I use this one:
def egcd(a, b):
lastremainder, remainder = abs(a), abs(b)
x, lastx, y, lasty = 0, 1, 1, 0
while remainder:
lastremainder, (quotient, remainder) = remainder, divmod(lastremainder, remainder)
x, lastx = lastx - quotient*x, x
y, lasty = lasty - quotient*y, y
return lastremainder, lastx * (-1 if a < 0 else 1), lasty * (-1 if b < 0 else 1)
def modinv(a, m):
g, x, y = self.egcd(a, m)
if g != 1:
raise ValueError('modinv for {} does not exist'.format(a))
return x % m
Modular_inverse in Python
Here is my code, it might be sloppy but it seems to work for me anyway.
# a is the number you want the inverse for
# b is the modulus
def mod_inverse(a, b):
r = -1
B = b
A = a
eq_set = []
full_set = []
mod_set = []
#euclid's algorithm
while r!=1 and r!=0:
r = b%a
q = b//a
eq_set = [r, b, a, q*-1]
b = a
a = r
full_set.append(eq_set)
for i in range(0, 4):
mod_set.append(full_set[-1][i])
mod_set.insert(2, 1)
counter = 0
#extended euclid's algorithm
for i in range(1, len(full_set)):
if counter%2 == 0:
mod_set[2] = full_set[-1*(i+1)][3]*mod_set[4]+mod_set[2]
mod_set[3] = full_set[-1*(i+1)][1]
elif counter%2 != 0:
mod_set[4] = full_set[-1*(i+1)][3]*mod_set[2]+mod_set[4]
mod_set[1] = full_set[-1*(i+1)][1]
counter += 1
if mod_set[3] == B:
return mod_set[2]%B
return mod_set[4]%B
The code above will not run in python3 and is less efficient compared to the GCD variants. However, this code is very transparent. It triggered me to create a more compact version:
def imod(a, n):
c = 1
while (c % a > 0):
c += n
return c // a
from the cpython implementation source code:
def invmod(a, n):
b, c = 1, 0
while n:
q, r = divmod(a, n)
a, b, c, n = n, c, b - q*c, r
# at this point a is the gcd of the original inputs
if a == 1:
return b
raise ValueError("Not invertible")
according to the comment above this code, it can return small negative values, so you could potentially check if negative and add n when negative before returning b.
To figure out the modular multiplicative inverse I recommend using the Extended Euclidean Algorithm like this:
def multiplicative_inverse(a, b):
origA = a
X = 0
prevX = 1
Y = 1
prevY = 0
while b != 0:
temp = b
quotient = a/b
b = a%b
a = temp
temp = X
a = prevX - quotient * X
prevX = temp
temp = Y
Y = prevY - quotient * Y
prevY = temp
return origA + prevY
Well, here's a function in C which you can easily convert to python. In the below c function extended euclidian algorithm is used to calculate inverse mod.
int imod(int a,int n){
int c,i=1;
while(1){
c = n * i + 1;
if(c%a==0){
c = c/a;
break;
}
i++;
}
return c;}
Translates to Python Function
def imod(a,n):
i=1
while True:
c = n * i + 1;
if(c%a==0):
c = c/a
break;
i = i+1
return c
Reference to the above C function is taken from the following link C program to find Modular Multiplicative Inverse of two Relatively Prime Numbers

hash functions family generator in python

I am looking for a hash functions family generator that could generate a family of hash functions given a set of parameters. I haven't found any such generator so far.
Is there a way to do that with the hashlib package ?
For example I'd like to do something like :
h1 = hash_function(1)
h2 = hash_function(2)
...
and h1 and h2 would be different hash functions.
For those of you who might know about it, I am trying to implement a min-hashing algorithm on a very large dataset.
Basically, I have a very large set of features (100 millions to 1 billion) for a given document, and I need to create 1000 to 10000 different random permutations for this set of features.
I do NOT want to build the random permutations explicitly so the technique I would like to use in the following :
generate a hash function h and consider that for two indices r and s
r appears before s in the permutation if h(r) < h(s) and do that for 100 to 1000 different hash functions.
Are there any known libraries that I might have missed ? Or any standard way of generating families of hash functions with python that you might be aware of ?
I'd just do something like (if you don't need thread-safety -- not hard to alter if you DO need thread safety -- and assuming a 32-bit Python version):
import random
_memomask = {}
def hash_function(n):
mask = _memomask.get(n)
if mask is None:
random.seed(n)
mask = _memomask[n] = random.getrandbits(32)
def myhash(x):
return hash(x) ^ mask
return myhash
As mentioned above, you can use universal hashing for minhash.
For example:
import random
def minhash():
d1 = set(random.randint(0, 2000) for _ in range(1000))
d2 = set(random.randint(0, 2000) for _ in range(1000))
jacc_sim = len(d1.intersection(d2)) / len(d1.union(d2))
print("jaccard similarity: {}".format(jacc_sim))
N_HASHES = 200
hash_funcs = []
for i in range(N_HASHES):
hash_funcs.append(universal_hashing())
m1 = [min([h(e) for e in d1]) for h in hash_funcs]
m2 = [min([h(e) for e in d2]) for h in hash_funcs]
minhash_sim = sum(int(m1[i] == m2[i]) for i in range(N_HASHES)) / N_HASHES
print("min-hash similarity: {}".format(minhash_sim))
def universal_hashing():
def rand_prime():
while True:
p = random.randrange(2 ** 32, 2 ** 34, 2)
if all(p % n != 0 for n in range(3, int((p ** 0.5) + 1), 2)):
return p
m = 2 ** 32 - 1
p = rand_prime()
a = random.randint(0, p)
if a % 2 == 0:
a += 1
b = random.randint(0, p)
def h(x):
return ((a * x + b) % p) % m
return h
Reference
#alex's answer is great and concise, but the hash functions it generates are not "very different from each other".
Let's look at the Pearson correlation between 10000 samples of 10000 hashes that put the results in 100 bins
%%time # 1min 14s
n=10000
hashes = [hash_function(i) for i in range(n)]
median_pvalue(hashes, n=n)
# 1.1614081043690444e-06
I.e. the median p_value is 1e-06 which is far from random. Here's an example if it were truly random :
%%time # 4min 15s
hashes = [lambda _ : random.randint(0,100) for _ in range(n)]
median_pvalue(hashes, n=n)
# 0.4979718236429698
Using Carter and Wegman method you could get:
%%time # 1min 43s
hashes = HashFamily(100).draw_hashes(n)
median_pvalue(hashes, n=n)
# 0.841929288037321
Code to reproduce :
from scipy.stats.stats import pearsonr
import numpy as np
import random
_memomask = {}
def hash_function(n):
mask = _memomask.get(n)
if mask is None:
random.seed(n)
mask = _memomask[n] = random.getrandbits(32)
def myhash(x):
return hash(x) ^ mask
return myhash
class HashFamily():
r"""Universal hash family as proposed by Carter and Wegman.
.. math::
\begin{array}{ll}
h_{{a,b}}(x)=((ax+b)~{\bmod ~}p)~{\bmod ~}m \ \mid p > m\\
\end{array}
Args:
bins (int): Number of bins to hash to. Better if a prime number.
moduler (int,optional): Temporary hashing. Has to be a prime number.
"""
def __init__(self, bins, moduler=None):
if moduler and moduler <= bins:
raise ValueError("p (moduler) should be >> m (buckets)")
self.bins = bins
self.moduler = moduler if moduler else self._next_prime(np.random.randint(self.bins + 1, 2**32))
# do not allow same a and b, as it could mean shifted hashes
self.sampled_a = set()
self.sampled_b = set()
def _is_prime(self, x):
"""Naive is prime test."""
for i in range(2, int(np.sqrt(x))):
if x % i == 0:
return False
return True
def _next_prime(self, n):
"""Naively gets the next prime larger than n."""
while not self._is_prime(n):
n += 1
return n
def draw_hash(self, a=None, b=None):
"""Draws a single hash function from the family."""
if a is None:
while a is None or a in self.sampled_a:
a = np.random.randint(1, self.moduler - 1)
assert len(self.sampled_a) < self.moduler - 2, "please give a bigger moduler"
self.sampled_a.add(a)
if b is None:
while b is None or b in self.sampled_b:
b = np.random.randint(0, self.moduler - 1)
assert len(self.sampled_b) < self.moduler - 1, "please give a bigger moduler"
self.sampled_b.add(b)
return lambda x: ((a * x + b) % self.moduler) % self.bins
def draw_hashes(self, n, **kwargs):
"""Draws n hash function from the family."""
return [self.draw_hash() for i in range(n)]
def median_pvalue(hashes, buckets=100, n=1000):
p_values = []
for j in range(n-1):
a = [hashes[j](i) % buckets for i in range(n)]
b = [hashes[j+1](i) % buckets for i in range(n)]
p_values.append(pearsonr(a,b)[1])
return np.median(p_values)
Note that my implementation is of Carter and Wegman is very naive (e.g. generation of prime numbers). It could be made shorter and quicker.
You should consider using universal hashing. My answer and code can be found here: https://stackoverflow.com/a/25104050/207661
The universal hash family is a set of hash functions H of size m, such that any two (district) inputs collide with probability at most 1/m when the hash function h is drawn randomly from set H.
Based on the formulation in Wikipedia, use can use the following code:
import random
def is_prime(n):
if n==2 or n==3: return True
if n%2==0 or n<2: return False
for i in range(3, int(n**0.5)+1, 2):
if n%i==0:
return False
return True
# universal hash functions
class UniversalHashFamily:
def __init__(self, number_of_hash_functions, number_of_buckets, min_value_for_prime_number=2, bucket_value_offset=0):
self.number_of_buckets = number_of_buckets
self.bucket_value_offset = bucket_value_offset
primes = []
number_to_check = min_value_for_prime_number
while len(primes) < number_of_hash_functions:
if is_prime(number_to_check):
primes.append(number_to_check)
number_to_check += random.randint(1, 1000)
self.hash_function_attrs = []
for i in range(number_of_hash_functions):
p = primes[i]
a = random.randint(1, p)
b = random.randint(0, p)
self.hash_function_attrs.append((a, b, p))
def __call__(self, function_index, input_integer):
a, b, p = self.hash_function_attrs[function_index]
return (((a*input_integer + b)%p)%self.number_of_buckets) + self.bucket_value_offset
Example usage:
We can create a hash family consists of 20 hash functions, each one map the input to 100 buckets.
hash_family = UniversalHashFamily(20, 100)
And get the hashed values like:
input_integer = 1234567890 # sample input
hash_family(0, input_integer) # the output of the first hash function, i.e. h0(input_integer)
hash_family(1, input_integer) # the output of the second hash function, i.e. h1(input_integer)
# ...
hash_family(19, input_integer) # the output of the last hash function, i.e. h19(input_integer)
If you are interested in the universal hash family for string inputs, you can use the following code. But please note that this code may not be the optimized solution for string hashing.
class UniversalStringHashFamily:
def __init__(self, number_of_hash_functions, number_of_buckets, min_value_for_prime_number=2, bucket_value_offset=0):
self.number_of_buckets = number_of_buckets
self.bucket_value_offset = bucket_value_offset
primes = []
number_to_check = max(min_value_for_prime_number, number_of_buckets)
while len(primes) < number_of_hash_functions:
if is_prime(number_to_check):
primes.append(number_to_check)
number_to_check += random.randint(1, 1000)
self.hash_function_attrs = []
for i in range(number_of_hash_functions):
p = primes[i]
a = random.randint(1, p)
a2 = random.randint(1, p)
b = random.randint(0, p)
self.hash_function_attrs.append((a, b, p, a2))
def hash_int(self, int_to_hash, a, b, p):
return (((a*int_to_hash + b)%p)%self.number_of_buckets) + self.bucket_value_offset
def hash_str(self, str_to_hash, a, b, p, a2):
str_to_hash = "1" + str_to_hash # this will ensure that universality is not affected, see wikipedia for more detail
l = len(str_to_hash)-1
int_to_hash = 0
for i in range(l+1):
int_to_hash += ord(str_to_hash[i]) * (a2 ** (l-i))
int_to_hash = int_to_hash % p
return self.hash_int(int_to_hash, a, b, p)
def __call__(self, function_index, str_to_hash):
a, b, p, a2 = self.hash_function_attrs[function_index]
return self.hash_str(str_to_hash, a, b, p, a2)

Categories

Resources