Dealing with memory issues, I was wondering if there is some library in Python that allows to define a matrix M based on some matrix operations, i.e. in a simple case:
M = A.dot(B)
... or perhaps with some threshold t:
M = (A.dot(B.T) / C) > t
But not actually computes M, only computes the elements when needed, i.e. when I ask for M[i,j] or M[i,:] or M[[i:j],:] it computes only those values.
I guess tensorflow could be an answer, but I am not sure about that.
Related
I am new to python and I need your kindly help.
I have three matrices, in particular:
Matrix M (class of the matrix: scipy.sparse.csc.csc_matrix), dimensions: N x C;
Matrix G (class of the matrix: numpy.ndarray), dimensions: C x T;
Matrix L (class of the matrix: numpy.ndarray), dimensions: T x N.
Where: N = 10000, C = 1000, T = 20.
I would like to calculate, this score:
I tried by using two for loops , one for the i-index and one for c. Furthermore, I used a dot product for obtaining the last sum in the equation. But my implementation requires too much times for giving the result.
This is what I implemented:
score = 0.0
for i in range(N):
for c in range(C):
Mic = M[i,c]
score += np.outer(Mic,(np.dot(L[:,i],G[c,:])))
Is there a way to avoid the two for loops?
Thank you in advance!
Best
Try this score = np.einsum("ic,ti,ct->", M, L, G)
EDIT1
By the way, in your case, score = np.sum(np.diag(M # G # L)) (in PYTHON3 starting from version 3.5, you can use the semantics of the # operator for matmul function) is faster than einsum (especially in np.trace((L # M) # G ) due to efficient use of memory, maybe #hpaulj meant this in his comment). But einsum is easier to use for complex tensor products (to encode with einsum I used your math expression directly without thinking about optimization).
Generally, using for with numpy results in a dramatic slowdown in computation speed (think "vectorize your computations" in the case of numpy).
I have the two following algorithms. My analysis says that both of them are O(m^2+4^n) i.e they are equivalent for big numbers. Is this right?. Note that m and n are the bits numbers for x and y
def pow1(x, y):
if y == 0:
return 1
temp = x
while y > 1:
y -= 1
temp *= x
return temp
def pow2(x, y):
if y == 0:
return 1
temp = pow2(x, y//2)
if y & 1: return temp * temp * x
return temp * temp
Whether the divide-and-conquer algorithm is more efficient depends on a ton of factors. In Python it is more efficient.
Your analysis is right; assuming standard grade-school multiplication, divide-and-conquer does fewer, more expensive multiplications, and asymptotically that makes total runtime a wash (constant factors probably matter -- I'd still guess divide-and-conquer would be faster because the majority of the work is happening in optimized C rather than in Python loop overhead, but that's just a hunch, and it'd be hard to test given that Python doesn't use an elementary multiplication algorithm).
Before going further, note that big integer multiplication in Python is little-o of m^2. In particular, it uses karatsuba, which is around O(m^0.58 n) for an m-bit integer and an n-bit integer with m<=n.
The small terms using ordinary multiplication won't matter asymptotically, so focusing on the large ones we can replace the multiplication cost and find that your iterative algorithm is around O(4^n m^1.58) and your divide-and-conquer solution is around O(3^n m^1.58).
I need help with evaluating the expression. I just started it but at a loss for what next plus all the for loops I am using seem unnecessary. it has sum, products and combinations:
What I tried is incomplete and in my opinion not accurate. I tried several but all I can come up with for now. I don't have the denominator yet.
i = 10
N = 3.1
j = []
for x in range(1, i + 1):
for y in range(1, i):
for z in range(1, n - i):
l = N * y * z
j.append(l)
ll = sum(j)
Any help is appreciated. I want to be able to understand it so I can do more complex examples.
Here are some hints. If you try them and are still stuck, ask for more help.
First, you know that the expression involves "combinations," also called "binomial coefficients." So you will need a routine that calculates those. Here is a question with multiple answers on how to calculate these numbers. Briefly, you can use the scipy package or make your own routine that uses Python's factorial function or that uses iteration.
Next, you see that the expression involves sums and products and is written as a single expression. Python has a sum function which works on generator expressions (as well as list and set generators and other iterables). Your conversion from math to Python will be easier if you know how to set up such expressions. If you do not understand these generators/iterables and how to sum them, do research on this topic. This approach is not necessary, since you could use loops rather than the generators, but this approach will be easier. Study until you can understand an expression (including why the final number in the range has 1 added to it) such as
sum(N * f(x) for x in range(1, 5+1))
Last, your expression has products, but Python has no built-in way to take the product of an iterable. Here is such a function in Python 3.
from operator import mul
from functools import reduce
def prod(iterable):
"""Return the product of the numbers in an iterable."""
return reduce(mul, iterable, 1)
With all of that, your desired expression will look like this (you will need to finish the job by replacing the ... with something more useful):
numerator = sum(N * prod(... for y in range(1, 1+1)) for x in range(1, 5+1))
denominator = prod(y + N for y in range(1, 5+1))
result = numerator / denominator
Note that your final result is a function of N.
I was looking for a Python library function which computes multinomial coefficients.
I could not find any such function in any of the standard libraries.
For binomial coefficients (of which multinomial coefficients are a generalization) there is scipy.special.binom and also scipy.misc.comb. Also, numpy.random.multinomial draws samples from a multinomial distribution, and sympy.ntheory.multinomial.multinomial_coefficients returns a dictionary related to multinomial coefficients.
However, I could not find a multinomial coefficients function proper, which given a,b,...,z returns (a+b+...+z)!/(a! b! ... z!). Did I miss it? Is there a good reason there is none available?
I would be happy to contribute an efficient implementation to SciPy say. (I would have to figure out how to contribute, as I have never done this).
For background, they do come up when expanding (a+b+...+z)^n. Also, they count the ways of depositing a+b+...+z distinct objects into distinct bins such that the first bin contains a objects, etc. I need them occasionally for a Project Euler problem.
BTW, other languages do offer this function: Mathematica, MATLAB, Maple.
To partially answer my own question, here is my simple and fairly efficient implementation of the multinomial function:
def multinomial(lst):
res, i = 1, 1
for a in lst:
for j in range(1,a+1):
res *= i
res //= j
i += 1
return res
It seems from the comments so far that no efficient implementation of the function exists in any of the standard libraries.
Update (January 2020). As Don Hatch has pointed out in the comments, this can be further improved by looking for the largest argument (especially for the case that it dominates all others):
def multinomial(lst):
res, i = 1, sum(lst)
i0 = lst.index(max(lst))
for a in lst[:i0] + lst[i0+1:]:
for j in range(1,a+1):
res *= i
res //= j
i -= 1
return res
No, there is not a built-in multinomial library or function in Python.
Anyway this time math could help you. In fact a simple method for calculating the multinomial
keeping an eye on the performance is to rewrite it by using the characterization of the multinomial coefficient as a product of binomial coefficients:
where of course
Thanks to scipy.special.binom and the magic of recursion you can solve the problem like this:
from scipy.special import binom
def multinomial(params):
if len(params) == 1:
return 1
return binom(sum(params), params[-1]) * multinomial(params[:-1])
where params = [n1, n2, ..., nk].
Note: Splitting the multinomial as a product of binomial is also good to prevent overflow in general.
You wrote "sympy.ntheory.multinomial.multinomial_coefficients returns a dictionary related to multinomial coefficients", but it is not clear from that comment if you know how to extract the specific coefficients from that dictionary. Using the notation from the wikipedia link, the SymPy function gives you all the multinomial coefficients for the given m and n. If you only want a specific coefficient, just pull it out of the dictionary:
In [39]: from sympy import ntheory
In [40]: def sympy_multinomial(params):
...: m = len(params)
...: n = sum(params)
...: return ntheory.multinomial_coefficients(m, n)[tuple(params)]
...:
In [41]: sympy_multinomial([1, 2, 3])
Out[41]: 60
In [42]: sympy_multinomial([10, 20, 30])
Out[42]: 3553261127084984957001360
Busy Beaver gave an answer written in terms of scipy.special.binom. A potential problem with that implementation is that binom(n, k) returns a floating point value. If the coefficient is large enough, it will not be exact, so it would probably not help you with a Project Euler problem. Instead of binom, you can use scipy.special.comb, with the argument exact=True. This is Busy Beaver's function, modified to use comb:
In [46]: from scipy.special import comb
In [47]: def scipy_multinomial(params):
...: if len(params) == 1:
...: return 1
...: coeff = (comb(sum(params), params[-1], exact=True) *
...: scipy_multinomial(params[:-1]))
...: return coeff
...:
In [48]: scipy_multinomial([1, 2, 3])
Out[48]: 60
In [49]: scipy_multinomial([10, 20, 30])
Out[49]: 3553261127084984957001360
Here are two approaches, one using factorials, one using Stirling's approximation.
Using factorials
You can define a function to return multinomial coefficients in a single line using vectorised code (instead of for-loops) as follows:
from scipy.special import factorial
def multinomial_coeff(c):
return factorial(c.sum()) / factorial(c).prod()
(Where c is an np.ndarray containing the number of counts for each different object). Usage example:
>>> import numpy as np
>>> coeffs = np.array([2, 3, 4])
>>> multinomial_coeff(coeffs)
1260.0
In some cases this might be slower because you will be computing certain factorial expressions multiple times, in other cases this might be faster because I believe that numpy naturally parallelises vectorised code. Also this reduces the required number of lines in your program and is arguably more readable. If someone has the time to run speed tests on these different options then I'd be interested to see the results.
Using Stirling's approximation
In fact the logarithm of the multinomial coefficient is much faster to compute (based on Stirling's approximation) and allows computation of much larger coefficients:
from scipy.special import gammaln
def log_multinomial_coeff(c):
return gammaln(c.sum()+1) - gammaln(c+1).sum()
Usage example:
>>> import numpy as np
>>> coeffs = np.array([2, 3, 4])
>>> np.exp(log_multinomial_coeff(coeffs))
1259.999999999999
Your own answer (the accepted one) is quite good, and is especially simple. However, it does have one significant inefficiency: your outer loop for a in lst is executed one more time than is necessary. In the first pass through that loop, the values of i and j are always identical, so the multiplications and divisions do nothing. In your example multinomial([123, 134, 145]), there are 123 unneeded multiplications and divisions, adding time to the code.
I suggest finding the maximum value in the parameters and removing it, so those unneeded operations are not done. That adds complexity to the code but reduces the execution time, especially for short lists of large numbers. My code below executes multcoeff(123, 134, 145) in 111 microseconds, while your code takes 141 microseconds. That is not a large increase, but that could matter. So here is my code. This also takes individual values as parameters rather than a list, so that is another difference from your code.
def multcoeff(*args):
"""Return the multinomial coefficient
(n1 + n2 + ...)! / n1! / n2! / ..."""
if not args: # no parameters
return 1
# Find and store the index of the largest parameter so we can skip
# it (for efficiency)
skipndx = args.index(max(args))
newargs = args[:skipndx] + args[skipndx + 1:]
result = 1
num = args[skipndx] + 1 # a factor in the numerator
for n in newargs:
for den in range(1, n + 1): # a factor in the denominator
result = result * num // den
num += 1
return result
Starting Python 3.8,
since the standard library now includes the math.comb function (binomial coefficient)
and since the multinomial coefficient can be computed as a product of binomial coefficients
we can implement it without external libraries:
import math
def multinomial(*params):
return math.prod(math.comb(sum(params[:i]), x) for i, x in enumerate(params, 1))
multinomial(10, 20, 30) # 3553261127084984957001360
thankyou for your help.
i am very new to programming, but have decided to learn Python. i am doing a program that can check if a number is a prime. this is mathematically done by checking if (x-1)^p -(x^p-1) is devisible by p (Capable of being divided, with no remainder) then p is a prime.
However i have run into trouble. this is my code so far:
from sympy import *
x=symbols('x')
p=11
f=(pow(x - 1, p)) - (pow(x, p) - 1) # (x-1)^p -(x^p-1)
f1=expand(f)
>>> -11*x**10 + 55*x**9 - 165*x**8 + 330*x**7 - 462*x**6 + 462*x**5 - 330*x**4 + 165*x**3 - 55*x**2 + 11*x
f2= f1/p
>>> -x**10 + 5*x**9 - 15*x**8 + 30*x**7 - 42*x**6 + 42*x**5 - 30*x**4 + 15*x**3 - 5*x**2 + x
to tell if the number p is a prime i need to check if the coefficients of the polynomium is divisible by p. so i have to check if the coefficients of f2 is whole numbers or real numbers.
this is what i would like to make a program that can check: https://www.youtube.com/watch?v=HvMSRWTE2mI
i have tried making it into int but it still shows fractions like 1/2 and 3/7. i wish that it will only show whole numbers.
how do i make it so?
What the method effective does is expand the polynomial and drop the first (x^p) and last coefficients (x^0). Then you have to iterate through the rest and check for divisibility. Since a polynomial expansion of power p produces p+1 terms (from 0 to p), we want to collect p-2 terms (from 1 to p-1). This is all summed up in the following code.
from sympy.abc import x
def is_prime_sympy(p):
poly = pow((x - 1), p).expand()
return not any(poly.coeff(x, i) % p for i in xrange(1, p))
This works, but the higher the number you input, e.g. 1013, the longer you'll notice it takes. Sympy is slow because internally it stores all expressions as some classes and all multiplications and additions take a long time. We can simply generate the coefficients using Pascal's triangle. For the polynomial (x - 1)^p, the coefficients are supposed to change sign, but we don't care about that. We just want the raw numbers. Credits to Copperfield for pointing out you only need half of the coefficients because of symmetry.
import math
def combination(n, r):
return math.factorial(n) // (math.factorial(r) * math.factorial(n - r))
def pascals_triangle(row):
# only generate half of the coefficients because of symmetry
return (combination(row, term) for term in xrange(1, (row+1)//2))
def is_prime_math(p):
return not any(c % p for c in pascals_triangle(p))
We can time both methods now to see which one is faster.
import time
def benchmark(p):
t0 = time.time()
is_prime_math(p)
t1 = time.time()
is_prime_sympy(p)
t2 = time.time()
print 'Math: %.3f, Sympy: %.3f' % (t1-t0, t2-t1)
And some tests.
>>> benchmark(512)
Math: 0.001, Sympy: 0.241
>>> benchmark(2003)
Math: 3.852, Sympy: 41.695
We know that 512 is not a prime. The very second term we have to check for divisibility fails the test, so most of the time is actually spent generating the coefficients. Python lazily computes them while sympy must expand the whole polynomial out before we can start collecting them. This shows as that a generator approach is preferable.
2003 is prime and here we notice sympy performs 10 times as slowly. In fact, all of the time is spent generating the coefficients, as iterating over 2000 elements for a modulo operation takes no time. So if there are any further optimisations, that's where one should focus.
numpy.poly1d()
Numpy has a class that can manipulate polynomial coefficients and it's exactly what we want. It even works relatively fast for powers up to 50k. However, in its original implementation it's useless to us. That is because the coefficients are stored as signed int32, which means very quickly they will overflow and our modulo operations will be thrown off. In fact, it'll fail for even 37.
But it's fast, though, right? Maybe if we can hack it so it accepts infite precision integers... Maybe it's possible, maybe it isn't. But even if it is, we have to consider that maybe the reason why it is so fast is exactly because it uses a fixed precision type under the hood.
For the sake of curiosity, this is what the implementation would look like if it were any useful.
import numpy as np
def is_prime_numpy(p):
poly = pow(np.poly1d([1, -1]), p)
return not any(c % p for c in poly.coeffs[1:-1])
And for the curious ones, the source code is located in ...\numpy\lib\polynomial.py.
I am not sure if I understood what you mean, but for checking if a number is an integer or float you can use isinstance:
>>> isinstance(1/2.0, float)
>>> True
>>> isinstance(1/2, float)
>>> False