Can you use numpy to calculate the magnitude of difference? - python

Say I have two matrices, A and B.
I want to calculate the magnitude of difference between the two matrices. That is, without using iteration.
Here's what I have so far:
def mymagn(A, B):
i = 0
j = 0
x = np.shape(A)
y = np.shape(B)
while i < x[1]:
while j < y[1]:
np.sum((A[i][j] - B[i][j]) * (A[i][j] - B[i][j]))
j += 1
i += 1
As I understand it, generally the value should be small with two similar matrices but I'm not getting that, can anyone help? Is there any way to get rid of the need to iterate?

This should do it:
def mymagn(A, B):
return np.sum((B - A) ** 2)
For arrays/matrices of the same size, addition/subtraction are element-wise (like in MATLAB). Exponentiation with scalar exponent is also element-wise. And np.sum will by default sum all elements (along all axes).

Related

Summation function for large integers

I am an amateur Python coder trying to find an efficient solution for Project Euler Digit Sum problem. My code returns the correct result but is is inefficient for large integers such as 1234567890123456789. I know that the inefficiency lies in my sigma_sum function where there is a 'for' loop.
I have tried various alternate solutions such as loading the values into an numpy array but ran out of memory with large integers with this approach. I am eager to learn more efficient solutions.
import math
def sumOfDigits(n: int) :
digitSum = 0
if n < 10: return n
else:
for i in str(n): digitSum += int(i)
return digitSum
def sigma_sum(start, end, expression):
return math.fsum(expression(i) for i in range(start, end))
def theArguement(n: int):
return n / sumOfDigits(n)
def F(N: int) -> float:
"""
>>> F(10)
19
>>> F(123)
1.187764610390e+03
>>> F(12345)
4.855801996238e+06
"""
s = sigma_sum(1, N + 1, theArguement)
if s.is_integer():
print("{:0.0f}".format(s))
else:
print("{:.12e}".format(s))
print(F(123))
if __name__ == '__main__':
import doctest
doctest.testmod()
Try solving a different problem.
Define G(n) to be a dictionary. Its keys are integers representing digit sums and its values are the sum of all positive integers < n whose digit sum is the key. So
F(n) = sum(v / k for k, v in G(n + 1).items())
[Using < instead of ≤ simplifies the calculations below]
Given the value of G(a) for any value, how would you calculate G(10 * a)?
This gives you a nice easy way to calculate G(x) for any value of x. Calculate G(x // 10) recursively, use that to calculate the value G((x // 10) * 10), and then manually add the few remaining elements in the range (x // 10) * 10 ≤ i < x.
Getting from G(a) to G(10 * a) is mildly tricky, but not overly so. If your code is correct, you can use calculating G(12346) as a test case to see if you get the right answer for F(12345).

Manual fft not giving me same results as fft

import numpy as np
import matplotlib.pyplot as pp
curve = np.genfromtxt('C:\Users\latel\Desktop\kool\Neuro\prax2\data\curve.csv',dtype = 'float', delimiter = ',')
curve_abs2 = np.empty_like(curve)
z = 1j
N = len(curve)
for i in range(0,N-1):
curve_abs2[i] =0
for k in range(0,N-1):
curve_abs2[i] += (curve[i]*np.exp((-1)*z*(np.pi)*i*((k-1)/N)))
for i in range(0,N):
curve_abs2[i] = abs(curve_abs2[i])/(2*len(curve_abs2))
#curve_abs = (np.abs(np.fft.fft(curve)))
#pp.plot(curve_abs)
pp.plot(curve_abs2)
pp.show()
The code behind # gives me 3 values. But this is just ... different
Wrong ^^ this code: http://www.upload.ee/image/3922681/Ex5problem.png
Correct using numpy.fft.fft(): http://www.upload.ee/image/3922682/Ex5numpyformulas.png
There are several problems:
You are assigning complex values to the elements of curve_abs2, so it should be declared to be complex, e.g. curve_abs2 = np.empty_like(curve, dtype=np.complex128). (And I would recommend using the name, say, curve_fft instead of curve_abs2.)
In python, range(low, high) gives the sequence [low, low + 1, ..., high - 2, high - 1], so instead of range(0, N - 1), you must use range(0, N) (which can be simplified to range(N), if you want).
You are missing a factor of 2 in your formula. You could fix this by using z = 2j.
In the expression that is being summed in the inner loop, you are indexing curve as curve[i], but this should be curve[k].
Also in that expression, you don't need to subtract 1 from k, because the k loop ranges from 0 to N - 1.
Because k and N are integers and you are using Python 2.7, the division in the expression (k-1)/N will be integer division, and you'll get 0 for all k. To fix this and the previous problem, you can change that term to k / float(N).
If you fix those issues, when the first double loop finishes, the array curve_abs2 (now a complex array) should match the result of np.fft.fft(curve). It won't be exactly the same, but the differences should be very small.
You could eliminate that double loop altogether using numpy vectorized calculations, but that is a topic for another question.

Trapezoid Rule in Python

I am trying to write a program using Python v. 2.7.5 that will compute the area under the curve y=sin(x) between x = 0 and x = pi. Perform this calculation varying the n divisions of the range of x between 1 and 10 inclusive and print the approximate value, the true value, and the percent error (in other words, increase the accuracy by increasing the number of trapezoids). Print all the values to three decimal places.
I am not sure what the code should look like. I was told that I should only have about 12 lines of code for these calculations to be done.
I am using Wing IDE.
This is what I have so far
# base_n = (b-a)/n
# h1 = a + ((n-1)/n)(b-a)
# h2 = a + (n/n)(b-a)
# Trap Area = (1/2)*base*(h1+h2)
# a = 0, b = pi
from math import pi, sin
def TrapArea(n):
for i in range(1, n):
deltax = (pi-0)/n
sum += (1.0/2.0)(((pi-0)/n)(sin((i-1)/n(pi-0))) + sin((i/n)(pi-0)))*deltax
return sum
for i in range(1, 11):
print TrapArea(i)
I am not sure if I am on the right track. I am getting an error that says "local variable 'sum' referenced before assignment. Any suggestions on how to improve my code?
Your original problem and problem with Shashank Gupta's answer was /n does integer division. You need to convert n to float first:
from math import pi, sin
def TrapArea(n):
sum = 0
for i in range(1, n):
deltax = (pi-0)/n
sum += (1.0/2.0)*(((pi-0)/float(n))*(sin((i-1)/float(n)*(pi-0))) + sin((i/float(n))*(pi-0)))*deltax
return sum
for i in range(1, 11):
print TrapArea(i)
Output:
0
0.785398163397
1.38175124526
1.47457409274
1.45836902046
1.42009115659
1.38070223089
1.34524797198
1.31450259385
1.28808354
Note that you can heavily simplify the sum += ... part.
First change all (pi-0) to pi:
sum += (1.0/2.0)*((pi/float(n))*(sin((i-1)/float(n)*pi)) + sin((i/float(n))*pi))*deltax
Then do pi/n wherever possible, which avoids needing to call float as pi is already a float:
sum += (1.0/2.0)*(pi/n * (sin((i-1) * pi/n)) + sin(i * pi/n))*deltax
Then change the (1.0/2.0) to 0.5 and remove some brackets:
sum += 0.5 * (pi/n * sin((i-1) * pi/n) + sin(i * pi/n)) * deltax
Much nicer, eh?
You have some indentation issues with your code but that could just be because of copy paste. Anyways adding a line sum = 0 at the beginning of your TrapArea function should solve your current error. But as #Blender pointed out in the comments, you have another issue, which is the lack of a multiplication operator (*) after your floating point division expression (1.0/2.0).
Remember that in Python expressions are not always evaluated as you would expect mathematically. Thus (a op b)(c) will not automatically multiply the result of a op b by c like you would expect with a mathematical expression. Instead this is the function call notation in Python.
Also remember that you must initialize all variables before using their values for assignment. Python has no default value for unnamed variables so when you reference the value of sum with sum += expr which is equivalent to sum = sum + expr you are trying to reference a name (sum) that is not binded to any object at all.
The following revision to your function should do the trick. Notice how I place multiplication operators (*) between every expression that you intend to multiply.
def TrapArea(n):
sum = 0
for i in range(1, n):
i = float(i)
deltax = (pi-0)/n
sum += (1.0/2.0)*(((pi-0)/n)*(sin((i-1)/n*(pi-0))) + sin((i/n)*(pi-0)))*deltax
return sum
EDIT: I also dealt with the float division issue by converting i to float(i) within every iteration of the loop. In Python 2.x, if you divide one integer type object with another integer type object, the expression evaluates to an integer regardless of the actual value.
A "nicer" way to do the trapezoid rule with equally-spaced points...
Let dx = pi/n be the width of the interval. Also, let f(i) be sin(i*dx) to shorten some expressions below. Then interval i (in range(1,n)) contributes:
dA = 0.5*dx*( f(i) + f(i-1) )
...to the sum (which is an area, so I'm using dA for "delta area"). Factoring out the 0.5*dx, makes the whole some look like:
A = 0.5*dx * ( (f(0) + f(1)) + (f(1) + f(2)) + .... + (f(n-1) + f(n)) )
Notice that there are two f(1) terms, two f(2) terms, on up to two f(n-1) terms. Combine those to get:
A = 0.5*dx * ( f(0) + 2*f(1) + 2*f(2) + ... + 2*f(n-1) + f(n) )
The 0.5 and 2 factors cancel except in the first and last terms:
A = 0.5*dx(f(0) + f(n)) + dx*(f(1) + f(2) + ... + f(n-1))
Finally, you can factor dx out entirely to do just one multiplication at the end. Converting back to sin() calls, then:
def TrapArea(n):
dx = pi/n
asum = 0.5*(sin(0) + sin(pi)) # this is 0 for this problem, but not others
for i in range(1, n-1):
asum += sin(i*dx)
return sum*dx
That changed "sum" to "asum", or maybe "area" would be better. That's mostly because sum() is a built-in function, which I'll use below the line.
Extra credit: The loop part of the sum can be done in one step with a generator expression and the sum builtin function:
def TrapArea2(n):
dx = pi/n
asum = 0.5*(sin(0) + sin(pi))
asum += sum(sin(i*dx) for i in range(1,n-1))
return asum*dx
Testing both of those:
>>> for n in [1, 10, 100, 1000, 10000]:
print n, TrapArea(n), TrapArea2(n)
1 1.92367069372e-16 1.92367069372e-16
10 1.88644298557 1.88644298557
100 1.99884870579 1.99884870579
1000 1.99998848548 1.99998848548
10000 1.99999988485 1.99999988485
That first line is a "numerical zero", since math.sin(math.pi) evaluates to about 1.2e-16 instead of exactly zero. Draw the single interval from 0 to pi and the endpoints are indeed both 0 (or nearly so.)

Function to calculate the difference between sum of squares and square of sums

I am trying to Write a function called sum_square_difference which takes a number n and returns the difference between the sum of the squares of the first n natural numbers and the square of their sum.
I think i know how to write a function that defines the sum of squares
def sum_of_squares(numbers):
total = 0
for num in numbers:
total += (num ** 2)
return(total)
I have tried to implement a square of sums function:
def square_sum(numbers):
total = 0
for each in range:
total = total + each
return total**2
I don't know how to combine functions to tell the difference and i don't know if my functions are correct.
Any suggestions please? I am using Python 3.3
Thank you.
The function can be written with pure math like this:
Translated into Python:
def square_sum_difference(n):
return int((3*n**2 + 2*n) * (1 - n**2) / 12)
The formula is a simplification of two other formulas:
def square_sum_difference(n):
return int(n*(n+1)*(2*n+1)/6 - (n*(n+1)/2)**2)
n*(n+1)*(2*n+1)/6 is the formula described here, which returns the sum of the squares of the first n natural numbers.
(n*(n+1)/2))**2 uses the triangle number formula, which is the sum of the first n natural numbers, and which is then squared.
This can also be done with the built in sum function. Here it is:
def sum_square_difference(n):
r = range(1, n+1) # first n natural numbers
return sum(i**2 for i in r) - sum(r)**2
The range(1, n+1) produces an iterator of the first n natural numbers.
>>> list(range(1, 4+1))
[1, 2, 3, 4]
sum(i**2 for i in r) returns the sum of the squares of the numbers in r, and sum(r)**2 returns the square of the sum of the numbers in r.
# As beta says,
# (sum(i))^2 - (sum(i^2)) is very easy to calculate :)
# A = sum(i) = i*(i+1)/2
# B = sum(i^2) = i*(i+1)*(2*i + 1)/6
# A^2 - B = i(i+1)(3(i^2) - i - 2) / 12
# :)
# no loops... just a formula !**
This is a case where it pays to do the math beforehand. You can derive closed-form solutions for both the sum of the squares and the square of the sum. Then the code is trivial (and O(1)).
Need help with the two solutions?
def sum_square_difference(n):
r = range(1,n+1)
sum_of_squares = sum(map(lambda x: x*x, r))
square_sum = sum(r)**2
return sum_of_squares - square_sum
In Ruby language you can achieve this in this way
def diff_btw_sum_of_squars_and_squar_of_sum(from=1,to=100) # use default values from 1..100.
((1..100).inject(:+)**2) -(1..100).map {|num| num ** 2}.inject(:+)
end
diff_btw_sum_of_squars_and_squar_of_sum #call for above method

Generating random numbers under very specific constraints

I am faced with the following programming problem. I need to generate n (a, b) tuples for which the sum of all a's is a given A and sum of all b's is a given B and for each tuple the ratio of a / b is in the range (c_min, c_max). A / B is within the same range, too. I am also trying to make sure there is no bias in the result other than what is introduced by the constraints and the a / b values are more-or-less uniformly distributed in the given range.
Some clarifications and meta-constraints:
A, B, c_min, and c_max are given.
The ratio A / B is in the (c_min, c_max) range. This has to be so if the problem is to have a solution given the other constraints.
a and b are >0 and non-integer.
I am trying to implement this in Python but ideas in any language (English included) are much appreciated.
We look for tuples a_i and b_i such that
(a_1, ... a_n) and (b_1, ... b_n) have a distribution which is invariant under permutation of indices (what you would call "unbiased")
the ratios a_i / b_i are uniformly distributed on [cmin, cmax]
sum(a_i) = A, sum(b_i) = B
If c_min and c_max are not too ill conditioned (ie they are not very close to another), and n is not very large, the following works:
Generate a_i "uniformly" such that sum a_i = A:
Draw n samples aa_i (i = 1..n) from some distribution (eg. uniform)
Divide them by their sum and multiply by A: a_i = A * aa_i / sum(aa_i) has desired properties.
Generate b_i such that sum b_i = B by the same method.
If there exists i such that a_i / b_i is not in the interval [cmin, cmax], throw away all the a_i and b_i and try again from the beginning.
It doesn't scale well with n, because the set of a_i and b_i satisfying the constraints gets more and more narrow as n increases (and so you reject more candidates).
To be honest, I don't see any other simple solution. If n gets large and cmin ~ cmax, then you will have to use a sledgehammer (eg. MCMC) to generate samples from your distribution, unless there is some trick we did not see.
If you really want to use MCMC algorithms, note that you can change cmin to cmin * B / A (likewise for cmax) and assume A == B == 1. The problem is then to draw uniformly on the product of two unit n-simplices (u_1...u_n, v_1...v_n) such that
u_i / v_i \in [cmin, cmax].
So you have to use a MCMC algorithm (Metropolis-Hastings seems more suited) on the product of two unit n-simplices with the density
f(u_1, ..., u_n, v_1, ..., v_n) = \prod indicator_{u_i/v_i \in [cmin, cmax]}
which is definitely doable (albeit involved).
Start by generating as many identical tuples, n, as you need:
(A/n, B/n)
Now pick two tuples at random. Make a random change to the a value of one, and a compensating change to the a value of the other, keeping everything within the given constraints. Put the two tuples back.
Now pick another random pair. This times twiddle with the b values.
Lather, rinse repeat.
I think the simplest thing is to
Use your favorite method to throw n-1 values such that \sum_i=0,n-1 a_i < A, and set a_n to get the right total. There are several SO question about doing that, though I've never seen a answer I'm really happy with yet. Maybe I'll write a paper or something.
Get the n-1 b's by throwing the c_i uniformly on the allowed range, and set final b to get the right total and check on the final c (I think it must be OK, but I haven't proven it yet).
Note that since we have 2 hard constrains we should expect to throw 2n-2 random numbers, and this method does exactly that (on the assumption that you can do step 1 with n-1 throws.
Blocked Gibbs sampling is pretty simple and converges to the right distribution (this is along the lines of what Alexandre is proposing).
For all i, initialize ai = A / n and bi = B / n.
Select i ≠ j uniformly at random. With probability 1/2, update ai and aj with uniform random values satisfying the constraints. The rest of the time, do the same for bi and bj.
Repeat Step 2 as many times as seems to be necessary for your application. I have no idea what the convergence rate is.
Lots of good ideas here. Thanks! Rossum's idea seemed the most straightforward implementation-wise so I went for it. Here is the code for posterity:
c_min = 0.25
c_max = 0.75
a_sum = 100.0
b_sum = 200.0
n = 1000
a = [a_sum / n] * n
b = [b_sum / n] * n
while not good_enough(a, b):
i, j = random.sample(range(n), 2)
li, ui = c_min * b[i] - a[i], c_max * b[i] - a[i]
lj, uj = a[j] - c_min * b[j], a[j] - c_max * b[j]
llim = max((li, uj))
ulim = min((ui, lj))
q = random.uniform(llim, ulim)
a[i] += q
a[j] -= q
i, j = random.sample(range(n), 2)
li, ui = a[i] / c_max - b[i], a[i] / c_min - b[i]
lj, uj = b[j] - a[j] / c_max, b[j] - a[j] / c_min
llim = max((li, uj))
ulim = min((ui, lj))
q = random.uniform(llim, ulim)
b[i] += q
b[j] -= q
The good_enough(a, b) function can be a lot of things. I tried:
Standard deviation, which is hit or miss, as you don't know what is a good enough value.
Kurtosis, where a large negative value would be nice. However, it is relatively slow to calculate and is undefined with the seed values of (a_sum / n, b_sum / n) (though that's trivial to fix).
Skewness, where a value close to 0 is desirable. But it has the same drawbacks as kurtosis.
A number of iterations proportional to n. 2n sometimes wasn't enough, n ^ 2 is a little bit of overkill and is, well, exponential.
Ideally, a heuristic using a combination of skewness and kurtosis would be best but I settled for making sure each value has been changed from the initial (again, as rossum suggested in a comment). Though there is no theoretical guarantee that the loop will complete, it seemed to work well enough for me.
So here's what I think from mathematical point of view. We have sequences a_i and b_i such that sum of a_i is A and sum of b_i is B. Furthermore A/B is in (x,y) and so is a_i/b_i for each i. Furthermore you want a_i/b_i to be uniformly distributed in (x,y).
So do it starting from the end. Choose c_i from (x,y) such that they are uniformly distributed. Then we want to have the following equality a_i/b_i = c_i, so a_i = b_i*c_i.
Therefore we only need to find b_i. But we have the following system of linear equations:
A = (sum)b_i*c_i
B = (sum)b_i
where b_i are variables. Solve it (some fancy linear algebra tricks) and you're done!
Note that for large enough n this system will have lots of solutions. They will be dependent on some parameters which you can choose randomly.
Enough of the theoretical approach, let's see some practical solution.
// EDIT 1: Here's some hard core Python code :D
import random
min = 0.0
max = 10.0
A = 500.0
B = 100.0
def generate(n):
C = [min + i*(max-min)/(n+1) for i in range(1, n+1)]
Y = [0]
for i in range(1,n-1):
# This line should be changed in order to always get positive numbers
# It should be relatively easy to figure out some good random generator
Y.append(random.random())
val = A - C[0]*B
for i in range(1, n-1):
val -= Y[i] * (C[i] - C[0])
val /= (C[n-1] - C[0])
Y.append(val)
val = B
for i in range(1, n):
val -= Y[i]
Y[0] = val
result = []
for i in range(0, n):
result.append([ Y[i]*C[i], Y[i] ])
return result
The result is a list of pairs (X,Y) satisfying your conditions with the exception that they may be negative (see the random generator line in code) i.e. the first and the last pair may contain negative numbers.
// EDIT 2:
Too ensure that they are positive you may try something like
Y.append(random.random() * B / n)
instead of
Y.append(random.random())
I'm not sure though.
// EDIT 3:
In order to have better results try something like this:
avrg = B / n
ran = avrg / 20
for i in range(1, n-1):
Y.append(random.gauss(avrg, ran))
instead of
for i in range(1, n-1):
Y.append(random.random())
This will make all b_i to be near B / n. Unfortunetly the last term will still sometimes jump high. I'm sorry, but there is no way to avoid this (mathematics) since the last and the first terms depend on the others. For small n (~100) it looks good though. Unfortunetly some negative values may appear.
The choice of a correct generator is not so simple if you additionally want b_i to be uniformly distributed.

Categories

Resources