realize FFT and IFFT using python3 - python

When I multiply two big integers using FFT, I find the result of FFT and IFFT is always not right.
method
To realize FFT, I just follow the pseudocode as followed:
the pseudocode of FFT
The equations of FFT and IFFT are as followed. So, when realizing IFFT, I just replace a with y, replace omega with omega ^^ -1 and divide it by n. And, use flag to distinguish them in my function.
For FFT, y will be
For IFFT, a will be
problem
To find the problem, I try to compare the results between numpy.fft and my function.
FFT.
The results of numpy and my function look the same, but the sign of images is the opposite. For example (the second element of case2 below):
my function result: -4-9.65685424949238j
numpy result: -4+9.65685424949238j
IFFT. I just find it wrong, and can't find any rule.
python code
Here is my function FFT, and comparison:
from typing import List
from cmath import pi, exp
from numpy.fft import fft, ifft
def FFT(a: List, flag: bool) -> List:
"""realize DFT using FFT"""
n = len(a)
if n == 1:
return a
# complex root
omg_n = exp(2 * pi * 1j / n)
if flag:
# IFFT
omg_n = 1 / omg_n
omg = 1
# split a into 2 part
a0 = a[::2] # even
a1 = a[1::2] # odd
# corresponding y
y0 = FFT(a0, flag)
y1 = FFT(a1, flag)
# result y
y = [0] * n
for k in range(n // 2):
y[k] = y0[k] + omg * y1[k]
y[k + n // 2] = y0[k] - omg * y1[k]
omg = omg * omg_n
# IFFT
if flag:
y = [i / n for i in y]
return y
if __name__ == '__main__':
test_cases = [
[1, 1],
[1, 2, 3, 4, 5, 6, 7, 8],
[1, 4, 2, 9, 0, 0, 3, 8, 9, 1, 4, 0, 0, 0, 0, 0, ],
]
print("test FFT")
for i, case in enumerate(test_cases):
print(f"case{i + 1}", case)
manual_result = FFT(case, False)
numpy_result = fft(case).tolist()
print("manual_result:", manual_result)
print("numpy_result:", numpy_result)
print("difference:", [i - j for i, j in zip(manual_result, numpy_result)])
print()
print("test IFFT")
for i, case in enumerate(test_cases):
print(f"case{i + 1}", case)
manual_result = FFT(case, True)
numpy_result = ifft(case).tolist()
print("manual_result:", manual_result)
print("numpy_result:", numpy_result)
print("difference:", [i - j for i, j in zip(manual_result, numpy_result)])
print()
The FFT output:
test FFT
case1 [1, 1]
manual_result: [2, 0]
numpy_result: [(2+0j), 0j]
difference: [0j, 0j]
case2 [1, 2, 3, 4, 5, 6, 7, 8]
manual_result: [36, (-4-9.65685424949238j), (-4-4.000000000000001j), (-4-1.6568542494923815j), -4, (-4+1.6568542494923806j), (-4+4.000000000000001j), (-3.999999999999999+9.656854249492381j)]
numpy_result: [(36+0j), (-4+9.65685424949238j), (-4+4j), (-4+1.6568542494923806j), (-4+0j), (-4-1.6568542494923806j), (-4-4j), (-4-9.65685424949238j)]
difference: [0j, -19.31370849898476j, -8j, -3.313708498984762j, 0j, 3.313708498984761j, 8j, (8.881784197001252e-16+19.31370849898476j)]
case3 [1, 4, 2, 9, 0, 0, 3, 8, 9, 1, 4, 0, 0, 0, 0, 0]
manual_result: [41, (-12.710780677203363+13.231540329804117j), (12.82842712474619+7.2426406871192865j), (-14.692799048494296+7.4256307475248935j), (1.0000000000000013-12j), (5.763866860359768+6.0114171851517995j), (7.171572875253808+1.2426406871192839j), (-10.360287134662114+11.817326767431025j), -3, (-10.360287134662112-11.817326767431021j), (7.17157287525381-1.2426406871192848j), (5.763866860359771-6.011417185151798j), (0.9999999999999987+12j), (-14.692799048494292-7.425630747524895j), (12.828427124746192-7.242640687119286j), (-12.710780677203362-13.23154032980412j)]
numpy_result: [(41+0j), (-12.710780677203363-13.231540329804115j), (12.82842712474619-7.242640687119286j), (-14.692799048494292-7.4256307475248935j), (1+12j), (5.763866860359768-6.011417185151798j), (7.17157287525381-1.2426406871192857j), (-10.360287134662112-11.81732676743102j), (-3+0j), (-10.360287134662112+11.81732676743102j), (7.17157287525381+1.2426406871192857j), (5.763866860359768+6.011417185151798j), (1-12j), (-14.692799048494292+7.4256307475248935j), (12.82842712474619+7.242640687119286j), (-12.710780677203363+13.231540329804115j)]
difference: [0j, 26.46308065960823j, 14.485281374238571j, (-3.552713678800501e-15+14.851261495049787j), (1.3322676295501878e-15-24j), 12.022834370303597j, (-1.7763568394002505e-15+2.4852813742385695j), (-1.7763568394002505e-15+23.634653534862046j), 0j, -23.63465353486204j, -2.4852813742385704j, (3.552713678800501e-15-12.022834370303595j), (-1.3322676295501878e-15+24j), -14.851261495049789j, (1.7763568394002505e-15-14.485281374238571j), (1.7763568394002505e-15-26.463080659608238j)]
The IFFT result:
test IFFT
case1 [1, 1]
manual_result: [1.0, 0.0]
numpy_result: [(1+0j), 0j]
difference: [0j, 0j]
case2 [1, 2, 3, 4, 5, 6, 7, 8]
manual_result: [0.5625, (-0.0625+0.15088834764831843j), (-0.0625+0.062499999999999986j), (-0.0625+0.025888347648318405j), -0.0625, (-0.0625-0.025888347648318433j), (-0.0625-0.062499999999999986j), (-0.062499999999999986-0.1508883476483184j)]
numpy_result: [(4.5+0j), (-0.5-1.2071067811865475j), (-0.5-0.5j), (-0.5-0.20710678118654757j), (-0.5+0j), (-0.5+0.20710678118654757j), (-0.5+0.5j), (-0.5+1.2071067811865475j)]
difference: [(-3.9375+0j), (0.4375+1.357995128834866j), (0.4375+0.5625j), (0.4375+0.23299512883486598j), (0.4375+0j), (0.4375-0.232995128834866j), (0.4375-0.5625j), (0.4375-1.357995128834866j)]
case3 [1, 4, 2, 9, 0, 0, 3, 8, 9, 1, 4, 0, 0, 0, 0, 0]
manual_result: [0.0400390625, (-0.01241287175508141-0.012921426103324331j), (0.012527760864009951-0.007072891296014926j), (-0.014348436570795205-0.007251592526879778j), (0.0009765625000000013+0.01171875j), (0.005628776230820083-0.005870524594874804j), (0.007003489135990047-0.0012135162960149274j), (-0.01011746790494347-0.011540358171319353j), -0.0029296875, (-0.010117467904943469+0.011540358171319355j), (0.007003489135990049+0.0012135162960149274j), (0.005628776230820081+0.005870524594874803j), (0.0009765624999999987-0.01171875j), (-0.014348436570795205+0.0072515925268797805j), (0.012527760864009953+0.007072891296014926j), (-0.012412871755081408+0.01292142610332433j)]
numpy_result: [(2.5625+0j), (-0.7944237923252102+0.8269712706127572j), (0.8017766952966369+0.45266504294495535j), (-0.9182999405308933+0.46410192172030584j), (0.0625-0.75j), (0.3602416787724855+0.37571357407198736j), (0.44822330470336313+0.07766504294495535j), (-0.647517945916382+0.7385829229644387j), (-0.1875+0j), (-0.647517945916382-0.7385829229644387j), (0.44822330470336313-0.07766504294495535j), (0.3602416787724855-0.37571357407198736j), (0.0625+0.75j), (-0.9182999405308933-0.46410192172030584j), (0.8017766952966369-0.45266504294495535j), (-0.7944237923252102-0.8269712706127572j)]
difference: [(-2.5224609375+0j), (0.7820109205701288-0.8398926967160816j), (-0.7892489344326269-0.45973793424097026j), (0.903951503960098-0.47135351424718563j), (-0.0615234375+0.76171875j), (-0.3546129025416654-0.38158409866686216j), (-0.4412198155673731-0.07887855924097029j), (0.6374004780114385-0.7501232811357581j), (0.1845703125+0j), (0.6374004780114385+0.7501232811357581j), (-0.4412198155673731+0.07887855924097029j), (-0.3546129025416654+0.38158409866686216j), (-0.0615234375-0.76171875j), (0.903951503960098+0.47135351424718563j), (-0.7892489344326269+0.45973793424097026j), (0.7820109205701288+0.8398926967160816j)]
#pjs, Thank you for your reminder that FFT requires len(data) to be a power of 2.

As was pointed out in comments, you used a positive sign in the computation of omg_n. There are different definitions of the DFT, so it isn't wrong by itself. However this would naturally lead to differences if you compare your results with an implementation that uses a negative sign, as is the case with numpy.fft.fft. Adjusting your implementation to also use a negative sign would cover all forward transform cases (leaving only small roundoff errors on the order of ~10-16).
For the inverse transform cases, your implementation ends up scaling the result by 1/n at every stage, instead of only the final stage. To correct this, simply remove the scaling from the recursion, and normalize only on the final stage:
def FFTrecursion(a: List, flag: bool) -> List:
"""Recursion of the FFT implementation"""
n = len(a)
if n == 1:
return a
# complex root
omg_n = exp(-2 * pi * 1j / n)
if flag:
# IFFT
omg_n = 1 / omg_n
omg = 1
# split a into 2 part
a0 = a[::2] # even
a1 = a[1::2] # odd
# corresponding y
y0 = FFTrecursion(a0, flag)
y1 = FFTrecursion(a1, flag)
# result y
y = [0] * n
for k in range(n // 2):
y[k] = y0[k] + omg * y1[k]
y[k + n // 2] = y0[k] - omg * y1[k]
omg = omg * omg_n
return y
def FFT(a: List, flag: bool) -> List:
"""realize DFT using FFT"""
y = FFTrecursion(a, flag)
# IFFT final scaling
if flag:
n = len(a)
y = [i / n for i in y]
return y

Related

Optimize non-trivial function on tensors

I am looking for a way to speed up the specific operation on tensors in PyTorch. Since it is a general operation on matrices, I am open to answers in NumPy as well.
Let's say I have a tensor with values from 0 to N-1 (N=4) where each value repeats the same number of times (R=2).
import torch
x = torch.Tensor([0, 0, 1, 1, 2, 2, 3, 3])
In this case, it is sorted, but any permutation of x is also in the set of considered tensors X.
I am getting an input tensor with values from 0 to N-1 but without any constraints on the repetition.
z = torch.tensor([3, 2, 3, 0, 2, 3, 1, 2])
And I would like to find an efficient implementation of foo such that y = foo(z). y should be some permutation of x (from the set X) that tries to do as few changes in z as possible (in terms of Hamming distance), for example
y = torch.tensor([3, 2, 3, 0, 2, 0, 1, 1])
The trivial solution is to keep counting the number elements with the same value, but it is extremely inefficient to process elements one-by-one for larger tensors:
def foo(z):
R = 2
N = 4
counters = [0] * N
# first, we replace extra elements with -1
y = []
for elem in z:
if counters[elem] < R:
counters[elem] += 1
y.append(elem)
else:
y.append(-1)
y = torch.tensor(y)
assert torch.equal(y, torch.tensor([3, 2, 3, 0, 2, -1, 1, -1]))
# second, we replace -1 by "unfilled" counters
for i in range(len(y)):
if y[i] == -1:
first_unfilled = [n for n in range(N) if counters[n] < R][0]
counters[first_unfilled] += 1
y[i] = first_unfilled
return y
assert torch.equal(y, foo(z))

Finding if there are n data points in a row that are less than a certain number

I am working with a spectrum in Python and I have fit a line to that spectrum. I want a code that can detect if there have been let's say, 10, data points on the spectrum in a row that are less than the fitted line. Does anyone know how a simple and quick way to do this?
I currently have something like this:
count = 0
for i in range(lowerbound, upperbound):
if spectrum[i] < fittedline[i]
count += 1
if count > 15:
*do whatever*
If I changed the first if statement line to be:
if spectrum[i] < fittedline[i] & spectrum[i+1] < fittedline[i+1] & so on
I'm sure the algorithm would work, but is there a smarter way for me to automate this in the case where I want the user to input a number for how many data points in a row must be less than the fitted line?
Your attempt is pretty close to working! For consecutive points, all you need to do is reset the count if one point doesn't satisfy your condition.
num_points = int(input("How many points must be less than the fitted line? "))
count = 0
for i in range(lowerbound, upperbound):
if spectrum[i] < fittedline[i]:
count += 1
else: # If the current point is NOT below the threshold, reset the count
count = 0
if count >= num_points:
print(f"{count} consecutive points found at location {i-count+1}-{i}!")
Let's test this:
lowerbound = 0
upperbound = 10
num_points = 5
spectrum = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
fittedline = [1, 2, 10, 10, 10, 10, 10, 8, 9, 10]
Running the code with these values gives:
5 consecutive points found at location 2-6!
My recommendation would be to research and use existing libraries before developing ad-hoc functionality
In this case some super smart people developed numerical python library numpy. This library, widely use in science projects, has a ton of useful functionality implementations of the shelf that are tested and optimized
Your needs can be covered with the following line:
number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()
But lets go step by step:
spectrum = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
fittedline = [1, 2, 10, 10, 10, 10, 10, 8, 9, 10]
# Import numerical python module
import numpy as np
# Convert your lists to numpy arrays
spectrum_array = np.array(spectrum)
gittedline_array = np.array(fittedline)
# Substract fitted line to spectrum
difference = spectrum_array - gittedline_array
#>>> array([ 0, 0, -7, -6, -5, -4, -3, 0, 0, 0])
# Identify points where condition is met
condition_check_array = difference < 0.0
# >>> array([False, False, True, True, True, True, True, False, False, False])
# Get the number of points where condition is met
number_of_points = condition_check_array.sum()
# >>> 5
# Get index of points where condition is met
index_of_points = np.where(difference < 0)
# >>> (array([2, 3, 4, 5, 6], dtype=int64),)
print(f"{number_of_points} points found at location {index_of_points[0][0]}-{index_of_points[0][-1]}!")
# Now same functionality in a simple function
def get_point_count(spectrum, fittedline):
return (np.array(spectrum) < np.array(fittedline)).sum()
get_point_count(spectrum, fittedline)
Now let's consider instead of having 10 points in your spectrum, you have 10M. Code efficience is a key thing to consider and numpy can save help there:
number_of_samples = 1000000
spectrum = [1] * number_of_samples
# >>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
fittedline = [0] * number_of_samples
fittedline[2:7] =[2] * 5
# >>> [0, 0, 2, 2, 2, 2, 2, 0, 0, 0, ...]
# With numpy
start_time = time.time()
number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()
numpy_time = time.time() - start_time
print("--- %s seconds ---" % (numpy_time))
# With ad hoc loop and ifs
start_time = time.time()
count=0
for i in range(0, len(spectrum)):
if spectrum[i] < fittedline[i]:
count += 1
else: # If the current point is NOT below the threshold, reset the count
count = 0
adhoc_time = time.time() - start_time
print("--- %s seconds ---" % (adhoc_time))
print("Ad hoc is {:3.1f}% slower".format(100 * (adhoc_time / numpy_time - 1)))
number_of_samples = 1000000
spectrum = [1] * number_of_samples
# >>> [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
fittedline = [0] * number_of_samples
fittedline[2:7] =[2] * 5
# >>> [0, 0, 2, 2, 2, 2, 2, 0, 0, 0, ...]
# With numpy
start_time = time.time()
number_of_points = (np.array(spectrum) < np.array(fittedline)).sum()
numpy_time = time.time() - start_time
print("--- %s seconds ---" % (numpy_time))
# With ad hoc loop and ifs
start_time = time.time()
count=0
for i in range(0, len(spectrum)):
if spectrum[i] < fittedline[i]:
count += 1
else: # If the current point is NOT below the threshold, reset the count
count = 0
adhoc_time = time.time() - start_time
print("--- %s seconds ---" % (adhoc_time))
print("Ad hoc is {:3.1f}% slower".format(100 * (adhoc_time / numpy_time - 1)))
>>>--- 0.20999646186828613 seconds ---
>>>--- 0.28800177574157715 seconds ---
>>>Ad hoc is 37.1% slower

How to stretch specific items of numpy array with decrement?

Given boundary value k, is there a vectorized way to replace each number n with consecutive descending numbers from n-1 to k? For example, if k is 0 the I'd like to replace np.array([3,4,2,2,1,3,1]) with np.array([2,1,0,3,2,1,0,1,0,1,0,0,2,1,0,0]). Every item of input array is greater than k.
I have tried combination of np.repeat and np.cumsum but it seems evasive solution:
x = np.array([3,4,2,2,1,3,1])
y = np.repeat(x, x)
t = -np.ones(y.shape[0])
t[np.r_[0, np.cumsum(x)[:-1]]] = x-1
np.cumsum(t)
Is there any other way? I expect smth like inverse of np.add.reduceat that is able to broadcast integers to decreasing sequences instead of minimizing them.
Here's another way with array-assignment to skip the repeat part -
def func1(a):
l = a.sum()
out = np.full(l, -1, dtype=int)
out[0] = a[0]-1
idx = a.cumsum()[:-1]
out[idx] = a[1:]-1
return out.cumsum()
Benchmarking
# OP's soln
def OP(x):
y = np.repeat(x, x)
t = -np.ones(y.shape[0], dtype=int)
t[np.r_[0, np.cumsum(x)[:-1]]] = x-1
return np.cumsum(t)
Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
import benchit
a = np.array([3,4,2,2,1,3,1])
in_ = [np.resize(a,n) for n in [10, 100, 1000, 10000]]
funcs = [OP, func1]
t = benchit.timings(funcs, in_)
t.plot(logx=True, save='timings.png')
Extend to take k as arg
def func1(a, k):
l = a.sum()+len(a)*(-k)
out = np.full(l, -1, dtype=int)
out[0] = a[0]-1
idx = (a-k).cumsum()[:-1]
out[idx] = a[1:]-1-k
return out.cumsum()
Sample run -
In [120]: a
Out[120]: array([3, 4, 2, 2, 1, 3, 1])
In [121]: func1(a, k=-1)
Out[121]:
array([ 2, 1, 0, -1, 3, 2, 1, 0, -1, 1, 0, -1, 1, 0, -1, 0, -1,
2, 1, 0, -1, 0, -1])
This is concise and probably ok for efficiency; I don't think apply is vectorized here, so you will be limited mostly be the number of elements in the original array (less so their value is my guess):
import pandas as pd
x = np.array([3,4,2,2,1,3,1])
values = pd.Series(x).apply(lambda val: np.arange(val-1,-1,-1)).values
output = np.concatenate(values)

Extract sub arrays based on kernel in numpy

I would like to know if there is an efficient method to get sub-arrays from a larger numpy array.
What I have is an application of np.where. I iterate 'manually' over x and y as offsets and apply where with a kernel to each rectangle extracted from the larger array with proper dimensions.
But is there a more direct approach in numpy's collection of methods?
import numpy as np
example = np.arange(20).reshape((5, 4))
# e.g. a cross kernel
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
np.where(a_kernel, example[1:4, 1:4], 0)
# returns
# array([[ 0, 6, 0],
# [ 9, 10, 11],
# [ 0, 14, 0]])
def arrays_from_kernel(a, a_kernel):
width, height = a_kernel.shape
y_max, x_max = a.shape
return [np.where(a_kernel, a[y:(y + height), x:(x + width)], 0)
for y in range(y_max - height + 1)
for x in range(x_max - width + 1)]
sub_arrays = arrays_from_kernel(example, a_kernel)
This returns the arrays I need for further processing.
# [array([[0, 1, 0],
# [4, 5, 6],
# [0, 9, 0]]),
# array([[ 0, 2, 0],
# [ 5, 6, 7],
# [ 0, 10, 0]]),
# ...
# array([[ 0, 9, 0],
# [12, 13, 14],
# [ 0, 17, 0]]),
# array([[ 0, 10, 0],
# [13, 14, 15],
# [ 0, 18, 0]])]
The context: similar to 2D convolution I would like to apply a custom function on each of the subarrays (e.g. product of squared numbers).
At the moment, you're manually advancing a sliding window over the data - stride tricks to the rescue! (And no, I didn't just make that up - there's actually a submodule called stride_tricks in numpy!) Instead of manually building windows into the data, and calling np.where() on them, if you had the windows in an array, you could call np.where() just once. Stride tricks allow you to create such an array without even having to copy the data.
Let me explain. Normal slices in numpy create views into the original data instead of copies. This is done by referring to the original data, but changing the strides used to access the data (ie. how much to jump between two elements or two rows, and so on). Stride tricks allow you to modify those strides more freely than just slicing and reshaping does, so you can eg. iterate over the same data more than once, which is useful here.
Let me demonstrate:
import numpy as np
example = np.arange(20).reshape((5, 4))
a_kernel = np.array([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
def sliding_window(data, win_shape, **kwargs):
assert data.ndim == len(win_shape)
shape = tuple(dn - wn + 1 for dn, wn in zip(data.shape, win_shape)) + win_shape
strides = data.strides * 2
return np.lib.stride_tricks.as_strided(data, shape=shape, strides=strides, **kwargs)
def arrays_from_kernel(a, a_kernel):
windows = sliding_window(a, a_kernel.shape)
return np.where(a_kernel, windows, 0)
sub_arrays = arrays_from_kernel(example, a_kernel)
The scipy.ndimage module offers a number of filters -- one of which might meet your needs. If none of those filters do what you want, you could use ndimage.generic_filter
to call a custom function on each subarray. ndimage.generic_filter is not as fast as the other ndimage filters, however.
For example,
import numpy as np
example = np.arange(20).reshape((5, 4))
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
# def arrays_from_kernel(a, a_kernel):
# width, height = a_kernel.shape
# y_max, x_max = a.shape
# return [np.where(a_kernel, a[y:(y + height), x:(x + width)], 0)
# for y in range(y_max - height + 1)
# for x in range(x_max - width + 1)]
# sub_arrays = arrays_from_kernel(example, a_kernel)
# for arr in sub_arrays:
# print(arr)
# print('-'*80)
import scipy.ndimage as ndimage
def func(x):
# reject subarrays that extend beyond the border of the `example` array
if not np.isnan(x).any():
y = np.zeros_like(a_kernel, dtype=example.dtype)
np.put(y, np.flatnonzero(a_kernel), x)
print(y)
# Instead or returning 0, you can perform your desired computation on the subarray here.
# Note that you may not need the 2D array y; often, you only need the values in the 1D array x
return 0
result = ndimage.generic_filter(example, func, footprint=a_kernel, mode='constant', cval=np.nan)
For the particular problem of computing the product of squares for each subarray, you
could convert the product into a sum by taking advantage of the fact that A * B = exp(log(A)+log(B)). This would allow you to express the computation as a normal convolution. Now using ndimage.convolve can improve performance a lot. The amount of the improvement depends on the size of example:
import numpy as np
import scipy.ndimage as ndimage
import perfplot
a_kernel = np.asarray([[0, 1, 0], [1, 1, 1], [0, 1, 0]])
def orig(example, a_kernel=a_kernel):
def arrays_from_kernel(a, a_kernel):
width, height = a_kernel.shape
y_max, x_max = a.shape
return [
np.where(a_kernel, a[y : (y + height), x : (x + width)], 1)
for y in range(y_max - height + 1)
for x in range(x_max - width + 1)
]
return [np.prod(x) ** 2 for x in arrays_from_kernel(example, a_kernel)]
def alt(example, a_kernel=a_kernel):
logged = np.log(example)
result = ndimage.convolve(logged, a_kernel, mode="constant", cval=0)[1:-1, 1:-1]
return (np.exp(result) ** 2).ravel()
def make_example(N):
return np.random.random(size=(N, N))
def check(A, B):
return np.allclose(A, B)
perfplot.show(
setup=make_example,
kernels=[orig, alt],
n_range=[2 ** k for k in range(2, 11)],
logx=True,
logy=True,
xlabel="len(example)",
equality_check=check,
)

Pythonic way to vectorize double summation

I'm attempting to convert a double summation formula into code, but can't figure out the correct matrix/vector representation of it.
The first summation is i to n, and the second is over j > i to n.
I'm guessing there is a much more efficient & pythonic way of writing this?
I resorted to nested for loops to just get it working but, as expected, it runs very slowly with a large dataset:
def wapc_denom(weights, vols):
x = []
y = []
for i, wi in enumerate(weights):
for j, wj in enumerate(weights):
if j > i:
x.append(wi * wj * vols[i] * vols[j])
y.append(np.sum(x))
return np.sum(y)
Edit:
Using guidance from smci's answer I think I have a potential solution:
def wapc_denom2(weights, vols):
return np.sum(np.tril(np.outer(weights, vols.T)**2, k=-1))
Assuming you want to count every term only once (for that you have to move the x = [] into the outer loop) one cheap way of computing the sum would be
Create mock data
weights = np.random.random(10)
vols = np.random.random(10)
Do the calculation
wv = weights * vols
result = (wv.sum()**2 - wv#wv) / 2
Check that it's the same
def wapc_denom(weights, vols):
y = []
for i, wi in enumerate(weights):
x = []
for j, wj in enumerate(weights):
if j > i:
x.append(wi * wj * vols[i] * vols[j])
y.append(np.sum(x))
return np.sum(y)
assert np.allclose(result, wapc_denom(weights, vols))
Why does it work?
What we are doing is compute the sum of the full matrix, subtract the diagonal and divide by two. This is cheap because it is easy to verify that the sum of an outer product is just the product of the summed factors.
wi * wj * vols[i] * vols[j] is a telltale. vols is another vector, so first you want to compute the vector wv = w * vols
then (wj * vols[j]) * (wi * vols[i]) = wv^T * wv is your (matrix outer product) expression; that's a column vector * a row vector. But actually you only want the sum. So I don't see a need to construct a vector y.append(np.sum(x)), you're only going to sum it anyway np.sum(y)
also the if j > i part means you only want the sum of the Lower Triangular part, and exclude the diagonal.
EDIT: the result is fully determined just from wv, I didn't think we needed the matrix to get the sum, and we didn't need the diagonal; #PaulPanzer found the most compact expression.
You can use triangulations in numpy, check np.triu and np.meshgrid. Do:
np.product(np.triu(np.meshgrid(weights,weights), 1) * np.triu(np.meshgrid(vols,vols), 1),0).sum(1).cumsum().sum()
Example:
w = np.arange(4) +1
v = np.array([1,3,2,2])
print(np.triu(np.meshgrid(w,w), k=1))
>>array([[[0, 2, 3, 4],
[0, 0, 3, 4],
[0, 0, 0, 4],
[0, 0, 0, 0]],
[[0, 1, 1, 1],
[0, 0, 2, 2],
[0, 0, 0, 3],
[0, 0, 0, 0]]])
# example of product + triu + meshgrid (your x values):
print(np.product(np.triu(np.meshgrid(w,w), 1) * np.triu(np.meshgrid(v,v), 1),0))
>>array([[ 0, 6, 6, 8],
[ 0, 0, 36, 48],
[ 0, 0, 0, 48],
[ 0, 0, 0, 0]])
print(np.product(np.triu(np.meshgrid(w,w), 1) * np.triu(np.meshgrid(v,v), 1),0).sum(1).cumsum().sum())
>> 428
print(wapc_denom(w, v))
>> 428

Categories

Resources