Improving distribution array element distribution - python

I am trying to write a command in Python that would calculate the distribution.
For an example the call:
n=10
x = [1]*3 # returns [1,1,1]
y = [s/sum(a)*n for s in x] # returns [3.3,3.3,3.3]
The problem is that this way, the program I am using would round down the values to 3 and basically, the sum of the array would be 9 instead of 10. How could I improve the call so that I get whole values like [4.0,3.0,3.0]?

The problem is to find which elements to round up while rounding all the rest down. One way to decide is that the element with the largest fractional part gets rounded up but no element is rounded up more than once.
import numpy as np
def distribute(weights, total):
# calculate the distribution as you have done
dist = np.array(weights) / sum(weights) * total
# separate the fractional parts
fracs, dist = np.modf(dist)
# adjust if necessary
while dist.sum() != total:
# find the index of the maximum value of the fractional parts
max_idx = np.argmax(fracs)
# increment just that one value
dist[max_idx] += 1
# zero out that fractional part so we don't use it twice
fracs[max_idx] = 0
return dist
>>> distribute([1]*3, 10)
array([4., 3., 3.])
>>> distribute([1, 2, 3], 16)
array([3., 5., 8.])
>>> distribute([1, 2, 3, 1, 1], 21)
array([3., 5., 8., 3., 2.])

Related

Fastest way to find nearest neighbours in NumPy array

What is the fastest way to perform operations on adjacent elements of an mxn array within distance $l$ (where m, n are large). If this was an image, it would equate to an operation on the surrounding pixels. To make things clearer, I've created a new array with the neighbours of the corresponding source.
Given some array like
x = [[1,2,3],
[4,5,6],
[7,8,9]]
if I were to take the [0,0] element, and want the surrounding elements at $l$=1, I'd need the [0,1] and [1,0] elements (namley 2 and 4). The desired output would look something like this
y = [[[2,4], [1,3,5], [2,6]],
[[1,5,7], [4,6,2,8], [3,9,5]],
[[4,8], [7,5,9], [8,6]]]
I've tried playing around with kdTree from scipy.spatial, and am aware of https://stackoverflow.com/a/45742628/20451990, but as far as I can tell this is actually finding the nearest data points, whereas I want to find the nearest array elements. I guess it could be naively done by iterating through, but that is very slow...
The end goal here is to generate combinations of nearby array elements which I will be taking the product of. For the example above this could be
[[1*2, 1*4], [2*1, 2*3, 2*5], [3*2, 3*6]],...]
Key takeaways
With numba, it is possible to get roughly 690x times faster algorithms than with naïve python code with for-loops and list appends.
With numba, functions have signature; you tell explicitly what is the datatype.
Avoid memory (re-)allocations. Try to allocate memory for any arrays in advance. Reuse the data containers whenever possible (See: cell_result in the numbafied process_cell())
Numba is not super handy with classes (at least, OOP style code), stuff which is dynamically typed, containers with mixed types or containers changing in size. Prefer simple functions and typed structures with defined size. See also: Supported Python features
Numba likes for-loops, and they're fast!
Prewords
You asked for a fastest way to calculate this. I had no baseline, so I created first a pure python for-loop solution as a baseline. Then, I used numba to make the code run fast. It most probably is not the fastest implementation but at least it is way faster than the naïve pure python for-loop approach.
So, if you are not familiar with numba this is a good way to learn about it a bit :)
Used test data
I use two pieces of test data. First, the simple array given in the question. I call this myarr, and it is used for easy comparison of the output:
import numpy as np
myarr = np.array(
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
],
dtype=np.float32,
)
The second dataset is for benchmarking. You mentioned that the arrays will be of size 30 x 30 and the distance I will be less than 4.
arr_large = np.arange(1, 30 * 30 + 1, 1, dtype=np.float32).reshape(30, 30)
In other words, the arr_large is a 30 x 30 2d-array:
>>> arr_large
array([[ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.,
12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22.,
23., 24., 25., 26., 27., 28., 29., 30.],
...
[871., 872., 873., 874., 875., 876., 877., 878., 879., 880., 881.,
882., 883., 884., 885., 886., 887., 888., 889., 890., 891., 892.,
893., 894., 895., 896., 897., 898., 899., 900.]], dtype=float32)
I specified the dtype because specifying datatype is needed at the optimization step. For the pure python solution this is of course not necessary at all.
Baseline solution: Pure python with for-loops
I implemented the baseline soution with a python class and for-loops. The output from it looks like this (source for NeighbourProcessor below):
Example output with 3 x 3 input array (I=1)
n = NeighbourProcessor()
output = n.process(myarr, max_distance=1)
The output is then
>>> output
{(0, 0): [2, 4],
(0, 1): [2, 6, 10],
(0, 2): [6, 18],
(1, 0): [4, 20, 28],
(1, 1): [10, 20, 30, 40],
(1, 2): [18, 30, 54],
(2, 0): [28, 56],
(2, 1): [40, 56, 72],
(2, 2): [54, 72]}
which is same as
{(0, 0): [1 * 2, 1 * 4],
(0, 1): [2 * 1, 2 * 3, 2 * 5],
(0, 2): [3 * 2, 3 * 6],
(1, 0): [4 * 1, 4 * 5, 4 * 7],
(1, 1): [5 * 2, 5 * 4, 5 * 6, 5 * 8],
(1, 2): [6 * 3, 6 * 5, 6 * 9],
(2, 0): [7 * 4, 7 * 8],
(2, 1): [8 * 5, 8 * 7, 8 * 9],
(2, 2): [9 * 6, 9 * 8]}
This is basically what was asked in the question; the target ouput was
[[1*2, 1*4], [2*1, 2*3, 2*5], [3*2, 3*6]],...]
Here I used a dictionary with (row, column) as the key because that way you can more easily find the output for each cell.
Baseline performance
For the largest input of 30 x 30, and largest distance (I=4), the calculation takes about 0.188 seconds on my laptop:
>>> %timeit n.process(arr_large, max_distance=4)
188 ms ± 19.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Code for NeighbourProcessor
import math
import numpy as np
class NeighbourProcessor:
def __init__(self):
self.arr = None
def process(self, arr, max_distance=1):
self.arr = arr
output = dict()
rows, columns = self.arr.shape
for current_row in range(rows):
for current_col in range(columns):
cell_result = self.process_cell(current_row, current_col, max_distance)
output[(current_row, current_col)] = cell_result
return output
def row_col_is_within_array(self, row, col):
if row < 0 or col < 0:
return False
if row > self.arr.shape[0] - 1 or col > self.arr.shape[1] - 1:
return False
return True
def distance(self, row, col, current_row, current_col):
distance_squared = (current_row - row) ** 2 + (current_col - col) ** 2
return np.sqrt(distance_squared)
def are_neighbours(self, row, col, current_row, current_col, max_distance):
if row == current_row and col == current_col:
return False
if not self.row_col_is_within_array(row, col):
return False
return self.distance(row, col, current_row, current_col) <= max_distance
def neighbours(self, current_row, current_col, max_distance):
start_row = math.floor(current_row - max_distance)
start_col = math.floor(current_col - max_distance)
end_row = math.ceil(current_row + max_distance)
end_col = math.ceil(current_col + max_distance)
for row in range(start_row, end_row + 1):
for col in range(start_col, end_col + 1):
if self.are_neighbours(
row, col, current_row, current_col, max_distance
):
yield row, col
def process_cell(self, current_row, current_col, max_distance):
cell_output = []
current_cell_value = self.arr[current_row][current_col]
for row, col in self.neighbours(current_row, current_col, max_distance):
neighbour_cell_value = self.arr[row][col]
cell_output.append(current_cell_value * neighbour_cell_value)
return cell_output
Short explanation
So what the NeighbourProcessor.process does is goes through the rows and columns of the input array, starting from (0,0), which is left top corner, and processing from left to right, top to bottom until the bottom right corner, which is (n_rows, n_columns), each time marking the cell as current cell; (current_row, current_column).
For each current cell, process it in process_cell. That will form an iterator with neighbours() which iterates all the neighbours at within maximum distance of I from the current cell. You can check how the logic goes in are_neighbours
Faster solution: Using numba and memory pre-allocation
Now I will make a functions-only version with numba, and try to make the processing as fast as possible. There is possibility also to use classes in numba, but they are still bit more experimental and complex, and this problem can be solved with functions only. The readability of the code suffers a bit, but that's the price we sometimes pay for speed optimization.
I'll start with the process function. Now it will have to create a a three dimensional array instead of a dict. The reason we want to create the array ahead of time because we memory allocation is a costly process and we want to do that exactly once. So, instead of having this as output for myarr:
# output[(row,column)]
#
output[(0,0)] # [2,4]
output[(0,1)] # [2, 6, 10]
#..etc
I want constant-sized output:
# output[row][column]
#
output[0][0] # [2, 4, nan, nan]
output[0][1] # [2, 6, 10, nan]
#..etc
Notice that after all the "pairs", the output is np.nan (not a number). Any postprocessing script must then just simply ignore the extra nans.
Solving for the required size for the pre-allocated array
How I know the size of the third dimension, i.e. the number of neighbours for given max. distance I? Well, I don't. It seems this is quite a complicated problem. See, for example this, this or the Gauss circle problem in Wikipedia. Nevertheless, I can quite easily calculate an upper bound for the number of neighbours. In the following I assume that neighbour is a neighbour if and only if the distance of the middle point of the cells is less or equal to I. If you create sketches with pen and paper, you will notice that when you increase the number of neighbours, the maximum number of neighbours grows as:
I = 1 -> max_number_neighbours = 4
I = 2 -> max_number_neighbours = 9
I = 3 -> max_number_neighbours = 28
Here is an example sketch with 10 x 10 2d-array and distance I=3, when current cell is (4,5), the number of neighbours must be less or equal to 28:
This pattern is represented as a function of max distance (I): (2*I-1)**2 + 4 -1, or
n_third_dimension = max_number_neighbours = (2*I-1)**2 + 3
Refactoring the code to work with numba
We start with creating the function signature of the entry point. In this case, we create a function process with the function signature:
#numba.jit("f4[:,:,:](f4[:,:], f4)")
def process(arr, max_distance):
...
See the docs for the other available types. The f4[:,:] just means that the input is 2d-array of float32 and f4[:,:,:](....) means that the function output is 3d-array of float32. Next, we create the output with the formula we invented above. Here is one part of the magic: memory pre-allocation with np.empty:
n_third_dimension = (2 * math.ceil(max_distance) - 1) ** 2 + 3
output = np.empty((*arr.shape, n_third_dimension), dtype=np.float32)
cell_result = np.empty(n_third_dimension, dtype=np.float32)
Numbafied code
I will not walk though the rest of the code hand-in-hand, but you can see below that it is a bit modified version of the pure python for-loop baseline.
import math
import numba
import numpy as np
#numba.njit("f4(i4,i4,i4,i4)")
def distance(row, col, current_row, current_col):
distance_squared = (current_row - row) ** 2 + (current_col - col) ** 2
return np.sqrt(distance_squared)
#numba.njit("boolean(i4,i4, i4,i4)")
def row_col_is_within_array(
row,
col,
arr_rows,
arr_cols,
):
if row < 0 or col < 0:
return False
if row > arr_rows - 1 or col > arr_cols - 1:
return False
return True
#numba.njit("boolean(i4,i4,i4,i4,f4,i4,i4)")
def are_neighbours(
neighbour_row,
neighbour_col,
current_row,
current_col,
max_distance,
arr_rows,
arr_cols,
):
if neighbour_row == current_row and neighbour_col == current_col:
return False
if not row_col_is_within_array(
neighbour_row,
neighbour_col,
arr_rows,
arr_cols,
):
return False
return (
distance(neighbour_row, neighbour_col, current_row, current_col) <= max_distance
)
#numba.njit("f4[:](f4[:,:], f4[:], i4,i4,i4,f4)")
def process_cell(
arr, cell_result, current_row, current_col, n_third_dimension, max_distance
):
for i in range(n_third_dimension):
cell_result[i] = np.nan
current_cell_value = arr[current_row][current_col]
# Potential cell neighbour area
start_row = math.floor(current_row - max_distance)
start_col = math.floor(current_col - max_distance)
end_row = math.ceil(current_row + max_distance)
end_col = math.ceil(current_col + max_distance)
arr_rows, arr_cols = arr.shape
cell_pointer = 0
for neighbour_row in range(start_row, end_row + 1):
for neighbour_col in range(start_col, end_col + 1):
if are_neighbours(
neighbour_row,
neighbour_col,
current_row,
current_col,
max_distance,
arr_rows,
arr_cols,
):
neighbour_cell_value = arr[neighbour_row][neighbour_col]
cell_result[cell_pointer] = current_cell_value * neighbour_cell_value
cell_pointer += 1
return cell_result
#numba.njit("f4[:,:,:](f4[:,:], f4)")
def process(arr, max_distance):
n_third_dimension = (2 * math.ceil(max_distance) - 1) ** 2 + 3
output = np.empty((*arr.shape, n_third_dimension), dtype=np.float32)
cell_result = np.empty(n_third_dimension, dtype=np.float32)
rows, columns = arr.shape
for current_row in range(rows):
for current_col in range(columns):
cell_result = process_cell(
arr,
cell_result,
current_row,
current_col,
n_third_dimension,
max_distance,
)
output[current_row][current_col][:] = cell_result
return output
Example output
>>> output = process(myarr, max_distance=1.0)
>>> output
array([[[ 2., 4., nan, nan],
[ 2., 6., 10., nan],
[ 6., 18., nan, nan]],
[[ 4., 20., 28., nan],
[10., 20., 30., 40.],
[18., 30., 54., nan]],
[[28., 56., nan, nan],
[40., 56., 72., nan],
>>> output[0]
array([[ 2., 4., nan, nan],
[ 2., 6., 10., nan],
[ 6., 18., nan, nan]], dtype=float32)
>>> output[0][1]
array([ 2., 6., 10., nan], dtype=float32)
# Above is the same as target: [2 * 1, 2 * 3, 2 * 5]
Speed of the numbafied code and closing words
The baseline approach rxecution time was 188 ms. Now, it is 271 µs. That is only 0.00144 times of what the original code took! (99.85% reduction in execution time. Some would say 693x faster.).
>>> %timeit process(arr_large, max_distance=4.0)
271 µs ± 2.88 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Note that you might want to calculate the distance differently, or add there weighting, or some more complex logic, aggregation functions, etc. This could be still further optimized a bit by creating better estimate for the maximum number of neighbors, for example. Have fun with numba, and I hope you learned something! :)
Bonus tip: There is also ahead of time compilation in numba which you can use to make also the first function call fast!

Square root of all values in numpy array, preserving sign

I'd like to take the square root of every value in a numpy array, while preserving the sign of the value (and not returning complex numbers when negative) - a signed square root.
The code below demonstrates the desired functionality w/ lists, but is not taking advantage of numpy's optimized array manipulating superpowers.
def signed_sqrt(list):
new_list = []
for v in arr:
sign = 1
if v < 0:
sign = -1
sqrt = cmath.sqrt(abs(v))
new_v = sqrt * sign
new_list.append(new_v)
list = [1., 81., -7., 4., -16.]
list = signed_sqrt(list)
# [1., 9., -2.6457, 2. -4.]
For some context, I'm computing the Hellinger Kernel for [thousands of] image comparisons.
Any smooth way to do this with numpy? Thanks.
You can try using the numpy.sign function to capture the sign, and just take the square root of the absolute value.
import numpy as np
x = np.array([-1, 1, 100, 16, -100, -16])
y = np.sqrt(np.abs(x)) * np.sign(x)
# [-1, 1, 10, 4, -10, -4]

Python get get average of neighbours in matrix with na value

I have very large matrix, so dont want to sum by going through each row and column.
a = [[1,2,3],[3,4,5],[5,6,7]]
def neighbors(i,j,a):
return [a[i][j-1], a[i][(j+1)%len(a[0])], a[i-1][j], a[(i+1)%len(a)][j]]
[[np.mean(neighbors(i,j,a)) for j in range(len(a[0]))] for i in range(len(a))]
This code works well for 3x3 or small range of matrix, but for large matrix like 2k x 2k this is not feasible. Also this does not work if any of the value in matrix is missing or it's like na
This code works well for 3x3 or small range of matrix, but for large matrix like 2k x 2k this is not feasible. Also this does not work if any of the value in matrix is missing or it's like na. If any of the neighbor values is na then skip that neighbour in getting the average
Shot #1
This assumes you are looking to get sliding windowed average values in an input array with a window of 3 x 3 and considering only the north-west-east-south neighborhood elements.
For such a case, signal.convolve2d with an appropriate kernel could be used. At the end, you need to divide those summations by the number of ones in kernel, i.e. kernel.sum() as only those contributed to the summations. Here's the implementation -
import numpy as np
from scipy import signal
# Inputs
a = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]
# Convert to numpy array
arr = np.asarray(a,float)
# Define kernel for convolution
kernel = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
# Perform 2D convolution with input data and kernel
out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
Shot #2
This makes the same assumptions as in shot #1, except that we are looking to find average values in a neighborhood of only zero elements with the intention to replace them with those average values.
Approach #1: Here's one way to do it using a manual selective convolution approach -
import numpy as np
# Convert to numpy array
arr = np.asarray(a,float)
# Pad around the input array to take care of boundary conditions
arr_pad = np.lib.pad(arr, (1,1), 'wrap')
R,C = np.where(arr==0) # Row, column indices for zero elements in input array
N = arr_pad.shape[1] # Number of rows in input array
offset = np.array([-N, -1, 1, N])
idx = np.ravel_multi_index((R+1,C+1),arr_pad.shape)[:,None] + offset
arr_out = arr.copy()
arr_out[R,C] = arr_pad.ravel()[idx].sum(1)/4
Sample input, output -
In [587]: arr
Out[587]:
array([[ 4., 0., 3., 3., 3., 1., 3.],
[ 2., 4., 0., 0., 4., 2., 1.],
[ 0., 1., 1., 0., 1., 4., 3.],
[ 0., 3., 0., 2., 3., 0., 1.]])
In [588]: arr_out
Out[588]:
array([[ 4. , 3.5 , 3. , 3. , 3. , 1. , 3. ],
[ 2. , 4. , 2. , 1.75, 4. , 2. , 1. ],
[ 1.5 , 1. , 1. , 1. , 1. , 4. , 3. ],
[ 2. , 3. , 2.25, 2. , 3. , 2.25, 1. ]])
To take care of the boundary conditions, there are other options for padding. Look at numpy.pad for more info.
Approach #2: This would be a modified version of convolution based approach listed earlier in Shot #1. This is same as that earlier approach, except that at the end, we selectively replace
the zero elements with the convolution output. Here's the code -
import numpy as np
from scipy import signal
# Inputs
a = [[1,2,3],[3,4,5],[5,6,7],[4,8,9]]
# Convert to numpy array
arr = np.asarray(a,float)
# Define kernel for convolution
kernel = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
# Perform 2D convolution with input data and kernel
conv_out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
# Initialize output array as a copy of input array
arr_out = arr.copy()
# Setup a mask of zero elements in input array and
# replace those in output array with the convolution output
mask = arr==0
arr_out[mask] = conv_out[mask]
Remarks: Approach #1 would be the preferred way when you have fewer number of zero elements in input array, otherwise go with Approach #2.
This is an appendix to comments under #Divakar's answer (rather than an independent answer).
Out of curiosity I tried different 'pseudo' convolutions against the scipy convolution. The fastest one was the % (modulus) wrapping one, which surprised me: obviously numpy does something clever with its indexing, though obviously not having to pad will save time.
fn3 -> 9.5ms, fn1 -> 21ms, fn2 -> 232ms
import timeit
setup = """
import numpy as np
from scipy import signal
N = 1000
M = 750
P = 5 # i.e. small number -> bigger proportion of zeros
a = np.random.randint(0, P, M * N).reshape(M, N)
arr = np.asarray(a,float)"""
fn1 = """
arr_pad = np.lib.pad(arr, (1,1), 'wrap')
R,C = np.where(arr==0)
N = arr_pad.shape[1]
offset = np.array([-N, -1, 1, N])
idx = np.ravel_multi_index((R+1,C+1),arr_pad.shape)[:,None] + offset
arr[R,C] = arr_pad.ravel()[idx].sum(1)/4"""
fn2 = """
kernel = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
conv_out = signal.convolve2d(arr, kernel, boundary='wrap', mode='same')/kernel.sum()
mask = arr == 0.0
arr[mask] = conv_out[mask]"""
fn3 = """
R,C = np.where(arr == 0.0)
arr[R, C] = (arr[(R-1)%M,C] + arr[R,(C-1)%N] + arr[R,(C+1)%N] + arr[(R+1)%M,C]) / 4.0
"""
print(timeit.timeit(fn1, setup, number = 100))
print(timeit.timeit(fn2, setup, number = 100))
print(timeit.timeit(fn3, setup, number = 100))
Using numpy and scipy.ndimage, you can apply a "footprint" that defines where you look for the neighbours of each element and apply a function to those neighbours:
import numpy as np
import scipy.ndimage as ndimage
# Getting neighbours horizontally and vertically,
# not diagonally
footprint = np.array([[0,1,0],
[1,0,1],
[0,1,0]])
a = [[1,2,3],[3,4,5],[5,6,7]]
# Need to make sure that dtype is float or the
# mean won't be calculated correctly
a_array = np.array(a, dtype=float)
# Can specify that you want neighbour selection to
# wrap around at the borders
ndimage.generic_filter(a_array, np.mean,
footprint=footprint, mode='wrap')
Out[36]:
array([[ 3.25, 3.5 , 3.75],
[ 3.75, 4. , 4.25],
[ 4.25, 4.5 , 4.75]])

Performing operations on variables before assignment of variable values in Python

Ok, so basically my problem is shifting frame of mind from solving math problems „on the paper“ to solving them by programing. Let me explain: I want to know is it possible to perform operations on variable before assigning it a value. Like if I have something like (1-x)**n can I firstly assign n a value, then turn it into a from specific for certain degree and then give x a value or values. If I wasn’t clear enough: if n=2 can I firstly turn equation in form 1-2x+x**2 and then in the next step take care of x value?
I want to write a code for calculating and drawing n-th degree Bezier curve .I am using Bernstein polynomials for this, so I realized that equations consists of 3 parts: first part are polynomial coefficients which are all part of Pascal triangle; I am calculating those and putting them in one list. Second part are coordinates of control points which are also some kind of coefficients, and put them in separate list. Now comes the hard part: part of equation that has a variable.Bernsteins are working with barocentric coordinates (meaning u and 1-u).N-th degree formula for this part of equation is:
u**i *(1-u)**(n-i)
where n is curve degree, I goes from 0->n and U is variable.U is acctualy normalised variable,meaning that it value can be from 0 to 1 and i want to itterate it later in certain number of steps (like 1000).But problem is if i try to use mentioned equation i keep getting error, because Python doesnt know what to do with u.I taught about nested loops in which first one would itterate a value of u from 0 to 1 and second would take care of the mentioned equation from 0 to n, but not sure if it is right solution,and no idea how to chech results.What do you think?
PS: I have not uploaded the code because the part with which im having problem i can not even start,and ,I think but could be wrong, that it is separated from the rest of the code; but if you think it can help solving problem i can upload it.
You can do with higher-order functions, that is functions that return functions, like in
def Bernstein(n,i):
def f(t):
return t**i*(1.0-t)**(n-i)
return f
that you could use like this
b52 = Bernstein(5,2)
val = b52(0.74)
but instead you'll rather use lists
Bernstein_ni = [Bernstein(n,i) for i in range(n+1)]
to be used in a higher order function to build the Bezier curve function
def mk_bezier(Px,Py):
"Input, lists of control points, output a function of t that returns (x,y)"
n = len(Px)
binomials = {0:[1], 1:[1,1], 2:[1,2,1],
3:[1,3,3,1], 4:[1,4,6,4,1], 5:[1,5,10,10,5,1]}
binomial = binomials[n-1]
bPx = [b*x for b,x in zip(binomial,Px)]
bPy = [b*y for b,y in zip(binomial,Py)]
bns = [Bernstein(n-1,i) for i in range(n)]
def f(t):
x = 0 ; y = 0
for i in range(n):
berns = bns[i](t)
x = x + bPx[i]*berns
y = y + bPy[i]*berns
return x, y
return f
eventually, in your program, you can use the function factory like this
linear = mk_bezier([0.0,1.0],[1.0,0.0])
quadra = mk_bezier([0.0,1.0,2.0],[1.0,3.0,1.0])
for t in (0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0):
l = linear(t) ; q = quadra(t)
print "%3.1f (%6.4f,%6.4f) (%6.4f,%6.4f)" % (t, l[0],l[1], q[0],q[1])
and this is the testing output
0.0 (0.0000,1.0000) (0.0000,1.0000)
0.1 (0.1000,0.9000) (0.2000,1.3600)
0.2 (0.2000,0.8000) (0.4000,1.6400)
0.3 (0.3000,0.7000) (0.6000,1.8400)
0.4 (0.4000,0.6000) (0.8000,1.9600)
0.5 (0.5000,0.5000) (1.0000,2.0000)
0.6 (0.6000,0.4000) (1.2000,1.9600)
0.7 (0.7000,0.3000) (1.4000,1.8400)
0.8 (0.8000,0.2000) (1.6000,1.6400)
0.9 (0.9000,0.1000) (1.8000,1.3600)
1.0 (1.0000,0.0000) (2.0000,1.0000)
Edit
I think that the right way to do it is at the module level, with a top level sort-of-defaultdictionary that memoizes all the different lists required to perform the actual computations, but defaultdict doesn't pass a variable to its default_factory and I don't feel like subclassing dict (not now) for the sake of this answer, the main reason being that I've never subclassed before...
In response to OP comment
You say that the function degree is the main parameter? But it is implicitely defined by length of the list of control points...
N = user_input()
P0x = user_input()
P0y = user_input()
PNx = user_input()
PNy = user_input()
# code that computes P1, ..., PNminus1
orderN = mk_bezier([P0x,P1x,...,PNminus1x,PNx],
[P0y,P1y,...,PNminus1y,PNy])
x077, y077 = orderN(0.77)
But the customer is always right, so I'll never try again to convince you that my solution works for you if you state that it does things differently from your expectations.
There are Python packages for doing symbolic math, but it might be easier to use some of the polynomial functions available in Numpy. These functions use the convention that a polynomial is represented as an array of coefficients, starting with the lowest order coefficient. So a polynomial a*x^2 + b*x + c would be represented as array([c, b, a]).
Some examples:
In [49]: import numpy.polynomial.polynomial as poly
In [50]: p = [-1, 1] # -x + 1
In [51]: p = poly.polypow(p, 2)
In [52]: p # should be 1 - 2x + x^2
Out[52]: array([ 1., -2., 1.])
In [53]: x = np.arange(10)
In [54]: poly.polyval(x, p) # evaluate polynomial at points x
Out[54]: array([ 1., 0., 1., 4., 9., 16., 25., 36., 49., 64.])
And you could calculate your Bernstein polynomial in a way similar to this (there is still a binomial coefficient missing):
In [55]: def Bernstein(n, i):
...: part1 = poly.polypow([0, 1], i) # (0 + u)**i
...: part2 = poly.polypow([1, -1], n - i) # (1 - u)**(n - i)
...: return poly.polymul(part1, part2)
In [56]: p = Bernstein(3, 2)
In [57]: p
Out[57]: array([ 0., 0., 1., -1.])
In [58]: poly.polyval(x, p) # evaluate polynomial at points x
Out[58]: array([ 0., 0., -4., -18., ..., -448., -648.])

Python plot frequency of fft.rfft

this is my first question here on stackoverflow and I hope I will not make huge mistakes.
I am analyzing a set of time series with sampling rate of 1 Hz. I need to plot their fourier transform in order to study their spectra.
Here it is my piece of code:
from obspy.core import read
import numpy as np
import matplotlib.pyplot as plt
st = read('../SC_noise/*HEC_109C*_s', format='SAC')
stp = st.copy()
stp.detrend('linear')
stp.taper('cosine')
for tr in stp:
dataonly = tr.data
spec = np.fft.rfft(dataonly)
plt.plot(abs(spec))
plt.show()
This works just fine: the plot is the same I get using SAC. But the xaxis does not show frequencies. I've wandered around a little bit and found different ideas: none of them is working.
For example in the case of a fft (here I am using a rfft) this should do the job
samp_rate=1
freq = np.fft.fftfreq(len(spec), d=1./samp_rate)
But if I use it it would give me negative frequencies.
Does anybody have an idea?
Thank you very much in advance for all the help!
Piero
If your NumPy version is new enough (1.8 or better), use numpy.fft.rfftfreq. Otherwise, here is the definition:
def rfftfreq(n, d=1.0):
"""
Return the Discrete Fourier Transform sample frequencies
(for usage with rfft, irfft).
The returned float array `f` contains the frequency bin centers in cycles
per unit of the sample spacing (with zero at the start). For instance, if
the sample spacing is in seconds, then the frequency unit is cycles/second.
Given a window length `n` and a sample spacing `d`::
f = [0, 1, ..., n/2-1, n/2] / (d*n) if n is even
f = [0, 1, ..., (n-1)/2-1, (n-1)/2] / (d*n) if n is odd
Unlike `fftfreq` (but like `scipy.fftpack.rfftfreq`)
the Nyquist frequency component is considered to be positive.
Parameters
----------
n : int
Window length.
d : scalar, optional
Sample spacing (inverse of the sampling rate). Defaults to 1.
Returns
-------
f : ndarray
Array of length ``n//2 + 1`` containing the sample frequencies.
Examples
--------
>>> signal = np.array([-2, 8, 6, 4, 1, 0, 3, 5, -3, 4], dtype=float)
>>> fourier = np.fft.rfft(signal)
>>> n = signal.size
>>> sample_rate = 100
>>> freq = np.fft.fftfreq(n, d=1./sample_rate)
>>> freq
array([ 0., 10., 20., 30., 40., -50., -40., -30., -20., -10.])
>>> freq = np.fft.rfftfreq(n, d=1./sample_rate)
>>> freq
array([ 0., 10., 20., 30., 40., 50.])
"""
if not (isinstance(n,int) or isinstance(n, integer)):
raise ValueError("n should be an integer")
val = 1.0/(n*d)
N = n//2 + 1
results = arange(0, N, dtype=int)
return results * val

Categories

Resources