Manual fft not giving me same results as fft - python

import numpy as np
import matplotlib.pyplot as pp
curve = np.genfromtxt('C:\Users\latel\Desktop\kool\Neuro\prax2\data\curve.csv',dtype = 'float', delimiter = ',')
curve_abs2 = np.empty_like(curve)
z = 1j
N = len(curve)
for i in range(0,N-1):
curve_abs2[i] =0
for k in range(0,N-1):
curve_abs2[i] += (curve[i]*np.exp((-1)*z*(np.pi)*i*((k-1)/N)))
for i in range(0,N):
curve_abs2[i] = abs(curve_abs2[i])/(2*len(curve_abs2))
#curve_abs = (np.abs(np.fft.fft(curve)))
#pp.plot(curve_abs)
pp.plot(curve_abs2)
pp.show()
The code behind # gives me 3 values. But this is just ... different
Wrong ^^ this code: http://www.upload.ee/image/3922681/Ex5problem.png
Correct using numpy.fft.fft(): http://www.upload.ee/image/3922682/Ex5numpyformulas.png

There are several problems:
You are assigning complex values to the elements of curve_abs2, so it should be declared to be complex, e.g. curve_abs2 = np.empty_like(curve, dtype=np.complex128). (And I would recommend using the name, say, curve_fft instead of curve_abs2.)
In python, range(low, high) gives the sequence [low, low + 1, ..., high - 2, high - 1], so instead of range(0, N - 1), you must use range(0, N) (which can be simplified to range(N), if you want).
You are missing a factor of 2 in your formula. You could fix this by using z = 2j.
In the expression that is being summed in the inner loop, you are indexing curve as curve[i], but this should be curve[k].
Also in that expression, you don't need to subtract 1 from k, because the k loop ranges from 0 to N - 1.
Because k and N are integers and you are using Python 2.7, the division in the expression (k-1)/N will be integer division, and you'll get 0 for all k. To fix this and the previous problem, you can change that term to k / float(N).
If you fix those issues, when the first double loop finishes, the array curve_abs2 (now a complex array) should match the result of np.fft.fft(curve). It won't be exactly the same, but the differences should be very small.
You could eliminate that double loop altogether using numpy vectorized calculations, but that is a topic for another question.

Related

Numerical approximation of forward difference in an interval

How can python be used for numerical finite difference calculation without using numpy?
For example I want to find multiple function values numerically in a certain interval with a step size 0.05 for a first order and second order derivatives.
Why don't you want to use Numpy? It's a good library and very fast for doing numerical computations because it's written in C (which is generally faster for numerical stuff than pure Python).
If you're curious how these methods work and how they look in code here's some sample code:
def linspace(a, b, step):
if a > b:
# see if going backwards?
if step < 0:
return linspace(b, a, -1*step)[::-1]
# step isn't negative so no points
return []
pt = a
res = [pt]
while pt <= b:
pt += step
res.append(pt)
return res
def forward(data, step):
if not data:
return []
res = []
i = 0
while i+1 < len(data):
delta = (data[i+1] - data[i])/step
res.append(delta)
i += 1
return res
# example usage
size = 0.1
ts = linspace(0, 1, size)
y = [t*t for t in ts]
dydt = forward(y, size)
d2ydt2 = forward(dydt, size)
Note: this will still use normal floating point numbers and so there are still odd rounding errors that happen because some numbers don't have an exact binary decimal representation.
Another library to check out is mpmath which has a lot of cool math functions like integration and special functions AND it allows you to specify how much precision you want. Of course using 100 digits of precision is going to be a lot slower than normal floats, but it is still a very cool library!

Recreating R Quantile Type 2 in Numpy

I'm migrating some legacy code from R to Python and I'm having trouble matching the quantile results with numpy percentile.
Given the following list of numbers:
a1 = [
5.75,6.13333333333333,7.13636363636364,9,10.1,4.80952380952381,8.82926829268293,4.7906976744186,3.83333333333333,6,6.1,
8.88235294117647,30,5.7,3.98507462686567,6.83333333333333,8.39805825242718,4.78260869565217,7.26356589147287,5.67857142857143,
3.58333333333333,6.69230769230769,14.3333333333333,14.3333333333333,5.125,5.16216216216216,5.36363636363636,10.7142857142857,
4.90909090909091,7.5,8,6,6.93939393939394,10.4,6,6.8,5.33333333333333,10.3076923076923,4.5625,5.4,6.44,3.36363636363636,
11.1666666666667,4.5,7.35714285714286,10.6363636363636,9.26746031746032,3.83333333333333,5.75,9.14285714285714,8.27272727272727,
5,5.92307692307692,5.23076923076923,4.09375,6.25,4.63888888888889,6.07142857142857,5,5.42222222222222,3.93892045454545,4.8,
8.71428571428571,6.25925925925926,4.12,5.30769230769231,4.26086956521739,5.22222222222222,4.64285714285714,5,3.64705882352941,
5.33333333333333,3.65217391304348,3.54166666666667,10.0952380952381,3.38235294117647,8.67123287671233,2.66666666666667,3.5,4.875,
4.5,6.2,5.45454545454545,4.89189189189189,4.71428571428571,1,5.33333333333333,6.09090909090909,4.36756756756757,6,5.17197452229299,
4.48717948717949,5.01219512195122,4.83098591549296,5.25,8.52,5.47692307692308,5.45454545454545,8.6578947368421,8.35714285714286,3.25,
8.5,4,5.95652173913043,7.05882352941176,7.5,8.6,8.49122807017544,5.14285714285714,4,13.3294117647059,9.55172413793103,5.57446808510638,
4.5,8,4.11764705882353,3.9,5.14285714285714,6,4.66666666666667,6,3.75,4.93333333333333,4.5,5.21666666666667,6.53125,6,7,7.28333333333333,
7.34615384615385,7.15277777777778,8.07936507936508,11.609756097561
]
Using quantile in R such that
quantile(a1, probs=.05, type=2)
Gives a results of 3.541667
Trying all of the interpolation methods in numpy to find the same result:
{x:np.percentile(a1,q=5, interpolation=x) for x in ['linear','lower','higher','nearest','midpoint']}
Yields
{'linear': 3.566666666666666,
'lower': 3.54166666666667,
'higher': 3.58333333333333,
'nearest': 3.58333333333333,
'midpoint': 3.5625}
As we can see the lower interpolation method returns the same result as R quantile type 2
However again with a different quantile in R we get different results:
quantile(a1, probs=.95, type=2)
Gives a result of 10.71429
And with numpy:
{x:np.percentile(a1,q=95, interpolation=x) for x in ['linear','lower','higher','nearest','midpoint']}
Yields
{'linear': 10.667532467532439,
'lower': 10.6363636363636,
'higher': 10.7142857142857,
'nearest': 10.6363636363636,
'midpoint': 10.67532467532465}
In this case the higher interpolation method returns the same result
I'm hoping that someone familiar enough w/the R quantile types can help me reproduce the same quantile logic in numpy.
You can implement this yourself. With type=2 it's a rather simple calculation. You either take the next highest order statistic or at a discontinuity (i.e. 100 values and you want the p=0.06, which falls exactly on the 6th value) you take the average of that order statistic and the next greatest order statistic.
import numpy as np
def R_type2(arr, p):
"""
arr : array-like
p : float between [0, 1]
"""
#m=0 for Q_2(p) in R
x = np.sort(arr)
n = len(x)
aleph = n*p
k = np.floor(np.array(aleph).clip(1, n-1)).astype(int)
gamma = {False: 1, True: 0.5}.get(aleph==k) # Discontinuity or not
# Deal with case where it should be smallest value
if aleph < 1:
return x[k-1] # x[0]
else:
return (1.-gamma)*x[k-1] + gamma*x[k]
R_type2(a1, 0.05)
#3.54166666666667
R_type2(a1, 0.95)
#10.7142857142857
A word of caution. k will be an integer while n*p is a float. In general it's a very bad idea to do aleph==k because this leads to problems with floating point inaccuracies. For instance with 100 numbers p=0.07 is NOT considered a discontinuity because 0.07 cannot be represented precisely. However, because R seems to implement a pure equality check I left it like the above for consistency.
I personally would favor changing from the equaltiy: {False: 1, True: 0.5}.get(aleph==k)
to {False: 1, True: 0.5}.get(np.isclose(aleph,k)) that way floating point issues don't become a problem.

Avoid Mean of Floating Point Error

When I calculate the mean of a list of floats the following way
def mean(x):
sum(x) / len(x)
then I usually do not care about tiny errors in floating point operations. Though, I am currently facing an issue where I want to get all elements in a list that are equal or above the list's average.
Again, this is usually no issue but when I face cases where all elements in the list are equal floating point numbers than the mean value calculated by the function above actually returns a value above all the elements. That, in my case, obviously is an issue.
I need a workaround to that involving no reliability on python3.x libraries (like e.g. statistics).
Edit:
It has been suggested in the comments to use rounding. This interestingly resulted in errors being rarer, but they still occur, as e.g. in this case:
[0.024484987, 0.024484987, 0.024484987, 0.024484987, ...] # x
0.024485 # mean
[] # numbers above mean
I believe you should be using math.fsum() instead of sum. For example:
>>> a = [0.024484987, 0.024484987, 0.024484987, 0.024484987] * 1360001
>>> math.fsum(a) / len(a)
0.024484987
This is, I believe, the answer you are looking for. It produces more consistent results, irrespective of the length of a, than the equivalent using sum().
>>> sum(a) / len(a)
0.024484987003073517
One neat solution is to use compensated summation, combined with double-double tricks to perform the division accurately:
def mean_kbn(X):
# 1. Kahan-Babuska-Neumaier summation
s = c = 0.0
n = 0
for x in X:
t = s + x
if abs(s) >= abs(x):
c -= ((s-t) + x)
else:
c -= ((x-t) + s)
s = t
n += 1
# sum is now s - c
# 2. double-double division from Dekker (1971)
# https://link.springer.com/article/10.1007%2FBF01397083
u = s / n # first guess of division
# Python doesn't have an fma function, so do mul2 via Veltkamp splitting
v = 1.34217729e8 # 0x1p27 + 1
uv = u*v
u_hi = (u - uv) + uv
u_lo = u - u_hi
nv = n*v
n_hi = (n - nv) + nv
n_lo = n - n_hi
# r = s - u*n exactly
r = (((s - u_hi*n_hi) - u_hi*n_lo) - u_lo*n_hi) - u_lo*n_lo
# add correction
return u + (r-c)/n
Here's a sample case I found, comparing with the sum, math.fsum and numpy.mean:
>>> mean_kbn([0.2,0.2,0.2])
0.2
>>> sum([0.2,0.2,0.2])/3
0.20000000000000004
>>> import math
>>> math.fsum([0.2,0.2,0.2])/3
0.20000000000000004
>>> import numpy
>>> numpy.mean([0.2,0.2,0.2])
0.20000000000000004
How about not using the mean but just multiplying each element by the length of the list and comparing it directly to the sum of the original list?
I think this should do what you want without relying on division

Capturing all data in non-whole train, test, and validate splits

just wondering if a better solution exists for this sort of problem.
We know that for a X/Y percentage split of an even number we can get an exact split of the data - for example for data size 10:
10 * .6 = 6
10 * .4 = 4
10
Splitting data this way is easy, and we can guarantee we have all of the data and nothing is lost. However where I am struggling is on less friendly numbers - take 11
11 * .6 = 6.6
11 * .4 = 4.4
11
However we can't index into an array at i = 6.6 for example. So we have to decide how to to do this. If we take JUST the integer portion we lose 1 data point -
First set = 0..6
Second set = 6..10
This would be the same case if we floored the numbers.
However, if we take the ceiling of the numbers:
First set = 0..7
Second set = 7..12
And we've read past the end of our array.
This gets even worse when we throw in a 3rd or 4th split (30,30,20,20 for example).
Is there a standard splitting procedure for these kinds of problems? Is data loss accepted? It seems like data loss would be unacceptable for dependent data, such as time series.
Thanks!
EDIT: The values .6 and .4 are chosen by me. They could be any two numbers that sum to 1.
First of all, notice that your problem is not limited to odd-sized arrays as you claim, but any-sized arrays. How would you make the 56%-44% split of a 10 element array? Or a 60%-40% split of a 4 element array?
There is no standard procedure. In many cases, programmers do not care that much about an exact split and they either do it by flooring or rounding one quantity (the size of the first set), while taking the complementary (array length - rounded size) for the other (the size of the second).
This might be ok in most cases when this is an one-off calculation and accuracy is not required. You have to ask yourself what your requirements are. For example: are you taking thousands of 10-sized arrays and each time you are splitting them 56%-44% doing some calculations and returning a result? You have to ask yourself what accuracy do you want. Do you care if your result ends up being
the 60%-40% split or the 50%-50% split?
As another example imagine that you are doing a 4-way equal split of 25%-25%-25%-25%. If you have 10 elements and you apply the rounding technique you end up with 3,3,3,1 elements. Surely this will mess up your results.
If you do care about all these inaccuracies then the first step is consider whether you can to adjust either the array size and/or the split ratio(s).
If these are set in stone then the only way to have an accurate split of any ratios of any sized array is to make it probabilistic. You have to split multiple arrays for this to work (meaning you have to apply the same split ratio to same-sized arrays multiple times). The more arrays the better (or you can use the same array multiple times).
So imagine that you have to make a 56%-44% split of a 10 sized array. This means that you need to split it in 5.6 elements and 4.4 elements on the average.
There are many ways you can achieve a 5.6 element average. The easiest one (and the one with the smallest variance in the sequence of tries) is to have 60% of the time a set with 6 elements and 40% of the time a set that has 5 elements.
0.6*6 + 0.4*5 = 5.6
In terms of code this is what you can do to decide on the size of the set each time:
import random
array_size = 10
first_split = 0.56
avg_split_size = array_size * first_split
floored_split_size = int(avg_split_size)
if avg_split_size > floored_split_size:
if random.uniform(0,1) > avg_split_size - floored_split_size:
this_split_size = floored_split_size
else:
this_split_size = floored_split_size + 1
else:
this_split_size = avg_split_size
You could make the code more compact, I just made an outline here so you get the idea. I hope this helps.
Instead of using ciel() or floor() use round() instead. For example:
>>> round(6.6)
7.0
The value returned will be of float type. For getting the integer value, type-cast it to int as:
>>> int(round(6.6))
7
This will be the value of your first split. For getting the second split, calculate it using len(data) - split1_val. This will be applicable in case of 2 split problem.
In case of 3 split, take round value of two split and take the value of 3rd split as the value of len(my_list) - val_split_1 - val_split2
In a Generic way, For N split:
Take the round() value of N-1 split. And for the last value, do len(data) - "value of N round() values".
where len() gives the length of the list.
Let's first consider just splitting the set into two pieces.
Let n be the number of elements we are splitting, and p and q be the proportions, so that
p+q == 1
I assert that the parts after the decimal point will always sum to either 1 or 0, so we should use floor on one and ceil on the other, and we will always be right.
Here is a function that does that, along with a test. I left the print statements in but they are commented out.
def simpleSplitN(n, p, q):
"split n into proportions p and q and return indices"
np = math.ceil(n*p)
nq = math.floor(n*q)
#print n, sum([np, nq]) #np and nq are the proportions
return [0, np] #these are the indices we would use
#test for simpleSplitN
for i in range(1, 10):
p = i/10.0;
q = 1-p
simpleSplitN(37, p, q);
For the mathematically inclined, here is the proof that the decimal proportions will sum to 1
-----------------------
We can express p*n as n/(1/p), and so by the division algorithm we get integers k and r
n == k*(1/p) + r with 0 <= r < (1/p)
Thus r/(1/p) == p*r < 1
We can do exactly the same for q, getting
q*r < 1 (this is a different r)
It is important to note that q*r and p*r are the part after the decimal when we divide our n.
Now we can add them together (we've added subscripts now)
0 <= p*(r_1) < 1
0 <= q*(r_2) < 1
=> 0 < p*r + q*r == p*n + q*n + k_1 + k_2 == n + k_1 + k_2 < 2
But by closure of the integers, n + k_1 + k_2 is an integer and so
0 < n + k_1 + k_2 < 2
means that p*r + q*r must be either 0 or 1. It will only be 0 in the case that our n is divided evenly.
Otherwise we can now see that our fractional parts will always sum to 1.
-----------------------
We can do a very similar (but slightly more complicated) proof for splitting n into an arbitrary number (say N) parts, but instead of them summing to 1, they will sum to an integer less than N.
Here is the general function, it has uncommented print statements for verification purposes.
import math
import random
def splitN(n, c):
"""Compute indices that can be used to split
a dataset of n items into a list of proportions c
by first dividing them naively and then distributing
the decimal parts of said division randomly
"""
nc = [n*i for i in c];
nr = [n*i - int(n*i) for i in c] #the decimal parts
N = int(round(sum(nr))) #sum of all decimal parts
print N, nc
for i in range(0, len(nc)):
nc[i] = math.floor(nc[i])
for i in range(N): #randomly distribute leftovers
nc[random.randint(1, len(nc)) - 1] += 1
print n,sum(nc); #nc now contains the proportions
out = [0] #compute a cumulative sum
for i in range(0, len(nc) - 1):
out.append(out[-1] + nc[i])
print out
return out
#test for splitN with various proportions
c = [.1,.2,.3,.4]
c = [.2,.2,.2,.2,.2]
c = [.3, .2, .2, .3]
for n in range( 10, 40 ):
print splitN(n, c)
If we have leftovers, we will never get an even split, so we distribute them randomly, like #Thanassis said. If you don't like the dependency on random, then you could just add them all at the beginning or at even intervals.
Both of my functions output indices but they compute proportions and thus could be slightly modified to output those instead per user preference.

How to do a Sigma in python 3

I'm trying to make a calculator for something, but the formulas use a sigma, I have no idea how to do a sigma in python, is there an operator for it?
Ill put a link here with a page that has the formulas on it for illustration:http://fromthedepths.gamepedia.com/User:Evil4Zerggin/Advanced_cannon
A sigma (∑) is a Summation operator. It evaluates a certain expression many times, with slightly different variables, and returns the sum of all those expressions.
For example, in the Ballistic coefficient formula
The Python implementation would look something like this:
# Just guessing some values. You have to search the actual values in the wiki.
ballistic_coefficients = [0.3, 0.5, 0.1, 0.9, 0.1]
total_numerator = 0
total_denominator = 0
for i, coefficient in enumerate(ballistic_coefficients):
total_numerator += 2**(-i) * coefficient
total_denominator += 2**(-i)
print('Total:', total_numerator / total_denominator)
You may want to look at the enumerate function, and beware precision problems.
The easiest way to do this is to create a sigma function the returns the summation, you can barely understand this, you don't need to use a library. you just need to understand the logic .
def sigma(first, last, const):
sum = 0
for i in range(first, last + 1):
sum += const * i
return sum
# first : is the first value of (n) (the index of summation)
# last : is the last value of (n)
# const : is the number that you want to sum its multiplication each (n) times with (n)
An efficient way to do this in Python is to use reduce().
To solve
3
Σ i
i=1
You can use the following:
from functools import reduce
result = reduce(lambda a, x: a + x, [0]+list(range(1,3+1)))
print(result)
reduce() will take arguments of a callable and an iterable, and return one value as specified by the callable. The accumulator is a and is set to the first value (0), and then the current sum following that. The current value in the iterable is set to x and added to the accumulator. The final accumulator is returned.
The formula to the right of the sigma is represented by the lambda. The sequence we are summing is represented by the iterable. You can change these however you need.
For example, if I wanted to solve:
Σ π*i^2
i
For a sequence I [2, 3, 5], I could do the following:
reduce(lambda a, x: a + 3.14*x*x, [0]+[2,3,5])
You can see the following two code lines produce the same result:
>>> reduce(lambda a, x: a + 3.14*x*x, [0]+[2,3,5])
119.32
>>> (3.14*2*2) + (3.14*3*3) + (3.14*5*5)
119.32
I've looked all the answers that different programmers and coders have tried to give to your query but i was unable to understand any of them maybe because i am a high school student anyways according to me using LIST will definately reduce some pain of coding so here it is what i think simplest way to form a sigma function .
#creating a sigma function
a=int(input("enter a number for sigma "))
mylst=[]
for i in range(1,a+1):
mylst.append(i)
b=sum(mylst)
print(mylst)
print(b)
Captial sigma (Σ) applies the expression after it to all members of a range and then sums the results.
In Python, sum will take the sum of a range, and you can write the expression as a comprehension:
For example
Speed Coefficient
A factor in muzzle velocity is the speed coefficient, which is a
weighted average of the speed modifiers si of the (non-
casing) parts, where each component i starting at the head has half the
weight of the previous:
The head will thus always determine at least 25% of the speed
coefficient.
For example, suppose the shell has a Composite Head (speed modifier
1.6), a Solid Warhead Body (speed modifier 1.3), and a Supercavitation
Base (speed modifier 0.9). Then we have
s0=1.6
s1=1.3
s2=0.9
From the example we can see that i starts from 0 not the usual 1 and so we can do
def speed_coefficient(parts):
return (
sum(0.75 ** i * si for i, si in enumerate(parts))
/
sum(0.75 ** i for i, si in enumerate(parts))
)
>>> speed_coefficient([1.6, 1.3, 0.9])
1.3324324324324326
import numpy as np
def sigma(s,e):
x = np.arange(s,e)
return np.sum([x+1])

Categories

Resources