I want to do something quite simple but I'm unable to find it in the depths of numpy. I want to numerically and continuously integrate a function given by its values (not by its formula!). That means I simply want an array which holds the sums of the beginning of the input array. Example:
Input:
[ 4, 3, 5, 8 ]
Output:
[ 4, 7, 12, 20 ] # [ sum(i[0:1]), sum(i[0:2]), sum(i[0:3]), sum(i[0:4]) ]
Sounds pretty straight forward, so I'm hopeful this must be easy with some numpy functionality I'm currently unable to find.
I found stuff like scipy.integrate.quad() but that seems to integrate over a given range (from a to b) and the returns a single value. I need an array as output.
You're looking for numpy.cumsum:
>>> numpy.cumsum([ 4, 3, 5, 8 ])
array([ 4, 7, 12, 20])
You would simply need numpy.cumsum().
import numpy as np
a = np.array([ 4, 3, 5, 8 ])
print np.cumsum(a) # prints [ 4 7 12 20]
You can use quadpy (pip install quadpy), a project of mine, which as opposed to scipy.integrate.quad() does vectorized compution. Provide it with many intervals, and get all the integral values over these intervals back.
import numpy
import quadpy
a = 0.0
b = 3.0
h = 1.0e-2
n = int((b-a) / h)
x0 = numpy.linspace(a, b, num=n, endpoint=False)
x1 = x0 + h
intervals = numpy.stack([x0, x1])
vals = quadpy.line_segment.integrate(
lambda x: numpy.sin(x),
intervals,
quadpy.line_segment.GaussLegendre(5)
)
res = numpy.cumsum(vals)
import matplotlib.pyplot as plt
plt.plot(x1, numpy.sin(x1), label='f')
plt.plot(x1, res, label='F')
plt.legend()
plt.show()
You don't need numpy to get the output. Using standard itertools we get the following:
from itertools import accumulate
a = [4, 3, 5, 8]
*b, = accumulate(a)
print(b)
# [4, 7, 12, 20]
Related
I have x,y,v arrays of data points and I am binning v on x-y plane. I am trying to get the x,y,v values back after binning but I want them as arrays corresponding to each bin. My code can get them individually but that will not work for large data sets with many bins. Maybe I need to use loops of some kind but my understanding of loops is weak. Code:
from scipy import stats
import numpy as np
x=np.array([-10,-2,4,12,3,6,8,14,3])
y=np.array([5,5,-6,8,-20,10,2,2,8])
v=np.array([4,-6,-10,40,22,-14,20,8,-10])
ret = stats.binned_statistic_2d(x,
y,
values,
'count',
bins=2,
expand_binnumbers=True)
print('counts=',ret.statistic)
print('binnumber=', ret.binnumber)
binnumber = ret.binnumber
statistic = ret.statistic
# get the bin numbers according to some condition
idx_bin_x, idx_bin_y = np.where(statistic==statistic[1][1])#[0]
print('idx_binx=',idx_bin_x)
print('idx_bin_y=',idx_bin_y)
# A binnumber of i means the corresponding value is
# between (bin_edges[i-1], bin_edges[i]).
# -> increment the bin indices by one
idx_bin_x += 1
idx_bin_y += 1
print('idx_binx+1=',idx_bin_x)
print('idx_bin_y+1=',idx_bin_y)
# get the boolean mask and apply it
is_event_x = np.in1d(binnumber[0], idx_bin_x)
print('eventx=',is_event_x)
is_event_y = np.in1d(binnumber[1], idx_bin_y)
print('eventy=',is_event_y)
is_event_xy = np.logical_and(is_event_x, is_event_y)
print('event_xy=', is_event_xy)
events_x = x[is_event_xy]
events_y = y[is_event_xy]
event_v=v[is_event_xy]
print('x=', events_x)
print('y=', events_y)
print('v=',event_v)
This outputs x,y,v for the bin with count=5 but I want all 4 bins returning 4 arrays for each x,y,v. eg for bin1: x_bin1=[...], y_bin1=[...], v_bin1=[...] and so on for 4 bins.
Also, feel free to suggest if you think there are easier ways to bin 2d planes (x,y) with values (v) like mine and getting binned values. Thank you!
Using np.array facilitates a compact way to recover the arrays you are after:
from scipy import stats
# coordinates
x = np.array([-10,-2,4,12,3,6,8,14,3])
y = np.array([5,5,-6,8,-20,10,2,2,8])
v = np.array([4,-6,-10,40,22,-14,20,8,-10])
ret = stats.binned_statistic_2d(x, y, None, 'count', bins=2, expand_binnumbers=True)
b = ret.binnumber
for i in [1,2]:
for j in [1,2]:
m = (b[0] == i) & (b[1] == j) # mask
print((list(x[m]),list(y[m]),list(v[m])))
which gives for each of the four bins a tuple of 3 lists corresponding to x, y and v values:
([], [], [])
([-10, -2], [5, 5], [4, -6])
([4, 3], [-6, -20], [-10, 22])
([12, 6, 8, 14, 3], [8, 10, 2, 2, 8], [40, -14, 20, 8, -10])
I have an array of data-points, for example:
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
and I need to perform the following sum on the values:
However, the problem is that I need to perform this sum on each value > i. For example, using the last 3 values in the set the sum would be:
and so on up to 10.
If i run something like:
import numpy as np
x = np.array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1])
alpha = 1/np.log(2)
for i in x:
y = sum(x**(alpha)*np.log(x))
print (y)
It returns a single value of y = 247.7827060452275, whereas I need an array of values. I think I need to reverse the order of the data to achieve what I want but I'm having trouble visualising the problem (hope I explained it properly) as a whole so any suggestions would be much appreciated.
The following computes all the partial sums of the grand sum in your formula
import numpy as np
# Generate numpy array [1, 10]
x = np.arange(1, 11)
alpha = 1 / np.log(2)
# Compute parts of the sum
parts = x ** alpha * np.log(x)
# Compute all partial sums
part_sums = np.cumsum(parts)
print(part_sums)
You really do not any explicit loop, or a non-numpy operation (like sum()) here. numpy takes care of all your needs.
I'm currently converting some old fortran code into python and looking to use numpy-style operations as much as I can, for speed.
The code calls for finding the products of all elements of two arrays, like so:
do i=1, nx
do j=1, ny
si(i,j) = xarray(i) * yarray(j)
enddo
enddo
so instead I have vectorized it like so:
for i, x in enumerate(xarray):
si[i] = x * yarray
but is there a way to remove that loop over x and generate the whole "nx x ny" array in one line, which would presumably be faster?
I think you are looking for np.outer
>>> nx = np.array([1,2,3,4])
>>> ny = np.array([2,3,4,5])
>>> np.outer(nx, ny)
array([[ 2, 3, 4, 5],
[ 4, 6, 8, 10],
[ 6, 9, 12, 15],
[ 8, 12, 16, 20]])
Try:
si = xarray.reshape(-1,1) * yarray
I want to perform an SVD on a 12*12 matrix. The numpy.linalg.svd works fine. But when I try to get the 12*12 matrix A back by performing u*s*v , i dont get it back.
import cv2
import numpy as np
import scipy as sp
from scipy import linalg, matrix
a_matrix=np.zeros((12,12))
with open('/home/koustav/Documents/ComputerVision/A2/codes/Points0.txt','r') as f:
for (j,line) in enumerate(f):
i=2*j
if(i%2==0):
values=np.array(map(np.double,line.strip('\n').split(' ')))
a_matrix[i,4]=-values[2]
a_matrix[i,5]=-values[3]
a_matrix[i,6]=-values[4]
a_matrix[i,7]=-1
a_matrix[i,8]=values[1]*values[2]
a_matrix[i,9]=values[1]*values[3]
a_matrix[i,10]=values[1]*values[4]
a_matrix[i,11]=values[1]*1
a_matrix[i+1,0]=values[2]
a_matrix[i+1,1]=values[3]
a_matrix[i+1,2]=values[4]
a_matrix[i+1,3]=1
a_matrix[i+1,8]=-values[0]*values[2]
a_matrix[i+1,9]=-values[0]*values[3]
a_matrix[i+1,10]=-values[0]*values[4]
a_matrix[i+1,11]=-values[0]*1
s_matrix=np.zeros((12,12))
u, s, v = np.linalg.svd(a_matrix,full_matrices=1)
k=0
while (k<12):
s_matrix[k,k]=s[k]
k+=1
print u
print '\n'
print s_matrix
print '\n'
print (u*s_matrix*v)
These are the points that i have used:
285.12 14.91 2.06655 -0.807071 -6.06083
243.92 100.51 2.23268 -0.100774 -5.63975
234.7 176.3 2.40898 0.230613 -5.10977
-126.59 -152.59 -1.72487 4.96296 -10.4564
-173.32 -164.64 -2.51852 4.95202 -10.3569
264.81 28.03 2.07303 -0.554853 -6.05747
Please suggest something...
Except from saving some code and time by using built in functions like numpy.diag, your problem seems to be the * operator. In numpy you have to use numpy.dot for matrix multiplication. See the code below for a working example...
In [16]: import numpy as np
In [17]: A = np.arange(15).reshape(5,3)
In [18]: A
Out[18]:
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
In [19]: u, s, v = np.linalg.svd(A)
In [20]: S = np.diag(s)
In [21]: S = np.vstack([S, np.zeros((2,3)) ])
In [22]: #fill in zeros to get the right shape
In [23]: np.allclose(A, np.dot(u, np.dot(S,v)))
Out[23]: True
numpy.allclose checks whether two arrays are numerically close...
I'm trying to calculate the Pearson correlation correlation between every item in my list. I'm trying to get the correlations between data[0] and data[1], data[0] and data[2], and data[1] and data[2].
import scipy
from scipy import stats
data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]
def pearson(x, y):
series1 = data[x]
series2 = data[y]
if x != y:
return scipy.stats.pearsonr(series1, series2)
h = [pearson(x,y) for x,y in range(0, len(data))]
This returns the error TypeError: 'int' object is not iterable on h. Could someone please explain the error here? Thanks.
range will return you a list of int values while you are trying to use it like it returning you a tuple. Try itertools.combinations instead:
import scipy
from scipy import stats
from itertools import combinations
data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]
def pearson(x, y):
series1 = data[x]
series2 = data[y]
if x != y:
return scipy.stats.pearsonr(series1, series2)
h = [pearson(x,y) for x,y in combinations(len(data), 2)]
Or as #Marius suggested:
h = [stats.pearsonr(data[x], data[y]) for x,y in combinations(len(data), 2)]
Why not use numpy.corrcoef
import numpy as np
data = [[1, 2, 4], [9, 5, 1], [8, 3, 3]]
Result:
>>> np.corrcoef(data)
array([[ 1. , -0.98198051, -0.75592895],
[-0.98198051, 1. , 0.8660254 ],
[-0.75592895, 0.8660254 , 1. ]])
The range() function will give you only an int for each iteration, and you can't assign an int to a pair of values.
If you want to go through every possible pair of possibilities of ints in that range you could try
import itertools
h = [pearson(x,y) for x,y in itertools.product(range(len(data)), repeat=2)]
That will combine all the possibilities in the given range in a tuple of 2 elements
Remember that, using that function you defined, when x==y you will have None values. To fix that you could use:
import itertools
h = [pearson(x,y) for x,y in itertools.permutations(range(len(data)), 2)]