I am using numpy.bincount, and I have a vector of indices ind, and a vector of weights coef, trying to run np.bincount(ind, coef). The problem here is that my weight vector is not of type float64, it is a non-built-in class supporting the arithmetic operator __add__.
I wonder how I can do this? Directly run the code np.bincount(ind, coef) gives me an error that
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'
The specific type I am considering is LaruentPolynomailRing from Sagemath.
bincount is compiled code, so we can't (readily) see what it does; we can only deduce things from the behavior.
The basic count:
In [303]: np.bincount(x)
Out[303]: array([1, 2, 3])
But adapting the weight example, to provide an int array of weights:
In [304]: #w = np.array([0.3, 0.5, 0.2, 0.7, 1., -0.6]) # weights
...: w = np.array([3,5,2,7,10,-6])
...: x = np.array([0, 1, 1, 2, 2, 2])
...: np.bincount(x, weights=w)
Out[304]: array([ 3., 7., 11.])
This is consistent with your error. The result is float, even when weights are int. Weights have been converted to float.
It might do something like this - but compiled code:
In [306]: res = np.zeros(3)
In [307]: for i,v in zip(x,w):
...: res[i] += v
...:
In [308]: res
Out[308]: array([ 3., 7., 11.])
I'm guessing this because it returns a result for each integer value between the x.min and x.max. Written like this it just requires w to have the __add__. But this kind of iteration on object dtype array is slow, even in compiled code - since it has to use to __add__ of each element object. It can't just zip through the byte data-buffer of the w array.
Without the consecutive bin value constraint, a defaultdict is an easy tool for collecting like values.
In [309]: from collections import defaultdict
In [310]: dd = defaultdict(float)
In [311]: for i,v in zip(x,w):
...: dd[i] += v
...:
In [312]: dd
Out[312]: defaultdict(float, {0: 3.0, 1: 7.0, 2: 11.0})
another way -again where x values are indices in the return array:
In [313]: res = np.zeros(3)
In [315]: np.add.at(res, x, w)
In [316]: res
Out[316]: array([ 3., 7., 11.])
I think all these will work with the objects with __add__.
I know about numpy.interp and scipy.interpolate.interp1d, but I can't seem to figure out how to just do a very simple linear interpolation between two lists based on some kind of [0, 1] range. For example if I have lists
x = [2., 3., 4.]
y = [3., 4., 8.5]
I want a function that will accept 0.5 as an argument and give me
[2.5, 3.5, 6.25]
or will accept 0.1 as an argument and give me 5.25, etc.
[2.1, 3.1, 6.25]
Why am I blanking on this? The answer must be quite easy...
Thanks!
You can use zip to iterate over multiple lists simultaneously.
Put this in an explicit for loop or in a list comprehension (or pass it to map, as suggested in another answer). I think the code is quite self-explanatory:
def with_explicit_loop(x_list, y_list, alpha=0.5):
z_list = []
for a, b in zip(x_list, y_list):
z_list.append(a * (1 - alpha) + b * alpha)
return z_list
def with_list_comprehension(x_list, y_list, alpha=0.5):
return [a * (1 - alpha) + b * alpha for a, b in zip(x_list, y_list)]
Both functions are equivalent, but I think the first is slightly easier to read and the second is slightly faster.
Here's a one-liner:
In [10]: lin = lambda x, y, mult: map(lambda (a,b): a*(1-mult) + b*(mult), zip(x, y))
In [11]: lin(x, y, .1)
Out[11]: [2.1, 3.1, 4.45]
In [12]: lin(x, y, .5)
Out[12]: [2.5, 3.5, 6.25]
I have a function like this:
def foo(v, w):
return sum(np.exp(v/w))
Where v in the beginning is a numpy array and w a number. Now I want to plot the value of this function for more values of w, so I need a function that works for different sizes of vectors.
My solution for now is the obvious one
r = []
for e in w:
r.append(foo(v, e))
but I wonder if there is a better way to do it. Also, I want to stay low on memory, so I need to avoid create a big matrix, then applying the function to every value and sum over the columns (the length of v is more than 5e+4 and the length of w is 1e+3).
Thanks
If you cannot determine an upper bound for the length of v and ensure that you don't exceed the memory requirements, I think you will have to stay with your solution.
If you can determine an upper bound for length of v and meet your memory requirements using a Mx1000 array, you can do this.
import numpy as np
v = np.array([1,2,3,4,5])
w = np.array([10.,5.])
c = v / w[:, np.newaxis]
d = np.exp(c)
e = d.sum(axis = 1)
>>>
>>> v
array([1, 2, 3, 4, 5])
>>> w
array([ 10., 5.])
>>> c
array([[ 0.1, 0.2, 0.3, 0.4, 0.5],
[ 0.2, 0.4, 0.6, 0.8, 1. ]])
>>> d
array([[ 1.10517092, 1.22140276, 1.34985881, 1.4918247 , 1.64872127],
[ 1.22140276, 1.4918247 , 1.8221188 , 2.22554093, 2.71828183]])
>>> e
array([ 6.81697845, 9.47916901])
>>>
I have two arrays:
a = [[a11,a12],
[a21,a22]]
b = [[b11,b12],
[b21,b22]]
What I would like to do is build up a matrix as follows:
xx = np.mean(a[:,0]*b[:,0])
xy = np.mean(a[:,0]*b[:,1])
yx = np.mean(a[:,1]*b[:,0])
yy = np.mean(a[:,1]*b[:,1])
and return an array c such that
c = [[xx,xy],
yx,yy]]
Is there a nice pythonic way to do this in numpy? Because at the moment I have done it by hand, exactly as above, so the dimensions of the output array are coded in by hand, rather than determined as according to the size of the input arrays a and b.
Is there an error in your third element? If, as seems reasonable, you want yx = np.mean(a[:,1]*b[:,0]) instead of yx = np.mean(b[:,1]*a[:,0]), then you can try the following:
a = np.random.rand(2, 2)
b = np.random.rand(2, 2)
>>> c
array([[ 0.26951488, 0.19019219],
[ 0.31008754, 0.1793523 ]])
>>> np.mean(a.T[:, None, :]*b.T, axis=-1)
array([[ 0.26951488, 0.19019219],
[ 0.31008754, 0.1793523 ]])
It will actually be faster to avoid the intermediate array and express your result as a matrix multiplication:
>>> np.dot(a.T, b) / a.shape[0]
array([[ 0.26951488, 0.19019219],
[ 0.31008754, 0.1793523 ]])
I have a list of tuples e.g. like this:
l=[ (2,2,1), (2,4,0), (2,8,0),
(4,2,0), (4,4,1), (4,8,0),
(8,2,0), (8,4,0), (8,8,1) ]
and want to transform it to an numpy array like this (only z values in the matrix, corresponding to the sequence of x, y coordinates, the coordinates should be stored separately) ):
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
I'm posting my solution below, but it's pretty low-level and I think there should be some higher-lever solution for this, either using matplotlib or numpy. Any idea?
One needs this kind of conversion to provide the arrays to matplotlib plotting functions like pcolor, imshow, contour.
It looks like np.unique with the return_inverse option fits the bill. For example,
In [203]: l[:,0]
Out[203]: array([2, 2, 2, 4, 4, 4, 8, 8, 8])
In [204]: np.unique(l[:,0], return_inverse = True)
Out[204]: (array([2, 4, 8]), array([0, 0, 0, 1, 1, 1, 2, 2, 2]))
np.unique returns a 2-tuple. The first array in the 2-tuple is an array of all the unique values in l[:,0]. The second array is the
index values associating values in array([2, 4, 8]) with values in the original array l[:,0]. It also happens to be the rank, since np.unique returns the unique values in sorted order.
import numpy as np
import matplotlib.pyplot as plt
l = np.array([ (2,2,1), (2,4,0), (2,8,0),
(4,2,0), (4,4,1), (4,8,0),
(8,2,0), (8,4,0), (8,8,1) ])
x, xrank = np.unique(l[:,0], return_inverse = True)
y, yrank = np.unique(l[:,1], return_inverse = True)
a = np.zeros((max(xrank)+1, max(yrank)+1))
a[xrank,yrank] = l[:,2]
fig = plt.figure()
ax = plt.subplot(111)
ax.pcolor(x, y, a)
plt.show()
yields
My solution first ranks the x and y values, and then creates the array.
l=[ (2,2,1), (2,4,0), (2,8,0),
(4,2,0), (4,4,1), (4,8,0),
(8,2,0), (8,4,0), (8,8,1) ]
def rankdata_ignoretied(data):
"""ranks data counting all tied values as one"""
# first translate the data values to integeres in increasing order
counter=0
encountered=dict()
for e in sorted(data):
if e not in encountered:
encountered[e]=counter
counter+=1
# then map the original sequence of the data values
result=[encountered[e] for e in data]
return result
x=[e[0] for e in l]
y=[e[1] for e in l]
z=[e[2] for e in l]
xrank=rankdata_ignoretied(x)
yrank=rankdata_ignoretied(y)
import numpy
a=numpy.zeros((max(xrank)+1, max(yrank)+1))
for i in range(len(l)):
a[xrank[i],yrank[i]]=l[i][2]
To use the resulting array for plotting one also needs the original x and y values, e.g.:
ax=plt.subplot(511)
ax.pcolor(sorted(set(x)), sorted(set(y)), a)
Anyone has a better idea of how to achieve this?
I don't understand why you're making this so complex. You can do it simply with:
array([
[cell[2] for cell in row] for row in zip(*[iter(x)] * 3)
])
Or perhaps more readably:
array([
[a[2], b[2], c[2]] for a, b, c in zip(x[0::3], x[1::3], x[2::3])
])
a solution using standard python construct set, list and sorted. if you don't have a lot of pointsit gains in readability even if slower than the numpy solution given by unutbu
l=[ (2,2,1), (2,4,0), (2,8,0),
(4,2,0), (4,4,1), (4,8,0),
(8,2,0), (8,4,0), (8,8,1) ]
#get the ranks of the values for x and y
xi = sorted(list(set( i[0] for i in l )))
yi = sorted(list(set( i[1] for i in l )))
a = np.zeros((len(xi),len(yi)))
#fill the matrix using the list.index
for x,y,v in l:
a[xi.index(x),yi.index(y)]=v
ax=plt.subplot(111)
ax.pcolor(array(xi), array(yi), a)