np.around for array with none and integer values - python

I have an array:
MDP= [[0.705,.655,0.614,0.388],[0.762,None,0.660,-1],[0.812,.868,0.918,+1]]
How can I apply np.around on above array without getting the error for None and -1, +1 values?
TIA

Make sure that you work with a numpy array, not lists of lists:
np.around(np.array(MDP).astype(float))
#array([[ 1., 1., 1., 0.],
# [ 1., nan, 1., -1.],
# [ 1., 1., 1., 1.]])
You can convert the result back to a nested list with .tolist(), if needed.

My solution is to make an exception when the values inside the array are of NoneType. This can be done pretty elegantly via a lambda function.
If your array is 1D:
flex_round = lambda array: [None if x == None else np.round(x) for x in array]
If your array is 2D:
flex_round = lambda array: [[None if x == None else np.round(x) for x in y] for y in array]
Do not forget to add the decimals argument to the np.roud call to precise how many digits should remain after the comma.

Related

How to stack numpy arrays alternately/slicewise along a specific axis?

How can I stack arrays in an alternating fashion? Consider the following example with three arrays:
import numpy as np
one = np.ones((5, 2, 2))
two = np.ones((5, 2, 2))*2
three = np.ones((5, 2, 2))*3
I would like to create a new array result with shape (15, 2, 2) which is formed by alternately taking a slice from each of the given arrays, i.e. the result should look like:
result[0] = one[0]
result[1] = two[0]
result[2] = three[0]
result[3] = one[1]
result[4] = two[1]
result[5] = three[1]
result[6] = one[2]
etc...
The arrays above are just an example to illustrate the question, I am not looking for a way to create this specific result array. What is the easiest way to achieve this, at best with specifying a stacking axis?
Of course, it is possible to do some loops but it seems rather inconvenient...
You may wanne take a look at np.stack() i.e.:
np.stack([one, two, three], axis=1).reshape(15, 2, 2)
With np.hstack and then reshape (with -1 for the first axis appended with the lengths along last two axes for a generic solution) -
np.hstack([one,two,three]).reshape((-1,)+one.shape[1:])
I think you are looking for np.vstack
np.vstack((one,two,three))
Read more about it here np.vstack
With selectable axis:
# example arrays
a,b,c = np.multiply.outer([1,2,3],np.ones((5,2,2)))
# axis
k = 1
np.stack([a,b,c],k+1).reshape(*(-(k==j) or s for j,s in enumerate(a.shape)))
# array([[[1., 1.],
# [2., 2.],
# [3., 3.],
# [1., 1.],
# [2., 2.],
# [3., 3.]],
#
# [[1., 1.],
...

How to copy list of tuples to EXISTING numpy array in shared memory

I have a structured numpy array in shared memory, that's only one "layer" of a higher dimensional array.
And I have a list of tuples whose values I want to copy to this (sub) array.
I've found how to make a new numpy structured array out of a list of tuples.
But I can't find out how to convert this list of tuples to an EXISTING numpy (sub) array.
The sizes already match, of course.
Of course I can copy elementwise in a Python for-loop, but this seems awfully inefficient. I'd like the looping to be done in the C++ that underlies numpy.
Explanation: The reason my array is in shared memory is that I use this as a common datatructure with a C++ process, guarded by mutex semaphores.
My list of tuples looks like:
[(25141156064, 5.3647, 221.32287846), (25141157138, 5.3647, 73.70348602), (25141155120, 5.3646, 27.77147382), (25141160388, 5.3643, 55.5000024), (25141160943, 5.3636, 166.49511561), (25141154452, 5.3578, 92), (25141154824, 5.3539, 37.22246003), (25141155187, 5.3504, 37.22246003), (25141157611, 5.34, 915), (25141157598, 5.3329, 1047.32982582), (25140831246, 5.3053, 915), (25141165780, 5.2915, 2000), (25141165781, 5.2512, 2000), (25140818946, 5.2483, 915), (25138992274, 5.1688, 458), (25121724934, 5.1542, 458), (25121034787, 4.8993, 3.47518861), (24402133353, 2.35, 341), (24859679064, 0.8, 1931.25), (24046377720, 0.5, 100), (25141166091, 5.3783, -650.51242432), (25141165779, 5.3784, -1794.28608778), (25141157632, 5.3814, -2000), (25141157601, 5.3836, -2000), (25141164181, 5.3846, -499.65636506), (25141164476, 5.4025, -91), (25141157766, 5.4026, -634.80061236), (25141153364, 5.4034, -2000), (25141107806, 5.4035, -1601.88882309), (25141157694, 5.4136, -1047.32982582), (25141148874, 5.4278, -266), (25141078136, 5.4279, -48.4864096), (25141165317, 5.4283, -2000), (25141097109, 5.4284, -914), (25141110492, 5.4344, -774.75614589), (25141110970, 5.4502, -928.32048159), (25141166045, 5.4527, -2000), (25141166041, 5.493, -2000), (25139832350, 5.5, -10.2273)]
My numpy array has elements that are defined as follows:
Id = np.uint64
Price = np.float64
Amount = np.float64
Quotation = np.dtype ([
('id', Id),
('price', Price),
('amount', Amount),
])
self._contents = np.ndarray (
shape = (
maxNrOfMarkets,
maxNrOfItemKindsPerMarket,
maxNrOfQuotationsPerItemKind
)
dtype = Quotation,
buffer = self.sharedMemory.buf,
offset = offset
)
Same way you'd do it if the array wasn't backed by shared memory. Just make sure you synchronize access properly.
your_array[:] = your_list
Say you have an array of shape (list_length, tuples_length).
Is this what you're looking for?
my_sub_array[:] = my_list_of_tuples
As an example :
my_sub_array = np.zeros((5, 3))
my_list_of_tuples = [(i, i + 1, i + 2) for i in range(5)]
my_sub_array
array([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
my_sub_array[:] = my_list_of_tuples
my_sub_array
array([[0., 1., 2.],
[1., 2., 3.],
[2., 3., 4.],
[3., 4., 5.],
[4., 5., 6.]])

numpy concatenate not appending new array to empty multidimensional array

I bet I am doing something very simple wrong. I want to start with an empty 2D numpy array and append arrays to it (with dimensions 1 row by 4 columns).
open_cost_mat_train = np.matrix([])
for i in xrange(10):
open_cost_mat = np.array([i,0,0,0])
open_cost_mat_train = np.vstack([open_cost_mat_train,open_cost_mat])
my error trace is:
File "/Users/me/anaconda/lib/python2.7/site-packages/numpy/core/shape_base.py", line 230, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
What am I doing wrong? I have tried append, concatenate, defining the empty 2D array as [[]], as [], array([]) and many others.
You need to reshape your original matrix so that the number of columns match the appended arrays:
open_cost_mat_train = np.matrix([]).reshape((0,4))
After which, it gives:
open_cost_mat_train
# matrix([[ 0., 0., 0., 0.],
# [ 1., 0., 0., 0.],
# [ 2., 0., 0., 0.],
# [ 3., 0., 0., 0.],
# [ 4., 0., 0., 0.],
# [ 5., 0., 0., 0.],
# [ 6., 0., 0., 0.],
# [ 7., 0., 0., 0.],
# [ 8., 0., 0., 0.],
# [ 9., 0., 0., 0.]])
If open_cost_mat_train is large I would encourage you to replace the for loop by a vectorized algorithm. I will use the following funtions to show how efficiency is improved by vectorizing loops:
def fvstack():
import numpy as np
np.random.seed(100)
ocmt = np.matrix([]).reshape((0, 4))
for i in xrange(10):
x = np.random.random()
ocm = np.array([x, x + 1, 10*x, x/10])
ocmt = np.vstack([ocmt, ocm])
return ocmt
def fshape():
import numpy as np
from numpy.matlib import empty
np.random.seed(100)
ocmt = empty((10, 4))
for i in xrange(ocmt.shape[0]):
ocmt[i, 0] = np.random.random()
ocmt[:, 1] = ocmt[:, 0] + 1
ocmt[:, 2] = 10*ocmt[:, 0]
ocmt[:, 3] = ocmt[:, 0]/10
return ocmt
I've assumed that the values that populate the first column of ocmt (shorthand for open_cost_mat_train) are obtained from a for loop, and the remaining columns are a function of the first column, as stated in your comments to my original answer. As real costs data are not available, in the forthcoming example the values in the first column are random numbers, and the second, third and fourth columns are the functions x + 1, 10*x and x/10, respectively, where x is the corresponding value in the first column.
In [594]: fvstack()
Out[594]:
matrix([[ 5.43404942e-01, 1.54340494e+00, 5.43404942e+00, 5.43404942e-02],
[ 2.78369385e-01, 1.27836939e+00, 2.78369385e+00, 2.78369385e-02],
[ 4.24517591e-01, 1.42451759e+00, 4.24517591e+00, 4.24517591e-02],
[ 8.44776132e-01, 1.84477613e+00, 8.44776132e+00, 8.44776132e-02],
[ 4.71885619e-03, 1.00471886e+00, 4.71885619e-02, 4.71885619e-04],
[ 1.21569121e-01, 1.12156912e+00, 1.21569121e+00, 1.21569121e-02],
[ 6.70749085e-01, 1.67074908e+00, 6.70749085e+00, 6.70749085e-02],
[ 8.25852755e-01, 1.82585276e+00, 8.25852755e+00, 8.25852755e-02],
[ 1.36706590e-01, 1.13670659e+00, 1.36706590e+00, 1.36706590e-02],
[ 5.75093329e-01, 1.57509333e+00, 5.75093329e+00, 5.75093329e-02]])
In [595]: np.allclose(fvstack(), fshape())
Out[595]: True
In order for the calls to fvstack() and fshape() produce the same results, the random number generator is initialized in both functions through np.random.seed(100). Notice that the equality test has been performed using numpy.allclose instead of fvstack() == fshape() to avoid the round off errors associated to floating point artihmetic.
As for efficiency, the following interactive session shows that initializing ocmt with its final shape is significantly faster than repeatedly stacking rows:
In [596]: import timeit
In [597]: timeit.timeit('fvstack()', setup="from __main__ import fvstack", number=10000)
Out[597]: 1.4884241055042366
In [598]: timeit.timeit('fshape()', setup="from __main__ import fshape", number=10000)
Out[598]: 0.8819408006311278

Split a multidimensional numpy array using a condition

I have a multidimensional numpy array.
The first array indicates the quality of the data. 0 is good, 1 is not so good.
For a first check I only want to use good data.
How do I split the array into two new ones?
My own idea does not work:
good_data = [x for x in data[0,:] if x = 1.0]
bad_data = [x for x in data[0,:] if x = 0.0]
Here is a small example indicating my problem:
import numpy as np
flag = np.array([0., 0., 0., 1., 1., 1.])
temp = np.array([300., 310., 320., 300., 222., 333.])
pressure = np.array([1013., 1013., 1013., 900., 900., 900.])
data = np.array([flag, temp, pressure])
good_data = data[0,:][data[0,:] == 1.0]
bad_data = data[0,:][data[0,:] == 0.0]
print good_data
The print statement gives me [1., 1., 1.].
But I am looking for [[1., 1., 1.], [300., 222., 333.], [900., 900., 900.]].
Is this what you are looking for?
good_data = data[0,:][data[0,:] == 1.0]
bad_data = data[0,:][data[0,:] == 0.0]
This returns a numpy.array.
Alternatively, you can do as you suggested, but convert the resulting list to numpy.array:
good_data = np.array([x for x in data[0,:] if x == 1.0])
Notice the comparison operator == in place of the assignment operator =.
For your particular example, subset data using flag == 1 while iterating over the first index:
good_data = [data[n,:][flag == 1] for n in range(data.shape[0])]
If you really want the elements of good_data to be lists, convert inside the comprehension:
good_data = [data[n,:][flag == 1].tolist() for n in range(data.shape[0])]
Thanks to Jaime who pointed out that the easy way to do this is:
good_data = data[:, data[0] == 1]

Python: An elegant/efficient way to evaluate function over bi-dimensional indexes?

I am very new to Python (in the past I used Mathematica, Maple, or Matlab scripts). I am very impressed how NumPy can evaluate functions over arrays but having problems trying to implement it in several dimensions. My question is very simple (please don't laugh): is there a more elegant and efficient way to evaluate some function f (which is defined over R^2) without using loops?
import numpy
M=numpy.zeros((10,10))
for i in range(0,10):
for j in range(0,10):
M[i,j]=f(i,j)
return M
The goal when coding with numpy is to implement your computation on the whole array, as much as possible. So if your function is, for example, f(x,y) = x**2 +2*y and you want to apply it to all integer pairs x,y in [0,10]x[0,10], do:
x,y = np.mgrid[0:10, 0:10]
fxy = x**2 + 2*y
If you don't find a way to express your function in such a way, then:
Ask how to do it (and state explicitly the function definition)
use numpy.vectorize
Same example using vectorize:
def f(x,y): return x**2 + 2*y
x,y = np.mgrid[0:10, 0:10]
fxy = np.vectorize(f)(x.ravel(),y.ravel()).reshape(x.shape)
Note that in practice I only use vectorize similarly to python map when the content of the arrays are not numbers. A typical example is to compute the length of all list in an array of lists:
# construct a sample list of lists
list_of_lists = np.array([range(i) for i in range(1000)])
print np.vectorize(len)(list_of_lists)
# [0,1 ... 998,999]
Yes, many numpy functions operate on N-dimensional arrays. Take this example:
>>> M = numpy.zeros((3,3))
>>> M[0][0] = 1
>>> M[2][2] = 1
>>> M
array([[ 1., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 1.]])
>>> M > 0.5
array([[ True, False, False],
[False, False, False],
[False, False, True]], dtype=bool)
>>> numpy.sum(M)
2.0
Note the difference between numpy.sum, which operates on N-dimensional arrays, and sum, which only goes 1 level deep:
>>> sum(M)
array([ 1., 0., 1.])
So if you build your function f() out of operations that work on n-dimensional arrays, then f() itself will work on n-dimensional arrays.
You can also use numpy multi-dimension slicing, like below. You just provide slices for each dimension:
arr = np.zeros((5,5)) # 5 rows, 5 columns
# update only first column
arr[:,0] = 1
# update only last row ... same as arr[-1] = 1
arr[-1,:] = 1
# update center
arr[1:-1, 1:-1] = 1
print arr
output:
array([[ 1., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 0.],
[ 1., 1., 1., 1., 1.]])
A pure python answer, not depending upon numpy tools, is to make the Cartesian Product of two sequences:
from itertools import product
for i, j in product(range(0, 10), range(0, 10)):
M[i,j]=f(i,j)
Edit: Actually, I should have read the question properly. This still uses loops, just one less loop.

Categories

Resources