I am trying to solve a "very simple" problem. Not so simple in Python. Given a large matrix A and another smaller matrix B I want to substitute certain elements of A with B.
In Matlab is would look like this:
Given A, row_coord = [1,5,6] col_coord = [2,4], and a matrix B of size(3X2), A[row_coord, col_coord] = B
In Python I tried to use product(row_coord, col_coord) from the itertools to generate the set of all indexes that need to be accessible in A but it does not work. All examples on submatrix substitution refer to block-wise row_coord = col_coord examples. Nothing concrete except for the http://comments.gmane.org/gmane.comp.python.numeric.general/11912 seems to relate to the problem that I am facing and the code in the link does not work.
Note: I know that I can implement what I need via the double for-loop, but on my data such a loop adds 9 secs to the run of one iteration and I am looking for a faster way to implement this.
Any help will be greatly appreciated.
Assuming you're using numpy arrays then (in the case where your B is a scalar) the following code should work to assign the chosen elements to the value of B.
itertools.product will create all of the coordinate pairs which we then convert into a numpy array and use in indexing your original array:
import numpy as np
from itertools import product
A = np.zeros([20,20])
col_coord = [0,1,3]
row_coord = [1,2]
coords = np.array(list(product(row_coord, col_coord)))
B = 1
A[coords[:,0], coords[:,1]] = B
I used this excellent answer by unutbu to work out how to do the indexing.
Related
I have two Arrays (which are very big), which have the same dimensions. In one of the both Arrays (in my code it's called "minMap") i want to save the smaller value of those both arrays
My current code looks like this:
for y in range(loadedMap.shape[1]):
for x in range(loadedMap.shape[0]):
if loadedMap[x][y] < minMap[x][y]:
minMap[x][y] = loadedMap[x][y]
It's working but im pretty sure it's a dumb solution, because I haven't used any numpy functionality. Maybe a solution with vectorization is faster? I didn't know how to do that :/
(Sorry for my bad english)
There is a function np.minimum() which does exactly what you want:
# a and b are 2 arrays with same shape
c = np.minimum(a, b)
# c will contain minimum values from a and b
For example, say that I have a list in which the the index of the list is a particular cell number and the value for that index is the temperature of that particular cell. Let’s say that the list looks like this:
fine_mesh = [600,625,650,675,700,725,750,775,800,825]
Then, let’s say that we want to create a coarser mesh, by reducing the number of cells in the mesh by a factor of 2, so we want to take the average temperature of sequential groups of two cells. The factor that the finer mesh is reduced to a coarser mesh by could be any number though.
So in this example case,
coarse_mesh = [612.5,662.5,712.5,762.5,812.5]
What is the fastest way to do this with python? Speed matters because there could be hundreds of thousands of cells. It is OK to use open source libraries like numpy.
Thanks in advance! :)
Using numpy you can vectorize addition (and multiplication, etc) and you can use slices so you can do the following
import numpy as np
# ... snip ...
fine_mesh = np.array(fine_mesh)
coarse_mesh = 0.5 * (fine_mesh[::2] + fine_mesh[1::2])
Since it's numpy it'll likely be quicker than a list comprehension.
You could use a list comprehension:
fine_mesh=[600,625,650,675,700,725,750,775,800,825]
coarse_mesh = [(a + b) / 2 for a, b in zip(fine_mesh[::2], fine_mesh[1::2])]
print(coarse_mesh)
Or if you prefer numpy, you could use numpy.mean:
import numpy as np
fine_mesh=[600,625,650,675,700,725,750,775,800,825]
coarse_mesh = np.mean(np.array(fine_mesh).reshape(-1, 2), 1)
print(coarse_mesh.tolist())
Output
[612.5, 662.5, 712.5, 762.5, 812.5]
I am having a small issue understanding indexing in Numpy arrays. I think a simplified example is best to get an idea of what I am trying to do.
So first I create an array of zeros of the size I want to fill:
x = range(0,10,2)
y = range(0,10,2)
a = zeros(len(x),len(y))
so that will give me an array of zeros that will be 5X5. Now, I want to fill the array with a rather complicated function that I can't get to work with grids. My problem is that I'd like to iterate as:
for i in xrange(0,10,2):
for j in xrange(0,10,2):
.........
"do function and fill the array corresponding to (i,j)"
however, right now what I would like to be a[2,10] is a function of 2 and 10 but instead the index for a function of 2 and 10 would be a[1,4] or whatever.
Again, maybe this is elementary, I've gone over the docs and find myself at a loss.
EDIT:
In the end I vectorized as much as possible and wrote the simulation loops that I could not in Cython. Further I used Joblib to Parallelize the operation. I stored the results in a list because an array was not filling right when running in Parallel. I then used Itertools to split the list into individual results and Pandas to organize the results.
Thank you for all the help
Some tips for your to get the things done keeping a good performance:
- avoid Python `for` loops
- create a function that can deal with vectorized inputs
Example:
def f(xs, ys)
return x**2 + y**2 + x*y
where you can pass xs and ys as arrays and the operation will be done element-wise:
xs = np.random.random((100,200))
ys = np.random.random((100,200))
f(xs,ys)
You should read more about numpy broadcasting to get a better understanding about how the arrays's operations work. This will help you to design a function that can handle properly the arrays.
First, you lack some parenthesis with zeros, the first argument should be a tuple :
a = zeros((len(x),len(y)))
Then, the corresponding indices for your table are i/2 and j/2 :
for i in xrange(0,10,2):
for j in xrange(0,10,2):
# do function and fill the array corresponding to (i,j)
a[i/2, j/2] = 1
But I second Saullo Castro, you should try to vectorize your computations.
Say I have a list of vectors
VectorList = [[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]]
and say I have a vector x
x = [0,3,0,1]
then x can in this case be written as a linear combination of the other vectors:
[0,3,0,1] = a*[1,0,0,0] + b*[0,1,0,0] + c*[0,0,1,0] + d*[0,0,0,1]
where a = 0, b = 3, c = 0 and d = 1
is there a general way in Python to check IF a vector can be written as a linear combination of a list of vectors and find the scalar-multiples of this?
I am a beginner in python and I can do it intuitively... But how would one solve this in Python? I have worked on this problem for so many hours
Remember, the list of vectors might not always be as clear cut as this.
So you want to solve a linear system of equations? I would suggest having a look at numpy.matrix and numpy linalg. Numpy has a lot of useful math stuff in it :).
EDIT:
It looks like the function you want is numpy.linalg.solve(a,b) (where b is x in your case) which will throw a LinAlgError if no solution can be found.
I would like to apply a function to a monodimensional array 3 elements at a time, and output for each of them a single element.
for example I have an array of 13 elements:
a = np.arange(13)**2
and I want to apply a function, let's say np.std as an example.
Here is the equivalent list comprehension:
[np.std(a[i:i+3]) for i in range(0, len(a),3)]
[1.6996731711975948,
6.5489609014628334,
11.440668201153674,
16.336734339790461,
0.0]
does anyone know a more efficient way using numpy functions?
The simplest way is to reshape it and apply the function along an axis.
import numpy as np
a = np.arange(12)**2
b = a.reshape(4,3)
print np.std(b, axis=1)
If you need a little better performance than that, you could try stride_tricks. Below is the same as above except using stride_tricks. I was wrong about the performance gain, because as you can see below, b becomes exactly the same view as b above. I wouldn't be surprised if they compiled to exactly the same thing.
import numpy as np
a = np.arange(12)**2
b = np.lib.stride_tricks.as_strided(a, shape=(4,3), strides=(a.itemsize*3, a.itemsize))
print np.std(b, axis=1)
Are you talking about something like vectorize? http://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html
You can reshape it. But that does require that the size not change. If you can tack on some bogus entries at the end you can do this:
[np.std(s) for s in a.reshape(-1,3)]