Coming from Matlab I am unable to even think of singular datapoints / variables. Anything I deal with is a matrix / array. After one week of searching and insuccesful trial and error I realise, that I ABSOLUTELY do NOT get the concept of dealing with matrices in (plain) Python.
I created
In[]: A = [[1,2,3], [9,8,7], [5,5,5]]
In[]: A
Out[]: [[1, 2, 3], [9, 8, 7], [5, 5, 5]]
Trying to extract the vectors in the matrix along the two dimensions:
In[]: A[:][1]
Out[]: [9, 8, 7]
In[]: A[1][:]
Out[]: [9, 8, 7]
'surprisingly' gives the same! No way to get a specific column (of course, except with one by one iteration).
Consequently, I am unable to manage merging matrix A with another vector, i.e. extending A with another column. Matlab style approach obviously is odd:
In[]: B = A, [4,6,8]
In[]: B
Out[]: ([[1, 2, 3], [9, 8, 7], [5, 5, 5]], [4, 6, 8])
Results in something nested, not an extension of A.
Same for
B = [A, [4,6,8]]
Ok, more Python-like:
A.append([11,12,13])
This easily adds a row. But is there a similar way to add a column??
(The frustrating thing is that Python doc gives all kinds of fancy examples but apparently these focus on demonstrating 'pythonic' solutions for one-dimensional lists.)
Coming from MATLAB myself, I understand your point.
The problem is that Python lists are not designed to serve as matrices. When indexing a list, you always work on the top level list elements, e.g. A[:][1] returns all the ([:]) three list elements, namely [1, 2, 3], [9, 8, 7] and [5, 5, 5]. Then you select the second ([1]) element from those, i.e. [9, 8, 7]. A[1][:] does the same, just the other way round.
This being said, you can still use nested lists for simple indexing tasks, as A[1][1] gives the expected result (8). However, if you are planing to migrate your whole MATLAB code to Python or work on non-trivial matrix problems, you should definitely consider using NumPy. There is even a NumPy guide for former MATLAB users.
Related
If you do e.g. the following:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15]])
print(a[2:10])
Python won't complain and prints the array as in a[2:] which would be great in my usecase. I want to loop through a large array and slice it into equally sized chunks until the array is "used up". The last array can thus be smaller than the rest which doesn't matter to me.
However: I'm concerned about security leaks, performance leaks, the possibility for this behaviour to become deprecated in the near future, etc.. Is it safe and intended to use slicing like this or should it be avoided and I have to go the extra mile to make sure the last chunk is sliced as a[2:] or a[2:len(a)]?
There are related Answers like this but I haven't found anything addressing my concerns
Slice resolution is not done in numpy. slice objects have a convenience method called indices method, which is only documented in the C API under PySlice_GetIndices. In fact the python documentation states that they have no functionality besides storing indices.
When you run a[2:10], the slice object is slice(2, 10), and the length of the axis is a.shape[0] == 5:
>>> slice(2, 10).indices(5)
(2, 5, 1)
This is builtin python behavior, at a lower level than numpy. The linked question has an example of getting an error for the corresponding index:
>>> a[np.arange(2, 10)]
In this case, the passed object is not a slice, so it does get handled by numpy, and raises an error:
IndexError: index 5 is out of bounds for axis 0 with size 5
This is the same error that you would get if you tried accessing the invalid index individually:
>>> a[5]
...
IndexError: index 5 is out of bounds for axis 0 with size 5
Incidentally, python lists and tuples will check the bounds on a scalar index as well:
>>> a.tolist()[5]
...
IndexError: list index out of range
You can implement your own bounds checking, for example to create a fancy index using slice.indices:
>>> a[np.arange(*slice(2, 10).indices(a.shape[0]))]
array([[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15]])
I'm new to Python and I need a dynamic matrix that I can manipulate adding more columns and rows to it. I read about numpy.matrix, but I can't find a method in there that does what I mentioned above. It occurred to me to use lists but I want to know if there is a simpler way to do it or a better implementation.
Example of what I look for:
matrix.addrow ()
matrix.addcolumn ()
matrix.changeValue (0, 0, "$200")
Am I asking for too much? If so, any ideas of how to implement something like that? Thanks!
You can do all of that in numpy (np.concatenate for example) or native python (my_list.append()). Which one is more efficient will depend on what else your program will do: numpy will be probably less efficient if all you are doing is adding / changing values one at a time, or do a lot of column 'adding' or 'removing'. However if you do matrix or column operations, the overhead of adding new columns to a numpy array maybe offset by the vectorized computation speed offered by numpy. So pick which ever you prefer, and if speed is an issue, then you need to experiment yourself with both approaches...
There are several ways to represent matrices in Python. You can use List of lists or numpy arrays. For example if you were to use numpy arrays
>>> import numpy as np
>>> a = np.array([[1,2,3], [2,3,4]])
>>> a
array([[1, 2, 3],
[2, 3, 4]])
To add a row
>>> np.vstack([a, [7,8,9]])
array([[1, 2, 3],
[2, 3, 4],
[7, 8, 9]])
To add a column
>>> np.hstack((a, [[7],[8]]))
array([[1, 2, 3, 7],
[2, 3, 4, 8]])
I would like to take a matrix and modify blocks of it. For example, with a 4x4 matrix the {1,2},{1,2} block is to the top left quadrant ([0,1;4,5] below). The {4,1},{4,1} block is the top left quadrant if we rearrange the matrix so the 4th row/column is in position 1 and the 1st in position 2.
Let's made such a 4x4 matrix:
a = np.arange(16).reshape(4, 4)
print(a)
## [[ 0 1 2 3]
## [ 4 5 6 7]
## [ 8 9 10 11]
## [12 13 14 15]]
Now one way of selecting the block, where I specify which rows/columns I want beforehand, is as follows:
C=[3,0]
a[[[C[0],C[0]],[C[1],C[1]]],[[C[0],C[1]],[C[0],C[1]]]]
## array([[15, 12],
## [ 3, 0]])
Here's another way:
a[C,:][:,C]
## array([[15, 12],
## [ 3, 0]])
Yet, if I have a 2x2 array, call it b, setting
a[C,:][:,C]=b
doesn't work but
a[[[C[0],C[0]],[C[1],C[1]]],[[C[0],C[1]],[C[0],C[1]]]]=b
does.
Why is this? And is this second way the most efficient possible? Thanks!
The relevant section from the numpy docs is
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#purely-integer-array-indexing
Advanced array indexing.
Adapting that example to your case:
In [213]: rows=np.array([[C[0],C[0]],[C[1],C[1]]])
In [214]: cols=np.array([[C[0],C[1]],[C[0],C[1]]])
In [215]: rows
array([[3, 3],
[0, 0]])
In [216]: cols
array([[3, 0],
[3, 0]])
In [217]: a[rows,cols]
array([[15, 12],
[ 3, 0]])
due to broadcasting, you don't need to repeat duplicate indices, thus:
a[[[3],[0]],[3,0]]
does just fine. np.ix_ is a convenience function to produce just such a pair:
np.ix_(C,C)
(array([[3],
[0]]),
array([[3, 0]]))
thus a short answer is:
a[np.ix_(C,C)]
A related function is meshgrid, which constructs full indexing arrays:
a[np.meshgrid(C,C,indexing='ij')]
np.meshgrid(C,C,indexing='ij') is the same as your [rows, cols]. See the functions doc for the significance of the 'ij' parameter.
np.meshgrid(C,C,indexing='ij',sparse=True) produces the same pair of arrays as np.ix_.
I don't think there's a serious difference in computational speed. Obviously some require less typing on your part.
a[:,C][C,:] works for viewing values, but not for modifying them. The details have to do with which actions make views and which make copies. The simple answer is, use only one layer of indexing if you want to modify values.
The indexing documentation:
Thus, x[ind1,...,ind2,:] acts like x[ind1][...,ind2,:] under basic slicing.
Thus a[1][3] += 7 works. But the doc also warns
Warning
The above is not true for advanced indexing.
What is the difference between vectorize and frompyfunc in numpy?
Both seem very similar. What is a typical use case for each of them?
Edit: As JoshAdel indicates, the class vectorize seems to be built upon frompyfunc. (see the source). It is still unclear to me whether frompyfunc may have any use case that is not covered by vectorize...
As JoshAdel points out, vectorize wraps frompyfunc. Vectorize adds extra features:
Copies the docstring from the original function
Allows you to exclude an argument from broadcasting rules.
Returns an array of the correct dtype instead of dtype=object
Edit: After some brief benchmarking, I find that vectorize is significantly slower (~50%) than frompyfunc for large arrays. If performance is critical in your application, benchmark your use-case first.
`
>>> a = numpy.indices((3,3)).sum(0)
>>> print a, a.dtype
[[0 1 2]
[1 2 3]
[2 3 4]] int32
>>> def f(x,y):
"""Returns 2 times x plus y"""
return 2*x+y
>>> f_vectorize = numpy.vectorize(f)
>>> f_frompyfunc = numpy.frompyfunc(f, 2, 1)
>>> f_vectorize.__doc__
'Returns 2 times x plus y'
>>> f_frompyfunc.__doc__
'f (vectorized)(x1, x2[, out])\n\ndynamic ufunc based on a python function'
>>> f_vectorize(a,2)
array([[ 2, 4, 6],
[ 4, 6, 8],
[ 6, 8, 10]])
>>> f_frompyfunc(a,2)
array([[2, 4, 6],
[4, 6, 8],
[6, 8, 10]], dtype=object)
`
I'm not sure what the different use cases for each is, but if you look at the source code (/numpy/lib/function_base.py), you'll see that vectorize wraps frompyfunc. My reading of the code is mostly that vectorize is doing proper handling of the input arguments. There might be particular instances where you would prefer one vs the other, but it would seem that frompyfunc is just a lower level instance of vectorize.
Although both methods provide you a way to build your own ufunc, numpy.frompyfunc method always returns a python object, while you could specify a return type when using numpy.vectorize method
Good day to all.
Help me please to understand theory of function scipy.ndimage.convolve for 1D arrays. I know the formula from http://lagrange.univ-lyon1.fr/docs/scipy/0.17.1/generated/scipy.ndimage.convolve.html
C_i = \sum_j{I_{i+j-k} W_j},
but i can't understand, how can I get results manually.
For example: test_1 = scipy.ndimage.convolve([1, 2, 3], [1, 2, 3, 4, 5])
result is [24 24 30]
Or test_2 = scipy.ndimage.convolve([1, 2, 3], [3, 4, 5])
result is [15 22 31]
If I write here all attempts that I have made, it will take a lot of space.
Give me please step by step instructions on what to do with these examples manually.
Two tricky things going on here
1) the ndimage has this flag called "mode" which is set to "reflect" by default
2) two is that convolutions internally reverse one of the inputs
try comparing this piece of code
scipy.ndimage.convolve([1, 2, 3][::-1], [1, 2, 3, 4, 5],mode='constant')
to your by hand solution. (get rid of the "[::-1]" if you've already accounted for the reversal)