In Matlab the buffer of matrix is continuous in column . So what about the numpy array of Python. which one is beter between numpy.empty((n,1)) and numpy.empty((1,n))
In numpy you can choose between Fortran-contiguous (along the column, like in Matlab) and C-contiguous (along the row, which is the default in numpy) order, passing the order argument when you create an array, so you have more flexibility.
As #user2357112 already said, for a 1xN or Nx1 array it does not matter, but for a MXN array it does matter and you should be aware of that.
They do different things. One makes an Nx1 array; the other makes a 1xN array. Neither is "better". (In fact, the memory layout will be identical for both arrays, even if you specify column-major storage.)
To answer the question about storage layout, though, numpy defaults to row-major layout, a.k.a. C-contiguous. You can see this clearly reflected in the docs.
Related
I'm currently working on a helper class to transfer data from a Java ND-Array to a Python numpy nd-array. The Java array uses ND4J and I'm able to ascertain shape, stride, and row/column ordering from the ND4J INDArray.
Py4j allows me to natively transmit a bytearray back from the JVM. However, I'm not too familiar with numpy and I don't quite know whether it has preference for row or column ordering and how I can provide shape information if I give it a bytearray representing a 1D array of data.
The closest question I could find was this: Quickest way to convert 1D byte array to 2D numpy array
However, it doesn't tell me much about providing explicit shape information - it only applies to RGB image data.
So my question is, how can I do something like np.array(bytearray, shape) and how can I know numpy's preferred ordering so I can prepare the incoming data?
Edit
Half-answered my question. Looks like numpy does indeed allow for specific ordering via an extra parameter on many of its array creation methods: https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html
Edit 2
Learning more, I need to make sure that the bytearray (converted from byte[]) is the right datatype. It's almost always going to be double, so should I pass a float type or a numpy.float64?
What you can do is
np.array(bytearray).reshape(shape)
where the output of np.array() is a 1D array, which you then reshape to be in the shape you want it to be. Note that reshaping does not change the order in memory, only how your data is viewed.
When linearly iterating through a default C-style NumPy array, the last dimension of your array will iterate the fastest, this means
a[0,0,0]
a[0,0,1]
are next to each other in memory, while
a[0,0,0]
a[0,1,0]
are not. Knowing this you should be able to figure out the shape argument.
Thirdly, dtype=float and dtype=np.float64 are interchangeable, which you can confirm by comparing
print np.arange(1, dtype=float).dtype
print np.arange(1, dtype=np.float64).dtype
I'm trying to understand the differences between what people call matrices and what people call lists within lists.
Are they the same in that, once created, you can do identical things to them (reference elements the same way within them, etc).
Examples:
Making lists within a list:
ListsInLists = [[1,2],[3,4],[5,6]]
Making a multidimensional array:
np.random.rand(3,2)
Stacking arrays to make a matrix:
Array1 = [1,2,3,4]
Array2 = [5,6,7,8]
CompleteArray = vstack((Array1,Array2))
A list of list is very different from a two-dimensional Numpy array.
A list has dynamic size and can hold any type of object, whereas an array has a fixed size and entries of uniform type.
In a list of lists, each sublist can have different sizes. An array has fixed dimensions along each axis.
An array is stored in a contiguous block of memory, whereas the objects in a list can be stored anywhere on the heap.
Numpy arrays are more restrictive, but offer greater performance and memory efficiency. They also provide convenient functions for vectorised mathematical operations.
Internally, a list is represented as an array of pointers that point to arbitrary Python objects. The array uses exponential over-allocation to achieve linear performance when appending repeatedly at the end of the list. A Numpy array on the other hand is typically represented as a C array of numbers.
(This answer does not cover the special case of Numpy object arrays, which can hold any kind of Python object as well. They are rarely used, since they have the restrictions of Numpy arrays, but don't have the performance advantages.)
They are not the same. Arrays are more memory efficient in python than lists, and there are additional functions that can be performed on arrays thanks the to numpy module that you cannot perform on lists.
For calculations, working with arrays in numpy tends to be a lot faster than using built in list functions.
You can read a bit more into it if you want in the answers to this question.
I often use theano.tensor.dimshuffle. Is there an equivalent function for Numpy?
I guess I could do the same via several numpy.swapaxes and numpy.newaxis (for broadcast dimensions), numpy.reshape but is there some simpler or more direct way, just like dimshuffle?
The function numpy.transpose permits any permutation of the axes of an array.
The variety array.T is a special case of this, corresponding to array.transpose() without arguments, which defaults to array.transpose(range(array.ndim)[::-1]).
numpy.swapaxes is numpy.transpose restricted to permutations of two axes.
theano.tensor.dimshuffle essentially corresponds to numpy.transpose, but in addition, it permits the creation of new axes of length 1 for broadcasting, by adding 'x' wherever an axis should be created. In numpy, this can be achieved using a combination of transpose and reshape.
Note that in numpy, care is taken to make transpose return a view on the data whenever possible. In theano this is probably the case, too, but may depend on how the code is optimized.
I build a class with some iteration over coming data. The data are in an array form without use of numpy objects. On my code I often use .append to create another array. At some point I changed one of the big array 1000x2000 to numpy.array. Now I have an error after error. I started to convert all of the arrays into ndarray but comments like .append does not work any more. I start to have a problems with pointing to rows, columns or cells. and have to rebuild all code.
I try to google an answer to the question: "what is and advantage of using ndarray over normal array" I can't find a sensible answer. Can you write when should I start to use ndarrays and if in your practice do you use both of them or stick to one only.
Sorry if the question is a novice level, but I am new to python, just try to move from Matlab and want to understand what are pros and cons. Thanks
NumPy and Python arrays share the property of being efficiently stored in memory.
NumPy arrays can be added together, multiplied by a number, you can calculate, say, the sine of all their values in one function call, etc. As HYRY pointed out, they can also have more than one dimension. You cannot do this with Python arrays.
On the other hand, Python arrays can indeed be appended to. Note that NumPy arrays can however be concatenated together (hstack(), vstack(),…). That said, NumPy arrays are mostly meant to have a fixed number of elements.
It is common to first build a list (or a Python array) of values iteratively and then convert it to a NumPy array (with numpy.array(), or, more efficiently, with numpy.frombuffer(), as HYRY mentioned): this allows mathematical operations on arrays (or matrices) to be performed very conveniently (simple syntax for complex operations). Alternatively, numpy.fromiter() might be used to construct the array from an iterator. Or loadtxt() to construct it from a text file.
There are at least two main reasons for using NumPy arrays:
NumPy arrays require less space than Python lists. So you can deal with more data in a NumPy array (in-memory) than you can with Python lists.
NumPy arrays have a vast library of functions and methods unavailable
to Python lists or Python arrays.
Yes, you can not simply convert lists to NumPy arrays and expect your code to continue to work. The methods are different, the bool semantics are different. For the best performance, even the algorithm may need to change.
However, if you are looking for a Python replacement for Matlab, you will definitely find uses for NumPy. It is worth learning.
array.array can change size dynamically. If you are collecting data from some source, it's better to use array.array. But array.array is only one dimension, and there is no calculation functions to do with it. So, when you want to do some calculation with your data, convert it to numpy.ndarray, and use functions in numpy.
numpy.frombuffer can create a numpy.ndarray that shares the same data buffer with array.array objects, it's fast because it don't need to copy the data.
Here is a demo:
import numpy as np
import array
import random
a = array.array("d")
# a for loop that collects 10 channels data
for x in range(100):
a.extend([random.random() for _ in xrange(10)])
# create a ndarray that share the same data buffer with array a, and reshape it to 2D
na = np.frombuffer(a, dtype=float).reshape(-1, 10)
# call some numpy function to do the calculation
np.sum(na, axis=0)
Another great advantage of using NumPy arrays over built-in lists is the fact that NumPy has a C API that allows native C and C++ code to access NumPy arrays directly. Hence, many Python libraries written in low-level languages like C are expecting you to work with NumPy arrays instead of Python lists.
Reference: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
I am using quite a lot of fortran libraries to do some mathematical computation. So all the arrays in numpy need to be Fortran-contiguous.
Currently I accomplish this with numpy.asfortranarray().
My questions are:
Is this a fast way of telling numpy that the array should be stored in fortran style or is there a faster one?
Is there the possibility to set some numpy flag, so that every array that is created is in fortran style?
Use optional argument order='F' (default 'C'), when generating numpy.array objects. This is the way I do it, probably does the same thing that you are doing. About number 2, I am not aware of setting default order, but it's easy enough to just include order optional argument when generating arrays.
Regarding question 2: you may be concerned about retaining Fortran ordering after performing array transformations and operations. I had a similar issue with endianness. I loaded a big-endian raw array from file, but when I applied a log transformation, the resultant array would be little-endian. I got around the problem by first allocating a second big-endian array, then performing an in-place log:
b=np.zeros(a.shape,dtype=a.dtype)
np.log10(1+100*a,b)
In your case you would allocate b with Fortran ordering.