How do I construct a tuple in Cython?

How do I construct a tuple in Cython? - python

I am new to cython and I am just looking for an easy way of casting a numpy array to a tuple that can then be added to and/or looked up in a dictionary.
In CPython, I can use PyTuple_New and iterate over the values of the array (adding each one to the tuple as though I were appending them to a list).
Cython does not seem to come with the usual CPython functions. How might I turn an array:
array([1,2,3])
into a tuple:
(1, 2, 3)

Cython is a superset of Python so any valid Python code is a valid Cython code. In this case, if you have a NumPy array, just passing it to a tuple class constructor should work just fine (just as you would do in regular Python).
a = np.array([1, 2, 3])
t = tuple(a)
Cython will take care of converting these constructs to appropriate C function calls.

Related

Cython - Declare List from Input

I am trying to convert a list of objects (GeoJSON) to shapely objects using cython, but I am running into a error:
This piece of code seems to be the issue: cdef object result[N]. How do I declare a list/array from a given list?
Here is my current code:
def convert_geoms(list array):
cdef int i, N=len(array)
cdef double x, s=0.0
cdef object result[N] # ERROR HERE
for i in range(N):
g = build_geometry_objects2(array[i])
result[i] = g
return result

There's two issues with cdef object result[N]:
It creates a C array of Python objects - this doesn't really work because C arrays aren't easily integrated with the Python object reference counting (in this you'd need to copy the whole array to something else when you returned it anyway, since it's a local variable that's only scoped to the function).
For C arrays of the form sometype result[N], N must be known at compile-time. In this case N is different for each function call, so the variable definition is invalid anyway.
There's multiple solutions - most of them involve just accepting that you're using Python objects so not worrying about specifying the types and just writing valid Python code. I'd probably write it as a list comprehension. I suspect Cython will do surprisingly well at producing optimized code for that
return [ build_geometry_objects2(array[i]) for i in range(len(array)) ]
# or
return [ build_geometry_objects2(a) for a in array ]
The second version is probably better, but if it matters you can time it.
If the performance really matters you can use Python C API calls which you can cimport from cpython.list. See Cythonize list of all splits of a string for an example of something similar where list creation is optimized this way. The advantage of PyList_New is that it creates an appropriately sized list at the start filled with NULL pointers, which you can then fill in.

Application of numpy methods

I'm confused with how numpy methods are applied to nd-arrays. for example:
import numpy as np
a = np.array([[1,2,2],[5,2,3]])
b = a.transpose()
a.sort()
Here the transpose() method is not changing anything to a, but is returning the transposed version of a, while the sort() method is sorting a and is returning a NoneType. Anybody an idea why this is and what is the purpose of this different functionality?

Because numpy authors decided that some methods will be in place and some won't. Why? I don't know if anyone but them can answer that question.
'in-place' operations have the potential to be faster, especially when dealing with large arrays, as there is no need to re-allocate and copy the entire array, see answers to this question
BTW, most if not all arr methods have a static version that returns a new array. For example, arr.sort has a static version numpy.sort(arr) which will accept an array and return a new, sorted array (much like the global sorted function and list.sort()).

In a Python class (OOP) methods which operate in place (modify self or its attributes) are acceptable, and if anything, more common than ones that return a new object. That's also true for built in classes like dict or list.
For example in numpy we often recommend the list append approach to building an new array:
In [296]: alist = []
In [297]: for i in range(3):
...: alist.append(i)
...:
In [298]: alist
Out[298]: [0, 1, 2]
This is common enough that we can readily write it as a list comprehension:
In [299]: [i for i in range(3)]
Out[299]: [0, 1, 2]
alist.sort operates in-place, sorted(alist) returns a new list.
In numpy methods that return a new array are much more common. In fact sort is about the only in-place method I can think of off hand. That and a direct modification of shape: arr.shape=(...).
A number of basic numpy operations return a view. That shares data memory with the source, but the array object wrapper is new. In fact even indexing an element returns a new object.
So while you ultimately need to check the documentation, it's usually safe to assume a numpy function or method returns a new object, as opposed to operating in-place.
More often users are confused by the numpy functions that have the same name as a method. In most of those cases the function makes sure the argument(s) is an array, and then delegates the action to its method. Also keep in mind that in Python operators are translated into method calls - + to __add__, [index] to __getitem__() etc. += is a kind of in-place operation.

why I can't use x.unique() in numpy, however, x.sum() or x.mean() works?

I'm learning numpy, however, I don't understand that, for example:
import numpy as np
ints = np.array([3,3,3,2,2,1,1,4,4])
ints.unique() # this won't work
np.unique(ints) # this works
however, some function works both ways
ints.sum()
np.sum(ints)
And I was reading numpy documents, what's the different between attributes vs methods? arributes will return something as well as methods.

unique unlike sum is a free function only and not a class (instance to be precise) method. The difference between the two is
obj.foo() # instance method, obj is implicitly passed to foo()
foo(obj) # free function, obj is explicity passed to foo()
Have a look here for some explanation on different variants of methods. In NumPy, this is mainly a design decision, I believe, however there are certain reasons for some functions to be a free function. One reason that comes to mind, is that unlike in other technical languages (such as MATLAB), numpy arrays can be structured or unstructured and can be flexible in terms of containing objects of different types, for example
a = np.array([[1,2],[3,4]]) # structured array
b = np.array([[1,2],[3,4,5]]) # unstructured array
c = np.array([[1,2],["abc",True]]) # unstructured array with flexible data type
In such scenarios, having to make every function/method an instance method, would lead to confusing behaviour. Even the sum function behaves differently with structured and unstructured arrays
In [18]: a.sum() # sums all elements of the array
Out[18]: 10
In [19]: b.sum() # concatenates all elements of the array
Out[19]: [1, 2, 3, 4, 5]
In contrast, some functions like unique have a much narrower scope in terms of their applications. For example unique only works for structured arrays/buffers of uniform data type and operates on the flattened (1D dimensional) version of the arrays.
attributes of numpy arrays typically tell you about the underlying data type, shape, dimensionality, memory layout/strides and data ownership of the array, for instance:
In [20]: a=np.random.rand(3,4)
In [21]: a.flags
Out[21]:
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [22]: a.shape
Out[22]: (3, 4)
In [23]: a.dtype
Out[23]: dtype('float64')
are all attributes and not array methods per say, in other words they are properties.

np.sum is a function that takes an array, or anything that can be turned into an array, and applies it's sum method. See np.source(np.sum) for details.
arr.sum is a method of the arr array. For a ndarray is compiled code. A subclassed array may have a different sum method.
Most of the cases where the are like-named functions and methods, a relationship like this holds.
Look at the source for np.unique to see a different design. One difference that comes to mind is that unique only works with 1d arrays, or with a flattened array. It's not as general purpose a method like sum or mean.
Some of these differences follow a pattern, or are explained, others are probably more the result of a development history. Often it is easier to add new functionality by writing a 'stand-alone' function, rather than adding a method to an existing class. The method is more closely integrated with the class.
To get into more details you'll have to spend time reading the development archives. For roughly that last 5 years, much of that can found by searching the respective github repository and its issues.

Interpretation between fortran and python

Now I am trying to rewrite the fortran code to python script.
In original fortran code, it declares real number as:
real a(n),b(n),D(64)
Here how can I convert this D(64) in python code?
a(n), b(n) are values from the data which I used, but D(64) is not.
I need to put this into sub-module fortran code which I wrapped with f2py.
That sub-module code looks like below. M is just a integer which will be define in main code.
subroutine multires (a,b,M, D)
implicit none
real a(*),b(*),D(*)
.
.
if(nw.gt.1) D(ms+1)=(sumab/nw)

If you code is looking for good performance, stay away from python lists. Instead you want to use numpy arrays.
import numpy as np
D = np.zeros((64), np.float32)
This will construct a numpy ndarray with 64 elements of 32 bit reals initialized to 0. Using these kind of arrays rather than lists can greatly improve the performance of your python code over lists. They also give you finer control over the typing for interoperability and you incur less overhead, especially if you get into cython.

Python is dynamically typed, so you have not to declare variables with their type. A Fortran array can be translated in a Python list, but as you should not want to dynamically add elements to the list, you should initialize it to its size. So Fortran real D(64) could become in Python:
D = [ 0. for i in range(64) ]
(declares D to be a list and initializes it with 64 0. values)
Of course, if you can use numpy and not just plain Python, you could use numpy.array type. Among other qualities (efficiency, type control, ...), it can be declared with F (for Fortran) order, meaning first-index varies the fastest, as Fortran programmers are used to.

What is the identity of "ndim, shape, size, ..etc" of ndarray in numpy

I'm quite new at Python.
After using Matlab for many many years, recently, I started studying numpy/scipy
It seems like the most basic element of numpy seems to be ndarray.
In ndarray, there are following attributes:
ndarray.ndim
ndarray.shape
ndarray.size
...etc
I'm quite familiar with C++/JAVA classes, but I'm a novice at Python OOP.
Q1: My first question is what is the identity of the above attributes?
At first, I assumed that the above attribute might be public member variables. But soon, I found that a.ndim = 10 doesn't work (assuming a is an object of ndarray) So, it seems it is not a public member variable.
Next, I guessed that they might be public methods similar to getter methods in C++. However, when I tried a.nidm() with a parenthesis, it doesn't work. So, it seems that it is not a public method.
The other possibility might be that they are private member variables, but but print a.ndim works, so they cannot be private data members.
So, I cannot figure out what is the true identity of the above attributes.
Q2. Where I can find the Python code implementation of ndarray? Since I installed numpy/scipy on my local PC, I guess there might be some ways to look at the source code, then I think everything might be clear.
Could you give some advice on this?

numpy is implemented as a mix of C code and Python code. The source is available for browsing on github, and can be downloaded as a git repository. But digging your way into the C source takes some work. A lot of the files are marked as .c.src, which means they pass through one or more layers of perprocessing before compiling.
And Python is written in a mix of C and Python as well. So don't try to force things into C++ terms.
It's probably better to draw on your MATLAB experience, with adjustments to allow for Python. And numpy has a number of quirks that go beyond Python. It is using Python syntax, but because it has its own C code, it isn't simply a Python class.
I use Ipython as my usual working environment. With that I can use foo? to see the documentation for foo (same as the Python help(foo), and foo?? to see the code - if it is writen in Python (like the MATLAB/Octave type(foo))
Python objects have attributes, and methods. Also properties which look like attributes, but actually use methods to get/set. Usually you don't need to be aware of the difference between attributes and properties.
x.ndim # as noted, has a get, but no set; see also np.ndim(x)
x.shape # has a get, but can also be set; see also np.shape(x)
x.<tab> in Ipython shows me all the completions for a ndarray. There are 4*18. Some are methods, some attributes. x._<tab> shows a bunch more that start with __. These are 'private' - not meant for public consumption, but that's just semantics. You can look at them and use them if needed.
Off hand x.shape is the only ndarray property that I set, and even with that I usually use reshape(...) instead. Read their docs to see the difference. ndim is the number of dimensions, and it doesn't make sense to change that directly. It is len(x.shape); change the shape to change ndim. Likewise x.size shouldn't be something you change directly.
Some of these properties are accessible via functions. np.shape(x) == x.shape, similar to MATLAB size(x). (MATLAB doesn't have . attribute syntax).
x.__array_interface__ is a handy property, that gives a dictionary with a number of the attributes
In [391]: x.__array_interface__
Out[391]:
{'descr': [('', '<f8')],
'version': 3,
'shape': (50,),
'typestr': '<f8',
'strides': None,
'data': (165646680, False)}
The docs for ndarray(shape, dtype=float, buffer=None, offset=0,
strides=None, order=None), the __new__ method lists these attributes:
`Attributes
----------
T : ndarray
Transpose of the array.
data : buffer
The array's elements, in memory.
dtype : dtype object
Describes the format of the elements in the array.
flags : dict
Dictionary containing information related to memory use, e.g.,
'C_CONTIGUOUS', 'OWNDATA', 'WRITEABLE', etc.
flat : numpy.flatiter object
Flattened version of the array as an iterator. The iterator
allows assignments, e.g., ``x.flat = 3`` (See `ndarray.flat` for
assignment examples; TODO).
imag : ndarray
Imaginary part of the array.
real : ndarray
Real part of the array.
size : int
Number of elements in the array.
itemsize : int
The memory use of each array element in bytes.
nbytes : int
The total number of bytes required to store the array data,
i.e., ``itemsize * size``.
ndim : int
The array's number of dimensions.
shape : tuple of ints
Shape of the array.
strides : tuple of ints
The step-size required to move from one element to the next in
memory. For example, a contiguous ``(3, 4)`` array of type
``int16`` in C-order has strides ``(8, 2)``. This implies that
to move from element to element in memory requires jumps of 2 bytes.
To move from row-to-row, one needs to jump 8 bytes at a time
(``2 * 4``).
ctypes : ctypes object
Class containing properties of the array needed for interaction
with ctypes.
base : ndarray
If the array is a view into another array, that array is its `base`
(unless that array is also a view). The `base` array is where the
array data is actually stored.
All of these should be treated as properties, though I don't think numpy actually uses the property mechanism. In general they should be considered to be 'read-only'. Besides shape, I only recall changing data (pointer to a data buffer), and strides.

Regarding your first question, Python has syntactic sugar for properties, including fine-grained control of getting, setting, deleting them, as well as restricting any of the above.
So, for example, if you have
class Foo(object):
#property
def shmip(self):
return 3
then you can write Foo().shmip to obtain 3, but, if that is the class definition, you've disabled setting Foo().shmip = 4.
In other words, those are read-only properties.

Question 1
The list you're mentioning is one that contains attributes for a Numpy array.
For example:
a = np.array([1, 2, 3])
print(type(a))
>><class 'numpy.ndarray'>
Since a is an nump.ndarray you're able to use those attributes to find out more about it. (i.e a.size will result in 3). To get information about what each one does visit SciPy's documentation about the attributes.
Question 2
You can start here to get yourself familiar with some of the basics tools of Numpy as well as the Reference Manual assuming you're using v1.9. For information specific to Numpy Array you can go to Array Objects.
Their documentation is very extensive and very helpful. Examples are provided throughout the website showing multiple examples.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I construct a tuple in Cython? - python

Related

Cython - Declare List from Input

Application of numpy methods

why I can't use x.unique() in numpy, however, x.sum() or x.mean() works?

Interpretation between fortran and python

What is the identity of "ndim, shape, size, ..etc" of ndarray in numpy

Categories

Resources