Any memory-efficient but "encapsulated, slice-friendly" way to pass a "linspace"-type array to functions in Python/numpy? - python

I am modifying software that plots and processes time series. The time/X values are evenly spaced, whereas the Y values are arbitrary. The way the architecture is currently designed, both time/X and Y values are generated upon initializing the program and explicitly stored in numpy arrays. Upon events being received by the software, these are passed to functions that follow the pattern do_something(x,y,params). These calls may be chained.
However, the downstream "do_something" functions often need only a fraction of the total length of the arrays. Since the entire array of Y values needs to be explicit anyway, the implementations of "do_something" incur no additional overhead with respect to Y by being passed a reference to this entire array and slicing it as needed. However, it would save a great deal of memory to only generate the X values within the slice, on the fly. The obvious way to do this, though, is to not pass a single parameter "X", but to pass a starting value that "aligns with" y[0], along with a step size as a separate parameter. Then whenever Y is sliced, the new starting X is calculated as old_start + step_size*low_index, and finally an X array is computed right as it's needed.
While very doable, this is kind of "ugly" because it breaks the symmetry of X and Y both being single reference arguments, to Y being a single reference argument and X being a pair of value arguments, and the downstream code all has to "remember" to handle these differently. It would be nice and "more Pythonic" to have some sort of object that by operator overloading does this automatically, i.e. it "holds" the start and interval "inside" it, and then an operation that "looks like" slicing outputs an array of just those values, that can be sent to matplotlib or whatever as an argument. The overhead of doing this would need to be fast though--because when scrolling through time the slice is constantly "moving".
Is there a natural way to create such a thing?

Related

I set 3 arrays to the same thing, changing a single entry in one of them also changes the other two arrays. How can I make the three arrays separate?

I am making a puzzle game in a command terminal. I have three arrays for the level, originlevel, which is the unaltered level that the game will return to if you restart the level. Emptylevel is the level without the player. Level is just the level. I need all 3, because I will be changing the space around the player.
def Player(matrix,width):
originlevel = matrix
emptylevel = matrix
emptylevel[PlayerPositionFind(matrix)]="#"
level = matrix
The expected result is that it would set one entry to "#" in the emptylevel array, but it actually sets all 3 arrays to the same thing! My theory is that the arrays are somehow linked because they are originally said to the same thing, but this ruins my code entirely! How can I make the arrays separate, so changing one would not change the other?
I should not that matrix is an array, it is not an actual matrix.
I tried a function which would take the array matrix, and then just return it, thinking that this layer would unlink the arrays. It did not. (I called the function IHATEPYTHON).
I've also read that setting them to the same array is supposed to do this, but I didn't actually find an answer how to make them NOT do that. Do I make a function which is just something like
for i in range(0,len(array)):
newarray.append(array[i])
return newarray
I feel like that would solve the issue but that's so stupid, can I not do it in another way?
This issue is caused by the way variables work in Python. If you want more background on why this is happening, you should look up 'pass by value versus pass by reference'.
In order for each of these arrays to be independent, you need to create a copy each time you assign it. The easiest way to do that is to use an array slice. This means you will get a new copy of the array each time.
def Player(matrix,width):
originlevel = matrix[:]
emptylevel = matrix[:]
emptylevel[PlayerPositionFind(matrix)]="#"
level = matrix[:]

"Direct" numpy functions on an array vs numpy array functions

I have a question about the design of Python. I have realised that some functions are implemented directly on container classes (e.g. numpy arrays) while other function that act on these containers must be called from numpy itself. An example would be:
import numpy as np
y = np.array([4,7,9,1])
m1 = np.mean(y) # Ok
m2 = y.mean() # Ok
print(m1 == m2) # True
x = [2,3]
r1 = np.concatenate([x, y]) # Ok
r2 = y.concatenate(x) # AttributeError: 'numpy.ndarray' object has no attribute 'concatenate'
print(r1 == r2)
Why can the mean be calculated directly from the array while the array as no method to concatenate another one to it? Is there a general rule which functions can be called directly on the array and which ones not? And if both is possible what is the pythonic way to do it?
The overview of NumPy history gives an indication of why not everything is consistent: it has two predecessors that were developed independently. Backward compatibility requires the project to keep array methods like max. Ongoing development favors the function syntax np.fun(array). I suppose one reason for the latter is that it allows array_like input (the term used throughout NumPy documentation): anything that NumPy can turn into an ndarray.
The question of why there are both methods and functions of the same name has been discussed and links provided.
But to focus on your two examples:
mean uses just one array. Logically it can be an ndarray method.
concatenate takes a list of arrays, and doesn't give priority to any one of them.
There is a np.append function that looks superficially like the list .append method. But it just passes the task on to concatenate with just a few modifications. And it causes all kinds of newby errors - it isn't inplace, it ravels, and it is slow compared to the list method.
Or consider the large family of ufunc. Those are functions, some take one array, others two. They share a common ufunc functionality.
np.add(a,b) <=> a+b <=> a.__add__(b)
np.sin(a) # no a.sin()
I suspect the choice to make sin a ufunc rather than a method has been influenced by common mathematical notation.
To me a big plus to the function approach is that it can be applied to a list or scalar. np.sin(1) works just as well as np.sin([0,.5,1]) or np.sin(np.arange(0,1,.5)).
Yes, history goes a long way toward excusing the mix of functions of methods, but many of the choices are logical.

Why isn't there any special method for __max__ in python?

As the title asks. Python has a lot of special methods, __add__, __len__, __contains__ et c. Why is there no __max__ method that is called when doing max? Example code:
class A:
def __max__():
return 5
a = A()
max(a)
It seems like range() and other constructs could benefit from this. Am I missing some other effective way to do max?ยจ
Addendum 1:
As a trivial example, max(range(1000000000)) takes a long time to run.
I have no authoritative answer but I can offer my thoughts on the subject.
There are several built-in functions that have no corresponding special method. For example:
max
min
sum
all
any
One thing they have in common is that they are reduce-like: They iterate over an iterable and "reduce" it to one value. The point here is that these are more of a building block.
For example you often wrap the iterable in a generator (or another comprehension, or transformation like map or filter) before applying them:
sum(abs(val) for val in iterable) # sum of absolutes
any(val > 10 for val in iterable) # is one value over 10
max(person.age for person in iterable) # the oldest person
That means most of the time it wouldn't even call the __max__ of the iterable but try to access it on the generator (which isn't implemented and cannot be implemented).
So there is simply not much of a benefit if these were implemented. And in the few cases when it makes sense to implement them it would be more obvious if you create a custom method (or property) because it highlights that it's a "shortcut" or that it's different from the "normal result".
For example these functions (min, etc.) have O(n) run-time, so if you can do better (for example if you have a sorted list you could access the max in O(1)) it might make sense to document that explicitly.
Some operations are not basic operations. Take max as an example, it is actually an operation based on comparison. In other words, when you get a max value, you are actually getting a biggest value.
So in this case, why should we implement a specified max function but not override the behave of comparison?
Think in another direction, what does max really mean? For example, when we execute max(list), what are we doing?
I think we are actually checking list's elements, and the max operation is not related to list itself at all.
list is just a container which is unnecessary in max operation. It is list or set or something else, it doesn't matter. What really useful is the elements inside this container.
So if we define a __max__ action for list, we are actually doing another totally different operation. We are asking a container to give us advice about max value.
I think in this case, as it is a totally different operation, it should be a method of container instead of overriding built-in function's behave.

How to cache the function that is returned by scipy interpolation

Trying to speed up a potential flow aerodynamic solver. Instead of calculating velocity at an arbitrary point using a relatively expensive formula I tried to precalculate a velocity field so that I could interpolate the values and (hopefully) speed up the code. Result was a slow-down due (I think) to the scipy.interpolate.RegularGridInterpolator method running on every call. How can I cache the function that is the result of this call? Everything I tried gets me hashing errors.
I have a method that implements the interpolator and a second 'factory' method to reduce the argument list so that it can be used in an ODE solver.
x_panels and y_panels are 1D arrays/tuples, vels is a 2D array/tuple, x and y are floats.
def _vol_vel_factory(x_panels, y_panels, vels):
# Function factory method
def _vol_vel(x, y, t=0):
return _volume_velocity(x, y, x_panels, y_panels, vels)
return _vol_vel
def _volume_velocity(x, y, x_panels, y_panels, vels):
velfunc = sp_int.RegularGridInterpolator(
(x_panels, y_panels), vels
)
return velfunc(np.array([x, y])).reshape(2)
By passing tuples instead of arrays as inputs I was able to get a bit further but converting the method output to a tuple did not make a difference; I still got the hashing error.
In any case, caching the result of the _volume_velocity method is not really what I want to do, I really want to somehow cache the result of _vol_vel_factory, whose result is a function. I am not sure if this is even a valid concept.
scipy.interpolate.RegularGridInterpolator returns a numpy array. This is not cacheable because it doesn't implement hash.
You can store other representations of the numpy array and cache that and then convert it back to a numpy array though. For details on how to do that look at the following.
How to hash a large object (dataset) in Python?

Passing subset reference of array/list as an argument in Python

I'm kind of new (1 day) to Python so maybe my question is stupid. I've already looked here but I can't find my answer.
I need to modify the content of an array at a random offset with a random size.
I have a Python API to interface a DDL for an USB device which I can't modify. There is a function just like this one :
def read_device(in_array):
# The DLL accesses the USB device, reads it into in_array of type array('B')
# in_array can be an int (the function will create an array with the int
# size and return it), or can be an array or, can be a tuple (array, int)
In MY code, I create an array of, let's say, 64 bytes and I want to read 16 bytes starting from the 32rd byte. In C, I'd give &my_64_array[31] to the read_device function.
In python, if a give :
read_device(my_64_array[31:31+16])
it seems that in_array is a reference to a copy of the given subset, therefore my_64_array is not modified.
What can I do ? Do I have to split my_64_array and recombine it after ??
Seeing as how you are not able to update and/or change the API code. The best method is to pass the function a small temporary array that you then assign to your existing 64 byte array after the function call.
So that would be something like the following, not knowing the exact specifics of your API call.
the_64_array[31:31+16] = read_device(16)
It's precisely as you say, if you input a slice into a function it creates a reference copy of the slice.
Two possible methods to add it later (assuming read_device returns the relevant slice):
my_64_array = my_64_array[:32] + read_device(my_64_array[31:31+16]) + my_64_array[31+16:]
# equivalently, but around 33% faster for even small arrays (length 10), 3 times faster for (length 50)...
my_64_array[31:31+16] = read_device(my_64_array[31:31+16])
So I think you should be using the latter.
.
If it was a modifiable function (but it's not in this case!) you could be to change your functions arguments (one is the entire array):
def read_device(the_64_array, start=31, end=47):
# some code
the_64_array[start:end] = ... #modifies `in_array` in place
and call read_device(my_64_array) or read(my_64_array, 31, 31+16).
When reading a list subset you're calling __getitem__ with a slice(x, y) argument of that list. In your case these statements are equal:
my_64_array[31:31+16]
my_64_array.__getitem__(slice(31, 31+16))
This means that the __getitem__ function can be overridden in a subclass to obtain different behaviour.
You can also set the same subset using a[1:3] = [1,2,3] in which case it'd call a.__setitem__(slice(1, 3), [1,2,3])
So I'd suggest either of these:
pass the list (my_64_array) and a slice object to read_device instead of passing the result of __getitem__, after which you could read the necessary data and set the corresponding offsets. No subclassing. This is probably the best solution in terms of readability and ease of development.
subclassing list, overriding __getitem__ and __setitem__ to return instances of that subclass with a parent reference, and then change all modifying or reading methods of a list to reference a parent list instead. This might be a little tricky if you're new to python, but basically, you'd exploit that python list properties are largely defined by the methods inside a list instance. This is probably better in terms of performance as you can create references.
If read_device returns the resulting list, and that list is of equal size, you can do this: a[x:y] = read_device(a[x:y])

Categories

Resources