Passing subset reference of array/list as an argument in Python - python

I'm kind of new (1 day) to Python so maybe my question is stupid. I've already looked here but I can't find my answer.
I need to modify the content of an array at a random offset with a random size.
I have a Python API to interface a DDL for an USB device which I can't modify. There is a function just like this one :
def read_device(in_array):
# The DLL accesses the USB device, reads it into in_array of type array('B')
# in_array can be an int (the function will create an array with the int
# size and return it), or can be an array or, can be a tuple (array, int)
In MY code, I create an array of, let's say, 64 bytes and I want to read 16 bytes starting from the 32rd byte. In C, I'd give &my_64_array[31] to the read_device function.
In python, if a give :
read_device(my_64_array[31:31+16])
it seems that in_array is a reference to a copy of the given subset, therefore my_64_array is not modified.
What can I do ? Do I have to split my_64_array and recombine it after ??

Seeing as how you are not able to update and/or change the API code. The best method is to pass the function a small temporary array that you then assign to your existing 64 byte array after the function call.
So that would be something like the following, not knowing the exact specifics of your API call.
the_64_array[31:31+16] = read_device(16)

It's precisely as you say, if you input a slice into a function it creates a reference copy of the slice.
Two possible methods to add it later (assuming read_device returns the relevant slice):
my_64_array = my_64_array[:32] + read_device(my_64_array[31:31+16]) + my_64_array[31+16:]
# equivalently, but around 33% faster for even small arrays (length 10), 3 times faster for (length 50)...
my_64_array[31:31+16] = read_device(my_64_array[31:31+16])
So I think you should be using the latter.
.
If it was a modifiable function (but it's not in this case!) you could be to change your functions arguments (one is the entire array):
def read_device(the_64_array, start=31, end=47):
# some code
the_64_array[start:end] = ... #modifies `in_array` in place
and call read_device(my_64_array) or read(my_64_array, 31, 31+16).

When reading a list subset you're calling __getitem__ with a slice(x, y) argument of that list. In your case these statements are equal:
my_64_array[31:31+16]
my_64_array.__getitem__(slice(31, 31+16))
This means that the __getitem__ function can be overridden in a subclass to obtain different behaviour.
You can also set the same subset using a[1:3] = [1,2,3] in which case it'd call a.__setitem__(slice(1, 3), [1,2,3])
So I'd suggest either of these:
pass the list (my_64_array) and a slice object to read_device instead of passing the result of __getitem__, after which you could read the necessary data and set the corresponding offsets. No subclassing. This is probably the best solution in terms of readability and ease of development.
subclassing list, overriding __getitem__ and __setitem__ to return instances of that subclass with a parent reference, and then change all modifying or reading methods of a list to reference a parent list instead. This might be a little tricky if you're new to python, but basically, you'd exploit that python list properties are largely defined by the methods inside a list instance. This is probably better in terms of performance as you can create references.
If read_device returns the resulting list, and that list is of equal size, you can do this: a[x:y] = read_device(a[x:y])

Related

How do I keep the dtype intact of a kwarg?

For a script that I am working on I want to make it optional to pass on an array to a function. The way in which I have attempted to do this is by making the variable in question (residue) a kwarg.
The problem is that when I do it in this way, python changes de dtype of the kwarg from a numpy.ndarray to dict. The simplest solution is to convert the variable back to a np.array using:
residue = np.array(residue.values())
But I do not find this a very elegant solution. So I was wondering if someone could show me a "prettier" way to accomplish this and possibly explain to my why python does this?
The function in question is:
#Returns a function for a 2D Gaussian model
def Gaussian_model2D(data,x_box,y_box,amplitude,x_stddev,y_stddev,theta,**residue):
if not residue:
x_mean, y_mean = max_pixel(data) # Returns location of maximum pixel value
else:
x_mean, y_mean = max_pixel(residue) # Returns location of maximum pixel value
g_init = models.Gaussian2D(amplitude,x_mean,y_mean,x_stddev,y_stddev,theta)
return g_init
# end of Gaussian_model2D
The function is called with the following command:
g2_init = Gaussian_model2D(cut_out,x_box,y_box,amp,x_stddev,y_stddev,theta,residue=residue1)
The version of Python that I am working in is 2.7.15
See the accepted answer here why you always get a mapping-object (aka a dict) if you pass arguments via **kwargs; the language spec says:
If the form “**identifier” is present, it is initialized to a new
ordered mapping receiving any excess keyword arguments, defaulting to
a new empty mapping of the same type.
In other words, the behavior you described is exactly what the language guarantees.
One of the reasons for this behavior is that all functions, wrappers, and implementations in the underlying language (e.g. C / J) will understand that **kwargs is part of the arguments and should be expanded to its key-value combinations.
If you want to preserve your extra-arguments as an object of a certain type, you can't use **kwargs to do so; pass it via an explicit argument, e.g. extra_args which has no special meaning.

How do you know in advance if a method (or function) will alter the variable when called?

I am new to Python from R. I have recently spent a lot of time reading up on how everything in Python is an object, objects can call methods on themselves, methods are functions within a class, yada yada yada.
Here's what I don't understand. Take the following simple code:
mylist = [3, 1, 7]
If I want to know how many times the number 7 occurs, I can do:
mylist.count(7)
That, of course, returns 1. And if I want to save the count number to another variable:
seven_counts = mylist.count(7)
So far, so good. Other than the syntax, the behavior is similar to R. However, let's say I am thinking about adding a number to my list:
mylist.append(9)
Wait a minute, that method actually changed the variable itself! (i.e., "mylist" has been altered and now includes the number 9 as the fourth digit in the list.) Assigning the code to a new variable (like I did with seven_counts) produces garbage:
newlist = mylist.append(9)
I find the inconsistency in this behavior a bit odd, and frankly undesirable. (Let's say I wanted to see what the result of the append looked like first and then have the option to decide whether or not I want to assign it to a new variable.)
My question is simple:
Is there a way to know in advance if calling a particular method will actually alter your variable (object)?
Aside from reading the documentation (which for some methods will include type annotations specifying the return value) or playing with the method in the interactive interpreter (including using help() to check the docstring for a type annotation), no, you can't know up front just by looking at the method.
That said, the behavior you're seeing is intentional. Python methods either return a new modified copy of the object or modify the object in place; at least among built-ins, they never do both (some methods mutate the object and return a non-None value, but it's never the object just mutated; the pop method of dict and list is an example of this case).
This either/or behavior is intentional; if they didn't obey this rule, you'd have had an even more confusing and hard to identify problem, namely, determining whether append mutated the value it was called on, or returned a new object. You definitely got back a list, but is it a new list or the same list? If it mutated the value it was called on, then
newlist = mylist.append(9)
is a little strange; newlist and mylist would be aliases to the same list (so why have both names?). You might not even notice for a while; you'd continue using newlist, thinking it was independent of mylist, only to look at mylist and discover it was all messed up. By having all such "modify in place" methods return None (or at least, not the original object), the error is discovered more quickly/easily; if you try and use newlist, mistakenly believing it to be a list, you'll immediately get TypeErrors or AttributeErrors.
Basically, the only way to know in advance is to read the documentation. For methods whose name indicates a modifying operation, you can check the return value and often get an idea as to whether they're mutating. It helps to know what types are mutable in the first place; list, dict, set and bytearray are all mutable, and the methods they have that their immutable counterparts (aside from dict, which has no immutable counterpart) lack tend to mutate the object in place.
The default tends to be to mutate the object in place simply because that's more efficient; if you have a 100,000 element list, a default behavior for append that made a new 100,001 element list and returned it would be extremely inefficient (and there would be no obvious way to avoid it). For immutable types (e.g. str, tuple, frozenset) this is unavoidable, and you can use those types if you want a guarantee that the object is never mutate in place, but it comes at a cost of unnecessary creation and destruction of objects that will slow down your code in most cases.
Just checkout the doc:
>>> list.count.__doc__
'L.count(value) -> integer -- return number of occurrences of value'
>>> list.append.__doc__
'L.append(object) -> None -- append object to end'
There isn't really an easy way to tell, but:
immutable object --> no way of changing through method calls
So, for example, tuple has no methods which affect the tuple as it is unchangeable so methods can only return new instances.
And if you "wanted to see what the result of the append looked like first and then have the option to decide whether or not I want to assign it to a new variable" then you can concatenate the list with a new list with one element.
i.e.
>>> l = [1,2,3]
>>> k = l + [4]
>>> l
[1, 2, 3]
>>> k
[1, 2, 3, 4]
Not from merely your invocation (your method call). You can guarantee that the method won't change the object if you pass in only immutable objects, but some methods are defined to change the object -- and will either not be defined for the one you use, or will fault in execution.
I Real Life, you look at the method's documentation: that will tell you exactly what happens.
[I was about to include what Joe Iddon's answer covers ...]

Why isn't there any special method for __max__ in python?

As the title asks. Python has a lot of special methods, __add__, __len__, __contains__ et c. Why is there no __max__ method that is called when doing max? Example code:
class A:
def __max__():
return 5
a = A()
max(a)
It seems like range() and other constructs could benefit from this. Am I missing some other effective way to do max?¨
Addendum 1:
As a trivial example, max(range(1000000000)) takes a long time to run.
I have no authoritative answer but I can offer my thoughts on the subject.
There are several built-in functions that have no corresponding special method. For example:
max
min
sum
all
any
One thing they have in common is that they are reduce-like: They iterate over an iterable and "reduce" it to one value. The point here is that these are more of a building block.
For example you often wrap the iterable in a generator (or another comprehension, or transformation like map or filter) before applying them:
sum(abs(val) for val in iterable) # sum of absolutes
any(val > 10 for val in iterable) # is one value over 10
max(person.age for person in iterable) # the oldest person
That means most of the time it wouldn't even call the __max__ of the iterable but try to access it on the generator (which isn't implemented and cannot be implemented).
So there is simply not much of a benefit if these were implemented. And in the few cases when it makes sense to implement them it would be more obvious if you create a custom method (or property) because it highlights that it's a "shortcut" or that it's different from the "normal result".
For example these functions (min, etc.) have O(n) run-time, so if you can do better (for example if you have a sorted list you could access the max in O(1)) it might make sense to document that explicitly.
Some operations are not basic operations. Take max as an example, it is actually an operation based on comparison. In other words, when you get a max value, you are actually getting a biggest value.
So in this case, why should we implement a specified max function but not override the behave of comparison?
Think in another direction, what does max really mean? For example, when we execute max(list), what are we doing?
I think we are actually checking list's elements, and the max operation is not related to list itself at all.
list is just a container which is unnecessary in max operation. It is list or set or something else, it doesn't matter. What really useful is the elements inside this container.
So if we define a __max__ action for list, we are actually doing another totally different operation. We are asking a container to give us advice about max value.
I think in this case, as it is a totally different operation, it should be a method of container instead of overriding built-in function's behave.

Custom class which is a dict, but initialized without dict copy?

For legibility purposes, I would like to have a custom class that behaves exactly like a dict (but carries a meaningful type instead of the more general dict type):
class Derivatives(dict):
"Dictionary that represents the derivatives."
Now, is there a way of building new objects of this class in a way that does not involve copies? The naive usage
derivs = Derivatives({var: 1}) # var is a Python object
in fact creates a copy of the dictionary passed as an argument, which I would like to avoid, for efficiency reasons.
I tried to bypass the copy but then the class of the dict cannot be changed, in CPython:
class Derivatives(dict):
def __new__(cls, init_dict):
init_dict.__class__ = cls # Fails with __class__ assignment: only for heap types
return init_dict
I would like to have both the ability to give an explicit class name to the dictionaries that the program manipulates and an efficient way of building such dictionaries (instead of being forced to copy a Python dict). Is this doable efficiently in Python?
PS: The use case is maybe 100,000 creations of single-key Derivatives, where the key is a variable (not a string, so no keyword initialization). This is actually not slow, so "efficiency reasons" here means more something like "elegance": there is ideally no need to waste time doing a copy when the copy is not needed. So, in this particular case the question is more about the elegance/clarity that Python can bring here than about running speed.
By inheriting from dict you are given three possibilities for constructor arguments: (baring the {} literal)
class dict(**kwarg)
class dict(mapping, **kwarg)
class dict(iterable, **kwarg)
This means that, in order to instantiate your instance you must do one of the following:
Pass the variables as keywords D(x=1) which are then packed into an intermediate dictionary anyway.
Create a plain dictionary and pass it as a mapping.
Pass an iterable of (key,value) pairs.
So in all three of these cases you will need to create intermediate objects to satisfy the dict constructor.
The third option for a single pair it would look like D(((var,1),)) which I highly recommend against for readability sake.
So if you want your class to inherit from a dictionary, using Derivatives({var: 1}) is your most efficient and most readable option.
As a personal note if you will have thousands of single pair dictionaries I'm not sure how the dict setup is the best in the first place, you may just reconsider the basis of your class.
TL;DR: There's not general-purpose way to do it unless you do it in C.
Long answer:
The dict class is implemented in C. Thus, there is no way to access it's internal properties - and most importantly, it's internal hash table, unless you use C.
In C, you could simply copy the pointer representing the hash table into your object without having to iterate over the dict (key, value) pairs and insert them into your object. (Of course, it's a bit more complicated than this. Note that I omit memory management details).
Longer answer:
I'm not sure why you are concerned about efficiency.
Python passes arguments as references. It rarely every copies unless you explicitly tell it to.
I read in the comments that you can't use named parameters, as the keys are actual Python objects. That leaves me to understand that you're worried about copying the dict keys (and maybe values). However, even the dictionary keys are not copied, and passed by reference! Consider this code:
class Test:
def __init__(self, x, y):
self.x = x
self.y = y
def __hash__(self):
return self.x
t = Test(1, 2)
print(t.y) # prints 2
d = {t: 1}
print(d[t]) # prints 1
keys = list(d.keys())
keys[0].y = 10
print(t.y) # prints 10! No copying was made when inserting object into dictionary.
Thus, the only remaining area of concern is iterating through the dict and inserting the values in your Derivatives class. This is unavoidable, unless you can somehow set the internal hash table of your class to the dict's internal hash table. There is no way to do this in pure python, as the dict class is implemented in C (as mentioned above).
Note that others have suggested using generators. This seems like a good idea too - say if you were reading the derivatives from a file or if you were generating them with a simple formula. It would avoid creating the dict object in the first place. However, there will be no noticable improvements in efficiency if the generators are just wrappers around lists (or any other data structure that can contain an arbritary set of values).
Your best bet is do stick with your original method. Generators are great, but they can't efficiently represent an arbritary set of values (which might be the case in your scenario). It's also not worth it to do it in C.
EDIT: It might be worth it to do it in C, after all!
I'm not too big on the details of the Python C API, but consider defining a class in C, for example,DerivativesBase (deriving from dict). All you do is define an __init__ function in C for DerivativesBase that takes a dict as a parameter and copies the hash table pointer from the dict into your DerivativesBase object. Then, in python, your Derivatives class derives from DerivativesBase and implements the bulk of the functionality.

Avoiding Python sum default start arg behavior

I am working with a Python object that implements __add__, but does not subclass int. MyObj1 + MyObj2 works fine, but sum([MyObj1, MyObj2]) led to a TypeError, becausesum() first attempts 0 + MyObj. In order to use sum(), my object needs __radd__ to handle MyObj + 0 or I need to provide an empty object as the start parameter. The object in question is not designed to be empty.
Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.
Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc contains one right-open interval, i.e. [start, end). Adding SimpleLoc yields a CompoundLoc, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12], checking length, and checking membership.
The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Questions I've looked at:
python's sum() and non-integer values
why there's a start argument in python's built-in sum function
TypeError after overriding the __add__ method
I'm considering two solutions. One is to avoid sum() and use the loop offered in this comment. I don't understand why sum() begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.
My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum() work.
# ...
def __radd__(self, other):
# This allows sum() to work (the default start value is zero)
if other == 0:
return self
return self.__add__(other)
In summary, is there another way to use sum() on objects that can neither be added to integers nor be empty?
Instead of sum, use:
import operator
from functools import reduce
reduce(operator.add, seq)
in Python 2 reduce was built-in so this looks like:
import operator
reduce(operator.add, seq)
Reduce is generally more flexible than sum - you can provide any binary function, not only add, and you can optionally provide an initial element while sum always uses one.
Also note: (Warning: maths rant ahead)
Providing support for add w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.
Note that all of:
naturals
reals
complex numbers
N-d vectors
NxM matrices
strings
together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.
If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum.
In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +, are likely to expect that it will behave in a monoidic way (as addition normally does).
Thanks for expanding, I'll refer to your particular module now:
There are 2 concepts here:
Simple locations,
Compound locations.
It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.
OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.
If you agree with me (and the above matches your implementation), then you'll be able to use sum as following:
sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )
Indeed, this appears to work.
I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.
Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__, __iter__, __len__, perhaps __or__ as an alias of +, __and__ as the product, etc).
As for construction from xrange, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end) pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc or CompoundLoc if you feel it's going to help.
I think that the best way to accomplish this is to provide the __radd__ method, or pass the start object to sum explicitly.
In case you really do not want to override __radd__ or provide a start object, how about redefining sum()?
>>> from __builtin__ import sum as builtin_sum
>>> def sum(iterable, startobj=MyCustomStartObject):
... return builtin_sum(iterable, startobj)
...
Preferably use a function with a name like my_sum(), but I guess that is one of the things you want to avoid (even though globally redefining builtin functions is probably something that a future maintainer will curse you for)
Actually, implementing __add__ without the concept of an "empty object" makes little sense. sum needs a start parameter to support the sums of empty and one-element sequences, and you have to decide what result you expect in these cases:
sum([o1, o2]) => o1 + o2 # obviously
sum([o1]) => o1 # But how should __add__ be called here? Not at all?
sum([]) => ? # What now?
You could use an object that's universally neutral wrt. addition:
class Neutral:
def __add__(self, other):
return other
print(sum("A BC D EFG".split(), Neutral())) # ABCDEFG
You could so something like:
from operator import add
try:
total = reduce(add, whatever) # or functools.reduce in Py3.x
except TypeError as e:
# I'm not 100% happy about branching on the exception text, but
# figure this msg isn't likely to be changed after so long...
if e.args[0] == 'reduce() of empty sequence with no initial value':
pass # do something appropriate here if necessary
else:
pass # Most likely that + isn't usable between objects...

Categories

Resources