How do I keep the dtype intact of a kwarg? - python

For a script that I am working on I want to make it optional to pass on an array to a function. The way in which I have attempted to do this is by making the variable in question (residue) a kwarg.
The problem is that when I do it in this way, python changes de dtype of the kwarg from a numpy.ndarray to dict. The simplest solution is to convert the variable back to a np.array using:
residue = np.array(residue.values())
But I do not find this a very elegant solution. So I was wondering if someone could show me a "prettier" way to accomplish this and possibly explain to my why python does this?
The function in question is:
#Returns a function for a 2D Gaussian model
def Gaussian_model2D(data,x_box,y_box,amplitude,x_stddev,y_stddev,theta,**residue):
if not residue:
x_mean, y_mean = max_pixel(data) # Returns location of maximum pixel value
else:
x_mean, y_mean = max_pixel(residue) # Returns location of maximum pixel value
g_init = models.Gaussian2D(amplitude,x_mean,y_mean,x_stddev,y_stddev,theta)
return g_init
# end of Gaussian_model2D
The function is called with the following command:
g2_init = Gaussian_model2D(cut_out,x_box,y_box,amp,x_stddev,y_stddev,theta,residue=residue1)
The version of Python that I am working in is 2.7.15

See the accepted answer here why you always get a mapping-object (aka a dict) if you pass arguments via **kwargs; the language spec says:
If the form “**identifier” is present, it is initialized to a new
ordered mapping receiving any excess keyword arguments, defaulting to
a new empty mapping of the same type.
In other words, the behavior you described is exactly what the language guarantees.
One of the reasons for this behavior is that all functions, wrappers, and implementations in the underlying language (e.g. C / J) will understand that **kwargs is part of the arguments and should be expanded to its key-value combinations.
If you want to preserve your extra-arguments as an object of a certain type, you can't use **kwargs to do so; pass it via an explicit argument, e.g. extra_args which has no special meaning.

Related

Argument convention in PyTorch

I am new to PyTorch and while going through the examples, I noticed that sometimes functions have a different convention when accepting arguments. For example transforms.Compose receives a list as its argument:
transform=transforms.Compose([ # Here we pass a list of elements
transforms.ToTensor(),
transforms.Normalize(
(0.4915, 0.4823, 0.4468),
(0.2470, 0.2435, 0.2616)
)
]))
At the same time, other functions receive the arguments individually (i.e. not in a list). For example torch.nn.Sequential:
torch.nn.Sequential( # Here we pass individual elements
torch.nn.Linear(1, 4),
torch.nn.Tanh(),
torch.nn.Linear(4, 1)
)
This has been a common typing mistake for me while learning.
I wonder if we are implying something when:
the arguments are passed as a list
the arguments are passed as individual items
Or is it simply the preference of the contributing author and should be memorized as is?
Update 1: Note that I do not claim that either format is better. I am merely complaining about lack of consistency. Of course (as Ivan stated in his answer) it makes perfect sense to follow one format if there is a good reason for it (e.g. transforms.Normalize). But if there is not, then I would vote for consistency.
This is not a convention, it is a design decision.
Yes, torch.nn.Sequential (source) receives individual items, whereas torchvision.transforms.Compose (source) receives a single list of items. Those are arbitrary design choices. I believe PyTorch and Torchvision are maintained by different groups of people, which might explain the difference. One could argue it is more coherent to have the inputs passed as a list since it is as a varied length, this is the approach used in more conventional programming languages such as C++ and Java. On the other hand you could argue it is more readable to pass them as a sequence of separate arguments instead, which what languages such as Python.
In this particular case we would have
>>> fn1([element_a, element_b, element_c]) # single list
vs
>>> fn2(element_a, element_b, element_c) # separate args
Which would have an implementation that resembles:
def fn1(elements):
pass
vs using the star argument:
def fn2(*elements):
pass
However it is not always up to design decision, sometimes the implementation is clear to take. For instance, it would be much preferred to go the list approach when the function has other arguments (whether they are positional or keyword arguments). In this case it makes more sense to implement it as fn1 instead of fn2. Here I'm giving second example with keyword arguments. Look a the difference in interface for the first set of arguments in both scenarios:
>>> fn1([elemen_a, element_b], option_1=True, option_2=True) # list
vs
>>> fn2(element_a, element_b, option_1=True, option_2=True) # separate
Which would have a function header which looks something like:
def fn1(elements, option_1=False, option_2=False)
pass
While the other would be using a star argument under the hood:
def fn2(*elements, option_1=False, option_2=False)
pass
If an argument is positioned after the star argument it essentially forces the user to use it as a keyword argument...
Mentioning this you can check out the source code for both Compose and Sequential and you will notice how both only expect a list of elements and no additional arguments afterwards. So in this scenario, it might have been preferred to go with Sequential's approach using the star argument... but this is just personal preference!

In Python, is it better to have two parameters for different types of the same thing, or handle one parameter of multiple types?

Please help resolve this debate on my team:
If we have a function that takes either a point (list of 3 values) or a list of points, then we have either:
def f(point=None, points=None):
if points is None: points = []
if point is not None: points.append(point)
.... stuff with "points" ....
or
def g(points):
if len(points) > 0 and not isinstance(points[0],list):
points = [points]
.... stuff with "points" ....
to be called as,
x = f(points=[[1,2,3],[4,5,6]])
y = f(point=[1,2,3])
z = g(points=[[1,2,3],[4,5,6]])
w = g(points=[1,2,3])
where x,y,z, and w should all be the same (if f and g are the same function).
That is to say, internally both f and g want a 2-dimensional array, but provide mechanisms to accept a single-dimensional array.
The difference is that f has an argument for each type, and therefore each argument expects a single, specific type, but g has a single argument that may be of multiple types.
The question is, which style is more "Pythonic"? (References to PEPs more than welcome.. I couldn't find anything.)
Edit: I forgot to mention that ostensibly the reason for supporting 1-dimensional input is backwards compatibility. That is to say, f(point, axis) already exists, and we are changing it to f(points, axes).. and we are trying to changing the callers as they are external to this code base. I agree that perhaps the right answer is to not do this at all and force the caller to be consistent with their types, but I ask the reader to assume this would be difficult and to provide some clarity on the two choices I am asking about, if at all possible. Thanks!
As the signatures of your f and g differ, I'm assuming you control how the function is or will be called. In this case, the most logical thing to do would be to accept only two dimensional lists and call it like this:
x = h([[1,2,3], [4,5,6]])
y = h([[1,2,3]])
A list of points with only one element is no special case unless it is semantically diferent than a multipoint list in your context or it would change the functions behavior.
I'd suggest using *args (or, well, call it *points possibly) and making the user unpack his points on his end.
This allows you to call f(point), f(point1, point2, point3), or f(*list_of_points), with all three options working fine and you can simply do:
def f(*points):
...
for point in points:
do_pointy_stuff(point)
...which is similar to g2 but without hacky isinstance checks which would fail if someone decided to throw in a tuple or something instead.
However, there is a minor potential pitfall here - the user must knowingly unpack his lists of points. If you don't pay attention when using a single argument, you may pass a *[points] rather than a *[[list of points]], which would result in the function treating each coordinate as a separate argument in itself and most likely throwing an error.

Passing subset reference of array/list as an argument in Python

I'm kind of new (1 day) to Python so maybe my question is stupid. I've already looked here but I can't find my answer.
I need to modify the content of an array at a random offset with a random size.
I have a Python API to interface a DDL for an USB device which I can't modify. There is a function just like this one :
def read_device(in_array):
# The DLL accesses the USB device, reads it into in_array of type array('B')
# in_array can be an int (the function will create an array with the int
# size and return it), or can be an array or, can be a tuple (array, int)
In MY code, I create an array of, let's say, 64 bytes and I want to read 16 bytes starting from the 32rd byte. In C, I'd give &my_64_array[31] to the read_device function.
In python, if a give :
read_device(my_64_array[31:31+16])
it seems that in_array is a reference to a copy of the given subset, therefore my_64_array is not modified.
What can I do ? Do I have to split my_64_array and recombine it after ??
Seeing as how you are not able to update and/or change the API code. The best method is to pass the function a small temporary array that you then assign to your existing 64 byte array after the function call.
So that would be something like the following, not knowing the exact specifics of your API call.
the_64_array[31:31+16] = read_device(16)
It's precisely as you say, if you input a slice into a function it creates a reference copy of the slice.
Two possible methods to add it later (assuming read_device returns the relevant slice):
my_64_array = my_64_array[:32] + read_device(my_64_array[31:31+16]) + my_64_array[31+16:]
# equivalently, but around 33% faster for even small arrays (length 10), 3 times faster for (length 50)...
my_64_array[31:31+16] = read_device(my_64_array[31:31+16])
So I think you should be using the latter.
.
If it was a modifiable function (but it's not in this case!) you could be to change your functions arguments (one is the entire array):
def read_device(the_64_array, start=31, end=47):
# some code
the_64_array[start:end] = ... #modifies `in_array` in place
and call read_device(my_64_array) or read(my_64_array, 31, 31+16).
When reading a list subset you're calling __getitem__ with a slice(x, y) argument of that list. In your case these statements are equal:
my_64_array[31:31+16]
my_64_array.__getitem__(slice(31, 31+16))
This means that the __getitem__ function can be overridden in a subclass to obtain different behaviour.
You can also set the same subset using a[1:3] = [1,2,3] in which case it'd call a.__setitem__(slice(1, 3), [1,2,3])
So I'd suggest either of these:
pass the list (my_64_array) and a slice object to read_device instead of passing the result of __getitem__, after which you could read the necessary data and set the corresponding offsets. No subclassing. This is probably the best solution in terms of readability and ease of development.
subclassing list, overriding __getitem__ and __setitem__ to return instances of that subclass with a parent reference, and then change all modifying or reading methods of a list to reference a parent list instead. This might be a little tricky if you're new to python, but basically, you'd exploit that python list properties are largely defined by the methods inside a list instance. This is probably better in terms of performance as you can create references.
If read_device returns the resulting list, and that list is of equal size, you can do this: a[x:y] = read_device(a[x:y])

How do you paint a stroke in Gimp with Python-fu?

I'm using Python-fu's gimp.pdb.gimp_paintbrush_default(layer, 2, [10,10, 20,20]), but no matter how many strokes I tell it to paint, it only ever paints the first (x,y) (in this case, (10,10)). Is it expecting a different format? The documentation for the function isn't for the Python plugin, and simply says that the third parameter expects a variable of type FLOATARRAY. I assume the Python version uses a list here, but it doesn't seem to look ahead to any values after the first two. How can I get it to paint more than one control point?
The second parameter you pass - in this case "2" indicates to GIMP the length of the list in the following parameter - although, when coding in Python we are used that the called function don't have a problem finding the length of a list, these calls in GIMP Python scripting are a 1:1 mapping to the GIMP API for several other languages, and are written in C.
In C there is no way for one to know the length of an array passed, unless it is explicitly passed, so, there is the need for this parameter.
Try doing this instead:
points = [10,10, 20,20]
pdb.gimp_paintbrush_default(layer, len(points), points)

Vector in python

I'm working on this project which deals with vectors in python. But I'm new to python and don't really know how to crack it. Here's the instruction:
"Add a constructor to the Vector class. The constructor should take a single argument. If this argument is either an int or a long or an instance of a class derived from one of these, then consider this argument to be the length of the Vector instance. In this case, construct a Vector of the specified length with each element is initialized to 0.0. If the length is negative, raise a ValueError with an appropriate message. If the argument is not considered to be the length, then if the argument is a sequence (such as a list), then initialize with vector with the length and values of the given sequence. If the argument is not used as the length of the vector and if it is not a sequence, then raise a TypeError with an appropriate message.
Next implement the __repr__ method to return a string of python code which could be used to initialize the Vector. This string of code should consist of the name of the class followed by an open parenthesis followed by the contents of the vector represented as a list followed by a close parenthesis."
I'm not sure how to do the class type checking, as well as how to initialize the vector based on the given object. Could someone please help me with this? Thanks!
Your instructor seems not to "speak Python as a native language". ;) The entire concept for the class is pretty silly; real Python programmers just use the built-in sequence types directly. But then, this sort of thing is normal for academic exercises, sadly...
Add a constructor to the Vector class.
In Python, the common "this is how you create a new object and say what it's an instance of" stuff is handled internally by default, and then the baby object is passed to the class' initialization method to make it into a "proper" instance, by setting the attributes that new instances of the class should have. We call that method __init__.
The constructor should take a single argument. If this argument is either an int or a long or an instance of a class derived from one of these
This is tested by using the builtin function isinstance. You can look it up for yourself in the documentation (or try help(isinstance) at the REPL).
In this case, construct a Vector of the specified length with each element is initialized to 0.0.
In our __init__, we generally just assign the starting values for attributes. The first parameter to __init__ is the new object we're initializing, which we usually call "self" so that people understand what we're doing. The rest of the arguments are whatever was passed when the caller requested an instance. In our case, we're always expecting exactly one argument. It might have different types and different meanings, so we should give it a generic name.
When we detect that the generic argument is an integer type with isinstance, we "construct" the vector by setting the appropriate data. We just assign to some attribute of self (call it whatever makes sense), and the value will be... well, what are you going to use to represent the vector's data internally? Hopefully you've already thought about this :)
If the length is negative, raise a ValueError with an appropriate message.
Oh, good point... we should check that before we try to construct our storage. Some of the obvious ways to do it would basically treat a negative number the same as zero. Other ways might raise an exception that we don't get to control.
If the argument is not considered to be the length, then if the argument is a sequence (such as a list), then initialize with vector with the length and values of the given sequence.
"Sequence" is a much fuzzier concept; lists and tuples and what-not don't have a "sequence" base class, so we can't easily check this with isinstance. (After all, someone could easily invent a new kind of sequence that we didn't think of). The easiest way to check if something is a sequence is to try to create an iterator for it, with the built-in iter function. This will already raise a fairly meaningful TypeError if the thing isn't iterable (try it!), so that makes the error handling easy - we just let it do its thing.
Assuming we got an iterator, we can easily create our storage: most sequence types (and I assume you have one of them in mind already, and that one is certainly included) will accept an iterator for their __init__ method and do the obvious thing of copying the sequence data.
Next implement the __repr__ method to return a string of python code which could be used to initialize the Vector. This string of code should consist of the name of the class followed by an open parenthesis followed by the contents of the vector represented as a list followed by a close parenthesis."
Hopefully this is self-explanatory. Hint: you should be able to simplify this by making use of the storage attribute's own __repr__. Also consider using string formatting to put the string together.
Everything you need to get started is here:
http://docs.python.org/library/functions.html
There are many examples of how to check types in Python on StackOverflow (see my comment for the top-rated one).
To initialize a class, use the __init__ method:
class Vector(object):
def __init__(self, sequence):
self._internal_list = list(sequence)
Now you can call:
my_vector = Vector([1, 2, 3])
And inside other functions in Vector, you can refer to self._internal_list. I put _ before the variable name to indicate that it shouldn't be changed from outside the class.
The documentation for the list function may be useful for you.
You can do the type checking with isinstance.
The initialization of a class with done with an __init__ method.
Good luck with your assignment :-)
This may or may not be appropriate depending on the homework, but in Python programming it's not very usual to explicitly check the type of an argument and change the behaviour based on that. It's more normal to just try to use the features you expect it to have (possibly catching exceptions if necessary to fall back to other options).
In this particular example, a normal Python programmer implementing a Vector that needed to work this way would try using the argument as if it were an integer/long (hint: what happens if you multiply a list by an integer?) to initialize the Vector and if that throws an exception try using it as if it were a sequence, and if that failed as well then you can throw a TypeError.
The reason for doing this is that it leaves your class open to working with other objects types people come up with later that aren't integers or sequences but work like them. In particular it's very difficult to comprehensively check whether something is a "sequence", because user-defined classes that can be used as sequences don't have to be instances of any common type you can check. The Vector class itself is quite a good candidate for using to initialize a Vector, for example!
But I'm not sure if this is the answer your teacher is expecting. If you haven't learned about exception handling yet, then you're almost certainly not meant to use this approach so please ignore my post. Good luck with your learning!

Categories

Resources