Python 3 kwargs insight

Python 3 kwargs insight - python

This has been a source of confusion and frustration for years now. Say you import a particularly poorly documented module and some method that you need to you only has **kwargs for its arguments, how are you supposed to know what keys that method is checking for?
def test(**kwargs):
if 'greeting' in kwargs:
print(kwargs['greeting'])
If i were to call text, how would i know that 'greeting is something the method was looking for?
test(greeting='hi)
Some simplistic cases the IDE can help out with, but most use cases seem to be out of the IDE's scope

Think of kwargs as a dictionary. There is no way to tell from the outside what key-value combinations the method will accept (in your case the test method is essentially a black box) but this is the point of having documentation. Without kwargs, some function headers would get extremely cluttered.

Use documentation!
The subprocess-module's docs is a good example. If you are using a newer version of python (3.7 or 3.6 with backport), consider using dataclasses as an alternative to kwargs, if it fits your usecase.

If it's not documented, your only recourse is to read the source.

Adding a **kwargs argument to a function is used when you don't want to explicitly define the arguments which must be named.
A trivial example:
If a function takes as an argument another function which is undetermined and may have different kwargs each time
def foo(func,**kwargs):
print(func)
return func(**kwargs)
You won't know what the function is explicitly looking for.
You can have in your example
def foo(greeting=None):
which shows the function is looking for greeting but it can be None

Related

**kwargs.pop(x) versus defining x as a function parameter

Lets say you're writing a child class that has a constructor that passes its unused kwargs up to the parent constructor, but your class has the argument x that it needs to store that shouldn't be passed to the parent.
I have seen two different approaches to this:
def __init__(self, **kwargs):
self.x = kwargs.pop('x', 'default')
super().__init__(**kwargs)
and
def __init__(self, x='default', **kwargs):
self.x = x
super().__init__(**kwargs)
Is there every any functional difference between these two constructors? Is there any reason to use one over the other?
The only difference I can see is that the second form, which defines x in the signature, allows the user to better see it as a possible argument, or an IDE to offer it as an autocomplete option. Or in Python 3.5+, you could add a type annotation to x. Does that make the first form objectively worse?

As already mentionned by Giacomo Alzetta in a comment, the second version allow to pass x as a positional argument when the first only allow named arguments, IOW with the second form you can use both Child(x=2) AND Child(2), while the first only supports Child(x=2).
Also, when using inspection to check the method's signature, the second form will clearly mention the existance of the x param, while the first won't.
And finally, the second version will yield a slightly clearer exception if x is not passed.
And that's for the functional differences.
Is there any reason to use one over the other?
Well... As a general rule, it's cleaner (best practice) to use explicit arguments whenever possible, even if only for readability, and from experience it does usually make maintenance easier indeed. So from this point of view, the second form can be seen as "objectively better" than the first.
This being said, when the parent method has dozens of mostly optional and rarely used arguments (django.forms.Form, I'm lookig at you) AND you also want to preserve positional arguments order, it can be convenient to just use the generic *args, **kwargs signature for the child and force the additional param(s) to be passed as kwargs. Assuming you clearly document this in the docstring, it's still explicit enough (as far as I'm concerned, YMMV), and also avoids a lot of clutter (you can have a look at django.forms.Form for a concrete example of what I mean here).
So as always with "best practices" and other golden rules, you have to understand and weight the pros and cons wrt/ the concrete case at hand.
PS: just to make things clear, django's Form class signature makes perfect sense so I'm not ranting here - it's just one of those cases where there's no "beautiful" solution to the problem, period.

Aside obvious differences in code clarity, there might be a little difference in speed of calling the function, in this case method init().
If you can, define all necessary arguments with default values if you have some, in both methods, and pass them classically, and exclude ones you do not wish.
In this way you make the code clear and speed of calls stays the same.
If you need some micro-optimization, then use timeit to check what works faster.
I expect that one with the "x" added as an argument will perhaps be a winner.
Because getting to its value directly from local variables will be faster and kwargs dict() is smaller.
When you use "normal" arguments, they are automatically inserted into the functions local variables dictionary.
When you use *args and/or **kwargs they are additional tuple() and/or dict() added as new local variables. They are first created from the arguments you passed into the function call.
When you are passing them to a next function, they are extracted
to match that function's call signature. In both operations you lose a tiny bit of speed.
If you add removing something from the kwargs dictionary, ( x = kwargs.pop("x") ), you also lose some speed.
By observing both codes, it seems that their call speed would be equal. But you should check. If you do not need an extra 0.000001 seconds when initializing your instances, then both options are fine and just choose what you like most.
But again, if you are free to do it, and if it will not greatly impair the code's maintenance, define all arguments and their default values and pass them on one-by-one.

What do the args and kwargs do in pandas.DataFrame.clip?

I've been working on the documentation for pandas.DataFrame.clip. I need to document what the *args and **kwargs do for that function.
Here is a link to the branch I am working on. The *args and **kwargs are passed to a function called validate_clip_with_axis. Here is the code for that function.
I'm not really sure what validate_clip_with_axis is doing or how the *args and **kwargs play a role in pandas.DataFrame.clip. In particular, I'm not even sure what sorts of argument I can include in *args and **kwargs.
What does validate_clip_with_axis do? How does it relate to pandas.DataFrame.clip? Could someone provide me with an example?

They seem to be used for compatibility with numpy libraries [1] in this file here.
In the original file, args, kwargs are being passed into nv.validate_clip_with_axis. Note that nv is imported here.
Since these are only used internally, and, as jpp pointed out, not even exposed in the Pandas docs, you probably don't need to worry about documenting them.
[1] https://github.com/pandas-dev/pandas/blob/fb556ed64cd0e905e31fe39723a8a4bca9cb112d/pandas/compat/numpy/function.py#L1-L19

Python Method Signature for Different Runtime Execution Data

Could someone tell me whether this idea is feasible in Python?
I want to have a method and the datatype of the signature is not fixed.
For example:
Foo(data1, data2) <-- Method Definition in Code
Foo(2,3) <---- Example of what would be executed in runtime
Foo(s,t) <---- Example of what would be executed in runtime
I know the code could work if i change the Foo(s,t) to Foo("s","t"). But I am trying to make the code smarter to recognize the command without the "" ...

singledispatch might be an answer, which transforms a function into a generic function, which can have different behaviors depending upon the type of its first argument.
You could see a concrete example in the above link. And you should do some special things if you want to do generic dispatch on more than one arguments.

How can a Python function determine if (and how) a kwarg default was explicitly passed?

Suppose a function spam has signature spam(ham=None). The following three calls will all cause the local variable ham in spam's namespace to have value None:
spam()
spam(None)
spam(ham=None)
How can spam find out which of these three alternatives was actually used?

It can't. This question describes a way to use a decorator to wrap the function and set the passed arguments as attributes on it. But there is no way to find out from within spam without help from outside. You have to intercept the call with a function that accepts **kwargs and use that to store the information.
However, you should be cautious of doing this. The different ways of passing in arguments are supposed to work the same. If you do something that makes them work differently, you will confuse many people who try to use your function.

SQLAlchemy sqlalchemy.sql.expression.select vs. sqlalchemy.sql.expression.Select

So I'm brand new to SQLAlchemy, and I'm trying to use the SQL Expression API to create a SELECT statement that specifies the exact columns to return. I found both a class and a function defined in the sqlalchmey.sql.expressions module and I'm not too sure which to use... Why do they have both a class and a function? When would you use one over the other? And would anyone be willing to explain why they need to have both in their library? It doesn't really make much sense to me to be honest, other than just to confuse me. :) JK
Thanks for the help in advance!

Use the source.
Here's the implementation of the select function, from the source code:
def select(columns=None, whereclause=None, from_obj=[], **kwargs):
"""Returns a ``SELECT`` clause element.
(... long docstring ...)
"""
return Select(columns, whereclause=whereclause, from_obj=from_obj, **kwargs)
So, it is exactly the same.

the expression package provides Python functions to do everything. These functions in some cases return a class instance verbatim from the function's arguments and other times compose an object from several components. It was originally the idea that the functions would be doing a lot more composition than they ended up doing in the end. In any case, the package prefers to stick to pep-8 as far as classes being in CamelCase, functions being all lowercase, and wanted the front end API to be all lower case - so you have the public "constructor" functions.
The SQL expression language is very easy to grok if you start with the tutorial.

I think it's pretty much the same. The documentation says for select (the function):
The returned object is an instance of Select.
As you can pass the select function the same parameters that Select.__init__() accepts, I don't really see a difference. At first glance the arguments of the class constructor seem to be a superset of the function's. But the function can be passed any of the constructor's keyword arguments.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.