Related
I'm trying to use the pyspark applyInPandas in my python code. Problem is, the function that I want to pass to it exists in the same class, and so it is defined as def func(self, key, df). This becomes an issue because applyInPandas will error out saying I'm passing too many arguments to the underlying func (at most it allows a key and df params, so the self is causing the issue). Is there any way around this?
The underlying goal is to process a pandas function on dataframe groups in parallel.
As OP mentioned, one way is to just use #staticmethod, which may not be desirable in some cases.
The pyspark source code for creating pandas_udf uses inspect.getfullargspec().args (line 386, 436), this includes self even if the class method is called from the instance. I would think this is a bug on their part (maybe worthwhile to raise a ticket).
To overcome this, the easiest way is to use functools.partial which can help change the argspec, i.e. remove the self argument and restore the number of args to 2.
This is based on the idea that calling an instance method is the same as calling the method directly from the class and supply the instance as the first argument (because of the descriptor magic):
A.func(A(), *args, **kwargs) == A().func(*args, **kwargs)
In a concrete example,
import functools
import inspect
class A:
def __init__(self, y):
self.y = y
def sum(self, a: int, b: int):
return (a + b) * self.y
def x(self):
# calling the method using the class and then supply the self argument
f = functools.partial(A.sum, self)
print(f(1, 2))
print(inspect.getfullargspec(f).args)
A(2).x()
This will print
6 # can still use 'self.y'
['a', 'b'] # 2 arguments (without 'self')
Then, in OP's case, one can simply do the same for key, df parameters:
class A:
def __init__(self):
...
def func(self, key, df):
...
def x(self):
f = functools.partial(A.func, self)
self.df.groupby(...).applyInPandas(f)
I have the follow init function who receives a lot of args to run the class (the args are default values if the user don't input anything or can be a value inputed by the user). What's the most elegant way to reduce the number of variables (not show a lot of args in the init) without lose readability? Use the *args function (like def__init__(self, *args))?
class World(object):
def __init__(self, grid_size=(GRID_WIDTH, GRID_HEIGHT),
cell_size=(CELL_WIDTH, CELL_HEIGHT),
obstacles_position= OBSTACLES,
recharge_position= RECHARGE_ZONE,
treadmill_position= TREADMILL_ZONE,
workers_positions= WORKERS_POS,
delivery_positions= DELIVERY_ZONE):
# some code bellow
def main():
# init some libraries
world = worldGrid()
# Do a while loop with the input variables from the world class
if __name__ = '__main__':
main()
Obs: I'm using Python 3+
In my opinion, you should probably stick with all of the function parameters in the function header (as you currently have it). This makes your code more readable, allows Python to tell you which arguments you may have omitted, plays nicely with Python's built-in help() method, allows third-party IDE code hinting, etc., etc...
If you really want to shorten the function header, you could use *args and **kwargs which will take any variadic arguments, e.g.:
def func(self, *args, **kwargs):
print("args:", args)
print("kwargs:", kwargs)
Usage would look like this:
>>> func(1, 2, 3, one="one", two="two")
args: (2, 3)
kwargs: {'one': 'one', 'two': 'two'}
Therefore, you could theoretically refactor your class to look something like below. This code doesn't handle default values or any error checking at all -- it just sets any keyword-arguments as attributes on the class itself:
class World(object):
def __init__(self, **kwargs):
for key, value in kwargs.items():
setattr(self, key, value)
And usage:
>>> w = World(one=1, two=2, three=3)
>>> w.one
1
>>> w.two
2
>>> w.three
3
If I have a class like this:
class foo(object):
def __init__(self, a, b=None, c=None, d=None):
print a, b, c, d
and a derived class like this:
class bar(foo):
def __init__(self, *args, **kwargs):
if "c" in kwargs:
kwargs['c'] = 'else' # CHANGE C IFF IT IS PRESENT
super(bar, self).__init__(*args, **kwargs)
when someone calls this constructor, they could do it like this:
bar('a','b','c','d')
or they could call it like this:
bar('a', c='something')
In the second case, my constructor works as planned, but in the case of the first call c sneaks thru in the args array. This looks like I would have to watch the length of the args array as well as kwargs, and that seems brittle to the point of unusable. Is there anything you can do to make this situation better, other than just enumerate the arguments from foo in bar? (A somewhat brittle practice itself, but easier to recognize).
How about populating kwargs with args?
class bar(foo):
def __init__(self, *args, **kwargs):
for name, value in zip(['a', 'b', 'c', 'd'], args): # <---
kwargs[name] = value # <---
args = () # <---
if "c" in kwargs:
kwargs['c'] = 'else'
super(bar, self).__init__(*args, **kwargs)
UPDATE
Alternative that use inspect.getcallargs:
import inspect
class bar(foo):
def __init__(self, *args, **kwargs):
kwargs = inspect.getcallargs(foo.__init__, self, *args, **kwargs)
kwargs.pop('self')
if 'c' in kwargs:
kwargs['c'] = 'else'
super(bar, self).__init__(**kwargs)
This is more brittle than you think. Have you considered the situation where someone passes a keyword argument not present in foo.__init__'s argument list (e.g. bar('a', f='something'))? My recommendation would be requiring bar to take keyword arguments only and then filter the keys not present in foo.__init__'s argument list (which you can determine via introspection using inspect.getargspec [or related functions for newer versions of Python, particularly the Signature and Parameter objects starting in 3.3] if the arguments may change). That in itself has its own form of brittleness, of course, as a programmer using bar would need to know the relevant argument names for foo's constructor, but depending on what bar is being used for they may need to know what arguments foo takes anyway, and when someone knows that they usually know what the names of the arguments are as well.
Now that I'm looking at inspect again, I realize that there's another method you could use: inspect.getcallargs, which might be more useful for you. With this, you could do, say, inspect.getcallargs(super(bar, self).__init__, *[self, 1], **{'c':3, 'd':4}) and obtain the following dict: {'a': 1, 'self': <bar object>, 'b': None, 'c': 3, 'd': 4}. Then you can modify that dictionary and supply it as super(bar, self).__init__(**fixed_dict) or something of that sort. You'd still have the issue with keyword arguments not present in foo.__init__'s argument list, though (getcallargs raises the same errors foo.__init__ will when passed invalid arguments).
I am trying to understand the use of *args and **kwds when creating subclasses in Python.
I want to understand why this code behaves the way it does. If I leave out the *args and **kwds in a call to super().__init__, I get some strange argument unpacking.
Here is my test case:
class Animal(object):
def __init__(self, moves, num_legs):
self.moves = moves
self.num_legs = num_legs
def describe(self):
print "Moves :{} , num_legs : {}".format(self.moves, self.num_legs)
class Snake(Animal):
def __init__(self, poisonous, *args, **kwds):
self.poisonous = poisonous
print "I am poisonous:{}".format(self.poisonous)
# This next line is key. You have to use *args , **kwds.
# But here I have deliberately used the incorrect form,
# `args` and `kwds`, and am suprised at what it does.
super(Snake, self).__init__(args, kwds)
Now, when I create instances of the Snake subclass, which contains the erroneous call to super(…).__init__ (where I use args and kwds instead of *args and **kwds), I get some interesting “argument unpacking”.
s1 = Snake(False, moves=True, num_legs=0)
s2 = Snake(poisonous=False, moves=True, num_legs=1)
s3 = Snake(False, True, 3)
s1.describe()
s2.describe()
s3.describe()
What I get is:
Moves :() , num_legs : {'moves': True, 'num_legs': 0}
Moves :() , num_legs : {'moves': True, 'num_legs': 1}
Moves :(True, 3) , num_legs : {}
So why is it that in s1 and s2, __init__ assumes that moves = True and num_legs = 0 or 1 are keyword arguments, and sets the num_legs to a dict?
In s3, it unpacks both of the variables to moves (in class Animal) as a tuple.
I stumbled into this as I was trying to understand argument unpacking. Sorry in advance—I don't know how to frame this question any better.
In Snake.__init__, args is a tuple of all positional arguments after poisonous and kwds is a dict of all the keyword arguments apart from poisonous. By calling
super(Snake,self).__init__(args,kwds)
you assign args to moves and kwds to num_legs in Animal.__init__. That’s exactly what you are seeing in your output.
The first two calls don’t have any positional arguments apart from poisonous, so args and consequential moves is an empty tuple. The third call has no keyword arguments, so kwds and consequential num_legs is an empty dict.
In short: def __init__(self,poisonous,*args,**kwds): means: capture positional arguments in a tuple args and keyword arguments in a dictionary kwds. Similarly, super(Snake,self).__init__(*args, **kwds) means: unpack the tuple args and the dictionary kwds into arguments so that they're passed separately to __init__.
If you don't use the * and ** then you're passing args and kwds as they are, which means you're getting a tuple and a dictionary.
As you've said, you'd need to write:
super(Snake,self).__init__(*args, **kwds)
to properly pack / unpack the arguments. In your current code you're not packing / unpacking the arguments so it sets num_legs to a dictionary as that's what kwds is at the moment.
If you don't give the arguments names then they're positional arguments. Hence Snake(False,True,3) are all positional arguments.
If you do give the arguments names then they're keyword arguments: Snake(poisonous=False,moves=True,num_legs=1).
In the first case you're combining both one positional argument and two keyword arguments: Snake(False,moves=True,num_legs=0).
A variability is nicer and more intuitive than this Snake(False, True, 3):
Snake("Python", constrictor=True, poisonous=False)
Animal("Snail") # Snail has a foot but no leg. Defaults are good for it.
# Cobra eat other snakes, including poisonous, fast attacks, snake fights.
Snake("Indian cobra", moves=True, poisonous=True)
Animal("Myriapod", num_legs=750) # Changes for an idividual after every molting.
Oh, really exciting question about Python, not only about programming. :)
It is a good idea to have the most individual parameters on the first places, that are common for all subclasses, like the universal "self" itself is. The next very common is a name like in this example.
If you believe that your classes will be never modified and they will be used everytimes with all implemented parameters and you will never make a mistake in the correct order, you need not any variability. You can continue to use fixed positional parameters as you are used. This assumption is frequently not fulfilled. Tomorrow will nobody remember what should be the first False and the second True without seeing it together with keywords.
If you need to call your class with fixed positional parameters by Snake(False, True, 3) you can not use **kwds for any of these parameters.
A)
Let we now expect it that your example Snake(False, True, 3) is a required test case. Then you can't use **kwds for anything of your positional parameters (poisonous, moves, num_legs). You have only these four possibilities of implementation __init__ header: (none good enough)
# the most fragile solution - easy extensible, not easy to observe the order
class Snake(Animal):
def __init__(self, *args):
self.poisonous = args.pop[0]
# or better ...pop[-1] that allows adding new parameters to the end
super(Snake,self).__init__(*args)
# now is args undefined if ancestors could eat parts from it but
# everything is in self
# the most naive solution - easy readable, not easy extensible because not DRY
class Snake(Animal):
def __init__(self, poisonous, moves, num_legs):
self.poisonous = poisonous
super(Snake,self).__init__(moves, num_legs)
# anythig between them combines disadvantages of both previous
class Snake(Animal):
def __init__(self, poisonous, *args):
self.poisonous = poisonous
super(Snake,self).__init__(*args)
class Snake(Animal):
def __init__(self, poisonous, moves, *args):
self.poisonous = poisonous
super(Snake,self).__init__(moves, *args)
.
B)
Keyword parameters are more robust because some their errors can be automatically reported.
Expect that you redefine Animal to increase its variablility:
class Animal(object):
def __init__(self,name, moves=True, num_legs=None):
self.name = name
self.moves = moves
self.num_legs = num_legs
# The recommended Snail !
class Snake(Animal):
def __init__(self, *args, **kwds):
"""Snake: Implements.. (Docs important, otherwise real keywords not seen in help)
kwds: (only what defined here)
poisonous: Bla bla. default=True
constrictor: Bla bla bla. default=False
"""
# A copy of kwds can be created, if manipulation with original is prohibited.
self.poisonous = kwds.pop('poisonous', True) # default: poisonous snake
self.constrictor = kwds.pop('constrictor', False)
# OK. This reports error if some keyword is misspelled and will not be consumed.
super(Snake,self).__init__(*args, **kwds)
# This Snake is more readable, but its descendants would be more complicated,
# otherwise is possible: "TypeError: got multiple values for keyword argument 'xy'".
class Snake(Animal):
def __init__(self, name, poisonous=True, constrictor=False, *args, **kwds):
self.poisonous = poisonous
self.constrictor = constrictor
super(Snake,self).__init__(name, *args, **kwds)
Now you have a big variability and the order of keyword arguments is not important.
Suppose I have a generic function f. I want to programmatically create a function f2 that behaves the same as f, but has a customized signature.
More detail
Given a list l and and dictionary d I want to be able to:
Set the non-keyword arguments of f2 to the strings in l
Set the keyword arguments of f2 to the keys in d and the default values to the values of d
ie. Suppose we have
l = ["x", "y"]
d = {"opt": None}
def f(*args, **kwargs):
# My code
Then I would want a function with signature:
def f2(x, y, opt=None):
# My code
A specific use case
This is just a simplified version of my specific use case. I am giving this as an example only.
My actual use case (simplified) is as follows. We have a generic initiation function:
def generic_init(self, *args, **kwargs):
"""Function to initiate a generic object"""
for name, arg in zip(self.__init_args__, args):
setattr(self, name, arg)
for name, default in self.__init_kw_args__.items():
if name in kwargs:
setattr(self, name, kwargs[name])
else:
setattr(self, name, default)
We want to use this function in a number of classes. In particular, we want to create a function __init__ that behaves like generic_init, but has the signature defined by some class variables at creation time:
class my_class:
__init_args__ = ["x", "y"]
__kw_init_args__ = {"my_opt": None}
__init__ = create_initiation_function(my_class, generic_init)
setattr(myclass, "__init__", __init__)
We want create_initiation_function to create a new function with the signature defined using __init_args__ and __kw_init_args__. Is it possible to write create_initiation_function?
Please note:
If I just wanted to improve the help, I could set __doc__.
We want to set the function signature on creation. After that, it doesn't need to be changed.
Instead of creating a function like generic_init, but with a different signature we could create a new function with the desired signature that just calls generic_init
We want to define create_initiation_function. We don't want to manually specify the new function!
Related
Preserving signatures of decorated functions: This is how to preserve a signature when decorating a function. We need to be able to set the signature to an arbitrary value
From PEP-0362, there actually does appear to be a way to set the signature in py3.3+, using the fn.__signature__ attribute:
from inspect import signature
from functools import wraps
def shared_vars(*shared_args):
"""Decorator factory that defines shared variables that are
passed to every invocation of the function"""
def decorator(f):
#wraps(f)
def wrapper(*args, **kwargs):
full_args = shared_args + args
return f(*full_args, **kwargs)
# Override signature
sig = signature(f)
sig = sig.replace(parameters=tuple(sig.parameters.values())[1:])
wrapper.__signature__ = sig
return wrapper
return decorator
Then:
>>> #shared_vars({"myvar": "myval"})
>>> def example(_state, a, b, c):
>>> return _state, a, b, c
>>> example(1,2,3)
({'myvar': 'myval'}, 1, 2, 3)
>>> str(signature(example))
'(a, b, c)'
Note: the PEP is not exactly right; Signature.replace moved the params from a positional arg to a kw-only arg.
For your usecase, having a docstring in the class/function should work -- that will show up in help() okay, and can be set programmatically (func.__doc__ = "stuff").
I can't see any way of setting the actual signature. I would have thought the functools module would have done it if it was doable, but it doesn't, at least in py2.5 and py2.6.
You can also raise a TypeError exception if you get bad input.
Hmm, if you don't mind being truly vile, you can use compile()/eval() to do it. If your desired signature is specified by arglist=["foo","bar","baz"], and your actual function is f(*args, **kwargs), you can manage:
argstr = ", ".join(arglist)
fakefunc = "def func(%s):\n return real_func(%s)\n" % (argstr, argstr)
fakefunc_code = compile(fakefunc, "fakesource", "exec")
fakeglobals = {}
eval(fakefunc_code, {"real_func": f}, fakeglobals)
f_with_good_sig = fakeglobals["func"]
help(f) # f(*args, **kwargs)
help(f_with_good_sig) # func(foo, bar, baz)
Changing the docstring and func_name should get you a complete solution. But, uh, eww...
I wrote a package named forge that solves this exact problem for Python 3.5+:
With your current code looking like this:
l=["x", "y"]
d={"opt":None}
def f(*args, **kwargs):
#My code
And your desired code looking like this:
def f2(x, y, opt=None):
#My code
Here is how you would solve that using forge:
f2 = forge.sign(
forge.arg('x'),
forge.arg('y'),
forge.arg('opt', default=None),
)(f)
As forge.sign is a wrapper, you could also use it directly:
#forge.sign(
forge.arg('x'),
forge.arg('y'),
forge.arg('opt', default=None),
)
def func(*args, **kwargs):
# signature becomes: func(x, y, opt=None)
return (args, kwargs)
assert func(1, 2) == ((), {'x': 1, 'y': 2, 'opt': None})
Have a look at makefun, it was made for that (exposing variants of functions with more or less parameters and accurate signature), and works in python 2 and 3.
Your example would be written like this:
try: # python 3.3+
from inspect import signature, Signature, Parameter
except ImportError:
from funcsigs import signature, Signature, Parameter
from makefun import create_function
def create_initiation_function(cls, gen_init):
# (1) check which signature we want to create
params = [Parameter('self', kind=Parameter.POSITIONAL_OR_KEYWORD)]
for mandatory_arg_name in cls.__init_args__:
params.append(Parameter(mandatory_arg_name, kind=Parameter.POSITIONAL_OR_KEYWORD))
for default_arg_name, default_arg_val in cls.__opt_init_args__.items():
params.append(Parameter(default_arg_name, kind=Parameter.POSITIONAL_OR_KEYWORD, default=default_arg_val))
sig = Signature(params)
# (2) create the init function dynamically
return create_function(sig, generic_init)
# ----- let's use it
def generic_init(self, *args, **kwargs):
"""Function to initiate a generic object"""
assert len(args) == 0
for name, val in kwargs.items():
setattr(self, name, val)
class my_class:
__init_args__ = ["x", "y"]
__opt_init_args__ = {"my_opt": None}
my_class.__init__ = create_initiation_function(my_class, generic_init)
and works as expected:
# check
o1 = my_class(1, 2)
assert vars(o1) == {'y': 2, 'x': 1, 'my_opt': None}
o2 = my_class(1, 2, 3)
assert vars(o2) == {'y': 2, 'x': 1, 'my_opt': 3}
o3 = my_class(my_opt='hello', y=3, x=2)
assert vars(o3) == {'y': 3, 'x': 2, 'my_opt': 'hello'}
You can't do this with live code.
That is, you seem to be wanting to take an actual, live function that looks like this:
def f(*args, **kwargs):
print args[0]
and change it to one like this:
def f(a):
print a
The reason this can't be done--at least without modifying actual Python bytecode--is because these compile differently.
The former results in a function that receives two parameters: a list and a dict, and the code you're writing operates on that list and dict. The second results in a function that receives one parameter, and which is accessed as a local variable directly. If you changed the function "signature", so to speak, it'd result in a function like this:
def f(a):
print a[0]
which obviously wouldn't work.
If you want more detail (though it doesn't really help you), a function that takes an *args or *kwargs has one or two bits set in f.func_code.co_flags; you can examine this yourself. The function that takes a regular parameter has f.func_code.co_argcount set to 1; the *args version is 0. This is what Python uses to figure out how to set up the function's stack frame when it's called, to check parameters, etc.
If you want to play around with modifying the function directly--if only to convince yourself that it won't work--see this answer for how to create a code object and live function from an existing one to modify bits of it. (This stuff is documented somewhere, but I can't find it; it's nowhere in the types module docs...)
That said, you can dynamically change the docstring of a function. Just assign to func.__doc__. Be sure to only do this at load time (from the global context or--most likely--a decorator); if you do it later on, tools that load the module to examine docstrings will never see it.
Maybe I didn't understand the problem well, but if it's about keeping the same behavior while changing the function signature, then you can do something like :
# define a function
def my_func(name, age) :
print "I am %s and I am %s" % (name, age)
# label the function with a backup name
save_func = my_func
# rewrite the function with a different signature
def my_func(age, name) :
# use the backup name to use the old function and keep the old behavior
save_func(name, age)
# you can use the new signature
my_func(35, "Bob")
This outputs :
I am Bob and I am 35
We want create_initiation_function to change the signature
Please don't do this.
We want to use this function in a number of classes
Please use ordinary inheritance.
There's no value in having the signature "changed" at run time.
You're creating a maintenance nightmare. No one else will ever bother to figure out what you're doing. They'll simply rip it out and replace it with inheritance.
Do this instead. It's simple and obvious and makes your generic init available in all subclasses in an obvious, simple, Pythonic way.
class Super( object ):
def __init__( self, *args, **kwargs ):
# the generic __init__ that we want every subclass to use
class SomeSubClass( Super ):
def __init__( self, this, that, **kwdefaults ):
super( SomeSubClass, self ).__init__( this, that, **kwdefaults )
class AnotherSubClass( Super ):
def __init__( self, x, y, **kwdefaults ):
super( AnotherSubClass, self ).__init__( x, y, **kwdefaults )
Edit 1: Answering new question:
You ask how you can create a function with this signature:
def fun(a, b, opt=None):
pass
The correct way to do that in Python is thus:
def fun(a, b, opt=None):
pass
Edit 2: Answering explanation:
"Suppose I have a generic function f. I want to programmatically create a function f2 that behaves the same as f, but has a customised signature."
def f(*args, **kw):
pass
OK, then f2 looks like so:
def f2(a, b, opt=None):
f(a, b, opt=opt)
Again, the answer to your question is so trivial, that you obviously want to know something different that what you are asking. You really do need to stop asking abstract questions, and explain your concrete problem.