Accept different types in python function? - python

I have a Python function that does a lot of major work on an XML file.
When using this function, I want two options: either pass it the name of an XML file, or pass it a pre-parsed ElementTree instance.
I'd like the function to be able to determine what it was given in its variable.
Example:
def doLotsOfXmlStuff(xmlData):
if (xmlData != # if xmlData is not ET instance):
xmlData = ET.parse(xmlData)
# do a bunch of stuff
return stuff
The app calling this function may need to call it just once, or it may need to call it several times. Calling it several times and parsing the XML each time is hugely inefficient and unnecessary. Creating a whole class just to wrap this one function seems a bit overkill and would end up requiring some code refactoring. For example:
ourResults = doLotsOfXmlStuff(myObject)
would have to become:
xmlObject = XMLProcessingObjectThatHasOneFunction("data.xml")
ourResult = xmlObject.doLotsOfXmlStuff()
And if I had to run this on lots of small files, a class would be created each time, which seems inefficient.
Is there a simple way to simply detect the type of the variable coming in? I know a lot of Pythoners will say "you shouldn't have to check" but here's one good instance where you would.
In other strong-typed languages I could do this with method overloading, but that's obviously not the Pythonic way of things...

The principle of "duck typing" is that you shouldn't care so much about the specific type of an object but rather you should check whether is supports the APIs in which you're interested.
In other words if the object passed to your function through the xmlData argument contains some method or attribute which is indicative of an ElementTree that's been parsed then you just use those methods or attributes ... if it doesn't have the necessary attribute then you are free to then pass it through some parsing.
So functions/methods/attributes of the result ET are you looking to use? You can use hasattr() to check for that. Alternatively you can wrap your call to any such functionality with a try: ... except AttributeError: block.
Personally I think if not hasattr(...): is a bit cleaner. (If it doesn't have the attribut I want, then rebind the name to something which has been prepared, parsed, whatever as I need it).
This approach has advantages over isinstance() because it allows users of your functionality to pass references to objects in their own classes which have extended ET through composition rather than inheritance. In other words if I wrap an ET like object in my own class, and expose the necessary functionality then I should be able to pass reference s to your function and have you just treat my object as if it were a "duck" even if it wasn't a descendant of a duck. If you need feathers, a bill, and webbed feet then just check for one of those and try to use the rest. I may be a black box containing a duck and I may have provided holes through which the feet, duck-bill, and feathers are accessible.

This is a fairly normal pattern (e.g. Python function that accepts file object or path). Just use isinstance:
def doLotsOfXmlStuff(xmlData):
if not isinstance(xmlData, ET):
xmlData = ET.parse(xmlData)
...
If you need to do cleanup (e.g. closing files) then calling your function recursively is OK:
def doLotsOfXmlStuff(xmlData):
if not isinstance(xmlData, ET):
xmlData = ET.parse(xmlData)
ret = doLotsOfXmlStuff(xmlData)
... # cleanup (or use a context manager)
return ret
...

You can use isinstance to determine type of variable.

Can you try to put an if statement to check the type and determine what to run from there?
if type(xmlData).__name__=='ElementTree':
#do stuff
else:
#do some other stuff

I think you can just compare the data types:
if (xmlData.dtype==something):
call Function1
else:
call Function2

Related

How to make deep copy of Python class?

I'd like to make a copy of class, while updating all of its methods to refer a new set of __globals__
I was thinking something like below, however unlike types.FunctionType, the constructor for types.UnboundMethodType does not accept __globals__, any suggestions how to work around this?
def copy_class(old_class, new_module):
"""Copies a class, updating __globals__ of all methods to point to new_module"""
new_dict = {}
for name, entry in old_class.__dict__.items():
if isinstance(entry, types.UnboundMethodType):
entry = types.UnboundMethodType(name, None, old_class.__class__, globals=new_module.__dict__)
new_dict[name] = entry
return type(old_class.name, old_class.__bases__, new_dict)
The __dict__ values are functions, not unbound methods. The unbound method objects only get created on attribute access. If you are seeing unbound method objects in the __dict__, something weird happened with your class object before this function got to it.
I don't know about you, but I generally don't like to use types for anything other than type checking (which I don't do very often ;-). I'd much rather inspect...
I have to preface this code by saying that I hope you have a really good reason for wanting to do this ;-) -- to me, it seems like just subclassing and overriding class properties should get the job done much more elegantly ... However, If you really want to copy a class -- Why not just execute it's source again in the new namespace?
I've put together the following simple modules:
# test.py
# Just some test data
FOO = 1
class Bar(object):
def subclass_method(self):
print('Hello World!')
class Foo(Bar):
def method(self):
return FOO
And then something to do the heavy lifting:
import sys
import inspect
def copy_class(cls, new_globals):
source = inspect.getsource(cls)
globs = {}
globs.update(sys.modules[cls.__module__].__dict__)
globs.update(new_globals)
exec source in globs
return globs[cls.__name__]
# Check that it works...
import test
NewFoo = copy_class(test.Foo, {'FOO': 2})
print NewFoo().method()
NewFoo().subclass_method()
print test.Foo().method()
test.Foo().subclass_method()
This has some possibly desirable properties and undesirable... First, it only works on classes that are inspectable. That's pretty much anything user-defined so probably not too restrictive... It also might be a bit slower than other solutions that don't involve re-parsing the source string -- But again, it doesn't seem like this should be executed too frequently, so that's probably Ok.
Now the "advantages"...
If a global is requested by a function but not supplied, this will use the global from the old namespace. If this behavior isn't desireable (i.e. you'd rather have the NameError), you can modify the function easily to remove it.
The "copy" doesn't inherit from the original. For most purposes, that probably doesn't matter, but it's a bit weird to have the copy of something inherit from the original ...
Some people might see the exec in here and immediately think "Oh no! exec!?!?! The world is about to end!!!". Franky, that's a good default response. However, I argue that if you're copying a function that you plan to use later in the code, it is no more safe than using exec (after all, the function's code has already been executed).

Python: store expected Exceptions in function attributes

Is it pythonic to store the expected exceptions of a funcion as attributes of the function itself? or just a stinking bad practice.
Something like this
class MyCoolError(Exception):
pass
def function(*args):
"""
:raises: MyCoolError
"""
# do something here
if some_condition:
raise MyCoolError
function.MyCoolError = MyCoolError
And there in other module
try:
function(...)
except function.MyCoolError:
#...
Pro: Anywhere I have a reference to my function, I have also a reference to the exception it can raise, and I don't have to import it explicitly.
Con: I "have" to repeat the name of the exception to bind it to the function. This could be done with a decorator, but it is also added complexity.
EDIT
Why I am doing this is because I append some methods in an irregular way to some classes, where I think that a mixin it is not worth it. Let's call it "tailored added functionality". For instance let's say:
Class A uses method fn1 and fn2
Class B uses method fn2 and fn3
Class C uses fn4 ...
And like this for about 15 classes.
So when I call obj_a.fn2(), I have to import explicitly the exception it may raise (and it is not in the module where classes A, B or C, but in another one where the shared methods live)... which I think it is a little bit annoying. Appart from that, the standard style in the project I'm working in forces to write one import per line, so it gets pretty verbose.
In some code I have seen exceptions stored as class attributes, and I have found it pretty useful, like:
try:
obj.fn()
except obj.MyCoolError:
....
I think it is not Pythonic. I also think that it does not provide a lot of advantage over the standard way which should be to just import the exception along with the function.
There is a reason (besides helping the interpreter) why Python programs use import statements to state where their code comes from; it helps finding the code of the facilities (e. g. your exception in this case) you are using.
The whole idea has the smell of the declaration of exceptions as it is possible in C++ and partly mandatory in Java. There are discussions amongst the language lawyers whether this is a good idea or a bad one, and in the Python world the designers decided against it, so it is not Pythonic.
It also raises a whole bunch of further questions. What happens if your function A is using another function B which then, later, is changed so that it can throw an exception (a valid thing in Python). Are you willing to change your function A then to reflect that (or catch it in A)? Where would you want to draw the line — is using int(text) to convert a string to int reason enough to "declare" that a ValueError can be thrown?
All in all I think it is not Pythonic, no.

In python is there a way to know if an object "implements an interface" before I pass it to a function?

I know this may sound like a stupid question, especially to someone who knows python's nature, but I was just wondering, is there a way to know if an object "implements an interface" so as to say?
To give an example of what I want to say:
let's say I have this function:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
My question is: Is there a way to make sure that the object passed to the function is iterable? I know that in Java or C# I could do this by having the method accept any object that implements a specific interface, let's say (for example) iIterable like this: void get_counts(iIterable sequence)
My guess is that in Python I would have to employ preemptive introspection checks (in a decorator perhaps?) and throw a custom exception if the object doesn't have an __iter__ attribute). But is there a more pythonic way to do this?
Use polymorphism and duck-typing before isinstance() or interfaces
You generally define what you want to do with your objects, then either use polymorphism to adjust how each object responds to what you want to do, or you use duck typing; test if the object at hand can do the thing you want to do in the first place. This is the invocation versus introspection trade-off, conventional wisdom states that invocation is preferable over introspection, but in Python, duck-typing is preferred over isinstance testing.
So you need to work out why you need to filter on wether or not something is iterable in the first place; why do you need to know this? Just use a try: iter(object), except TypeError: # not iterable to test.
Or perhaps you just need to throw an exception if whatever that was passed was not an iterable, as that would signal an error.
ABCs
With duck-typing, you may find that you have to test for multiple methods, and thus a isinstance() test may look a better option. In such cases, using a Abstract Base Class (ABC) could also be an option; using an ABC let's you 'paint' several different classes as being the right type for a given operation, for example. Using a ABC let's you focus on the tasks that need to be performed rather than the specific implementations used; you can have a Paintable ABC, a Printable ABC, etc.
Zope interfaces and component architecture
If you find your application is using an awful lot of ABCs or you keep having to add polymorphic methods to your classes to deal with various different situations, the next step is to consider using a full-blown component architecture, such as the Zope Component Architecture (ZCA).
zope.interface interfaces are ABCs on steroids, especially when combined with the ZCA adapters. Interfaces document expected behaviour of a class:
if IFrobnarIterable.providedBy(yourobject):
# it'll support iteration and yield Frobnars.
but it also let's you look up adapters; instead of putting all the behaviours for every use of shapes in your classes, you implement adapters to provide polymorphic behaviours for specific use-cases. You can adapt your objects to be printable, or iterable, or exportable to XML:
class FrobnarsXMLExport(object):
adapts(IFrobnarIterable)
provides(IXMLExport)
def __init__(self, frobnariterator):
self.frobnars = frobnariterator
def export(self):
entries = []
for frobnar in self.frobnars:
entries.append(
u'<frobnar><width>{0}</width><height>{0}</height></frobnar>'.format(
frobnar.width, frobnar.height)
return u''.join(entries)
and your code merely has to look up adapters for each shape:
for obj in setofobjects:
self.result.append(IXMLExport(obj).export())
Python (since 2.6) has abstract base classes (aka virtual interfaces), which are more flexible than Java or C# interfaces. To check whether an object is iterable, use collections.Iterable:
if isinstance(obj, collections.Iterable):
...
However, if your else block would just raise an exception, then the most Python answer is: don't check! It's up to your caller to pass in an appropriate type; you just need to document that you're expecting an iterable object.
The Pythonic way is to use duck typing and, "ask forgiveness, not permission". This usually means performing an operation in a try block assuming it behaves the way you expect it to, then handling other cases in an except block.
I think this is the way that the community would recommend you do it:
import sys
def do_something(foo):
try:
for i in foo:
process(i)
except:
t, ex, tb = sys.exc_info()
if "is not iterable" in ex.message:
print "Is not iterable"
do_something(True)
Or, you could use something like zope.interface.

Creating a function object from a string

Question: Is there a way to make a function object in python using strings?
Info: I'm working on a project which I store data in a sqlite3 server backend. nothing to crazy about that. a DAL class is very commonly done through code generation because the code is so incredibly mundane. But that gave me an idea. In python when a attribute is not found, if you define the function __getattr__ it will call that before it errors. so the way I figure it, through a parser and a logic tree I could dynamically generate the code I need on its first call, then save the function object as a local attrib. for example:
DAL.getAll()
#getAll() not found, call __getattr__
DAL.__getattr__(self,attrib)#in this case attrib = getAll
##parser logic magic takes place here and I end up with a string for a new function
##convert string to function
DAL.getAll = newFunc
return newFunc
I've tried the compile function, but exec, and eval are far from satisfactory in terms of being able to accomplish this kind of feat. I need something that will allow multiple lines of function. Is there another way to do this besides those to that doesn't involve writing the it to disk? Again I'm trying to make a function object dynamically.
P.S.: Yes, I know this has horrible security and stability problems. yes, I know this is a horribly in-efficient way of doing this. do I care? no. this is a proof of concept. "Can python do this? Can it dynamically create a function object?" is what I want to know, not some superior alternative. (though feel free to tack on superior alternatives after you've answered the question at hand)
The following puts the symbols that you define in your string in the dictionary d:
d = {}
exec "def f(x): return x" in d
Now d['f'] is a function object. If you want to use variables from your program in the code in your string, you can send this via d:
d = {'a':7}
exec "def f(x): return x + a" in d
Now d['f'] is a function object that is dynamically bound to d['a']. When you change d['a'], you change the output of d['f']().
can't you do something like this?
>>> def func_builder(name):
... def f():
... # multiline code here, using name, and using the logic you have
... return name
... return f
...
>>> func_builder("ciao")()
'ciao'
basically, assemble a real function instead of assembling a string and then trying to compile that into a function.
If it is simply proof on concept then eval and exec are fine, you can also do this with pickle strings, yaml strings and anything else you decide to write a constructor for.

Parameter names in Python functions that take single object or iterable

I have some functions in my code that accept either an object or an iterable of objects as input. I was taught to use meaningful names for everything, but I am not sure how to comply here. What should I call a parameter that can a sinlge object or an iterable of objects? I have come up with two ideas, but I don't like either of them:
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Normally I call iterables of objects just the plural of what I would call a single object. I know this might seem a little bit compulsive, but Python is supposed to be (among others) about readability.
I have some functions in my code that accept either an object or an iterable of objects as input.
This is a very exceptional and often very bad thing to do. It's trivially avoidable.
i.e., pass [foo] instead of foo when calling this function.
The only time you can justify doing this is when (1) you have an installed base of software that expects one form (iterable or singleton) and (2) you have to expand it to support the other use case. So. You only do this when expanding an existing function that has an existing code base.
If this is new development, Do Not Do This.
I have come up with two ideas, but I don't like either of them:
[Only two?]
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
What? Are you saying you provide NO other documentation, and no other training? No support? No advice? Who is the "someone not used to it"? Talk to them. Don't assume or imagine things about them.
Also, don't use Leading Upper Case Names.
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Terrible. Never. Do. This.
I looked in the Python library for examples. Most of the functions that do this have simple descriptions.
http://docs.python.org/library/functions.html#isinstance
isinstance(object, classinfo)
They call it "classinfo" and it can be a class or a tuple of classes.
You could do that, too.
You must consider the common use case and the exceptions. Follow the 80/20 rule.
80% of the time, you can replace this with an iterable and not have this problem.
In the remaining 20% of the cases, you have an installed base of software built around an assumption (either iterable or single item) and you need to add the other case. Don't change the name, just change the documentation. If it used to say "foo" it still says "foo" but you make it accept an iterable of "foo's" without making any change to the parameters. If it used to say "foo_list" or "foo_iter", then it still says "foo_list" or "foo_iter" but it will quietly tolerate a singleton without breaking.
80% of the code is the legacy ("foo" or "foo_list")
20% of the code is the new feature ("foo" can be an iterable or "foo_list" can be a single object.)
I guess I'm a little late to the party, but I'm suprised that nobody suggested a decorator.
def withmany(f):
def many(many_foos):
for foo in many_foos:
yield f(foo)
f.many = many
return f
#withmany
def process_foo(foo):
return foo + 1
processed_foo = process_foo(foo)
for processed_foo in process_foo.many(foos):
print processed_foo
I saw a similar pattern in one of Alex Martelli's posts but I don't remember the link off hand.
It sounds like you're agonizing over the ugliness of code like:
def ProcessWidget(widget_thing):
# Infer if we have a singleton instance and make it a
# length 1 list for consistency
if isinstance(widget_thing, WidgetType):
widget_thing = [widget_thing]
for widget in widget_thing:
#...
My suggestion is to avoid overloading your interface to handle two distinct cases. I tend to write code that favors re-use and clear naming of methods over clever dynamic use of parameters:
def ProcessOneWidget(widget):
#...
def ProcessManyWidgets(widgets):
for widget in widgets:
ProcessOneWidget(widget)
Often, I start with this simple pattern, but then have the opportunity to optimize the "Many" case when there are efficiencies to gain that offset the additional code complexity and partial duplication of functionality. If this convention seems overly verbose, one can opt for names like "ProcessWidget" and "ProcessWidgets", though the difference between the two is a single easily missed character.
You can use *args magic (varargs) to make your params always be iterable.
Pass a single item or multiple known items as normal function args like func(arg1, arg2, ...) and pass iterable arguments with an asterisk before, like func(*args)
Example:
# magic *args function
def foo(*args):
print args
# many ways to call it
foo(1)
foo(1, 2, 3)
args1 = (1, 2, 3)
args2 = [1, 2, 3]
args3 = iter((1, 2, 3))
foo(*args1)
foo(*args2)
foo(*args3)
Can you name your parameter in a very high-level way? people who read the code are more interested in knowing what the parameter represents ("clients") than what their type is ("list_of_tuples"); the type can be defined in the function documentation string, which is a good thing since it might change, in the future (the type is sometimes an implementation detail).
I would do 1 thing,
def myFunc(manyFoos):
if not type(manyFoos) in (list,tuple):
manyFoos = [manyFoos]
#do stuff here
so then you don't need to worry anymore about its name.
in a function you should try to achieve to have 1 action, accept the same parameter type and return the same type.
Instead of filling the functions with ifs you could have 2 functions.
Since you don't care exactly what kind of iterable you get, you could try to get an iterator for the parameter using iter(). If iter() raises a TypeError exception, the parameter is not iterable, so you then create a list or tuple of the one item, which is iterable and Bob's your uncle.
def doIt(foos):
try:
iter(foos)
except TypeError:
foos = [foos]
for foo in foos:
pass # do something here
The only problem with this approach is if foo is a string. A string is iterable, so passing in a single string rather than a list of strings will result in iterating over the characters in a string. If this is a concern, you could add an if test for it. At this point it's getting wordy for boilerplate code, so I'd break it out into its own function.
def iterfy(iterable):
if isinstance(iterable, basestring):
iterable = [iterable]
try:
iter(iterable)
except TypeError:
iterable = [iterable]
return iterable
def doIt(foos):
for foo in iterfy(foos):
pass # do something
Unlike some of those answering, I like doing this, since it eliminates one thing the caller could get wrong when using your API. "Be conservative in what you generate but liberal in what you accept."
To answer your original question, i.e. what you should name the parameter, I would still go with "foos" even though you will accept a single item, since your intent is to accept a list. If it's not iterable, that is technically a mistake, albeit one you will correct for the caller since processing just the one item is probably what they want. Also, if the caller thinks they must pass in an iterable even of one item, well, that will of course work fine and requires very little syntax, so why worry about correcting their misapprehension?
I would go with a name explaining that the parameter can be an instance or a list of instances. Say one_or_more_Foo_objects. I find it better than the bland param.
I'm working on a fairly big project now and we're passing maps around and just calling our parameter map. The map contents vary depending on the function that's being called. This probably isn't the best situation, but we reuse a lot of the same code on the maps, so copying and pasting is easier.
I would say instead of naming it what it is, you should name it what it's used for. Also, just be careful that you can't call use in on a not iterable.

Categories

Resources