Using isinstance() versus duck typing - python

I'm writing an interface to matplotlib, which requires that lists of floats are treated as corresponding to a colour map, but other types of input are treated as specifying a particular colour.
To do this, I planned to use matplotlib.colors.colorConverter, which is an instance of a class that converts the other types of input to matplotlib RGBA colour tuples. However, it will also convert floats to a grayscale colour map. This conflicts with the existing functionality of the package I'm working on and I think that would be undesirable.
My question is: is it appropriate/Pythonic to use an isinstance() check prior to using colorConverter to make sure that I don't incorrectly handle lists of floats? Is there a better way that I haven't thought of?
I've read that I should generally code to an interface, but in this case, the interface has functionality that differs from what is required.

It's a little subjective, but I'd say: in general it's a not a good idea, but here where you're distinguishing between a container and an instance of a class it is appropriate (especially when, say, those classes may themselves be iterable, like tuples or strings and doing it the duck-typing would get quite tricky).
Aside: coding to an interface is generally recommended, but it's far more applicable to the Java-style static languages than Python, where interfaces don't really exist, unless you count abstract base classes and the abc module etc. (much deeper discussion in What's the Python version for “Code against an interface, not an object”?)
Hard to say without more details, but it sounds like you're closer to building a facade here than anything, and as such you should be free to use your own (neater / tighter / different) API, insulating users from your underlying implementation (Matplotlib).

Why not write two separate functions, one that treats its input as a color map, and another that treats its input as a color? This would be the simplest way to deal with the problem, and would both avoid surprises, and leave you room to expand functionality in the future.

Related

How to represent free functions in a Class Diagram Python

I have a number of free functions in a couple of Python modules and I need to create a UML Class Diagram to represent my entire program.
Can I represent a free functions in a Class Diagram somehow or do I need to create a Utility Class so I can represent them in my Class Diagram?
You will need to have some class in order to represent a "free function". You are quite free in how to do that. What I usually do is to create a stereotyped class. And it would be ok to use «utility» for that. Anything else would work, but of course you need to document that in your domain.
Usually a stereotype is bound to a profile. But most tools allow to use freely defined stereotypes. Though that is not 100% UML compliant it is quite a common practice.
Even though UML was conceived in a time, when object orientation was hyped, it doesn't mean that it cannot be used for functions. What many don't realize is, that Behavior in the UML is a Class. Therefore, any Behavior can be shown in a class diagram. Just put the metaclass in guillemets above the name, e.g. «activity». If you plan to describe the function with an activity diagram, that makes perfect sense. However, if you plan to describe it in (pseudo) code or in natural language, you can use «function behavior» which is defined as a behavior without side effects. Or, if it can have side effects, just use «opaque behavior».

ABAQUS python scripting inconsistencies when selecting regions

This may sound more like a rant to some extent, but I also would like to have your opinion on how to deal with the inconsistencies when using python scripting in abaqus.
here my example: in my rootAssembly (ra) I have three instances called a, b, c. in the script below I assign general seed, then mesh control, and element types, finally I generate the mesh:
ra.seedPartInstance(regions=(a,b,c), size=1.0)
ra.setMeshControls(elemShape=QUAD,
regions=(a.faces+b.faces+c.faces),
technique=STRUCTURED)
ra.setElementType(
elemTypes=eltyp,
regions=(a.faces,b.faces,c.faces))
ra.generateMesh(regions=(a,b,c))
As you can see, ABAQUS requires you to define the same region in several different modes.
Even though the argument is called "regions", ABAQUS either asks for a Set, or a Vertex, or a GeomSequence.
how do you deal with this? scripting feels a lot like trial and error, as there is no way to know in advance what is expected.
any suggestions?
Yes, there is clearly "a way to know in advance what is expected" - the docs. These spell out exactly what arguments are allowed.
But seriously - I see no inconsistency in your example. In practice, the reuse of the argument regions makes complete sense when you consider the context for what each of the functions actually do. Consider how the word "region" is a useful conceptual framework that can be adapted to easily allow the user to specify the necessary info for a variety of different tasks.
Now consider the complexity of the underlying system that the Python API exposes, and the variety of tasks that different users want to control and do with that underlying system. I doubt it would be simpler if the args were named something like seq_of_geomCells_geomFaces_or_geomSets. Or even worse, if there were a different argument for each allowable model entity that the function was designed to handle - that would be a nightmare. In this respect, the reuse of the keyword regions as a logical conceptual framework makes complete sense.
ok, i read now from the documentation of the three commands used above:
seedPartInstance(...)
regions: A sequence of PartInstance objects specifying the part instances to seed.
setMeshControls(...)
regions: A sequence of Face or Cell regions specifying the regions for which to set the mesh control parameters.
setElementType(...)
regions: A sequence of Geometry regions or MeshElement objects, or a Set object containing either geometry regions or elements, specifying the regions to which element types are to be assigned.
ok, i get the difference between partInstances and faces, but still it's not extremely clear why one is appended (using commas) and the other is added (using +), since they both call for a Sequence, and at this point, how does setElementType even works when passing faces objects to it?
I will take some more time to learn ABAQUS and to think through it, hopefully i can understand truly these differences.

How to get a type of the variable from the Python's AST?

Suppose I want to get the type of all variables from the AST tree that I have generated from some source code -- how would I go about doing that?
For example, suppose in my source code I have something like i = 5. How would I determine, from the abstract syntax tree, that the type of i is integer?
I tried the type() function; however, it does not work in this situation.
As explained in other posts, there isn't easy way to achieve this without heavy analysis of the syntax tree, for which python ast module provides no facilities.
You can still use logilab's astng1, which is the basis for pylint2 and provides static inference capabilities.
Here is a quick example :
from logilab.astng.builder import ASTNGBuilder
builder = ASTNGBuilder()
astng = builder.string_build('i = 1', __name__, '<string>')
assnode = astng['']
print [(inf.value, type(inf.value)) for inf in assnode.infer()]
Of course you'll have to dig the api for more real-life usage. You can still write python-projects#lists.logilab.org for help on this.
As other posters have noted, this isn't so easy in a dynamically typed language. You can't just trace the assignment back to a static type declaration, as you can in C or Java.
However, one can often make a reasonable determination of the type.
Presumably the scoping rules allow one to determine which i (or which set of i's) might be accessed/updated/bound where the question is asked ("what the type of at this point in the code?"). Then one can do an analysis of all the values that might be assigned (a particularly trivial case is when i is bound only to a function definition). The upper bound in the type lattice on those types is the "type" of i. Yes, it might be "anything" in some cases, but in most well-written programs even dynamic variables have a "narrow" type intended by the programmer, and often its a primitive langauge type (like, er, "int"). Or the programmer wouldn't be able to reasonably write an algorithm (What, your array index isn't an integer sometimes?).
You need to do some kind of conservative analysis of the program to determine this upperbound type. (You can obviously do the trivial analysis, and conclude useless that a variable can be "any" type). I think that's an unsatisfactory answer.
The machinery to do all this analysis is pretty complicated (you need global flow analysis and some determination of what can be dynamically loaded to do this really well) and I doubt if Python's AST package does it.
You can't, because Python's variables don't have a type. Values have types.
That's how dynamic typing works.

How do we use sin,cos,tan generically (including user-defined types) in Python?

Edit: Let me try to reword and improve my question. The old version is attached at the bottom.
What I am looking for is a way to express and use free functions in a type-generic way. Examples:
abs(x) # maps to x.__abs__()
next(x) # maps to x.__next__() at least in Python 3
-x # maps to x.__neg__()
In these cases the functions have been designed in a way that allows users with user-defined types to customize their behaviour by delegating the work to a non-static method call. This is nice. It allows us to write functions that don't really care about the exact parameter types as long as they "feel" like objects that model a certain concept.
Counter examples: Functions that can't be easily used generically:
math.exp # only for reals
cmath.exp # takes complex numbers
Suppose, I want to write a generic function that applies exp on a list of number-like objects. What exp function should I use? How do I select the correct one?
def listexp(lst):
return [math.exp(x) for x in lst]
Obviously, this won't work for lists of complex numbers even though there is an exp for complex numbers (in cmath). And it also won't work for any user-defined number-like type which might offer its own special exp function.
So, what I'm looking for is a way to deal with this on both sides -- ideally without special casing a lot of things. As a writer of some generic function that does not care about the exact types of parameters I want to use the correct mathematical functions that is specific to the types involved without having to deal with this explicitly. As a writer of a user-defined type, I would like to expose special mathematical functions that have been augmented to deal with additional data stored in those objects (similar to the imaginary part of complex numbers).
What is the preferred pattern/protocol/idiom for doing that? I did not yet test numpy. But I downloaded its source code. As far as I know, it offers a sin function for arrays. Unfortunately, I haven't found its implementation yet in the source code. But it would be interesting to see how they managed to pick the right sin function for the right type of numbers the array currently stores.
In C++ I would have relied on function overloading and ADL (argument-dependent lookup). With C++ being statically typed, it should come as no surprise that this (name lookup, overload resolution) is handled completely at compile-time. I suppose, I could emulate this at runtime with Python and the reflective tools Python has to offer. But I also know that trying to import a coding style into another language might be a bad idea and not very idiomatic in the new language. So, if you have a different idea for an approach, I'm all ears.
I guess, somewhere at some point I need to manually do some type-dependent dispatching in an extensible way. Maybe write a module "tgmath" (type generic math) that comes with support for real and complex support as well as allows others to register their types and special case functions... Opinions? What do the Python masters say about this?
TIA
Edit: Apparently, I'm not the only one who is interested in generic functions and type-dependent overloading. There is PEP 3124 but it is in draft state since 4 years ago.
Old version of the question:
I have a strong background in Java and C++ and just recently started learning Python. What I'm wondering about is: How do we extend mathematical functions (at least their names) so they work on other user-defined types? Do these kinds of functions offer any kind of extension point/hook I can leverage (similar to the iterator protocol where next(obj) actually delegates to obj.__next__, etc) ?
In C++ I would have simply overloaded the function with the new parameter type and have the compiler figure out which of the functions was meant using the argument expressions' static types. But since Python is a very dynamic language there is no such thing as overloading. What is the preferred Python way of doing this?
Also, when I write custom functions, I would like to avoid long chains of
if isinstance(arg,someClass):
suchandsuch
elif ...
What are the patterns I could use to make the code look prettier and more Pythonish?
I guess, I'm basically trying to deal with the lack of function overloading in Python. At least in C++ overloading and argument-dependent lookup is an important part of good C++ style.
Is it possible to make
x = udt(something) # object of user-defined type that represents a number
y = sin(x) # how do I make this invoke custom type-specific code for sin?
t = abs(x) # works because abs delegates to __abs__() which I defined.
work? I know I could make sin a non-static method of the class. But then I lose genericity because for every other kind of number-like object it's sin(x) and not x.sin().
Adding a __float__ method is not acceptable since I keep additional information in the object such as derivatives for "automatic differentiation".
TIA
Edit: If you're curious about what the code looks like, check this out. In an ideal world I would be able to use sin/cos/sqrt in a type-generic way. I consider these functions part of the objects interface even if they are "free functions". In __somefunction I did not qualify the functions with math. nor __main__.. It just works because I manually fall back on math.sin (etc) in my custom functions via the decorator. But I consider this to be an ugly hack.
you can do this, but it works backwards. you implement __float__() in your new type and then sin() will work with your class.
in other words, you don't adapt sine to work on other types; you adapt those types so that they work with sine.
this is better because it forces consistency. if there is no obvious mapping from your object to a float then there probably isn't a reasonable interpretation of sin() for that type.
[sorry if i missed the "__float__ won't work" part earlier; perhaps you added that in response to this? anyway, for convincing proof that what you want isn't possible, python has the cmath library to add sin() etc for complex numbers...]
If you want the return type of math.sin() to be your user-defined type, you appear to be out of luck. Python's math library is basically a thin wrapper around a fast native IEEE 754 floating point math library. If you want to be internally consistent and duck-typed, you can at least put the extensibility shim that python is missing into your own code.
def sin(x):
try:
return x.__sin__()
except AttributeError:
return math.sin(x)
Now you can import this sin function and use it indiscriminately wherever you used math.sin previously. It's not quite as pretty as having math.sin pick up your duck-typing automatically but at least it can be consistent within your codebase.
Define your own versions in a module. This is what's done in cmath for complex number and in numpy for arrays.
Typically the answer to questions like this is "you don't" or "use duck typing". Can you provide a little more detail about what you want to do? Have you looked at the remainder of the protocol methods for numeric types?
http://docs.python.org/reference/datamodel.html#emulating-numeric-types
Ideally, you will derive your user-defined numeric types from a native Python type, and the math functions will just work. When that isn't possible, perhaps you can define __int__() or __float__() or __complex__() or __long__() on the object so it knows how to convert itself to a type the math functions can handle.
When that isn't feasible, for example if you wish to take a sin() of an object that stores x and y displacement rather than an angle, you will need to provide either your own equivalents of such functions (usually as a method of the class) or a function such as to_angle() to convert the object's internal representation to the one needed by Python.
Finally, it is possible to provide your own math module that replaces the built-in math functions with your own varieties, so if you want to allow math on your classes without any syntax changes to the expressions, it can be done in that fashion, although it is tricky and can reduce performance, since you'll be doing (e.g.) a fair bit of preprocessing in Python before calling the native implementations.

Data Structures in Python

All the books I've read on data structures so far seem to use C/C++, and make heavy use of the "manual" pointer control that they offer. Since Python hides that sort of memory management and garbage collection from the user is it even possible to implement efficient data structures in this language, and is there any reason to do so instead of using the built-ins?
Python gives you some powerful, highly optimized data structures, both as built-ins and as part of a few modules in the standard library (lists and dicts, of course, but also tuples, sets, arrays in module array, and some other containers in module collections).
Combinations of these data structures (and maybe some of the functions from helper modules such as heapq and bisect) are generally sufficient to implement most richer structures that may be needed in real-life programming; however, that's not invariably the case.
When you need something more than the rich library provides, consider the fact that an object's attributes (and items in collections) are essentially "pointers" to other objects (without pointer arithmetic), i.e., "reseatable references", in Python just like in Java. In Python, you normally use a None value in an attribute or item to represent what NULL would mean in C++ or null would mean in Java.
So, for example, you could implement binary trees via, e.g.:
class Node(object):
__slots__ = 'payload', 'left', 'right'
def __init__(self, payload=None, left=None, right=None):
self.payload = payload
self.left = left
self.right = right
plus methods or functions for traversal and similar operations (the __slots__ class attribute is optional -- mostly a memory optimization, to avoid each Node instance carrying its own __dict__, which would be substantially larger than the three needed attributes/references).
Other examples of data structures that may best be represented by dedicated Python classes, rather than by direct composition of other existing Python structures, include tries (see e.g. here) and graphs (see e.g. here).
For some simple data structures (eg. a stack), you can just use the builtin list to get your job done. With more complex structures (eg. a bloom filter), you'll have to implement them yourself using the primitives the language supports.
You should use the builtins if they serve your purpose really since they're debugged and optimised by a horde of people for a long time. Doing it from scratch by yourself will probably produce an inferior data structure.
If however, you need something that's not available as a primitive or if the primitive doesn't perform well enough, you'll have to implement your own type.
The details like pointer management etc. are just implementation talk and don't really limit the capabilities of the language itself.
C/C++ data structure books are only attempting to teach you the underlying principles behind the various structures - they are generally not advising you to actually go out and re-invent the wheel by building your own library of stacks and lists.
Whether you're using Python, C++, C#, Java, whatever, you should always look to the built in data structures first. They will generally be implemented using the same system primitives you would have to use doing it yourself, but with the advantage of having been tried and tested.
Only when the provided data structures do not allow you to accomplish what you need, and there isn't an alternative and reliable library available to you, should you be looking at building something from scratch (or extending what's provided).
How Python handles objects at a low level isn't too strange anyway. This document should disambiguate it a tad; it's basically all the pointer logic you're already familiar with.
With Python you have access to a vast assortment of library modules written and debugged by other people. Odds are very good that somewhere out there, there is a module that does at least part of what you want, and odds are even good that it might be implemented in C for performance.
For example, if you need to do matrix math, you can use NumPy, which was written in C and Fortran.
Python is slow enough that you won't be happy if you try to write some sort of really compute-intensive code (example, a Fast Fourier Transform) in native Python. On the other hand, you can get a C-coded Fourier Transform as part of SciPy, and just use it.
I have never had a situation where I wanted to solve a problem in Python and said "darn, I just can't express the data structure I need."
If you are a pioneer, and you are doing something in Python for which there just isn't any library module out there, then you can try writing it in pure Python. If it is fast enough, you are done. If it is too slow, you can profile it, figure out where the slow parts are, and rewrite them in C using the Python C API. I have never needed to do this yet.
It's not possible to implement something like a C++ vector in Python, since you don't have array primitives the way C/C++ do. However, anything more complicated can be implemented (efficiently) on top of it, including, but not limited to: linked lists, hash tables, multisets, bloom filters, etc.

Categories

Resources