Given an arbitrary collection, is there a way to tell if it is ordered? - python

Here's what I have so far:
def is_ordered(collection):
if isinstance(collection, set):
return False
if isinstance(collection, list):
return True
if isinstance(collection, dict):
return False
raise Exception("unknown collection")
Is there a much better way to do this?
NB: I do mean ordered and not sorted.
Motivation:
I want to iterate over an ordered collection. e.g.
def most_important(priorities):
for p in priorities:
print p
In this case the fact that priorities is ordered is important. What kind of collection it is is not. I'm trying to live duck-typing here. I have frequently been dissuaded by from type checking by Pythonistas.

If the collection is truly arbitrary (meaning it can be of any class whatsoever), then the answer has to be no.
Basically, there are two possible approaches:
know about every possible class that can be presented to your method, and whether it's ordered;
test the collection yourself by inserting into it every possible combination of keys, and seeing whether the ordering is preserved.
The latter is clearly infeasible. The former is along the lines of what you already have, except that you have to know about every derived class such as collections.OrderedDict; checking for dict is not enough.
Frankly, I think the whole is_ordered check is a can of worms. Why do you want to do this anyway?

Update: In essence, you are trying to unittest the argument passed to you. Stop doing that, and unittest your own code. Test your consumer (make sure it works with ordered collections), and unittest the code that calls it, to ensure it is getting the right results.
In a statically-typed language you would simply restrict yourself to specific types. If you really want to replicate that, simply specify the only types you accept, and test for those. Raise an exception if anything else is passed. It's not pythonic, but it reliably achieves what you want to do
Well, you have two possible approaches:
Anything with an append method is almost certainly ordered; and
If it only has an add method, you can try adding a nonce-value, then iterating over the collection to see if the nonce appears at the end (or, perhaps at one end); you could try adding a second nonce and doing it again just to be more confident.
Of course, this won't work where e.g. the collection is empty, or there is an ordering function that doesn't result in addition at the ends.
Probably a better solution is simply to specify that your code requires ordered collections, and only pass it ordered collections.

I think that enumerating the 90% case is about as good as you're going to get (if using Python 3, replace basestring with str). Probably also want to consider how you would handle generator expressions and similar ilk, too (again, if using Py3, skip the xrangor):
generator = type((i for i in xrange(0)))
enumerator = type(enumerate(range(0)))
xrangor = type(xrange(0))
is_ordered = lambda seq : isinstance(seq,(tuple, list, collections.OrderedDict,
basestring, generator, enumerator, xrangor))
If your callers start using itertools, then you'll also need to add itertools types as returned by islice, imap, groupby. But the sheer number of these special cases really starts to point to a code smell.

What if the list is not ordered, e.g. [1,3,2]?

Related

Use of a temporary variable vs repeatedly read same key/value from a dictionary

Background: I need to read the same key/value from a dictionary (exactly) twice.
Question: There are two ways, as shown below,
Method 1. Read it with the same key twice, e.g.,
sample_map = {'A':1,}
...
if sample_map.get('A', None) is not None:
print("A's value in map is {}".format(sample_map.get('A')))
Method 2. Read it once and store it in a local variable, e.g,
sample_map = {'A':1,}
...
ret_val = sample.get('A', None)
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
Which way is better? What are their Pros and Cons?
Note that I am aware that print() can naturally handle ret_val of None. This is a hypothetical example and I just use it for illustration purposes.
Under these conditions, I wouldn't use either. What you're really interested in is whether A is a valid key, and the KeyError (or lack thereof) raised by __getitem__ will tell you if it is or not.
try:
print("A's value in map is {}".format(sample['A'])
except KeyError:
pass
Or course, some would say there is too much code in the try block, in which case method 2 would be preferable.
try:
ret_val = sample['A']
except KeyError:
pass
else:
print("A's value in map is {}".format(ret_val))
or the code you already have:
ret_val = sample.get('A') # None is the default value for the second argument
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
There isn't any effective difference between the options you posted.
Python: List vs Dict for look up table
Lookups in a dict are about o(1). Same goes for a variable you have stored.
Efficiency is about the same. In this case, I would skip defining the extra variable, since not much else is going on.
But in a case like below, where there's a lot of dict lookups going on, I have plans to refactor the code to make things more intelligible, as all of the lookups clutter or obfuscate the logic:
# At this point, assuming that these are floats is OK, since no thresholds had text values
if vname in paramRanges:
"""
Making sure the variable is one that we have a threshold for
"""
# We might care about it
# Don't check the equal case, because that won't matter
if float(tblChanges[vname][0]) < float(tblChanges[vname][1]):
# Check lower tolerance
# Distinction is important because tolerances are not always the same +/-
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][2]):
# Difference from default is greater than tolerance
# vname : current value, default value, negative tolerance, tolerance units, change date
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][2]),
paramRanges[vname][0], tblChanges[vname][2]
)
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][1]):
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][1]),
paramRanges[vname][0], tblChanges[vname][2]
)
In most cases—if you can't just rewrite your code to use EAFP as chepner suggests, which you probably can for this example—you want to avoid repeated method calls.
The only real benefit of repeating the get is saving an assignment statement.
If your code isn't crammed in the middle of a complex expression, that just means saving one line of vertical space—which isn't nothing, but isn't a huge deal.
If your code is crammed in the middle of a complex expression, pulling the get out may force you to rewrite things a bit. You may have to, e.g., turn a lambda into a def, or turn a while loop with a simple condition into a while True: with an if …: break. Usually that's a sign that you, e.g., really wanted a def in the first place, but "usually" isn't "always". So, this is where you might want to violate the rule of thumb—but see the section at the bottom first.
On the other side…
For dict.get, the performance cost of repeating the method is pretty tiny, and unlikely to impact your code. But what if you change the code to take an arbitrary mapping object from the caller, and someone passes you, say, a proxy that does a get by making a database query or an RPC to a remote server?
For single-threaded code, calling dict.get with the same arguments twice in a row without doing anything in between is correct. But what if you're taking a dict passed by the caller, and the caller has a background thread also modifying the same dict? Then your code is only correct if you put a Lock or other synchronization around the two accesses.
Or, what if your expression was something that might mutate some state, or do something dangerous?
Even if nothing like this is ever going to be an issue in your code, unless that fact is blindingly obvious to anyone reading your code, they're still going to have to think about the possibility of performance costs and ToCToU races and so on.
And, of course, it makes at least two of your lines longer. Assuming you're trying to write readable code that sticks to 72 or 79 or 99 columns, horizontal space is a scarce resource, while vertical space is much less of a big deal. I think your second version is easier to scan than your first, even without all of these other considerations, but imagine making the expression, say, 20 characters longer.
In the rare cases where pulling the repeated value out of an expression would be a problem, you still often want to assign it to a temporary.
Unfortunately, up to Python 3.7, you usually can't. It's either clumsy (e.g., requiring an extra nested comprehension or lambda just to give you an opportunity to bind a variable) or impossible.
But in Python 3.8, PEP 572 assignment expressions handle this case.
if (sample := sample_map.get('A', None)) is not None:
print("A's value in map is {}".format(sample))
I don't think this is a great use of an assignment expression (see the PEP for some better examples), especially since I'd probably write this the way chepner suggested… but it does show how to get the best of both worlds (assigning a temporary, and being embeddable in an expression) when you really need to.

Why isn't there any special method for __max__ in python?

As the title asks. Python has a lot of special methods, __add__, __len__, __contains__ et c. Why is there no __max__ method that is called when doing max? Example code:
class A:
def __max__():
return 5
a = A()
max(a)
It seems like range() and other constructs could benefit from this. Am I missing some other effective way to do max?¨
Addendum 1:
As a trivial example, max(range(1000000000)) takes a long time to run.
I have no authoritative answer but I can offer my thoughts on the subject.
There are several built-in functions that have no corresponding special method. For example:
max
min
sum
all
any
One thing they have in common is that they are reduce-like: They iterate over an iterable and "reduce" it to one value. The point here is that these are more of a building block.
For example you often wrap the iterable in a generator (or another comprehension, or transformation like map or filter) before applying them:
sum(abs(val) for val in iterable) # sum of absolutes
any(val > 10 for val in iterable) # is one value over 10
max(person.age for person in iterable) # the oldest person
That means most of the time it wouldn't even call the __max__ of the iterable but try to access it on the generator (which isn't implemented and cannot be implemented).
So there is simply not much of a benefit if these were implemented. And in the few cases when it makes sense to implement them it would be more obvious if you create a custom method (or property) because it highlights that it's a "shortcut" or that it's different from the "normal result".
For example these functions (min, etc.) have O(n) run-time, so if you can do better (for example if you have a sorted list you could access the max in O(1)) it might make sense to document that explicitly.
Some operations are not basic operations. Take max as an example, it is actually an operation based on comparison. In other words, when you get a max value, you are actually getting a biggest value.
So in this case, why should we implement a specified max function but not override the behave of comparison?
Think in another direction, what does max really mean? For example, when we execute max(list), what are we doing?
I think we are actually checking list's elements, and the max operation is not related to list itself at all.
list is just a container which is unnecessary in max operation. It is list or set or something else, it doesn't matter. What really useful is the elements inside this container.
So if we define a __max__ action for list, we are actually doing another totally different operation. We are asking a container to give us advice about max value.
I think in this case, as it is a totally different operation, it should be a method of container instead of overriding built-in function's behave.

Flexible Arguments in Python

This has probably been asked before, but I don't know how to look up the answer, because I'm not sure what's a good way to phrase the question.
The topic is function arguments that can semantically be expressed in many different ways. For example, to give a file to a function, you could either give the file directly, or you could give a string, which is the path to the file. To specify a number, you might allow an integer as an argument, or maybe you might allow a string (the numeral), or you might even allow a string such as "one". Another example might be a function that takes a list (of numbers, say), but as a convenience, it will convert a number into a list containing one element: that number.
Is there a more or less standard approach in Python to allowing this sort of flexibility? It certainly complicates the code for a program if you're not certain what types the arguments are, so my guess would be to try factor out the convenience functions into just one place, instead of scattered everywhere, but I don't really know how best to do that kind of factoring.
No, there is not more or less a "standard" approach to this.
Don't do that! ;)
But if you still want to, I'd suggest that you have an intermediate class or function handling this for you:
Pseudocode:
def printTheNumber(num):
print num
def intermediatePrintTheNumber(input):
num_int_dict = {'one':1, "two":2 ....
if input.isstring():
printTheNumber(num_int_dict[input])
elif input.isint():
printTheNumber(input)
else:
print "Sorry Dave, I don't understand you"
If this is pythonic I don't know, but that's how I'd solve it if I had to, of course with some more checking of the input to deem it valid.
When it comes to your comment you mention semantic similarity i.e "one" and 1 might mean the same thing.
Where should this kind of conversion be made you ask.
Well that depends on the design of your system, but I can tell you that it should not be done in the same function that I call printTheNumber for one very simple reason, and that is that that would give that function way to much responsibility.
Depending on the complexity of the input it could be the integer 1 or the string "1" or, in the worse case, "one" or maybe even worse "uno"|"one"|"yxi"|"ett" .. and so on. This should be handled by a function that has only that responsibility maybe with a database handling the mapping.
I would split it up so that I have one function handling the the strings "one", "two" ... etc, and one handling integers and have a third function that checks the input to see if it can be converted to an integer or not.
As I see it, there is a warning for a fundamental flaw in the design if you have to take measures for this kind of complexity, but you seem to be aware of that so I won't go on about it.
A good way to factor out code common to several functions is through decorators. For example,
from functools import wraps
def takes_list(func):
#wraps(func)
def wrapper(arg):
if not isinstance(arg, list):
arg = [arg]
return func(arg)
return wrapper
#takes_list
def my_func(x):
"Does something with list x."
I should note that for cases such as files, you don't want to get in the way of Python's duck typing: doing a check isinstance(arg, file) has the problem that it won't allow file-like things such as io.StringIO. Instead, check against str (or basestring) or even let open do the checking for you, with try-except.
However, it's generally better practice just to let the caller pass what they like into the function, and fail if it isn't valid.
Since you can not add methods at runtime to builtin classes such as int or str you would make a switch case statement like structure as mentioned by Daniel Figueroa.
Another way would be to just convert:
def func(i):
if not isinstance(i, int):
i = int(i) # objects can overwrite __int__ if needed.
If you have own classes that you can add methods to, you may use double dispatch to do the same thing for you. Smalltalk uses this for the Integer-Float-... conversion.
Another way would be to use subject oriented programming for which I have not found an implementation yet but I tried: https://gist.github.com/niccokunzmann/4971938

Best / most pythonic way to get an ordered list of unique items

I have one or more unordered sequences of (immutable, hashable) objects with possible duplicates and I want to get a sorted sequence of all those objects without duplicates.
Right now I'm using a set to quickly gather all the elements discarding duplicates, convert it to a list and then sort that:
result = set()
for s in sequences:
result = result.union(s)
result = list(result)
result.sort()
return result
It works but I wouldn't call it "pretty". Is there a better way?
This should work:
sorted(set(itertools.chain.from_iterable(sequences)))
I like your code just fine. It is straightforward and easy to understand.
We can shorten it just a little bit by chaining off the list():
result = set()
for s in sequences:
result = result.union(s)
return sorted(result)
I really have no desire to try to boil it down beyond that, but you could do it with reduce():
result = reduce(lambda s, x: s.union(x), sequences, set())
return sorted(result)
Personally, I think this is harder to understand than the above, but people steeped in functional programming might prefer it.
EDIT: #agf is much better at this reduce() stuff than I am. From the comments below:
return sorted(reduce(set().union, sequences))
I had no idea this would work. If I correctly understand how this works, we are giving reduce() a callable which is really a method function on one instance of a set() (call it x for the sake of discussion, but note that I am not saying that Python will bind the name x with this object). Then reduce() will feed this function the first two iterables from sequences, returning x, the instance whose method function we are using. Then reduce() will repeatedly call the .union() method and ask it to take the union of x and the next iterable from sequences. Since the .union() method is likely smart enough to notice that it is being asked to take the union with its own instance and not bother to do any work, it should be just as fast to call x.union(x, some_iterable) as to just call x.union(some_iterable). Finally, reduce() will return x, and we have the set we want.
This is a bit tricky for my personal taste. I had to think this through to understand it, while the itertools.chain() solution made sense to me right away.
EDIT: #agf made it less tricky:
return sorted(reduce(set.union, sequences, set()))
What this is doing is much simpler to understand! If we call the instance returned by set() by the name of x again (and just like above with the understanding that I am not claiming that Python will bind the name x with this instance); and if we use the name n to refer to each "next" value from sequences; then reduce() will be repeatedly calling set.union(x, n). And of course this is exactly the same thing as x.union(n). IMHO if you want a reduce() solution, this is the best one.
--
If you want it to be fast, ask yourself: is there any way we can apply itertools to this? There is a pretty good way:
from itertools import chain
return sorted(set(chain(*sequences)))
itertools.chain() called with *sequences serves to "flatten" the list of lists into a single iterable. It's a little bit tricky, but only a little bit, and it's a common idiom.
EDIT: As #Jbernardo wrote in the most popular answer, and as #agf observes in comments, itertools.chain() returns an object that has a .from_iterable() method, and the documentation says it evaluates an iterable lazily. The * notation forces building a list, which may consume considerable memory if the iterable is a long sequence. In fact, you could have a never-ending generator, and with itertools.chain().from_iterable() you would be able to pull values from it for as long as you want to run your program, while the * notation would just run out of memory.
As #Jbernardo wrote:
sorted(set(itertools.chain.from_iterable(sequences)))
This is the best answer, and I already upvoted it.

Parameter names in Python functions that take single object or iterable

I have some functions in my code that accept either an object or an iterable of objects as input. I was taught to use meaningful names for everything, but I am not sure how to comply here. What should I call a parameter that can a sinlge object or an iterable of objects? I have come up with two ideas, but I don't like either of them:
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Normally I call iterables of objects just the plural of what I would call a single object. I know this might seem a little bit compulsive, but Python is supposed to be (among others) about readability.
I have some functions in my code that accept either an object or an iterable of objects as input.
This is a very exceptional and often very bad thing to do. It's trivially avoidable.
i.e., pass [foo] instead of foo when calling this function.
The only time you can justify doing this is when (1) you have an installed base of software that expects one form (iterable or singleton) and (2) you have to expand it to support the other use case. So. You only do this when expanding an existing function that has an existing code base.
If this is new development, Do Not Do This.
I have come up with two ideas, but I don't like either of them:
[Only two?]
FooOrManyFoos - This expresses what goes on, but I could imagine that someone not used to it could have trouble understanding what it means right away
What? Are you saying you provide NO other documentation, and no other training? No support? No advice? Who is the "someone not used to it"? Talk to them. Don't assume or imagine things about them.
Also, don't use Leading Upper Case Names.
param - Some generic name. This makes clear that it can be several things, but does explain nothing about what the parameter is used for.
Terrible. Never. Do. This.
I looked in the Python library for examples. Most of the functions that do this have simple descriptions.
http://docs.python.org/library/functions.html#isinstance
isinstance(object, classinfo)
They call it "classinfo" and it can be a class or a tuple of classes.
You could do that, too.
You must consider the common use case and the exceptions. Follow the 80/20 rule.
80% of the time, you can replace this with an iterable and not have this problem.
In the remaining 20% of the cases, you have an installed base of software built around an assumption (either iterable or single item) and you need to add the other case. Don't change the name, just change the documentation. If it used to say "foo" it still says "foo" but you make it accept an iterable of "foo's" without making any change to the parameters. If it used to say "foo_list" or "foo_iter", then it still says "foo_list" or "foo_iter" but it will quietly tolerate a singleton without breaking.
80% of the code is the legacy ("foo" or "foo_list")
20% of the code is the new feature ("foo" can be an iterable or "foo_list" can be a single object.)
I guess I'm a little late to the party, but I'm suprised that nobody suggested a decorator.
def withmany(f):
def many(many_foos):
for foo in many_foos:
yield f(foo)
f.many = many
return f
#withmany
def process_foo(foo):
return foo + 1
processed_foo = process_foo(foo)
for processed_foo in process_foo.many(foos):
print processed_foo
I saw a similar pattern in one of Alex Martelli's posts but I don't remember the link off hand.
It sounds like you're agonizing over the ugliness of code like:
def ProcessWidget(widget_thing):
# Infer if we have a singleton instance and make it a
# length 1 list for consistency
if isinstance(widget_thing, WidgetType):
widget_thing = [widget_thing]
for widget in widget_thing:
#...
My suggestion is to avoid overloading your interface to handle two distinct cases. I tend to write code that favors re-use and clear naming of methods over clever dynamic use of parameters:
def ProcessOneWidget(widget):
#...
def ProcessManyWidgets(widgets):
for widget in widgets:
ProcessOneWidget(widget)
Often, I start with this simple pattern, but then have the opportunity to optimize the "Many" case when there are efficiencies to gain that offset the additional code complexity and partial duplication of functionality. If this convention seems overly verbose, one can opt for names like "ProcessWidget" and "ProcessWidgets", though the difference between the two is a single easily missed character.
You can use *args magic (varargs) to make your params always be iterable.
Pass a single item or multiple known items as normal function args like func(arg1, arg2, ...) and pass iterable arguments with an asterisk before, like func(*args)
Example:
# magic *args function
def foo(*args):
print args
# many ways to call it
foo(1)
foo(1, 2, 3)
args1 = (1, 2, 3)
args2 = [1, 2, 3]
args3 = iter((1, 2, 3))
foo(*args1)
foo(*args2)
foo(*args3)
Can you name your parameter in a very high-level way? people who read the code are more interested in knowing what the parameter represents ("clients") than what their type is ("list_of_tuples"); the type can be defined in the function documentation string, which is a good thing since it might change, in the future (the type is sometimes an implementation detail).
I would do 1 thing,
def myFunc(manyFoos):
if not type(manyFoos) in (list,tuple):
manyFoos = [manyFoos]
#do stuff here
so then you don't need to worry anymore about its name.
in a function you should try to achieve to have 1 action, accept the same parameter type and return the same type.
Instead of filling the functions with ifs you could have 2 functions.
Since you don't care exactly what kind of iterable you get, you could try to get an iterator for the parameter using iter(). If iter() raises a TypeError exception, the parameter is not iterable, so you then create a list or tuple of the one item, which is iterable and Bob's your uncle.
def doIt(foos):
try:
iter(foos)
except TypeError:
foos = [foos]
for foo in foos:
pass # do something here
The only problem with this approach is if foo is a string. A string is iterable, so passing in a single string rather than a list of strings will result in iterating over the characters in a string. If this is a concern, you could add an if test for it. At this point it's getting wordy for boilerplate code, so I'd break it out into its own function.
def iterfy(iterable):
if isinstance(iterable, basestring):
iterable = [iterable]
try:
iter(iterable)
except TypeError:
iterable = [iterable]
return iterable
def doIt(foos):
for foo in iterfy(foos):
pass # do something
Unlike some of those answering, I like doing this, since it eliminates one thing the caller could get wrong when using your API. "Be conservative in what you generate but liberal in what you accept."
To answer your original question, i.e. what you should name the parameter, I would still go with "foos" even though you will accept a single item, since your intent is to accept a list. If it's not iterable, that is technically a mistake, albeit one you will correct for the caller since processing just the one item is probably what they want. Also, if the caller thinks they must pass in an iterable even of one item, well, that will of course work fine and requires very little syntax, so why worry about correcting their misapprehension?
I would go with a name explaining that the parameter can be an instance or a list of instances. Say one_or_more_Foo_objects. I find it better than the bland param.
I'm working on a fairly big project now and we're passing maps around and just calling our parameter map. The map contents vary depending on the function that's being called. This probably isn't the best situation, but we reuse a lot of the same code on the maps, so copying and pasting is easier.
I would say instead of naming it what it is, you should name it what it's used for. Also, just be careful that you can't call use in on a not iterable.

Categories

Resources