What is the easiest way to check if something is a list?
A method doSomething has the parameters a and b. In the method, it will loop through the list a and do something. I'd like a way to make sure a is a list, before looping through - thus avoiding an error or the unfortunate circumstance of passing in a string then getting back a letter from each loop.
This question must have been asked before - however my googles failed me. Cheers.
To enable more usecases, but still treat strings as scalars, don't check for a being a list, check that it isn't a string:
if not isinstance(a, basestring):
...
Typechecking hurts the generality, simplicity, and maintainability of your code. It is seldom used in good, idiomatic Python programs.
There are two main reasons people want to typecheck:
To issue errors if the caller provides the wrong type.
This is not worth your time. If the user provides an incompatible type for the operation you are performing, an error will already be raised when the compatibility is hit. It is worrisome that this might not happen immediately, but it typically doesn't take long at all and results in code that is more robust, simple, efficient, and easier to write.
Oftentimes people insist on this with the hope they can catch all the dumb things a user can do. If a user is willing to do arbitrarily dumb things, there is nothing you can do to stop him. Typechecking mainly has the potential of keeping a user who comes in with his own types that are drop-in replacements for the ones replaced or when the user recognizes that your function should actually be polymorphic and provides something different that can accept the same operation.
If I had a big system where lots of things made by lots of people should fit together right, I would use a system like zope.interface to make testing that everything fits together right.
To do different things based on the types of the arguments received.
This makes your code worse because your API is inconsistent. A function or method should do one thing, not fundamentally different things. This ends up being a feature not usually worth supporting.
One common scenario is to have an argument that can either be a foo or a list of foos. A cleaner solution is simply to accept a list of foos. Your code is simpler and more consistent. If it's an important, common use case only to have one foo, you can consider having another convenience method/function that calls the one that accepts a list of foos and lose nothing. Providing the first API would not only have been more complicated and less consistent, but it would break when the types were not the exact values expected; in Python we distinguish between objects based on their capabilities, not their actual types. It's almost always better to accept an arbitrary iterable or a sequence instead of a list and anything that works like a foo instead of requiring a foo in particular.
As you can tell, I do not think either reason is compelling enough to typecheck under normal circumstances.
I'd like a way to make sure a is a list, before looping through
Document the function.
Usually it's considered not a good style to perform type-check in Python, but try
if isinstance(a, list):
...
(I think you may also check if a.__iter__ exists.)
Related
Is it a bad practice to modify function arguments?
_list = [1,2,3]
def modify_list(list):
list.append(4)
print(_list)
modify_list(_list)
print(_list)
At first it was supposed to be a comment, but it needed more formatting and space to explain the example. ;)
If you:
know what you're doing
can justify the use
don't use mutable default arguments (they are way too confusing in the way they behave, I can't imagine the reason their use would ever be justified)
don't use global mutables anywhere near that thing (modifying mutable global's contents AND modifying mutable argument's contents?)
and, most importantly, document this thing,
this thing shouldn't cause much harm (but still might bite you if you only think you know what you're doing, but in fact you don't) and can be useful!
Example:
I've worked with scripts (made by other programmers) that used mutable arguments. In this case: dictionaries.
The script was supposed to be run with threads but also allowed single-thread run. Using dictionaries instead of return values removed the difference of getting the result in single- and multiple-thread runs:
Normally value returned by a thread is encapsulated, but we only used the value after .join anyway and didn't care about threads killed by exceptions (single-thread run was mostly for debugging/local run).
That way, dictionaries (more than one in a single function) were used for appending new results in each run, without the need of collecting returned values manually and filtering them (the called function knew in which dict to put the result in, used lock to ensure thread safety).
Was it a "good" or "wrong" way of doing things?
In my opinion it was a pythonic way of doing things:
easily readable in both forms - dealing with the result was the same in single- and multi-threaded
data was "automatically" nicely formatted - as opposed to de-capsulating thread results, manual collecting and parsing them
and fairly easy to understand - with first and last point in my list above ;)
Whenever I chain conditions in Python (or any other language tbh) I stumble upon asking myself this, kicking me out of the productive "Zone".
When I chain conditions I can, by ordering them correctly, check conditions that without checking for the other conditions first, may produce an Error.
As an example lets assume the following snippet:
if "attr" in some_dictionary and some_value in some_dictionary["attr"]:
print("whooohooo")
If the first condition wasnt in the first place or even absent, the second condition my produce an KeyError
I do this pretty often to simply save space in the code, but I always wondered, if this is good style, if it comes with a risk or if its simply "pythonic".
A more Pythonic way is to "ask for forgivness rather than permission". In other words, use a try-except block:
try:
if some_value in some_dictionary["attr"]:
print("Woohoo")
except KeyError:
pass
Python is a late binding language, which is reflected in these kind of checks. The behavior is called short-circuiting. One thing I often do is:
def do(condition_check=None):
if condition_check is not None and condition_check():
# do stuff
Now, many people will argue that try: except: is more appropriate. This really depends on the use case!
if expressions are faster when the check is likely to fail, so use them when you know what is happening.
try expressions are faster when the check is likely to succeed, so use them to safeguard against exceptional circumstances.
if is explicit, so you know precisely what you are checking. Use it if you know what is happening, i.e. strongly typed situations.
try is implicit, so you only have to care about the outcome of a call. Use it when you don't care about the details, i.e. in weakly typed situations.
if works in a well-defined scope - namely right where you are performing the check. Use it for nested relations, where you want to check the top-most one.
try works on the entire contained call stack - an exception may be thrown several function calls deeper. Use it for flat or well-defined calls.
Basically, if is a precision tool, while try is a hammer - sometimes you need precision, and sometimes you just have nails.
I have a class with few methods - each one is setting some internal state, and usually requires some other method to be called first, to prepare stage.
Typical invocation goes like this:
c = MyMysteryClass()
c.connectToServer()
c.downloadData()
c.computeResults()
In some cases only connectToServer() and downloadData() will be called (or even just connectToServer() alone).
The question is: how should those methods behave when they are called in wrong order (or, in other words, when the internal state is not yet ready for their task)?
I see two solutions:
They should throw an exception
They should call correct previous method internally
Currently I'm using second approach, as it allows me to write less code (I can just write c.computeResults() and know that two other methods will be called if necessary). Plus, when I call them multiple times, I don't have to keep track of what was already called and so I avoid multiple reconnecting or downloading.
On the other hand, first approach seems more predictable from the caller perspective, and possibly less error prone.
And of course, there is a possibility for a hybrid solution: throw and exception, and add another layer of methods with internal state detection and proper calling of previous ones. But that seems to be a bit of an overkill.
Your suggestions?
They should throw an exception. As said in the Zen of Python: Explicit is better than implicit. And, for that matter, Errors should never pass silently. Unless explicitly silenced. If the methods are called out of order that's a programmer's mistake, and you shouldn't try to fix that by guessing what they mean. You might accidentally cover up an oversight in a way that looks like it works, but is not actually an accurate reflection of the programmer's intent. (That programmer may be future you.)
If these methods are usually called immediately one after another, you could consider collating them by adding a new method that simply calls them all in a row. That way you can use that method and not have to worry about getting it wrong.
Note that classes that handle internal state in this way are sometimes called for but are often not, in fact, necessary. Depending on your use case and the needs of the rest of your application, you may be better off doing this with functions and actually passing connection objects, etc. from one method to another, rather than using a class to store internal state. See for instance Stop Writing Classes. This is just something to consider and not an imperative; plenty of reasonable people disagree with the theory behind Stop Writing Classes.
You should write exceptions. It is good programming practice to write Exceptions to make your code easier to understand for the following reasons:
What you are describe fits the literal description of "exception" -- it is an exception to normal proceedings.
If you build in some kind of work around, you will likely have "spaghetti code" = BAD.
When you, or someone else goes back and reads this code later, it will be difficult to understand if you do not provide the hint that it is an exception to have these methods executed out of order.
Here's a good source:
http://jeffknupp.com/blog/2013/02/06/write-cleaner-python-use-exceptions/
As my CS professor always said "Good programmers can write code that computers can read, but great programmers write code that humans and computers can read".
I hope this helps.
If it's possible, you should make the dependencies explicit.
For your example:
c = MyMysteryClass()
connection = c.connectToServer()
data = c.downloadData(connection)
results = c.computeResults(data)
This way, even if you don't know how the library works, there's only one order the methods could be called in.
I am looking for example where things in python would be easier to program just because it is dynamically typed?
I want to compare it with Haskell type system because its static typing doesn't get in the way like c# or java. Can I program in Haskell as I can in python without static typing being a hindrance?
PS: I am a python user and have played around little bit with ML and Haskell.. ... I hope it is clear now..
Can I program in Haskell as I can in python without static typing being a hindrance
Yes.
To elaborate, I would say the main gotcha will be the use of existential types in Haskell for heterogeneous data structures (regular data structures holding lists of variously typed elements). This often catches OO people used to a top "Object" type. It often catches Lisp/Scheme programmers. But I'm not sure it will matter to a Pythonista.
Try to write some Haskell, and come back when you get a confusing type error.
You should think of static typing as a benefit -- it checks a lot of things for you, and the more you lean on it, the less things you have to test for. In addition, it enables the compiler to make your code much faster.
Well for one you can't create a list containing multiple types of values without wrappers (like to get a list that may contain a string or an int, you'd have to create a list of Either Int String and wrap each item in a Left or a Right).
You also can't define a function that may return multiple types of values (like if someCondition then 1 else "this won't compile"), again, without using wrappers.
Like Chris said, this is one objective question (what can a dynamically typed language do that a statically typed one can't?) and one subjective question (can I use Haskell without static typing being a hindrance). So you're going to get mostly subjective answers, because the first question is not as interesting.
For me, the biggest hindrance was Haskell's IO type, because I had to stop and think about what code does I/O and what code doesn't, and explicitly pass information between the two. Everything else was pretty easy. If you commonly write
if someCondition:
return 1
else:
return "other"
Then you're making your own problems, Python just doesn't stop you from doing it. Haskell will, and that's about the only difference. The only exception is that this is sort of common in Python:
if someErrorCondition:
return None
else:
return NewItem(Success)
You can't do that in Haskell because there is no common None object. But there are easy ways to work around it.
I did find the type errors confusing at first, but I learned to read them in about a week.
I want to echo Don's advice: just try writing some Haskell and come back when you get a confusing type error.
Are there any security exploits that could occur in this scenario:
eval(repr(unsanitized_user_input), {"__builtins__": None}, {"True":True, "False":False})
where unsanitized_user_input is a str object. The string is user-generated and could be nasty. Assuming our web framework hasn't failed us, it's a real honest-to-god str instance from the Python builtins.
If this is dangerous, can we do anything to the input to make it safe?
We definitely don't want to execute anything contained in the string.
See also:
Funny blog post about eval safety
Previous Question
Blog: Fast deserialization in Python
The larger context which is (I believe) not essential to the question is that we have thousands of these:
repr([unsanitized_user_input_1,
unsanitized_user_input_2,
unsanitized_user_input_3,
unsanitized_user_input_4,
...])
in some cases nested:
repr([[unsanitized_user_input_1,
unsanitized_user_input_2],
[unsanitized_user_input_3,
unsanitized_user_input_4],
...])
which are themselves converted to strings with repr(), put in persistent storage, and eventually read back into memory with eval.
Eval deserialized the strings from persistent storage much faster than pickle and simplejson. The interpreter is Python 2.5 so json and ast aren't available. No C modules are allowed and cPickle is not allowed.
It is indeed dangerous and the safest alternative is ast.literal_eval (see the ast module in the standard library). You can of course build and alter an ast to provide e.g. evaluation of variables and the like before you eval the resulting AST (when it's down to literals).
The possible exploit of eval starts with any object it can get its hands on (say True here) and going via .__class_ to its type object, etc. up to object, then gets its subclasses... basically it can get to ANY object type and wreck havoc. I can be more specific but I'd rather not do it in a public forum (the exploit is well known, but considering how many people still ignore it, revealing it to wannabe script kiddies could make things worse... just avoid eval on unsanitized user input and live happily ever after!-).
If you can prove beyond doubt that unsanitized_user_input is a str instance from the Python built-ins with nothing tampered, then this is always safe. In fact, it'll be safe even without all those extra arguments since eval(repr(astr)) = astr for all such string objects. You put in a string, you get back out a string. All you did was escape and unescape it.
This all leads me to think that eval(repr(x)) isn't what you want--no code will ever be executed unless someone gives you an unsanitized_user_input object that looks like a string but isn't, but that's a different question--unless you're trying to copy a string instance in the slowest way possible :D.
With everything as you describe, it is technically safe to eval repred strings, however, I'd avoid doing it anyway as it's asking for trouble:
There could be some weird corner-case where your assumption that only repred strings are stored (eg. a bug / different pathway into the storage that doesn't repr instantly becmes a code injection exploit where it might otherwise be unexploitable)
Even if everything is OK now, assumptions might change at some point, and unsanitised data may get stored in that field by someone unaware of the eval code.
Your code may get reused (or worse, copy+pasted) into a situation you didn't consider.
As Alex Martelli pointed out, in python2.6 and higher, there is ast.literal_eval which will safely handle both strings and other simple datatypes like tuples. This is probably the safest and most complete solution.
Another possibility however is to use the string-escape codec. This is much faster than eval (about 10 times according to timeit), available in earlier versions than literal_eval, and should do what you want:
>>> s = 'he\nllo\' wo"rld\0\x03\r\n\tabc'
>>> repr(s)[1:-1].decode('string-escape') == s
True
(The [1:-1] is to strip the outer quotes repr adds.)
Generally, you should never allow anyone to post code.
So called "paid professional programmers" have a hard-enough time writing code that actually works.
Accepting code from the anonymous public -- without benefit of formal QA -- is the worst of all possible scenarios.
Professional programmers -- without good, solid formal QA -- will make a hash of almost any web site. Indeed, I'm reverse engineering some unbelievably bad code from paid professionals.
The idea of allowing a non-professional -- unencumbered by QA -- to post code is truly terrifying.
repr([unsanitized_user_input_1,
unsanitized_user_input_2,
...
... unsanitized_user_input is a str object
You shouldn't have to serialise strings to store them in a database..
If these are all strings, as you mentioned - why can't you just store the strings in a db.StringListProperty?
The nested entries might be a bit more complicated, but why is this the case? When you have to resort to eval to get data from the database, you're probably doing something wrong..
Couldn't you store each unsanitized_user_input_x as it's own db.StringProperty row, and have group them by an reference field?
Either of those may not be applicable, since I've no idea what you're trying to achieve, but my point is - can you not structure the data in a way you where don't have to rely on eval (and also rely on it not being a security issue)?