So according to duck-typing advice, you aren't advised to check types in python, but simply see if an operation succeeds or fails. In which case, how do I sum a list of (mainly) numbers, while omitting the strings.
sum([1,2,3,4,'']) #fails
sum(filter(lambda x: type(x)==int, [1,2,3,4,''])) #bad style
I will do something like this
a = [1,2,3,4,'']
print sum(x if not isinstance(x,str) else 0 for x in a)
Well, I see two main solutions here:
Pre-processing: Filter the input data in order to prevent occurrences of 'missing data', might be quite complex. We can't help you on this point without more informations.
Post-processing: Filter the result list and remove 'missing data', easy but it isn't really scalable.
About post-processing, here is a solution using list comprehension, and another using your filter-based approach:
a = [1,2,3,4,'']
filtered_a = [x for x in t if isinstance(x, int)]
filtered_a = filter(lambda x: isinstance(x, int), a)
Then, you can simply do sum(filtered_a)
We can also argue that you can check for data consistency during the processing, and just don't add string in your array.
Related
When doing data processing tasks I often find myself applying a series of compositions, vectorized functions, etc. to some input iterable of data to generate a final result. Ideally I would like something that will work for both lists and generators (in addition to any other iterable). I can think of a number of approaches to structuring code to accomplish this, but every way I can think of has one or more ways where it feels unclean/unidiomatic to me. I have outlined below the different methods I can think of to do this, but my question is—is there a recommended, idiomatic way to do this?
Methods I can think of, illustrated with a simple example that is generally representative of:
Write it as one large expression
result = [sum(group)
for key, group in itertools.groupby(
filter(lambda x: x <= 2, [x **2 for x in input]),
keyfunc=lambda x: x % 3)]
This is often quite difficult to read for any non-trivial sequence of steps. When reading through the code one also encounters each step in reverse order.
Save each step into a different variable name
squared = [x**2 for x in input]
filtered = filter(lambda x: x < 2, squared)
grouped = itertools.groupby(filtered, keyfunc=lambda x: x % 3)
result = [sum(group) for key, group in grouped]
This introduces a number of local variables that can often be hard to name descriptively; additionally, if the result of some or all of the intermediate steps is especially large keeping them around could be very wasteful of memory. If one wants to add a step to this process, care must be taken that all variable names get updated correctly—for example, if we wished to divide every number by two we would add the line halved = [x / 2.0 for x in filtered], but would also have to remember to change filtered to halved in the following line.
Store each step into the same variable name
tmp = [x**2 for x in input]
tmp = filter(lambda x: x < 2, tmp)
tmp = itertools.groupby(tmp, keyfunc=lambda x: x % 3)
result = [sum(group) for key, group in tmp]
I guess this seems to me as the least-bad of these options, but storing things in a generically named placeholder variable feels un-pythonic to me and makes me suspect that there is some better way out there.
Code Review often is a better place for style questions. SO is more for problem solving. But CR can be picky about the completeness of the example.
But I can a few observations:
if you wrap this calculation in a function, naming isn't such a big deal. The names don't have to be globally meaningful.
a number of your expressions are generators. Itertools tends to produce generators or gen. expressions. So memory use shouldn't be much of an issue.
def better_name(input):
squared = (x**2 for x in input) # gen expression
filtered = filter(lambda x: x < 2, squared)
grouped = itertools.groupby(filtered, lambda x: x % 3)
result = (sum(group) for key, group in grouped)
return result
list(better_name(input))
Using def functions instead of lambdas can also make the code clearer. There's a trade off. Your lambdas are simple enough that I'd probably keep them.
Your 2nd option is much more readable than the 1st. The order of the expressions guides my reading and mental evaluation. In the 1st it's hard to identify the inner-most or first evaluation. And groupby is a complex operation, so any help in compartmentalizing the action is welcome.
Following the filter docs, these are equivalent:
filtered = filter(lambda x: x < 2, squared)
filtered = (x for x in squared if x<2)
I was missing the return. The function could return a generator as I show, or an evaluated list.
groupby keyfunc is not a keyword argument, but rather positional one.
groupby is complex function. It returns a generator that produces tuples, an element of which is a generator itself. Returning this makes it more obvious.
((key, list(group)) for key, group in grouped)
So a code style that clarifies its use is desirable.
From this answer I have a flattened list.
Now I want to remove duplicates and sort the list. Currently I have the following:
x = itertools.chain.from_iterable(results[env].values()) #From the linked answer
y = sorted(list(set(x)), key=lambda s:s.lower())
Is there a better way of accomplishing this? In my case x is of size ~32,000 and y ends up being of size ~1,100. What I have works, but I'd like to see if there's anything better (faster, more readable, etc)
Actually, if you just remove the list() which isn't needed, you've got a nice neat solution to your original problem. Your code is perfectly readable and efficient I think.
y = sorted(set(x), key=lambda s:s.lower())
Since results[env] is a dictionary you can create a set of union values instead of flattening the values then sort the result:
>>> sorted(set().union(*results[env].values()), key=str.lower)
Also note that you don't need a lambda function as your key, you can simple use str.lower method.
I have a string list
[str1, str2, str3.....]
and I also have a def to check the format of the strings, something like:
def CheckIP(strN):
if(formatCorrect(strN)):
return True
return False
Now I want to check every string in list, and of course I can use for to check one by one. But could I use map() to make code more readable...?
You can map your list to your function and then use all to check if it returns True for every item:
if all(map(CheckIP, list_of_strings)):
# All strings are good
Actually, it would be cleaner to just get rid of the CheckIP function and use formatCorrect directly:
if all(map(formatCorrect, list_of_strings)):
# All strings are good
Also, as an added bonus, all uses lazy-evaluation. Meaning, it only checks as many items as are necessary before returning a result.
Note however that a more common approach would be to use a generator expression instead of map:
if all(formatCorrect(x) for x in list_of_strings):
In my opinion, generator expressions are always better than map because:
They are slightly more readable.
They are just as fast if not faster than using map. Also, in Python 2.x, map creates a list object that is often unnecessary (wastes memory). Only in Python 3.x does map use lazy-computation like a generator expression.
They are more powerful. In addition to just mapping items to a function, generator expressions allow you to perform operations on each item as they are produced. For example:
sum(x * 2 for x in (1, 2, 3))
They are preferred by most Python programmers. Keeping with convention is important when programming because it eases maintenance and makes your code more understandable.
There is talk of removing functions like map, filter, etc. from a future version of the language. Though this is not set in stone, it has come up many times in the Python community.
Of course, if you are a fan of functional programming, there isn't much chance you'll agree to points one and four. :)
An example, how you could do:
in_str = ['str1', 'str2', 'str3', 'not']
in_str2 = ['str1', 'str2', 'str3']
def CheckIP(strN):
# different than yours, just to show example.
if 'str' in strN:
return True
else:
return False
print(all(map(CheckIP, in_str))) # gives false
print(all(map(CheckIP, in_str2))) # gives true
L = [str1, str2, str3.....]
answer = list(map(CheckIP, L))
answer is a list of booleans such that answer[i] is CheckIP(L[i]). If you want to further check if all of those values are True, you could use all:
all(answer)
This returns True if and only if all the values in answer are True. However, you may do this without listifying:
all(map(CheckIP, L)), as, in python3, `map` returns an iterator, not a list. This way, you don't waste space turning everything into a list. You also save on time, as the first `False` value makes `all` return `False`, stopping `map` from computing any remaining values
I've got list like this z = ['aaaaaa','bbbbbbbbbb','cccccccc'] i would like to cut off first 6 chars from all elements and if element is empty not to put in another list. So I made this code:
[x[6:] if x[6:] is not '' else pass for x in z]
I've tried with
pass
continue
and still syntax error. Maybe someone could help me with it? thanks
Whenever you need to filter items from a list, the condition has to be at the end. So you need to filter the empty items, like this
[x[6:] for x in z if x[6:] != ""]
# ['bbbb', 'cc']
Since, an empty string is falsy, we can write the same condition succinctly as follows
[x[6:] for x in z if x[6:]]
As an alternative, as tobias_k suggested, you can check the length of the string like this
[x[6:] for x in z if len(x) > 6]
If you are learning to do with lambda(not an official link), you should try with map and filter like this:
filter(None, map(lambda y: y[6:], x))
Here, the map(lambda y: y[6:], x) will keep only strings from 7th character and replace other smaller strings with Boolean 'False'. To remove all these 'False' values from the new list, we will use filter function.
You can take this only for learning purposes as this is downright ugly when Python's PEP8 is considered. List comprehension is the way to go like mentioned above.
[y[6:] for y in x if y[6:]]
Or the traditional for loop as
output = []
for y in x:
if isinstance(y, str) and y[6:]:
output.append(y[6:])
Please note that even though the traditional way looks bigger, it can add more values(like here, taking only the strings from the list if the list has other data types such as lists, tuples, dictionaries, etc)
So I would suggest either to stick with list comprehensions for simple controlled lists or the traditional way for controlled output
Is there a more pythonic way to do the following code? I would like to do it in one line
parsed_rows is a function that can return a tuple of size 3, or None.
parsed_rows = [ parse_row(tr) for tr in tr_els ]
data = [ x for x in parsed_rows if x is not None ]
Doing this in one line won't make it more Pythonic; it will make it less readable. If you really want to, you can always translate it directly by substitution like this:
data = [x for x in [parse_row(tr) for tr in tr_els] if x is not None]
… which can obviously be flattened as Doorknob of Snow shows, but it's still hard to understand. However, he didn't get it quite right: clauses nest from left to right, and you want x to be each parse_row result, not each element of each parse_row result (as Volatility points out), so the flattened version would be:
data = [x for tr in tr_els for x in (parse_row(tr),) if x is not None]
I think the fact that a good developer got it backward and 6 people upvoted it before anyone realized the problem, and then I missed a second problem and 7 more people upvoted that before anyone caught it, is pretty solid proof that this is not more pythonic or more readable, just as Doorknob said. :)
In general, when faced with either a nested comp or a comp with multiple for clauses, if it's not immediately obvious what it does, you should translate it into nested for and if statements with an innermost append expression statement, as shown in the tutorial. But if you need to do that with a comprehension you're trying to write, it's a pretty good sign you shouldn't be trying to write it…
However, there is a way to make this more Pythonic, and also more efficient: change the first list comprehension to a generator expression, like this:
parsed_rows = (parse_row(tr) for tr in tr_els)
data = [x for x in parsed_rows if x is not None]
All I did is change the square brackets to parentheses, and that's enough to compute the first one lazily, calling parse_row on each tr as needed, instead of calling it on all of the rows, and building up a list in memory that you don't actually need, before you even get started on the real work.
In fact, if the only reason you need data is to iterate over it once (or to convert it into some other form, like a CSV file or a NumPy array), you can make that a generator expression as well.
Or, even better, replace the list comprehension with a map call. When your expression is just "call this function on each element", map is generally more readable (whereas when you have to write a new function, especially with lambda, just to wrap up some more complex expression, it's usually not). So:
parsed_rows = map(parse_row, tr_els)
data = [x for x in parsed_rows if x is not None]
And now it actually is readable to sub in:
data = [x for x in map(parse_row, tr_els) if x is not None]
You could similarly turn the second comprehension into a filter call. However, just as with map, if the predicate isn't just "call this function and see if it returns something truthy", it usually ends up being less readable. In this case:
data = filter(lambda x: x is not None, map(parse_row, tr_els))
But notice that you really don't need to check is not None in the first place. The only non-None values you have are 3-tuples, which are always truthy. So, you can replace the if x is not None with if x, which can simplifies your comprehension:
data = [x for x in map(parse_row, tr_else) if x]
… and which can be written in two different ways with filter:
data = filter(bool, map(parse_row, tr_els))
data = filter(None, map(parse_row, tr_els))
Asking which of those two is better will start a religious war on any of the Python lists, so I'll just present them both and let you decide.
Note that if you're using Python 2.x, map is not lazy; it will generate the whole intermediate list. So, if you want to get the best of both worlds, and can't use Python 3, use itertools.imap instead of map. An in the same way, in 3.x, filter is lazy, so if you want a list, use list(filter(…)).
You can nest one in the other:
data = [x for tr in tr_els for x in parse_row(tr) if x is not None]
(Also, #Volatility points out that this will give an error if parse_row(tr) is None, which can be solved like this:
data = [x for tr in tr_els for x in (parse_row(tr),) if x is not None]
)
However, in my opinion this is much less readable. Shorter is not always better.