Pythonic Improvement Of Function In List Comprehension? - python

Is there a more pythonic way to do the following code? I would like to do it in one line
parsed_rows is a function that can return a tuple of size 3, or None.
parsed_rows = [ parse_row(tr) for tr in tr_els ]
data = [ x for x in parsed_rows if x is not None ]

Doing this in one line won't make it more Pythonic; it will make it less readable. If you really want to, you can always translate it directly by substitution like this:
data = [x for x in [parse_row(tr) for tr in tr_els] if x is not None]
… which can obviously be flattened as Doorknob of Snow shows, but it's still hard to understand. However, he didn't get it quite right: clauses nest from left to right, and you want x to be each parse_row result, not each element of each parse_row result (as Volatility points out), so the flattened version would be:
data = [x for tr in tr_els for x in (parse_row(tr),) if x is not None]
I think the fact that a good developer got it backward and 6 people upvoted it before anyone realized the problem, and then I missed a second problem and 7 more people upvoted that before anyone caught it, is pretty solid proof that this is not more pythonic or more readable, just as Doorknob said. :)
In general, when faced with either a nested comp or a comp with multiple for clauses, if it's not immediately obvious what it does, you should translate it into nested for and if statements with an innermost append expression statement, as shown in the tutorial. But if you need to do that with a comprehension you're trying to write, it's a pretty good sign you shouldn't be trying to write it…
However, there is a way to make this more Pythonic, and also more efficient: change the first list comprehension to a generator expression, like this:
parsed_rows = (parse_row(tr) for tr in tr_els)
data = [x for x in parsed_rows if x is not None]
All I did is change the square brackets to parentheses, and that's enough to compute the first one lazily, calling parse_row on each tr as needed, instead of calling it on all of the rows, and building up a list in memory that you don't actually need, before you even get started on the real work.
In fact, if the only reason you need data is to iterate over it once (or to convert it into some other form, like a CSV file or a NumPy array), you can make that a generator expression as well.
Or, even better, replace the list comprehension with a map call. When your expression is just "call this function on each element", map is generally more readable (whereas when you have to write a new function, especially with lambda, just to wrap up some more complex expression, it's usually not). So:
parsed_rows = map(parse_row, tr_els)
data = [x for x in parsed_rows if x is not None]
And now it actually is readable to sub in:
data = [x for x in map(parse_row, tr_els) if x is not None]
You could similarly turn the second comprehension into a filter call. However, just as with map, if the predicate isn't just "call this function and see if it returns something truthy", it usually ends up being less readable. In this case:
data = filter(lambda x: x is not None, map(parse_row, tr_els))
But notice that you really don't need to check is not None in the first place. The only non-None values you have are 3-tuples, which are always truthy. So, you can replace the if x is not None with if x, which can simplifies your comprehension:
data = [x for x in map(parse_row, tr_else) if x]
… and which can be written in two different ways with filter:
data = filter(bool, map(parse_row, tr_els))
data = filter(None, map(parse_row, tr_els))
Asking which of those two is better will start a religious war on any of the Python lists, so I'll just present them both and let you decide.
Note that if you're using Python 2.x, map is not lazy; it will generate the whole intermediate list. So, if you want to get the best of both worlds, and can't use Python 3, use itertools.imap instead of map. An in the same way, in 3.x, filter is lazy, so if you want a list, use list(filter(…)).

You can nest one in the other:
data = [x for tr in tr_els for x in parse_row(tr) if x is not None]
(Also, #Volatility points out that this will give an error if parse_row(tr) is None, which can be solved like this:
data = [x for tr in tr_els for x in (parse_row(tr),) if x is not None]
)
However, in my opinion this is much less readable. Shorter is not always better.

Related

Is there a more elegant way to remove Nones from list after list comprehension

I'm using a list comprehension to map a function that either returns a value or None.
My function looks like this (this is extremely simplified, just to give you a general idea)
def convertline(x):
if x == 'undesirablevalue':
return None
else:
# do some logic
# do some logic
# do some logic
return somecalculatedvalue
and I have it iterating over a list in a list comprehension, like so. To filter out the nones, I use a list comprehension.
items = [convertline(line) for line in sample2.splitlines()]
items = [x for x in items if x is not None]
But the above code seems bulky.
I realized I could also do this:
items = [convertline(line) for line in sample2.splitlines() if convertline(line) is not None]
But this seems garbled, and I also do the math twice. Is there a better, more elegant way to do this? Both solutions seem kind of bulky. Thanks
There's really nothing wrong with your original approach. I would greatly prefer it over the approach that calls the function twice, that seems definitely wasteful, especially if it does a lot of work.
If you are using >=Python3.8, you can use an assignment expression:
[result for x in data if (result:= foo(x)) is not None]
Alternatively, the following which uses map, only does a single pass and doesn't build an intermediate list:
[x for x in map(foo, data) if x is not None]
You could it the other way around:
def convertline(x):
# do some logic
# do some logic
# do some logic
return somecalculatedvalue
items = [convertline(line) for line in sample2.splitlines() if line is not 'undesirablevalue']
A function that returns None is a bit weird in my opinion.

Python List Comprehension: Using "if" Statement on Result of the Comprehension

Can you filter a list comprehension based on the result of the transformation in the comprehension?
For example, suppose you want to strip each string in a list, and remove strings that are just whitespace. I could easily do the following:
filter(None, [x.strip() for x in str_list])
But that iterates over the list twice. Alternatively you could do the following:
[x.strip() for x in str_list if x.strip()]
But that implementation performs strip twice. I could also use a generator:
for x in str_list:
x = x.strip()
if x:
yield x
But now that's a bunch of lines of code. Is there any way of doing the above (1) only iterating once; (2) only performing the transformation once; and (3) in a single list comprehension? The example above is a toy example, but I'd like to do this with longer lists and non-trivial transforms.
Update: I'm using Python 2.7.X, and prefer answers in that, but if Python 3 has some new features that make this easy, I'd be happy to learn about them as well.
Don't pass a list to filter, pass a generator expression, and it will only be iterated once:
filter(None, (x.strip() for x in str_list))
This is exactly the same idea as using a nested generator like
[y for y in (x.strip() for x in str_list) if y]
Both cases rely on the lazy evaluation of generators: each element of str_list will be processed exactly once, when the corresponding output element is created. No intermediate lists will be made.
The comprehension approach is nice for small one-shot transformations like these. Even the simple example here, which filters after the transformation, is pushing the limits of readability in my opinion. With any non-trivial sequence of transformations and filters, I would recommend using a for loop.
Why not nest the comprehensions?
result = (x for x in (y.strip() for y in str_list) if len(x))
The inner () makes a generator that is just the stripped versions of strings in str_list. The outer () makes a second generator that consumes the first and produces only the non-empty elements. You traverse the list only once and strip each string only once.

Python Idiom for applying sequential steps to an iterable

When doing data processing tasks I often find myself applying a series of compositions, vectorized functions, etc. to some input iterable of data to generate a final result. Ideally I would like something that will work for both lists and generators (in addition to any other iterable). I can think of a number of approaches to structuring code to accomplish this, but every way I can think of has one or more ways where it feels unclean/unidiomatic to me. I have outlined below the different methods I can think of to do this, but my question is—is there a recommended, idiomatic way to do this?
Methods I can think of, illustrated with a simple example that is generally representative of:
Write it as one large expression
result = [sum(group)
for key, group in itertools.groupby(
filter(lambda x: x <= 2, [x **2 for x in input]),
keyfunc=lambda x: x % 3)]
This is often quite difficult to read for any non-trivial sequence of steps. When reading through the code one also encounters each step in reverse order.
Save each step into a different variable name
squared = [x**2 for x in input]
filtered = filter(lambda x: x < 2, squared)
grouped = itertools.groupby(filtered, keyfunc=lambda x: x % 3)
result = [sum(group) for key, group in grouped]
This introduces a number of local variables that can often be hard to name descriptively; additionally, if the result of some or all of the intermediate steps is especially large keeping them around could be very wasteful of memory. If one wants to add a step to this process, care must be taken that all variable names get updated correctly—for example, if we wished to divide every number by two we would add the line halved = [x / 2.0 for x in filtered], but would also have to remember to change filtered to halved in the following line.
Store each step into the same variable name
tmp = [x**2 for x in input]
tmp = filter(lambda x: x < 2, tmp)
tmp = itertools.groupby(tmp, keyfunc=lambda x: x % 3)
result = [sum(group) for key, group in tmp]
I guess this seems to me as the least-bad of these options, but storing things in a generically named placeholder variable feels un-pythonic to me and makes me suspect that there is some better way out there.
Code Review often is a better place for style questions. SO is more for problem solving. But CR can be picky about the completeness of the example.
But I can a few observations:
if you wrap this calculation in a function, naming isn't such a big deal. The names don't have to be globally meaningful.
a number of your expressions are generators. Itertools tends to produce generators or gen. expressions. So memory use shouldn't be much of an issue.
def better_name(input):
squared = (x**2 for x in input) # gen expression
filtered = filter(lambda x: x < 2, squared)
grouped = itertools.groupby(filtered, lambda x: x % 3)
result = (sum(group) for key, group in grouped)
return result
list(better_name(input))
Using def functions instead of lambdas can also make the code clearer. There's a trade off. Your lambdas are simple enough that I'd probably keep them.
Your 2nd option is much more readable than the 1st. The order of the expressions guides my reading and mental evaluation. In the 1st it's hard to identify the inner-most or first evaluation. And groupby is a complex operation, so any help in compartmentalizing the action is welcome.
Following the filter docs, these are equivalent:
filtered = filter(lambda x: x < 2, squared)
filtered = (x for x in squared if x<2)
I was missing the return. The function could return a generator as I show, or an evaluated list.
groupby keyfunc is not a keyword argument, but rather positional one.
groupby is complex function. It returns a generator that produces tuples, an element of which is a generator itself. Returning this makes it more obvious.
((key, list(group)) for key, group in grouped)
So a code style that clarifies its use is desirable.

Sum a python list without the string values

So according to duck-typing advice, you aren't advised to check types in python, but simply see if an operation succeeds or fails. In which case, how do I sum a list of (mainly) numbers, while omitting the strings.
sum([1,2,3,4,'']) #fails
sum(filter(lambda x: type(x)==int, [1,2,3,4,''])) #bad style
I will do something like this
a = [1,2,3,4,'']
print sum(x if not isinstance(x,str) else 0 for x in a)
Well, I see two main solutions here:
Pre-processing: Filter the input data in order to prevent occurrences of 'missing data', might be quite complex. We can't help you on this point without more informations.
Post-processing: Filter the result list and remove 'missing data', easy but it isn't really scalable.
About post-processing, here is a solution using list comprehension, and another using your filter-based approach:
a = [1,2,3,4,'']
filtered_a = [x for x in t if isinstance(x, int)]
filtered_a = filter(lambda x: isinstance(x, int), a)
Then, you can simply do sum(filtered_a)
We can also argue that you can check for data consistency during the processing, and just don't add string in your array.

python list comprehensions invalid syntax while if statement

I've got list like this z = ['aaaaaa','bbbbbbbbbb','cccccccc'] i would like to cut off first 6 chars from all elements and if element is empty not to put in another list. So I made this code:
[x[6:] if x[6:] is not '' else pass for x in z]
I've tried with
pass
continue
and still syntax error. Maybe someone could help me with it? thanks
Whenever you need to filter items from a list, the condition has to be at the end. So you need to filter the empty items, like this
[x[6:] for x in z if x[6:] != ""]
# ['bbbb', 'cc']
Since, an empty string is falsy, we can write the same condition succinctly as follows
[x[6:] for x in z if x[6:]]
As an alternative, as tobias_k suggested, you can check the length of the string like this
[x[6:] for x in z if len(x) > 6]
If you are learning to do with lambda(not an official link), you should try with map and filter like this:
filter(None, map(lambda y: y[6:], x))
Here, the map(lambda y: y[6:], x) will keep only strings from 7th character and replace other smaller strings with Boolean 'False'. To remove all these 'False' values from the new list, we will use filter function.
You can take this only for learning purposes as this is downright ugly when Python's PEP8 is considered. List comprehension is the way to go like mentioned above.
[y[6:] for y in x if y[6:]]
Or the traditional for loop as
output = []
for y in x:
if isinstance(y, str) and y[6:]:
output.append(y[6:])
Please note that even though the traditional way looks bigger, it can add more values(like here, taking only the strings from the list if the list has other data types such as lists, tuples, dictionaries, etc)
So I would suggest either to stick with list comprehensions for simple controlled lists or the traditional way for controlled output

Categories

Resources