python list comprehensions invalid syntax while if statement - python

I've got list like this z = ['aaaaaa','bbbbbbbbbb','cccccccc'] i would like to cut off first 6 chars from all elements and if element is empty not to put in another list. So I made this code:
[x[6:] if x[6:] is not '' else pass for x in z]
I've tried with
pass
continue
and still syntax error. Maybe someone could help me with it? thanks

Whenever you need to filter items from a list, the condition has to be at the end. So you need to filter the empty items, like this
[x[6:] for x in z if x[6:] != ""]
# ['bbbb', 'cc']
Since, an empty string is falsy, we can write the same condition succinctly as follows
[x[6:] for x in z if x[6:]]
As an alternative, as tobias_k suggested, you can check the length of the string like this
[x[6:] for x in z if len(x) > 6]

If you are learning to do with lambda(not an official link), you should try with map and filter like this:
filter(None, map(lambda y: y[6:], x))
Here, the map(lambda y: y[6:], x) will keep only strings from 7th character and replace other smaller strings with Boolean 'False'. To remove all these 'False' values from the new list, we will use filter function.
You can take this only for learning purposes as this is downright ugly when Python's PEP8 is considered. List comprehension is the way to go like mentioned above.
[y[6:] for y in x if y[6:]]
Or the traditional for loop as
output = []
for y in x:
if isinstance(y, str) and y[6:]:
output.append(y[6:])
Please note that even though the traditional way looks bigger, it can add more values(like here, taking only the strings from the list if the list has other data types such as lists, tuples, dictionaries, etc)
So I would suggest either to stick with list comprehensions for simple controlled lists or the traditional way for controlled output

Related

Python List Comprehension: Using "if" Statement on Result of the Comprehension

Can you filter a list comprehension based on the result of the transformation in the comprehension?
For example, suppose you want to strip each string in a list, and remove strings that are just whitespace. I could easily do the following:
filter(None, [x.strip() for x in str_list])
But that iterates over the list twice. Alternatively you could do the following:
[x.strip() for x in str_list if x.strip()]
But that implementation performs strip twice. I could also use a generator:
for x in str_list:
x = x.strip()
if x:
yield x
But now that's a bunch of lines of code. Is there any way of doing the above (1) only iterating once; (2) only performing the transformation once; and (3) in a single list comprehension? The example above is a toy example, but I'd like to do this with longer lists and non-trivial transforms.
Update: I'm using Python 2.7.X, and prefer answers in that, but if Python 3 has some new features that make this easy, I'd be happy to learn about them as well.
Don't pass a list to filter, pass a generator expression, and it will only be iterated once:
filter(None, (x.strip() for x in str_list))
This is exactly the same idea as using a nested generator like
[y for y in (x.strip() for x in str_list) if y]
Both cases rely on the lazy evaluation of generators: each element of str_list will be processed exactly once, when the corresponding output element is created. No intermediate lists will be made.
The comprehension approach is nice for small one-shot transformations like these. Even the simple example here, which filters after the transformation, is pushing the limits of readability in my opinion. With any non-trivial sequence of transformations and filters, I would recommend using a for loop.
Why not nest the comprehensions?
result = (x for x in (y.strip() for y in str_list) if len(x))
The inner () makes a generator that is just the stripped versions of strings in str_list. The outer () makes a second generator that consumes the first and produces only the non-empty elements. You traverse the list only once and strip each string only once.

Sum a python list without the string values

So according to duck-typing advice, you aren't advised to check types in python, but simply see if an operation succeeds or fails. In which case, how do I sum a list of (mainly) numbers, while omitting the strings.
sum([1,2,3,4,'']) #fails
sum(filter(lambda x: type(x)==int, [1,2,3,4,''])) #bad style
I will do something like this
a = [1,2,3,4,'']
print sum(x if not isinstance(x,str) else 0 for x in a)
Well, I see two main solutions here:
Pre-processing: Filter the input data in order to prevent occurrences of 'missing data', might be quite complex. We can't help you on this point without more informations.
Post-processing: Filter the result list and remove 'missing data', easy but it isn't really scalable.
About post-processing, here is a solution using list comprehension, and another using your filter-based approach:
a = [1,2,3,4,'']
filtered_a = [x for x in t if isinstance(x, int)]
filtered_a = filter(lambda x: isinstance(x, int), a)
Then, you can simply do sum(filtered_a)
We can also argue that you can check for data consistency during the processing, and just don't add string in your array.

python list modification to list of lists

I am trying to learn python (just finished Learn Python the Hard Way book!), but I seem to be struggling a bit with lists. Specifically speaking, I have a list like so:
x = ["/2.ext", "/4.ext", "/5.ext", "/1.ext"]
I would like to operate on this above list, so that it returns a list (somehow!) like so:
y = [ ["/1.ext", "/2.ext"], ["/1.ext", "/2.ext", "/3.ext, "/4.ext"], ["/1.ext", "/2.ext", "/3.ext", "/4.ext", "/5.ext"], ["/1.ext"] ]
So, essentially, each element in x is now turned to a list of lists. I could probably loop over x, store all the sequence lists in another list and then merge then together - but it just seems like there must be a better way to do it.
Would be grateful if someone could point me in the right direction to solve this problem.
EDIT (taking into account Martijn's comments):
Specifically, I want to generate the intermediary filenames in a sequence, ending at the number for each x list element
You can do it as follows:
x = ["/2.ext", "/4.ext", "/5.ext", "/1.ext"]
print [['/{}.ext'.format(j) for j in range(1,int(i[1])+1)] for i in x]
[OUTPUT]
[['/1.ext', '/2.ext'], ['/1.ext', '/2.ext', '/3.ext', '/4.ext'], ['/1.ext', '/2.ext', '/3.ext', '/4.ext', '/5.ext'], ['/1.ext']]
This only works for digits upto 9. I'll post update for more general solutions
HERE is the more general solution. Works for any numbers:
import re
x = ["/2.ext", "/4.ext", "/5.ext", "/1.ext"]
print [['/{}.ext'.format(j) for j in range(1,int(re.search(r'\d+',i).group(0))+1)] for i in x]

Pythonic Improvement Of Function In List Comprehension?

Is there a more pythonic way to do the following code? I would like to do it in one line
parsed_rows is a function that can return a tuple of size 3, or None.
parsed_rows = [ parse_row(tr) for tr in tr_els ]
data = [ x for x in parsed_rows if x is not None ]
Doing this in one line won't make it more Pythonic; it will make it less readable. If you really want to, you can always translate it directly by substitution like this:
data = [x for x in [parse_row(tr) for tr in tr_els] if x is not None]
… which can obviously be flattened as Doorknob of Snow shows, but it's still hard to understand. However, he didn't get it quite right: clauses nest from left to right, and you want x to be each parse_row result, not each element of each parse_row result (as Volatility points out), so the flattened version would be:
data = [x for tr in tr_els for x in (parse_row(tr),) if x is not None]
I think the fact that a good developer got it backward and 6 people upvoted it before anyone realized the problem, and then I missed a second problem and 7 more people upvoted that before anyone caught it, is pretty solid proof that this is not more pythonic or more readable, just as Doorknob said. :)
In general, when faced with either a nested comp or a comp with multiple for clauses, if it's not immediately obvious what it does, you should translate it into nested for and if statements with an innermost append expression statement, as shown in the tutorial. But if you need to do that with a comprehension you're trying to write, it's a pretty good sign you shouldn't be trying to write it…
However, there is a way to make this more Pythonic, and also more efficient: change the first list comprehension to a generator expression, like this:
parsed_rows = (parse_row(tr) for tr in tr_els)
data = [x for x in parsed_rows if x is not None]
All I did is change the square brackets to parentheses, and that's enough to compute the first one lazily, calling parse_row on each tr as needed, instead of calling it on all of the rows, and building up a list in memory that you don't actually need, before you even get started on the real work.
In fact, if the only reason you need data is to iterate over it once (or to convert it into some other form, like a CSV file or a NumPy array), you can make that a generator expression as well.
Or, even better, replace the list comprehension with a map call. When your expression is just "call this function on each element", map is generally more readable (whereas when you have to write a new function, especially with lambda, just to wrap up some more complex expression, it's usually not). So:
parsed_rows = map(parse_row, tr_els)
data = [x for x in parsed_rows if x is not None]
And now it actually is readable to sub in:
data = [x for x in map(parse_row, tr_els) if x is not None]
You could similarly turn the second comprehension into a filter call. However, just as with map, if the predicate isn't just "call this function and see if it returns something truthy", it usually ends up being less readable. In this case:
data = filter(lambda x: x is not None, map(parse_row, tr_els))
But notice that you really don't need to check is not None in the first place. The only non-None values you have are 3-tuples, which are always truthy. So, you can replace the if x is not None with if x, which can simplifies your comprehension:
data = [x for x in map(parse_row, tr_else) if x]
… and which can be written in two different ways with filter:
data = filter(bool, map(parse_row, tr_els))
data = filter(None, map(parse_row, tr_els))
Asking which of those two is better will start a religious war on any of the Python lists, so I'll just present them both and let you decide.
Note that if you're using Python 2.x, map is not lazy; it will generate the whole intermediate list. So, if you want to get the best of both worlds, and can't use Python 3, use itertools.imap instead of map. An in the same way, in 3.x, filter is lazy, so if you want a list, use list(filter(…)).
You can nest one in the other:
data = [x for tr in tr_els for x in parse_row(tr) if x is not None]
(Also, #Volatility points out that this will give an error if parse_row(tr) is None, which can be solved like this:
data = [x for tr in tr_els for x in (parse_row(tr),) if x is not None]
)
However, in my opinion this is much less readable. Shorter is not always better.

Check if string in strings

I have a huge list containing many strings like:
['xxxx','xx','xy','yy','x',......]
Now I am looking for an efficient way that removes all strings that are present within another string. For example 'xx' 'x' fit in 'xxxx'.
As the dataset is huge, I was wondering if there is an efficient method for this beside
if a in b:
The complete code: With maybe some optimization parts:
for x in range(len(taxlistcomplete)):
if delete == True:
x = x - 1
delete = False
for y in range(len(taxlistcomplete)):
if taxlistcomplete[x] in taxlistcomplete[y]:
if x != y:
print x,y
print taxlistcomplete[x]
del taxlistcomplete[x]
delete = True
break
print x, len(taxlistcomplete)
An updated version of the code:
for x in enumerate(taxlistcomplete):
if delete == True:
#If element is removed, I need to step 1 back and continue looping.....
delete = False
for y in enumerate(taxlistcomplete):
if x[1] in y[1]:
if x[1] != y[1]:
print x[1],y[1]
print taxlistcomplete[x]
del taxlistcomplete[x[0]]
delete = True
break
print x, len(taxlistcomplete)
Now implemented with the enumerate, only now I am wondering if this is more efficient and howto implement the delete step so I have less to search in as well.
Just a short thought...
Basically what I would like to see...
if element does not match any other elements in list write this one to a file.
Thus if 'xxxxx' not in 'xx','xy','wfirfj',etc... print/save
A new simple version as I dont think I can optimize it much further anyway...
print 'comparison'
file = open('output.txt','a')
for x in enumerate(taxlistcomplete):
delete = False
for y in enumerate(taxlistcomplete):
if x[1] in y[1]:
if x[1] != y[1]:
taxlistcomplete[x[0]] = ''
delete = True
break
if delete == False:
file.write(str(x))
x in <string> is fast, but checking each string against all other strings in the list will take O(n^2) time. Instead of shaving a few cycles by optimizing the comparison, you can achieve huge savings by using a different data structure so that you can check each string in just one lookup: For two thousand strings, that's two thousand checks instead of four million.
There's a data structure called a "prefix tree" (or trie) that allows you to very quickly check whether a string is a prefix of some string you've seen before. Google it. Since you're also interested in strings that occur in the middle of another string x, index all substrings of the form x, x[1:], x[2:], x[3:], etc. (So: only n substrings for a string of length n). That is, you index substrings that start in position 0, 1, 2, etc. and continue to the end of the string. That way you can just check if a new string is an initial part of something in your index.
You can then solve your problem in O(n) time like this:
Order your strings in order of decreasing length. This ensures that no string could be a substring of something you haven't seen yet. Since you only care about length, you can do a bucket sort in O(n) time.
Start with an empty prefix tree and loop over your ordered list of strings. For each string x, use your prefix tree to check whether it is a substring of a string you've seen before. If not, add its substrings x, x[1:], x[2:] etc. to the prefix tree.
Deleting in the middle of a long list is very expensive, so you'll get a further speedup if you collect the strings you want to keep into a new list (the actual string is not copied, just the reference). When you're done, delete the original list and the prefix tree.
If that's too complicated for you, at least don't compare everything with everything. Sort your strings by size (in decreasing order), and only check each string against the ones that have come before it. This will give you a 50% speedup with very little effort. And do make a new list (or write to a file immediately) instead of deleting in place.
Here is a simple approach, assuming you can identify a character (I will use '$' in my example) that is guaranteed not to be in any of the original strings:
result = ''
for substring in taxlistcomplete:
if substring not in result: result += '$' + substring
taxlistcomplete = result.split('$')
This leverages Python's internal optimizations for substring searching by just making one big string to substring-search :)
Here is my suggestion. First I sort the elements by length. Because obviously the shorter the string is, the more likely it is to be a substring of another string. Then I have two for loops, where I run through the list and remove every element from the list where el is a substring. Note that the first for loop only passes each element once.
By sortitng the list first, we destroy the order of elements in the list. So if the order is important, then you can't use this solution.
Edit. I assume there are no identical elements in the list. So that when el == el2, it's because its the same element.
a = ["xyy", "xx", "zy", "yy", "x"]
a.sort(key=len)
for el in a:
for el2 in a:
if el in el2 and el != el2:
a.remove(el2)
Using a list comprehension -- note in -- is the fastest and more Pythonic way of solving your problem:
[element for element in arr if 'xx' in element]

Categories

Resources