In this example:
sorted_data = [files.data[ind] for ind in sort_inds]
May someone please provide an explanation as to how the expression behind the for loop is related or how it is working, thanks.
It's called a List Comprehension
In other words
sorted_data = [files.data[ind] for ind in sort_inds]
is equivalent to:
sorted_data = []
for ind in sort_inds:
sorted_data.append(files.data[ind])
It's just a lot more readable using the comprehension
ok so here is a simple example:
say i have a list of ints:
nums = [1,2,3]
and i do this:
[i**2 for i in nums]
it will output:
[1, 4, 9]
this is equivalent to this:
for i in nums:
list.append(i**2)
because it iterated through the list and squared each item in the list
another example:
say i have a list of strings like this:
list1 = ['hey, jim','hey, pam', 'hey dwight']
and I do this:
[phrase.split(',') for phrase in list1]
this will output this list:
[['hey', ' jim'], ['hey', ' pam'], ['hey dwight']]
this is equivalent too:
for phrase in list1:
new_phrase = phrase.split(',')
list.append(new_phrase)
it went through and made a list out of each item but it used split on each item
its basically a compacted for loop and instead of using append() it just creates the list!. it is much more readable and takes less lines
learn more here
It means for every item in this case ind present in sort_inds, pass it as a parameter to the function files.data[ind].
Save the result in a list (sorted_data)
Related
I'm trying to do an exercise where I have a list:
list_1 = ['chocolate;1.20', 'book;5.50', 'hat;3.25']
And I have to make a second list out of it that looks like this:
list_2 = [['chocolate', 1.20], ['book', 5.50], ['hat', 3.25]]
In the second list the numbers have to be floats and without the ' '
So far I've come up with this code:
for item in list_1:
list_2.append(item.split(';'))
The output looks about right:
[['chocolate', '1.20'], ['book', '5.50'], ['hat', '3.25']]
But how do I convert those numbers into floats and remove the double quotes?
I tried:
for item in list_2:
if(item.isdigit()):
item = float(item)
Getting:
AttributeError: 'list' object has no attribute 'isdigit'
list_1 = ['chocolate;1.20', 'book;5.50', 'hat;3.25']
list_2 = [x.split(';') for x in list_1]
list_3 = [[x[0], float(x[1])] for x in list_2]
item is a list like ['chocolate', '1.20']. You should be calling isdigit() on item[1], not item. But isdigit() isn't true when the string contains ., so that won't work anyway.
Put the split string in a variable, then call float() on the second element.
for item in list_1:
words = item.split(';')
words[1] = float(words[1])
list_2.append(words)
I don't know if this helpful for you.
But,I think using function is better than just using simple for loop
Just try it.
def list_map(string_val,float_val):
return [string_val,float_val]
def string_spliter(list_1):
string_form=[]
float_form=[]
for string in list_1:
str_val,float_val=string.split(";")
string_form.append(str_val)
float_form.append(float_val)
return string_form,float_form
list_1 = ['chocolate;1.20', 'book;5.50', 'hat;3.25']
string_form,float_form=string_spliter(list_1)
float_form=list(map(float,float_form))
output=list(map(list_map,string_form,float_form))
print(output)
Your way of creating list_2 is fine. To then make your new list, you can use final_list = [[i[0], float(i[1])] for i in list_2]
You could also do it in the for loop like this:
for item in list_1:
split_item = item.split(';')
list_2.append([split_item[0], float(split_item[1])])
This can be achieved in two lines of code using list comprehensions.
list_1 = ['chocolate;1.20', 'book;5.50', 'hat;3.25']
list_2 = [[a, float(b)] for x in list_1 for a, b in [x.split(';', 1)]]
The second "dimension" to the list comprehension generates a list with a single sublist. This lets us essentially save the result of splitting each item and then bind those two items to a and b to make using them cleaner that having to specify indexes.
Note: by calling split with a second argument of 1 we ensure the string is only split at most once.
You can use a function map to convert each value.
def modify_element(el):
name, value = el.split(';')
return [name, float(value)]
list_1 = ['chocolate;1.20', 'book;5.50', 'hat;3.25']
result = list(map(modify_element, list_1))
For a problem like this you can initialize two variables for the result of calling the split function and then append a list of both values and call the builtin float function on the second value.
array = []
for i in a_list:
string, number = i.split(";")
array.append([string, float(number)])
print(array)
I have a list containing integers like this (not in order):
list1 = [2,1,3]
I have a second list like this:
list2 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400','Contig_4_Length_300','Contig_5_Length_200','Contig_6_Length_100']
These lists are from fasta files. list 2 always start with "Contig_", but may not always in a well sorted order. I'd like to return a list like this:
list3 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400']
list3 contains contigs whose number only appeared in list1.
How to do this in python?
Thank you very much!
You can create a dictionary from the second list for an O(n) (linear) solution:
import re
list1 = [2,1,3]
list2 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400','Contig_4_Length_300','Contig_5_Length_200','Contig_6_Length_100']
new_result = {int(re.findall('(?<=^Contig_)\d+', i)[0]):i for i in list2}
final_result = [new_result[i] for i in list1]
Output:
['Contig_2_Length_500', 'Contig_1_Length_1000', 'Contig_3_Length_400']
You can use list comprehension like this:
list3 = [i for i in list2 if any(j in i for j in list1)]
You can use startswith - it takes a tuple of multiple starting strings to scan efficiently:
[i for i in list2 if i.startswith(tuple(list1))]
['Contig_1_Length_1000', 'Contig_2_Length_500', 'Contig_3_Length_400']
A pretty simple list comprehension like:
list1 = ['Contig_1','Contig_2','Contig_3']
list2 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400','Contig_4_Length_300','Contig_5_Length_200','Contig_6_Length_100']
list3 = [s for s in list2 for k in list1 if k in s]
print(list3)
gives an output of:
['Contig_1_Length_1000', 'Contig_2_Length_500', 'Contig_3_Length_400']
You'll have to iterate over the two input lists, and see for each combination whether there's a match. One way to do this is
[list2_item for list2_item in list2 if any([list1_item in list2_item for list1_item in list1])]
I tried Ajax1234 's method of using re, blhsing 's code which is close the same as mine except it uses a generator rather than a list (and has more opaque variable names), jeremycg 's method of startswith, and bilbo_strikes_back 's method of zip. The zip method was by far the fastest, but it just takes the first three elements of list2 without concern for the contents of list1, so we might as well do list3 = list2[:3], which was even faster. Ajax1234 's method took about twice as long as blhsing 's, which took slightly longer than mine. jeremycg 's took slightly more than half as much time, but keep in mind that it assumes that the substring will be at the beginning.
try zip and slicing
list1 = ['Contig_1','Contig_2','Contig_3']
list2 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400','Contig_4_Length_300','Contig_5_Length_200','Contig_6_Length_100']
list3 = [x[1] for x in zip(list1, list2)]
print(list3)
I want to check if strings in a list of strings contain a certain substring. If they do I want to save that list item to a new list:
list = ["Maurice is smart","Maurice is dumb","pie","carrots"]
I have tried using the following code:
new_list = [s for s in list if 'Maurice' in list]
but this just replicates the list if one of its items is 'Maurice'.
So I was wondering if, maybe, there was a way to solve this by using the following syntax:
if "Maurice" in list:
# Code that saves all list items containing the substring "Maurice" to a new list
Result should then be:
new_list = ["Maurice is smart", "Maurice is dumb"]
If been looking for a way to do this but I can not find anything.
You could do this:
list = ["Maurice is smart","Maurice is dumb","pie","carrots"]
new_list = [x for x in list if "Maurice" in x]
print(new_list)
Output:
['Maurice is smart', 'Maurice is dumb']
You could use Python's builtin filter:
data = ["Maurice is smart", "Maurice is dumb", "pie", "carrots"]
res = filter(lambda s: 'Maurice' in s, data)
print(res)
Output:
['Maurice is smart', 'Maurice is dumb']
The first argument is a predicate function (a simple lambda here) which must evaluate to True for the element of the iterable to be considered as a match.
filter is useful whenever an iterable must be filtered based on a predicate.
Also, a little extra, imagine now this data to be filtered:
data = ["Maurice is smart","Maurice is dumb","pie","carrots", "maurice in bikini"]
res = filter(lambda s: 'maurice' in s.lower(), list)
print(res)
Ouput:
['Maurice is smart', 'Maurice is dumb', 'maurice in bikini']
You can use a list comprehension.
Also, make sure not to use the built in list as variable name.
my_list = ["Maurice is smart", "Maurice is dumb", "pie", "carrots"]
[e for e in my_list if 'Maurice' in e]
I'm trying to figure out how to delete duplicates from 2D list. Let's say for example:
x= [[1,2], [3,2]]
I want the result:
[1, 2, 3]
in this order.
Actually I don't understand why my code doesn't do that :
def removeDuplicates(listNumbers):
finalList=[]
finalList=[number for numbers in listNumbers for number in numbers if number not in finalList]
return finalList
If I should write it in nested for-loop form it'd look same
def removeDuplicates(listNumbers):
finalList=[]
for numbers in listNumbers:
for number in numbers:
if number not in finalList:
finalList.append(number)
return finalList
"Problem" is that this code runs perfectly. Second problem is that order is important. Thanks
finalList is always an empty list on your list-comprehension even though you think it's appending during that to it, which is not the same exact case as the second code (double for loop).
What I would do instead, is use set:
>>> set(i for sub_l in x for i in sub_l)
{1, 2, 3}
EDIT:
Otherway, if order matters and approaching your try:
>>> final_list = []
>>> x_flat = [i for sub_l in x for i in sub_l]
>>> list(filter(lambda x: f.append(x) if x not in final_list else None, x_flat))
[] #useless list thrown away and consumesn memory
>>> f
[1, 2, 3]
Or
>>> list(map(lambda x: final_list.append(x) if x not in final_list else None, x_flat))
[None, None, None, None] #useless list thrown away and consumesn memory
>>> f
[1, 2, 3]
EDIT2:
As mentioned by timgeb, obviously the map & filter will throw away lists that are at the end useless and worse than that, they consume memory. So, I would go with the nested for loop as you did in your last code example, but if you want it with the list comprehension approach than:
>>> x_flat = [i for sub_l in x for i in sub_l]
>>> final_list = []
>>> for number in x_flat:
if number not in final_list:
finalList.append(number)
The expression on the right-hand-side is evalueated first, before assigning the result of this list comprehension to the finalList.
Whereas in your second approach you write to this list all the time between the iterations. That's the difference.
That may be similar to the considerations why the manuals warn about unexpected behaviour when writing to the iterated iterable inside a for loop.
you could use the built-in set()-method to remove duplicates (you have to do flatten() on your list before)
You declare finalList as the empty list first, so
if number not in finalList
will be False all the time.
The right hand side of your comprehension will be evaluated before the assignment takes place.
Iterate over the iterator chain.from_iterable gives you and remove duplicates in the usual way:
>>> from itertools import chain
>>> x=[[1,2],[3,2]]
>>>
>>> seen = set()
>>> result = []
>>> for item in chain.from_iterable(x):
... if item not in seen:
... result.append(item)
... seen.add(item)
...
>>> result
[1, 2, 3]
Further reading: How do you remove duplicates from a list in Python whilst preserving order?
edit:
You don't need the import to flatten the list, you could just use the generator
(item for sublist in x for item in sublist)
instead of chain.from_iterable(x).
There is no way in Python to refer to the current comprehesion. In fact, if you remove the line finalList=[], which does nothing, you would get an error.
You can do it in two steps:
finalList = [number for numbers in listNumbers for number in numbers]
finalList = list(set(finalList))
or if you want a one-liner:
finalList = list(set(number for numbers in listNumbers for number in numbers))
This question already has answers here:
How can I use list comprehensions to process a nested list?
(13 answers)
Closed 7 months ago.
I recently looked for a way to flatten a nested python list, like this: [[1,2,3],[4,5,6]], into this: [1,2,3,4,5,6].
Stackoverflow was helpful as ever and I found a post with this ingenious list comprehension:
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
I thought I understood how list comprehensions work, but apparently I haven't got the faintest idea. What puzzles me most is that besides the comprehension above, this also runs (although it doesn't give the same result):
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Can someone explain how python interprets these things? Based on the second comprension, I would expect that python interprets it back to front, but apparently that is not always the case. If it were, the first comprehension should throw an error, because 'sublist' does not exist. My mind is completely warped, help!
Let's take a look at your list comprehension then, but first let's start with list comprehension at it's easiest.
l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]
You can look at this the same as a for loop structured like so:
for x in l:
print x
Now let's look at another one:
l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]
That is the exact same as this:
a = []
l = [1,2,3,4,5]
for x in l:
if x % 2 == 0:
a.append(x)
print a # prints [2,4]
Now let's take a look at the examples you provided.
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]
For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
Now for the last one
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Using the same knowledge we can create a for loop and see how it would behave:
for item in sublist:
for sublist in l:
exactly_the_same_as_l.append(item)
Now the only reason the above one works is because when flattened_l was created, it also created sublist. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError
The for loops are evaluated from left to right. Any list comprehension can be re-written as a for loop, as follows:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
The above is the correct code for flattening a list, whether you choose to write it concisely as a list comprehension, or in this extended version.
The second list comprehension you wrote will raise a NameError, as 'sublist' has not yet been defined. You can see this by writing the list comprehension as a for loop:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for item in sublist:
for sublist in l:
flattened_l.append(item)
The only reason you didn't see the error when you ran your code was because you had previously defined sublist when implementing your first list comprehension.
For more information, you may want to check out Guido's tutorial on list comprehensions.
For the lazy dev that wants a quick answer:
>>> a = [[1,2], [3,4]]
>>> [i for g in a for i in g]
[1, 2, 3, 4]
While this approach definitely works for flattening lists, I wouldn't recommend it unless your sublists are known to be very small (1 or 2 elements each).
I've done a bit of profiling with timeit and found that this takes roughly 2-3 times longer than using a single loop and calling extend…
def flatten(l):
flattened = []
for sublist in l:
flattened.extend(sublist)
return flattened
While it's not as pretty, the speedup is significant. I suppose this works so well because extend can more efficiently copy the whole sublist at once instead of copying each element, one at a time. I would recommend using extend if you know your sublists are medium-to-large in size. The larger the sublist, the bigger the speedup.
One final caveat: obviously, this only holds true if you need to eagerly form this flattened list. Perhaps you'll be sorting it later, for example. If you're ultimately going to just loop through the list as-is, this will not be any better than using the nested loops approach outlined by others. But for that use case, you want to return a generator instead of a list for the added benefit of laziness…
def flatten(l):
return (item for sublist in l for item in sublist) # note the parens
Note, of course, that the sort of comprehension will only "flatten" a list of lists (or list of other iterables). Also if you pass it a list of strings you'll "flatten" it into a list of characters.
To generalize this in a meaningful way you first want to be able to cleanly distinguish between strings (or bytearrays) and other types of sequences (or other Iterables). So let's start with a simple function:
import collections
def non_str_seq(p):
'''p is putatively a sequence and not a string nor bytearray'''
return isinstance(p, collections.Iterable) and not (isinstance(p, str) or isinstance(p, bytearray))
Using that we can then build a recursive function to flatten any
def flatten(s):
'''Recursively flatten any sequence of objects
'''
results = list()
if non_str_seq(s):
for each in s:
results.extend(flatten(each))
else:
results.append(s)
return results
There are probably more elegant ways to do this. But this works for all the Python built-in types that I know of. Simple objects (numbers, strings, instances of None, True, False are all returned wrapped in list. Dictionaries are returned as lists of keys (in hash order).