Optimal and efficient way to examine python lists

Optimal and efficient way to examine python lists - python

I was asked this on an interview this past week, and I didn't have the answer (the correct answer anyways). Say for instance you have list A which has the following elements [1,3,5,7,9,10] and then you have list B, which has the following elements: [3,4,5,6,7], and you want to know which elements in list B are in list A. My answer was:
for item in listA:
for item1 in listB:
if item1 == item:
put item1 in some third list
But I know this is bad, because say listA is one million elements, and listB is a hundred thousand, this solution is just rubbish.
What's the best way to achieve something like this without iteration of both lists?

set(listA) & set(listB) is simplest.

I'd suggest converting them both to sets and doing an intersection:
setA = set(listA)
setB = set(listB)
setA.intersection(setB)
Edit: Note that this will remove any duplicate elements that were in both lists. So if we had listA = [1,1,2,2,3] and listB = [1,1,2,3] then the intersection will only be set([1,2,3]). Also, for a worst-case estimate, this will be as slow as the list comprehension - O(n * m), where n and m are the respective lengths of the lists. Average case is a far better O(n) + O(m) + O(min(m,n)) == O(max(m,n)), however.

well i may as well throw filter in the mix
filter(lambda x: x in listb,lista)

Using list comprehension and using the in operator to test for membership:
[i for i in lista if i in listb]
would yield:
[3, 5, 7]
Alternatively, one could use set operations and see what the intersection of both lists (converted to sets) would be.

You can use sets (preferred):
listC = list(set(listA) & set(listB))
Or a list comprehension:
listC = [i for i in listA if i in listB]

Related

extract strings from a list based on another string in python

I have a list containing integers like this (not in order):
list1 = [2,1,3]
I have a second list like this:
list2 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400','Contig_4_Length_300','Contig_5_Length_200','Contig_6_Length_100']
These lists are from fasta files. list 2 always start with "Contig_", but may not always in a well sorted order. I'd like to return a list like this:
list3 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400']
list3 contains contigs whose number only appeared in list1.
How to do this in python?
Thank you very much!

You can create a dictionary from the second list for an O(n) (linear) solution:
import re
list1 = [2,1,3]
list2 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400','Contig_4_Length_300','Contig_5_Length_200','Contig_6_Length_100']
new_result = {int(re.findall('(?<=^Contig_)\d+', i)[0]):i for i in list2}
final_result = [new_result[i] for i in list1]
Output:
['Contig_2_Length_500', 'Contig_1_Length_1000', 'Contig_3_Length_400']

You can use list comprehension like this:
list3 = [i for i in list2 if any(j in i for j in list1)]

You can use startswith - it takes a tuple of multiple starting strings to scan efficiently:
[i for i in list2 if i.startswith(tuple(list1))]
['Contig_1_Length_1000', 'Contig_2_Length_500', 'Contig_3_Length_400']

A pretty simple list comprehension like:
list1 = ['Contig_1','Contig_2','Contig_3']
list2 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400','Contig_4_Length_300','Contig_5_Length_200','Contig_6_Length_100']
list3 = [s for s in list2 for k in list1 if k in s]
print(list3)
gives an output of:
['Contig_1_Length_1000', 'Contig_2_Length_500', 'Contig_3_Length_400']

You'll have to iterate over the two input lists, and see for each combination whether there's a match. One way to do this is
[list2_item for list2_item in list2 if any([list1_item in list2_item for list1_item in list1])]
I tried Ajax1234 's method of using re, blhsing 's code which is close the same as mine except it uses a generator rather than a list (and has more opaque variable names), jeremycg 's method of startswith, and bilbo_strikes_back 's method of zip. The zip method was by far the fastest, but it just takes the first three elements of list2 without concern for the contents of list1, so we might as well do list3 = list2[:3], which was even faster. Ajax1234 's method took about twice as long as blhsing 's, which took slightly longer than mine. jeremycg 's took slightly more than half as much time, but keep in mind that it assumes that the substring will be at the beginning.

try zip and slicing
list1 = ['Contig_1','Contig_2','Contig_3']
list2 = ['Contig_1_Length_1000','Contig_2_Length_500','Contig_3_Length_400','Contig_4_Length_300','Contig_5_Length_200','Contig_6_Length_100']
list3 = [x[1] for x in zip(list1, list2)]
print(list3)

Summing 2nd list items in a list of lists of lists

My data is a list of lists of lists of varying size:
data = [[[1, 3],[2, 5],[3, 7]],[[1,11],[2,15]],.....]]]
What I want to do is return a list of lists with the values of the 2nd element of each list of lists summed - so, 3+5+7 is a list, so is 11+15, etc:
newdata = [[15],[26],...]
Or even just a list of the sums would be fine as I can take it from there:
newdata2 = [15,26,...]
I've tried accessing the items in the list through different forms and structures of list comprehensions, but I can't get seem to get it to the format I want.

Try this one-line approach using list comprehension:
[sum([x[1] for x in i]) for i in data]
Output:
data = [[[1, 3],[2, 5],[3, 7]],[[1,11],[2,15]]]
[sum([x[1] for x in i]) for i in data]
Out[19]: [15, 26]
If you want the output to be a list of list, then use
[[sum([x[1] for x in i])] for i in data]

#mathmax and #BrendanAbel have offered Pythonic and performant solutions. However, I want to throw this transpositional approach into the ring for the sake of its brevity and trickyness:
[sum(zip(*x)[1]) for x in data]

Something like this is pretty short and concise, using list comprehensions and map (I try not to use nested comprehensions if I can avoid it)
import operator
f = operator.itemgetter(1)
[sum(map(f, x)) for x in data]

I need to make two lists the same

I have two quite long lists and I know that all of the elements of the shorter are contained in the longer, yet I need to isolate the elements in the longer list which are not in the shorter so that I can remove them individually from the dictionary I got the longer list from.
What I have so far is:
for e in range(len(lst_ck)):
if lst_ck[e] not in lst_rk:
del currs[lst_ck[e]]
del lst_ck[e]
lst_ck is the longer list and lst_rk is the shorter, currs is the dictionary from which came lst_ck. If it helps, they are both lists of 3 digit keys from dictionaries.

Use sets to find the difference:
l1 = [1,2,3,4]
l2 = [1,2,3,4,6,7,8]
print(set(l2).difference(l1))
set([6, 7, 8]) # in l2 but not in l1
Then remove the elements.
diff = set(l2).difference(l1):
your_list[:] = [ele for ele in your_list of ele not in diff]
If you lists are very big you may prefer a generator expression:
your_list[:] = (ele for ele in your_list of ele not in diff)

If you don't care of multiple occurrences of the same item, use set.
diff = set(lst_ck) - set(lst_rk)
If you care, try this:
diff = [e for e in lst_rk if e not in lst_ck]

How does the list comprehension to flatten a python list work? [duplicate]

This question already has answers here:
How can I use list comprehensions to process a nested list?
(13 answers)
Closed 7 months ago.
I recently looked for a way to flatten a nested python list, like this: [[1,2,3],[4,5,6]], into this: [1,2,3,4,5,6].
Stackoverflow was helpful as ever and I found a post with this ingenious list comprehension:
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
I thought I understood how list comprehensions work, but apparently I haven't got the faintest idea. What puzzles me most is that besides the comprehension above, this also runs (although it doesn't give the same result):
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Can someone explain how python interprets these things? Based on the second comprension, I would expect that python interprets it back to front, but apparently that is not always the case. If it were, the first comprehension should throw an error, because 'sublist' does not exist. My mind is completely warped, help!

Let's take a look at your list comprehension then, but first let's start with list comprehension at it's easiest.
l = [1,2,3,4,5]
print [x for x in l] # prints [1, 2, 3, 4, 5]
You can look at this the same as a for loop structured like so:
for x in l:
print x
Now let's look at another one:
l = [1,2,3,4,5]
a = [x for x in l if x % 2 == 0]
print a # prints [2,4]
That is the exact same as this:
a = []
l = [1,2,3,4,5]
for x in l:
if x % 2 == 0:
a.append(x)
print a # prints [2,4]
Now let's take a look at the examples you provided.
l = [[1,2,3],[4,5,6]]
flattened_l = [item for sublist in l for item in sublist]
print flattened_l # prints [1,2,3,4,5,6]
For list comprehension start at the farthest to the left for loop and work your way in. The variable, item, in this case, is what will be added. It will produce this equivalent:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
Now for the last one
exactly_the_same_as_l = [item for item in sublist for sublist in l]
Using the same knowledge we can create a for loop and see how it would behave:
for item in sublist:
for sublist in l:
exactly_the_same_as_l.append(item)
Now the only reason the above one works is because when flattened_l was created, it also created sublist. It is a scoping reason to why that did not throw an error. If you ran that without defining the flattened_l first, you would get a NameError

The for loops are evaluated from left to right. Any list comprehension can be re-written as a for loop, as follows:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for sublist in l:
for item in sublist:
flattened_l.append(item)
The above is the correct code for flattening a list, whether you choose to write it concisely as a list comprehension, or in this extended version.
The second list comprehension you wrote will raise a NameError, as 'sublist' has not yet been defined. You can see this by writing the list comprehension as a for loop:
l = [[1,2,3],[4,5,6]]
flattened_l = []
for item in sublist:
for sublist in l:
flattened_l.append(item)
The only reason you didn't see the error when you ran your code was because you had previously defined sublist when implementing your first list comprehension.
For more information, you may want to check out Guido's tutorial on list comprehensions.

For the lazy dev that wants a quick answer:
>>> a = [[1,2], [3,4]]
>>> [i for g in a for i in g]
[1, 2, 3, 4]

While this approach definitely works for flattening lists, I wouldn't recommend it unless your sublists are known to be very small (1 or 2 elements each).
I've done a bit of profiling with timeit and found that this takes roughly 2-3 times longer than using a single loop and calling extend…
def flatten(l):
flattened = []
for sublist in l:
flattened.extend(sublist)
return flattened
While it's not as pretty, the speedup is significant. I suppose this works so well because extend can more efficiently copy the whole sublist at once instead of copying each element, one at a time. I would recommend using extend if you know your sublists are medium-to-large in size. The larger the sublist, the bigger the speedup.
One final caveat: obviously, this only holds true if you need to eagerly form this flattened list. Perhaps you'll be sorting it later, for example. If you're ultimately going to just loop through the list as-is, this will not be any better than using the nested loops approach outlined by others. But for that use case, you want to return a generator instead of a list for the added benefit of laziness…
def flatten(l):
return (item for sublist in l for item in sublist) # note the parens

Note, of course, that the sort of comprehension will only "flatten" a list of lists (or list of other iterables). Also if you pass it a list of strings you'll "flatten" it into a list of characters.
To generalize this in a meaningful way you first want to be able to cleanly distinguish between strings (or bytearrays) and other types of sequences (or other Iterables). So let's start with a simple function:
import collections
def non_str_seq(p):
'''p is putatively a sequence and not a string nor bytearray'''
return isinstance(p, collections.Iterable) and not (isinstance(p, str) or isinstance(p, bytearray))
Using that we can then build a recursive function to flatten any
def flatten(s):
'''Recursively flatten any sequence of objects
'''
results = list()
if non_str_seq(s):
for each in s:
results.extend(flatten(each))
else:
results.append(s)
return results
There are probably more elegant ways to do this. But this works for all the Python built-in types that I know of. Simple objects (numbers, strings, instances of None, True, False are all returned wrapped in list. Dictionaries are returned as lists of keys (in hash order).

Delete all occurrences of specific values from list of lists python

As far as I can see this question (surprisingly?) has not been asked before - unless I am failing to spot an equivalent question due to lack of experience. (Similar questions have
been asked about 1D lists)
I have a list_A that has int values in it.
I want to delete all occurrences of all the values specified in List_A from my list_of_lists. As a novice coder I can hack something together here myself using list comprehensions and for loops, but given what I have read about inefficiencies of deleting elements from within lists, I am looking for advice from more experienced users about the fastest way to go about this.
list_of_lists= [
[1,2,3,4,5,6,8,9],
[0,2,4,5,6,7],
[0,1,6],
[0,4,9],
[0,1,3,5],
[0,1,4],
[0,1,2],
[1,8],
[0,7],
[0,3]
]
Further info
I am not looking to eliminate duplicates (there is already a question on here about that). I am looking to eliminate all occurrences of selected values.
list_A may typically have 200 values in it
list_of_lists will have a similar (long tailed) distribution to that shown above but in the order of up to 10,000 rows by 10,000 columns
Output can be a modified version of original list_of_lists or completely new list - whichever is quicker
Last but not least (thanks to RemcoGerlich for drawing attention to this) - I need to eliminate empty sublists from with the list of lists
Many thanks

Using list comprehension should work as:
new_list = [[i for i in l if i not in list_A] for l in list_of_list]
After that, if you want to remove empty lists, you can make:
for i in new_list:
if not i:
new_list.remove(i)
of, as #ferhatelmas pointed in comments:
new_list = [i for i in new_list if i]
To avoid duplicates in list_A you can convert it to a set before with list_A = set(list_A)

I'd write a function that just takes one list (or iterable) and a set, and returns a new list with the values from the set removed:
def list_without_values(L, values):
return [l for l in L if l not in values]
Then I'd turn list_A into a set for fast checking, and loop over the original list:
set_A = set(list_A)
list_of_lists = [list_without_values(L, set_A) for L in list_of_lists]
Should be fast enough and readibility is what matters most.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimal and efficient way to examine python lists - python

set(listA) & set(listB) is simplest.

well i may as well throw filter in the mix filter(lambda x: x in listb,lista)

Using list comprehension and using the in operator to test for membership: [i for i in lista if i in listb] would yield: [3, 5, 7] Alternatively, one could use set operations and see what the intersection of both lists (converted to sets) would be.

You can use sets (preferred): listC = list(set(listA) & set(listB)) Or a list comprehension: listC = [i for i in listA if i in listB]

Related

extract strings from a list based on another string in python

Summing 2nd list items in a list of lists of lists

I need to make two lists the same

How does the list comprehension to flatten a python list work? [duplicate]

Delete all occurrences of specific values from list of lists python

Categories

Resources