Remove an item with certain property value from a list - python

Given a list of objects, where each has a property named x, and I want to remove all the objects whose x property contains value v from the list.
One way to do it is to use list comprehension: [item for item in mylist if item.x != v], but since my list is small (usually less than 10). Another way is to iterate through the list in a loop and check for every single item.
Is there a third way that is equally fast or even faster?

You can also use a generator or the filter function. Choose what you find the most readable; efficiency doesn't really matter at this point (especially not if you're dealing with just a few elements).

Create a new list using list comprehension syntax. I don't think you can do anything faster than that. It doesn't matter that your list is small, that's even better.

Related

Why does Pythons' set() return a set item instead of a list

I quite often use set() to remove duplicates from lists. After doing so, I always directly change it back to a list.
a = [0,0,0,1,2,3,4,5]
b = list(set(a))
Why does set() return a set item, instead of simply a list?
type(set(a)) == set # is true
Is there a use for set items that I have failed to understand?
Yes, sets have many uses. They have lots of nice operations documented here which lists don't have. One very useful difference is that membership testing (x in a) can be much faster than for a list.
Okay, by doubles you mean duplicate? and set() will always return a set because it is a data structure in python like lists. when you are calling set you are creating an object of set().
rest of the information about sets you can find here
https://docs.python.org/2/library/sets.html
As already mentioned, I won't go into why set does not return a list but like you stated:
I quite often use set() to remove doubles from lists. After doing so, I always directly change it back to a list.
You could use OrderedDict if you really hate going back to changing it to a list:
source_list = [0,0,0,1,2,3,4,5]
from collections import OrderedDict
print(OrderedDict((x, True) for x in source_list).keys())
OUTPUT:
odict_keys([0, 1, 2, 3, 4, 5])
As said before, for certain operations if you use set instead of list, it is faster. Python wiki has query TimeComplexity in which speed of operations of various data types are given. Note that if you have few elements in your list or set, you will most probably do not notice difference, but with more elements it become more important.
Notice that for example if you want to make in-place removal, for list it is O(n) meaning that for 10 times longer list it will need 10 times more time, while for set and s.difference_update(t) where s is set, t is set with one element to be removed from s, time is O(1) i.e. independent from number of elements of s.

Compare whether the first two elements on a nested list are equal to a comparison list in python

In python 2.7, I would like to verify whether a subset list of elements is included in a longer nested list when comparing let's say only the first two elements.
Lets say we have a big list of nested elements (this big_list will have over 10k elements so looping for every comparison is very inefficient and I'd like to avoid this). For this example, lets say we only have 4 nested lists in big_list:
`
big_list = ((2,3,5,6,7), (4,5,6,7,8), (6,7,8,8), (8,4,2,7))
`
If I have a single list, let's say (4,5,11,11,11), I am looking for an operation that will return True when compared to big_list since the second list in big_list starts with (4,5,...) and matches the first two elements of my single_list. Essentially I want to know whether the first two elements of a single list (e.g. (4,5,11,11,11)) are repeated in my big list regardless of the other followed numbers (e.g. 11,11, ...).
My operation should also return False if another single_list (e.g. (4,8,11,11,11) ) does not match the first two element in the big_list.
I hope this is clearer. Any help?
Thanks in advance,
Since you have a huge list, to avoid iterating over the whole thing every time — O(n) time complexity for each search, you can do a constant time lookup using a set.
tup_truth_set = set([tup[:2] for tup in big_list]) # set with first two letters of interest
then you would simply do something like this to check in constant time:
tuple_of_interest[:2] in tup_truth_set
I don't think that you can avoid the loop over your list. Even if you don't run the loop yourself and suppose there is a built-in function, that I am not aware of and can do what you are asking, I am pretty sure it would loop the list in the background. So I suggest a single line of code to do that, including a loop, obviously.
(4,5,11,11,11)[:2] in [i[:2] for i in big_list]

Is there a built-in python method that iterates over a list using index values instead of elements

I am looking for a python method that iterates over a list using index values instead of the actual elements itself. The code does exactly this:
for index in range(len(list)-1):
# do stuff
I am looking to see if there is already a function that does this or if I should add it myself.
Use enumerate:
for index, ignore_this_its_the_value in enumerate(list):
# do stuff
But actually:
for index in range(len(list)):
# do stuff
does produce the proper indexes.
for i in x: works on any iterable, such as a list or generator.
range(r) or xrange(r) is just a list of numbers (or generator of such a list). enumerate is a convenient of generating both counters and values from a list. You could do the same thing with a zip(range(len(alist)), alist).
for i, value in enumerate(alist):
alist[i] = foo(value)
is, I think, a good example of using enumerate. It provides both an efficient way of 'reading' the values, and a way of modifying them.
Of course if you need more control over the iteration there is always the while loop. It's particularly useful if you need to continue or break, or if you need to change the index in unusual ways.
Generators are another good tool for packaging an iteration.
Personally I like the list comprehension (and the generator and dictionary analogs) best. It's a good blend of control and compactness.

Efficient use of Python list comprehensions

I have a Python list of objects that could be pretty long. At particular times, I'm interested in all of the elements in the list that have a certain attribute, say flag, that evaluates to False. To do so, I've been using a list comprehension, like this:
objList = list()
# ... populate list
[x for x in objList if not x.flag]
Which seems to work well. After forming the sublist, I have a few different operations that I might need to do:
Subscript the sublist to get the element at index ind.
Calculate the length of the sublist (i.e. the number of elements that have flag == False).
Search the sublist for the first instance of a particular object (i.e. using the list's .index() method).
I've implemented these using the naive approach of just forming the sublist and then using its methods to get at the data I want. I'm wondering if there are more efficient ways to go about these. #1 and #3 at least seem like they could be optimized, because in #1 I only need the first ind + 1 matching elements of the sublist, not necessarily the entire result set, and in #3 I only need to search through the sublist until I find a matching element.
Is there a good Pythonic way to do this? I'm guessing I might be able to use the () syntax in some way to get a generator instead of creating the entire list, but I haven't happened upon the right way yet. I obviously could write loops manually, but I'm looking for something as elegant as the comprehension-based method.
If you need to do any of these operations a couple of times, the overhead of other methods will be higher, the list is the best way. It's also probably the clearest, so if memory isn't a problem, then I'd recommend just going with it.
If memory/speed is a problem, then there are alternatives - note that speed-wise, these might actually be slower, depending on the common case for your software.
For your scenarios:
#value = sublist[n]
value = nth(x for x in objList if not x.flag, n)
#value = len(sublist)
value = sum(not x.flag for x in objList)
#value = sublist.index(target)
value = next(dropwhile(lambda x: x != target, (x for x in objList if not x.flag)))
Using itertools.dropwhile() and the nth() recipe from the itertools docs.
I'm going to assume you might do any of these three things, and you might do them more than once.
In that case, what you want is basically to write a lazily evaluated list class. It would keep two pieces of data, a real list cache of evaluated items, and a generator of the rest. You could then do ll[10] and it would evaluate up to the 10th item, ll.index('spam') and it would evaluate until it finds 'spam', and then len(ll) and it would evaluate the rest of the list, all the while caching in the real list what it sees so nothing is done more than once.
Constructing it would look like this:
LazyList(x for x in obj_list if not x.flag)
But nothing would actually be computed until you actually start using it as above.
Since you commented that your objList can change, if you don't also need to index or search objList itself, then you might be better off just storing two different lists, one with .flag = True and one with .flag = False. Then you can use the second list directly instead of constructing it with a list comprehension each time.
If this works in your situation, it is likely the most efficient way to do it.

Fastest Way To Remove Duplicates In Lists Python

I have two very large lists and to loop through it once takes at least a second and I need to do it 200,000 times. What's the fastest way to remove duplicates in two lists to form one?
This is the fastest way I can think of:
import itertools
output_list = list(set(itertools.chain(first_list, second_list)))
Slight update: As jcd points out, depending on your application, you probably don't need to convert the result back to a list. Since a set is iterable by itself, you might be able to just use it directly:
output_set = set(itertools.chain(first_list, second_list))
for item in output_set:
# do something
Beware though that any solution involving the use of set() will probably reorder the elements in your list, so there's no guarantee that elements will be in any particular order. That said, since you're combining two lists, it's hard to come up with a good reason why you would need a particular ordering over them anyway, so this is probably not something you need to worry about.
I'd recommend something like this:
def combine_lists(list1, list2):
s = set(list1)
s.update(list2)
return list(s)
This eliminates the problem of creating a monster list of the concatenation of the first two.
Depending on what you're doing with the output, don't bother to convert back to a list. If ordering is important, you might need some sort of decorate/sort/undecorate shenanigans around this.
As Daniel states, a set cannot contain duplicate entries - so concatenate the lists:
list1 + list2
Then convert the new list to a set:
set(list1 + list2)
Then back to a list:
list(set(list1 + list2))
result = list(set(list1).union(set(list2)))
That's how I'd do it. I am not so sure about performance, though, but it is certainly better, than doing it by hand.

Categories

Resources