Optimized method of cutting/slicing sorted lists - python

Is there any pre-made optimized tool/library in Python to cut/slice lists for values "less than" something?
Here's the issue: Let's say I have a list like:
a=[1,3,5,7,9]
and I want to delete all the numbers which are <= 6, so the resulting list would be
[7,9]
6 is not in the list, so I can't use the built-in index(6) method of the list. I can do things like:
#!/usr/bin/env python
a = [1, 3, 5, 7, 9]
cut=6
for i in range(len(a)-1, -2, -1):
if a[i] <= cut:
break
b = a[i+1:]
print "Cut list: %s" % b
which would be fairly quick method if the index to cut from is close to the end of the list, but which will be inefficient if the item is close to the beginning of the list (let's say, I want to delete all the items which are >2, there will be a lot of iterations).
I can also implement my own find method using binary search or such, but I was wondering if there's a more... wide-scope built in library to handle this type of things that I could reuse in other cases (for instance, if I need to delete all the number which are >=6).
Thank you in advance.

You can use the bisect module to perform a sorted search:
>>> import bisect
>>> a[bisect.bisect_left(a, 6):]
[7, 9]

bisect.bisect_left is what you are looking for, I guess.

If you just want to filter the list for all elements that fulfil a certain criterion, then the most straightforward way is to use the built-in filter function.
Here is an example:
a_list = [10,2,3,8,1,9]
# filter all elements smaller than 6:
filtered_list = filter(lambda x: x<6, a_list)
the filtered_list will contain:
[2, 3, 1]
Note: This method does not rely on the ordering of the list, so for very large lists it might be that a method optimised for ordered searching (as bisect) performs better in terms of speed.

Bisect left and right helper function
#!/usr/bin/env python3
import bisect
def get_slice(list_, left, right):
return list_[
bisect.bisect_left(list_, left):
bisect.bisect_left(list_, right)
]
assert get_slice([0, 1, 1, 3, 4, 4, 5, 6], 1, 5) == [1, 1, 3, 4, 4]
Tested in Ubuntu 16.04, Python 3.5.2.

Adding to Jon's answer, if you need to actually delete the elements less than 6 and want to keep the same reference to the list, rather than returning a new one.
del a[:bisect.bisect_right(a,6)]
You should note as well that bisect will only work on a sorted list.

Related

Extract index of Non duplicate elements in python list

I have a list:
input = ['a','b','c','a','b','d','e','d','g','g']
I want index of all elements except duplicate in a list.
output = [0,1,2,5,6,8]
You should iterate over the enumerated list and add each element to a set of "seen" elements and add the index to the output list if the element hasn't already been seen (is not in the "seen" set).
Oh, the name input overrides the built-in input() function, so I renamed it input_list.
output = []
seen = set()
for i,e in enumerate(input_list):
if e not in seen:
output.append(i)
seen.add(e)
which gives output as [0, 1, 2, 5, 6, 8].
why use a set?
You could be thinking, why use a set when you could do something like:
[i for i,e in enumerate(input_list) if input_list.index(e) == i]
which would work because .index returns you the index of the first element in a list with that value, so if you check the index of an element against this, you can assert that it is the first occurrence of that element and filter out those elements which aren't the first occurrences.
However, this is not as efficient as using a set, because list.index requires Python to iterate over the list until it finds the element (or doesn't). This operation is O(n) complexity and since we are calling it for every element in input_list, the whole solution would be O(n^2).
On the other hand, using a set, as in the first solution, yields an O(n) solution, because checking if an element is in a set is complexity O(1) (average case). This is due to how sets are implemented (they are like lists, but each element is stored at the index of its hash so you can just compute the hash of an element and see if there is an element there to check membership rather than iterating over it - note that this is a vague oversimplification but is the idea of them).
Thus, since each check for membership is O(1), and we do this for each element, we get an O(n) solution which is much better than an O(n^2) solution.
You could do a something like this, checking for counts (although this is computation-heavy):
indexes = []
for i, x in enumerate(inputlist):
if (inputlist.count(x) == 1
and x not in inputlist[:i]):
indexes.append(i)
This checks for the following:
if the item appears only once. If so, continue...
if the item hasn't appeared before in the list up till now. If so, add to the results list
In case you don't mind indexes of the last occurrences of duplicates instead and are using Python 3.6+, here's an alternative solution:
list(dict(map(reversed, enumerate(input))).values())
This returns:
[3, 4, 2, 7, 6, 9]
Here is a one-liner using zip and reversed
>>> input = ['a','b','c','a','b','d','e','d','g','g']
>>> sorted(dict(zip(reversed(input), range(len(input)-1, -1, -1))).values())
[0, 1, 2, 5, 6, 8]
This question is missing a pandas solution. 😉
>>> import pandas as pd
>>> inp = ['a','b','c','a','b','d','e','d','g','g']
>>>
>>> pd.DataFrame(list(enumerate(inp))).groupby(1).first()[0].tolist()
[0, 1, 2, 5, 6, 8]
Yet another version, using a side effect in a list comprehension.
>>> xs=['a','b','c','a','b','d','e','d','g','g']
>>> seen = set()
>>> [i for i, v in enumerate(xs) if v not in seen and not seen.add(v)]
[0, 1, 2, 5, 6, 8]
The list comprehension filters indices of values that have not been seen already.
The trick is that not seen.add(v) is always true because seen.add(v) returns None.
Because of short circuit evaluation, seen.add(v) is performed if and only if v is not in seen, adding new values to seen on the fly.
At the end, seen contains all the values of the input list.
>>> seen
{'a', 'c', 'g', 'b', 'd', 'e'}
Note: it is usually a bad idea to use side effects in list comprehension,
but you might see this trick sometimes.

Rearrange list in-place by modifying the original list, put even-index values at front

I am relatively new to python and I am still trying to learn the basics of the language. I stumbled upon a question which asks you to rearrange the list by modifying the original. What you are supposed to do is move all the even index values to the front (in reverse order) followed by the odd index values.
Example:
l = [0, 1, 2, 3, 4, 5, 6]
l = [6, 4, 2, 0, 1, 3, 5]
My initial approach was to just use the following:
l = l[::-2] + l[1::2]
However, apparently this is considered 'creating a new list' rather than looping through the original list to modify it.
As such, I was hoping to get some ideas or hints as to how I should approach this particular question. I know that I can use a for loop or a while loop to cycle through the elements / index, but I don't know how to do a swap or anything else for that matter.
You can do it by assigning to a list slice instead of a variable:
l[:] = l[::2][::-1] + l[1::2]
Your expression for the reversed even elements was also wrong. Use l[::2] to get all the even numbers, then reverse that with [::-1].
This is effectively equivalent to:
templ = l[::2][::-1] + l[1::2]
for i in range(len(l)):
l[i] = templ[i]
The for loop modifies the original list in place.

How to simultaneously iterate and modify list, set, etc?

In my program I have many lines where I need to both iterate over a something and modify it in that same for loop.
However, I know that modifying the thing over which you're iterating is bad because it may - probably will - result in an undesired result.
So I've been doing something like this:
for el_idx, el in enumerate(theList):
if theList[el_idx].IsSomething() is True:
theList[el_idx].SetIt(False)
Is this the best way to do this?
This is a conceptual misunderstanding.
It is dangerous to modify the list itself from within the iteration, because of the way Python translates the loop to lower level code. This can cause unexpected side effects during the iteration, there's a good example here :
https://unspecified.wordpress.com/2009/02/12/thou-shalt-not-modify-a-list-during-iteration/
But modifying mutable objects stored in the list is acceptable, and common practice.
I suspect that you're thinking that because the list is made up of those objects, that modifying those objects modifies the list. This is understandable - it's just not how it's normally thought of. If it helps, consider that the list only really contains references to those objects. When you modify the objects within the loop - you are merely using the list to modify the objects, not modifying the list itself.
What you should not do is add or remove items from the list during the iteration.
Your problem seems to be unclear to me. But if we talk about harmful of modifying list during a for loop iteration in Python. I can think about two scenarios.
First, You modify some elements in list that suppose to be used on the next round of computation as its original value.
e.g. You want to write a program that have such inputs and outputs like these.
Input:
[1, 2, 3, 4]
Expected output:
[1, 3, 6, 10] #[1, 1 + 2, 1 + 2 + 3, 1 + 2 + 3 + 4]
But...you write a code in this way:
#!/usr/bin/env python
mylist = [1, 2, 3, 4]
for idx, n in enumerate(mylist):
mylist[idx] = sum(mylist[:idx + 1])
print mylist
Result is:
[1, 3, 7, 15] # undesired result
Second, you make some change on size of list during a for loop iteration.
e.g. From python-delete-all-entries-of-a-value-in-list:
>>> s=[1,4,1,4,1,4,1,1,0,1]
>>> for i in s:
... if i ==1: s.remove(i)
...
>>> s
[4, 4, 4, 0, 1]
The example shows the undesired result that raised from side-effect of changing size in list. This obviously shows you that for each loop in Python can not handle list with dynamic size in a proper way. Below, I show you some simple way to overcome this problem:
#!/usr/bin/env python
s=[1, 4, 1, 4, 1, 4, 1, 1, 0, 1]
list_size=len(s)
i=0
while i!=list_size:
if s[i]==1:
del s[i]
list_size=len(s)
else:
i=i + 1
print s
Result:
[4, 4, 4, 0]
Conclusion: It's definitely not harmful to modify any elements in list during a loop iteration, if you don't 1) make change on size of list 2) make some side-effect of computation by your own.
you could get index first
idx = [ el_idx for el_idx, el in enumerate(theList) if el.IsSomething() ]
[ theList[i].SetIt(False) for i in idx ]

Does Python keep track of when something has been sorted, internally?

For example, if I call
L = [3,4,2,1,5]
L = sorted(L)
I get a sorted list. Now, in the future, if I want to perform some other kind of sort on L, does Python automatically know "this list has been sorted before and not modified since, so we can perform some internal optimizations on how we perform this other kind of sort" such as a reverse-sort, etc?
Nope, it doesn't. The sorting algorithm is designed to exploit (partially) sorted inputs, but the list itself doesn't "remember" being sorted in any way.
(This is actually a CPython implementation detail, and future versions/different implementations could cache the fact that a list was just sorted. However, I'm not convinced that could be done without slowing down all operations that modify the list, such as append.)
As the commenters pointed out, normal Python lists are inherently ordered and efficiently sortable (thanks, Timsort!), but do not remember or maintain sorting status.
If you want lists that invariably retain their sorted status, you can install the SortedContainers package from PyPI.
>>> from sortedcontainers import SortedList
>>> L = SortedList([3,4,2,1,5])
>>> L
SortedList([1, 2, 3, 4, 5])
>>> L.add(3.3)
>>> L
SortedList([1, 2, 3, 3.3, 4, 5])
Note the normal append method becomes add, because the item isn't added on the end. It's added wherever appropriate given the sort order. There is also a SortedListWithKey type that allows you to set your sort key/order explicitly.
Some of this, at least the specific reverse sort question, could be done using numpy:
import numpy as np
L = np.array([3,4,2,1,5])
a = np.argsort(L)
b = L[a]
r = L[a[::-1]]
print L
[3 4 2 1 5]
print b
[1 2 3 4 5]
print r
[5, 4, 3, 2, 1]
That is, here we just do the sort once (to create a, the sorting indices), and then we can manipulate a, to do other various sorts, like the normal sort b, and the reverse sort r. And many others would be similarly easy, like every other element.

Sorting based on one of the list among Nested list in python

I have a list as [[4,5,6],[2,3,1]]. Now I want to sort the list based on list[1] i.e. output should be [[6,4,5],[1,2,3]]. So basically I am sorting 2,3,1 and maintaining the order of list[0].
While searching I got a function which sorts based on first element of every list but not for this. Also I do not want to recreate list as [[4,2],[5,3],[6,1]] and then use the function.
Since [4, 5, 6] and [2, 3, 1] serves two different purposes I will make a function taking two arguments: the list to be reordered, and the list whose sorting will decide the order. I'll only return the reordered list.
This answer has timings of three different solutions for creating a permutation list for a sort. Using the fastest option gives this solution:
def pyargsort(seq):
return sorted(range(len(seq)), key=seq.__getitem__)
def using_pyargsort(a, b):
"Reorder the list a the same way as list b would be reordered by a normal sort"
return [a[i] for i in pyargsort(b)]
print using_pyargsort([4, 5, 6], [2, 3, 1]) # [6, 4, 5]
The pyargsort method is inspired by the numpy argsort method, which does the same thing much faster. Numpy also has advanced indexing operations whereby an array can be used as an index, making possible very quick reordering of an array.
So if your need for speed is great, one would assume that this numpy solution would be faster:
import numpy as np
def using_numpy(a, b):
"Reorder the list a the same way as list b would be reordered by a normal sort"
return np.array(a)[np.argsort(b)].tolist()
print using_numpy([4, 5, 6], [2, 3, 1]) # [6, 4, 5]
However, for short lists (length < 1000), this solution is in fact slower than the first. This is because we're first converting the a and b lists to array and then converting the result back to list before returning. If we instead assume you're using numpy arrays throughout your application so that we do not need to convert back and forth, we get this solution:
def all_numpy(a, b):
"Reorder array a the same way as array b would be reordered by a normal sort"
return a[np.argsort(b)]
print all_numpy(np.array([4, 5, 6]), np.array([2, 3, 1])) # array([6, 4, 5])
The all_numpy function executes up to 10 times faster than the using_pyargsort function.
The following logaritmic graph compares these three solutions with the two alternative solutions from the other answers. The arguments are two randomly shuffled ranges of equal length, and the functions all receive identically ordered lists. I'm timing only the time the function takes to execute. For illustrative purposes I've added in an extra graph line for each numpy solution where the 60 ms overhead for loading numpy is added to the time.
As we can see, the all-numpy solution beats the others by an order of magnitude. Converting from python list and back slows the using_numpy solution down considerably in comparison, but it still beats pure python for large lists.
For a list length of about 1'000'000, using_pyargsort takes 2.0 seconds, using_nympy + overhead is only 1.3 seconds, while all_numpy + overhead is 0.3 seconds.
The sorting you describe is not very easy to accomplish. The only way that I can think of to do it is to use zip to create the list you say you don't want to create:
lst = [[4,5,6],[2,3,1]]
# key = operator.itemgetter(1) works too, and may be slightly faster ...
transpose_sort = sorted(zip(*lst),key = lambda x: x[1])
lst = zip(*transpose_sort)
Is there a reason for this constraint?
(Also note that you could do this all in one line if you really want to:
lst = zip(*sorted(zip(*lst),key = lambda x: x[1]))
This also results in a list of tuples. If you really want a list of lists, you can map the result:
lst = map(list, lst)
Or a list comprehension would work as well:
lst = [ list(x) for x in lst ]
If the second list doesn't contain duplicates, you could just do this:
l = [[4,5,6],[2,3,1]] #the list
l1 = l[1][:] #a copy of the to-be-sorted sublist
l[1].sort() #sort the sublist
l[0] = [l[0][l1.index(x)] for x in l[1]] #order the first sublist accordingly
(As this saves the sublist l[1] it might be a bad idea if your input list is huge)
How about this one:
a = [[4,5,6],[2,3,1]]
[a[0][i] for i in sorted(range(len(a[1])), key=lambda x: a[1][x])]
It uses the principal way numpy does it without having to use numpy and without the zip stuff.
Neither using numpy nor the zipping around seems to be the cheapest way for giant structures. Unfortunately the .sort() method is built into the list type and uses hard-wired access to the elements in the list (overriding __getitem__() or similar does not have any effect here).
So you can implement your own sort() which sorts two or more lists according to the values in one; this is basically what numpy does.
Or you can create a list of values to sort, sort that, and recreate the sorted original list out of it.

Categories

Resources