Selective flattening of a Python list - python

Suppose I have a list containing (among other things) sublists of different types:
[1, 2, [3, 4], {5, 6}]
that I'd like to flatten in a selective way, depending on the type of its elements (i.e. I'd like to only flatten sets, and leave the rest unflattened):
[1, 2, [3, 4], 5, 6]
My current solution is a function, but just for my intellectual curiosity, I wonder if it's possible to do it with a single list comprehension?

List comprehensions aren't designed for flattening (since they don't have a way to combine the values corresponding to multiple input items).
While you can get around this with nested list comprehensions, this requires each element in your top level list to be iterable.
Honestly, just use a function for this. It's the cleanest way.

Amber is probably right that a function is preferable for something like this. On the other hand, there's always room for a little variation. I'm assuming the nesting is never more than one level deep -- if it is ever more than one level deep, then you should definitely prefer a function for this. But if not, this is a potentially viable approach.
>>> from itertools import chain
>>> from collections import Set
>>> list(chain.from_iterable(x if isinstance(x, Set) else (x,) for x in l))
[1, 2, [3, 4], 5, 6]
The non-itertools way to do this would involve nested list comprehensions. Better to break that into two lines:
>>> packaged = (x if isinstance(x, collections.Set) else (x,) for x in l)
>>> [x for y in packaged for x in y]
[1, 2, [3, 4], 5, 6]
I don't have a strong intuition about whether either of these would be faster or slower than a straightforward function. These create lots of singleton tuples -- that's kind of a waste -- but they also happen at LC speed, which is usually pretty good.

You can use flatten function from funcy library:
from funcy import flatten, isa
flat_list = flatten(your_list, follow=isa(set))
You can also peek at its implementation.

Related

Does Python keep track of when something has been sorted, internally?

For example, if I call
L = [3,4,2,1,5]
L = sorted(L)
I get a sorted list. Now, in the future, if I want to perform some other kind of sort on L, does Python automatically know "this list has been sorted before and not modified since, so we can perform some internal optimizations on how we perform this other kind of sort" such as a reverse-sort, etc?
Nope, it doesn't. The sorting algorithm is designed to exploit (partially) sorted inputs, but the list itself doesn't "remember" being sorted in any way.
(This is actually a CPython implementation detail, and future versions/different implementations could cache the fact that a list was just sorted. However, I'm not convinced that could be done without slowing down all operations that modify the list, such as append.)
As the commenters pointed out, normal Python lists are inherently ordered and efficiently sortable (thanks, Timsort!), but do not remember or maintain sorting status.
If you want lists that invariably retain their sorted status, you can install the SortedContainers package from PyPI.
>>> from sortedcontainers import SortedList
>>> L = SortedList([3,4,2,1,5])
>>> L
SortedList([1, 2, 3, 4, 5])
>>> L.add(3.3)
>>> L
SortedList([1, 2, 3, 3.3, 4, 5])
Note the normal append method becomes add, because the item isn't added on the end. It's added wherever appropriate given the sort order. There is also a SortedListWithKey type that allows you to set your sort key/order explicitly.
Some of this, at least the specific reverse sort question, could be done using numpy:
import numpy as np
L = np.array([3,4,2,1,5])
a = np.argsort(L)
b = L[a]
r = L[a[::-1]]
print L
[3 4 2 1 5]
print b
[1 2 3 4 5]
print r
[5, 4, 3, 2, 1]
That is, here we just do the sort once (to create a, the sorting indices), and then we can manipulate a, to do other various sorts, like the normal sort b, and the reverse sort r. And many others would be similarly easy, like every other element.

How can I skip creating tuples with certain values, when using the zip function in Python?

I am using the zip() function in Python.
I am zipping two lists, however, some lists have the value 0 in them. I would like to avoid any tuples with a zero in them.
x = [1, 2, 3]
y = [0, 0, 6]
zipped = zip(x, y)
>>> zipped
[(3, 6)]
In the process of trying to figure this out, I have also come to realize that any possible solution is probably going to be slow, if it involves zipping then removing.
Is there any way to incorporate a condition on zipping? Or should I explore a different, faster method?
Any advice is gratefully appreciated.
You could izip and only pass through the proper values:
from itertools import izip
zipped = [(xx, yy) for xx, yy in izip(x, y) if yy]
The reason I choose izip over zip is because in python2.x, zip will generate a new list which really isn't necessary. izip avoids that overhead.
In python3.x, the builtin zip doesn't have this list materializing behavior so you should use that instead. (in fact, itertools.izip doesn't even exist in python3.x)
zip() by itself won't cover this particular requirement. Nevertheless you can use it along with list comprehensions to get the desired result:
>>>x = [1, 2, 3]
>>>y = [0, 0, 6]
>>>zipped = [(a,b) for a,b in zip(x,y) if a and b]
>>>zipped
[(3,6)]
Regarding to the speed of this method. It will be a lot faster than any new solution you can create yourself using loops, since the looping in this case is performed natively in C which is faster than Python's loops.

Optimized method of cutting/slicing sorted lists

Is there any pre-made optimized tool/library in Python to cut/slice lists for values "less than" something?
Here's the issue: Let's say I have a list like:
a=[1,3,5,7,9]
and I want to delete all the numbers which are <= 6, so the resulting list would be
[7,9]
6 is not in the list, so I can't use the built-in index(6) method of the list. I can do things like:
#!/usr/bin/env python
a = [1, 3, 5, 7, 9]
cut=6
for i in range(len(a)-1, -2, -1):
if a[i] <= cut:
break
b = a[i+1:]
print "Cut list: %s" % b
which would be fairly quick method if the index to cut from is close to the end of the list, but which will be inefficient if the item is close to the beginning of the list (let's say, I want to delete all the items which are >2, there will be a lot of iterations).
I can also implement my own find method using binary search or such, but I was wondering if there's a more... wide-scope built in library to handle this type of things that I could reuse in other cases (for instance, if I need to delete all the number which are >=6).
Thank you in advance.
You can use the bisect module to perform a sorted search:
>>> import bisect
>>> a[bisect.bisect_left(a, 6):]
[7, 9]
bisect.bisect_left is what you are looking for, I guess.
If you just want to filter the list for all elements that fulfil a certain criterion, then the most straightforward way is to use the built-in filter function.
Here is an example:
a_list = [10,2,3,8,1,9]
# filter all elements smaller than 6:
filtered_list = filter(lambda x: x<6, a_list)
the filtered_list will contain:
[2, 3, 1]
Note: This method does not rely on the ordering of the list, so for very large lists it might be that a method optimised for ordered searching (as bisect) performs better in terms of speed.
Bisect left and right helper function
#!/usr/bin/env python3
import bisect
def get_slice(list_, left, right):
return list_[
bisect.bisect_left(list_, left):
bisect.bisect_left(list_, right)
]
assert get_slice([0, 1, 1, 3, 4, 4, 5, 6], 1, 5) == [1, 1, 3, 4, 4]
Tested in Ubuntu 16.04, Python 3.5.2.
Adding to Jon's answer, if you need to actually delete the elements less than 6 and want to keep the same reference to the list, rather than returning a new one.
del a[:bisect.bisect_right(a,6)]
You should note as well that bisect will only work on a sorted list.

Sorting based on one of the list among Nested list in python

I have a list as [[4,5,6],[2,3,1]]. Now I want to sort the list based on list[1] i.e. output should be [[6,4,5],[1,2,3]]. So basically I am sorting 2,3,1 and maintaining the order of list[0].
While searching I got a function which sorts based on first element of every list but not for this. Also I do not want to recreate list as [[4,2],[5,3],[6,1]] and then use the function.
Since [4, 5, 6] and [2, 3, 1] serves two different purposes I will make a function taking two arguments: the list to be reordered, and the list whose sorting will decide the order. I'll only return the reordered list.
This answer has timings of three different solutions for creating a permutation list for a sort. Using the fastest option gives this solution:
def pyargsort(seq):
return sorted(range(len(seq)), key=seq.__getitem__)
def using_pyargsort(a, b):
"Reorder the list a the same way as list b would be reordered by a normal sort"
return [a[i] for i in pyargsort(b)]
print using_pyargsort([4, 5, 6], [2, 3, 1]) # [6, 4, 5]
The pyargsort method is inspired by the numpy argsort method, which does the same thing much faster. Numpy also has advanced indexing operations whereby an array can be used as an index, making possible very quick reordering of an array.
So if your need for speed is great, one would assume that this numpy solution would be faster:
import numpy as np
def using_numpy(a, b):
"Reorder the list a the same way as list b would be reordered by a normal sort"
return np.array(a)[np.argsort(b)].tolist()
print using_numpy([4, 5, 6], [2, 3, 1]) # [6, 4, 5]
However, for short lists (length < 1000), this solution is in fact slower than the first. This is because we're first converting the a and b lists to array and then converting the result back to list before returning. If we instead assume you're using numpy arrays throughout your application so that we do not need to convert back and forth, we get this solution:
def all_numpy(a, b):
"Reorder array a the same way as array b would be reordered by a normal sort"
return a[np.argsort(b)]
print all_numpy(np.array([4, 5, 6]), np.array([2, 3, 1])) # array([6, 4, 5])
The all_numpy function executes up to 10 times faster than the using_pyargsort function.
The following logaritmic graph compares these three solutions with the two alternative solutions from the other answers. The arguments are two randomly shuffled ranges of equal length, and the functions all receive identically ordered lists. I'm timing only the time the function takes to execute. For illustrative purposes I've added in an extra graph line for each numpy solution where the 60 ms overhead for loading numpy is added to the time.
As we can see, the all-numpy solution beats the others by an order of magnitude. Converting from python list and back slows the using_numpy solution down considerably in comparison, but it still beats pure python for large lists.
For a list length of about 1'000'000, using_pyargsort takes 2.0 seconds, using_nympy + overhead is only 1.3 seconds, while all_numpy + overhead is 0.3 seconds.
The sorting you describe is not very easy to accomplish. The only way that I can think of to do it is to use zip to create the list you say you don't want to create:
lst = [[4,5,6],[2,3,1]]
# key = operator.itemgetter(1) works too, and may be slightly faster ...
transpose_sort = sorted(zip(*lst),key = lambda x: x[1])
lst = zip(*transpose_sort)
Is there a reason for this constraint?
(Also note that you could do this all in one line if you really want to:
lst = zip(*sorted(zip(*lst),key = lambda x: x[1]))
This also results in a list of tuples. If you really want a list of lists, you can map the result:
lst = map(list, lst)
Or a list comprehension would work as well:
lst = [ list(x) for x in lst ]
If the second list doesn't contain duplicates, you could just do this:
l = [[4,5,6],[2,3,1]] #the list
l1 = l[1][:] #a copy of the to-be-sorted sublist
l[1].sort() #sort the sublist
l[0] = [l[0][l1.index(x)] for x in l[1]] #order the first sublist accordingly
(As this saves the sublist l[1] it might be a bad idea if your input list is huge)
How about this one:
a = [[4,5,6],[2,3,1]]
[a[0][i] for i in sorted(range(len(a[1])), key=lambda x: a[1][x])]
It uses the principal way numpy does it without having to use numpy and without the zip stuff.
Neither using numpy nor the zipping around seems to be the cheapest way for giant structures. Unfortunately the .sort() method is built into the list type and uses hard-wired access to the elements in the list (overriding __getitem__() or similar does not have any effect here).
So you can implement your own sort() which sorts two or more lists according to the values in one; this is basically what numpy does.
Or you can create a list of values to sort, sort that, and recreate the sorted original list out of it.

Does Ruby have something like Python's list comprehensions?

Python has a nice feature:
print([j**2 for j in [2, 3, 4, 5]]) # => [4, 9, 16, 25]
In Ruby it's even simpler:
puts [2, 3, 4, 5].map{|j| j**2}
but if it's about nested loops Python looks more convenient.
In Python we can do this:
digits = [1, 2, 3]
chars = ['a', 'b', 'c']
print([str(d)+ch for d in digits for ch in chars if d >= 2 if ch == 'a'])
# => ['2a', '3a']
The equivalent in Ruby is:
digits = [1, 2, 3]
chars = ['a', 'b', 'c']
list = []
digits.each do |d|
chars.each do |ch|
list.push d.to_s << ch if d >= 2 && ch == 'a'
end
end
puts list
Does Ruby have something similar?
The common way in Ruby is to properly combine Enumerable and Array methods to achieve the same:
digits.product(chars).select{ |d, ch| d >= 2 && ch == 'a' }.map(&:join)
This is only 4 or so characters longer than the list comprehension and just as expressive (IMHO of course, but since list comprehensions are just a special application of the list monad, one could argue that it's probably possible to adequately rebuild that using Ruby's collection methods), while not needing any special syntax.
As you know Ruby has no syntactic sugar for list-comprehensions, so the closer you can get is by using blocks in imaginative ways. People have proposed different ideas, take a look at lazylist and verstehen approaches, both support nested comprehensions with conditions:
require 'lazylist'
list { [x, y] }.where(:x => [1, 2], :y => [3, 4]) { x+y>4 }.to_a
#=> [[1, 4], [2, 3], [2, 4]]
require 'verstehen'
list { [x, y] }.for(:x).in { [1, 2] }.for(:y).in { [3, 4] }.if { x+y>4 }.comprehend
#=> [[1, 4], [2, 3], [2, 4]]
Of course that's not what you'd call idiomatic Ruby, so it's usually safer to use the typical product + select + map approach.
As suggested by RBK above, List comprehension in Ruby provides a whole slew of different ways to do things sort of like list comprehensions in Ruby.
None of them explicitly describe nested loops, but at least some of them can be nested quite easily.
For example, the accepted answer by Robert Gamble suggests adding an Array#comprehend method.
class Array
def comprehend(&block)
return self if block.nil?
self.collect(&block).compact
end
end
Having done that, you can write your code as:
digits.comprehend{|d| chars.comprehend{|ch| d.to_s+ch if ch =='a'} if d>=2}
Compare to the Python code:
[str(d)+ch for d in digits for ch in chars if d >= 2 if ch == 'a']
The differences are pretty minor:
The Ruby code is a bit longer. But that's mostly just the fact that "comprehend" is spelled out; you can always call it something shorter if you want.
The Ruby code puts things in a different order—the arrays come at the beginning instead of in the middle. But if you think about it, that's exactly what you'd expect, and want, because of the "everything is a method" philosophy.
The Ruby code requires nested braces for nested comprehensions. I can't think of an obvious way around this that doesn't make things worse (you don't want to call "[str,digits].comprehend2" or anything…).
Of course the real strength of Python here is that if you decide you want to evaluate the list lazily, you can convert your comprehension into a generator expression just by removing the brackets (or turning them into parentheses, depending on the context). But even there, you could create an Array#lazycomprehend or something.

Categories

Resources