I am looking for a way to easily split a python list in half.
So that if I have an array:
A = [0,1,2,3,4,5]
I would be able to get:
B = [0,1,2]
C = [3,4,5]
A = [1,2,3,4,5,6]
B = A[:len(A)//2]
C = A[len(A)//2:]
If you want a function:
def split_list(a_list):
half = len(a_list)//2
return a_list[:half], a_list[half:]
A = [1,2,3,4,5,6]
B, C = split_list(A)
A little more generic solution (you can specify the number of parts you want, not just split 'in half'):
def split_list(alist, wanted_parts=1):
length = len(alist)
return [ alist[i*length // wanted_parts: (i+1)*length // wanted_parts]
for i in range(wanted_parts) ]
A = [0,1,2,3,4,5,6,7,8,9]
print split_list(A, wanted_parts=1)
print split_list(A, wanted_parts=2)
print split_list(A, wanted_parts=8)
f = lambda A, n=3: [A[i:i+n] for i in range(0, len(A), n)]
f(A)
n - the predefined length of result arrays
def split(arr, size):
arrs = []
while len(arr) > size:
pice = arr[:size]
arrs.append(pice)
arr = arr[size:]
arrs.append(arr)
return arrs
Test:
x=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
print(split(x, 5))
result:
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13]]
If you don't care about the order...
def split(list):
return list[::2], list[1::2]
list[::2] gets every second element in the list starting from the 0th element.
list[1::2] gets every second element in the list starting from the 1st element.
Using list slicing. The syntax is basically my_list[start_index:end_index]
>>> i = [0,1,2,3,4,5]
>>> i[:3] # same as i[0:3] - grabs from first to third index (0->2)
[0, 1, 2]
>>> i[3:] # same as i[3:len(i)] - grabs from fourth index to end
[3, 4, 5]
To get the first half of the list, you slice from the first index to len(i)//2 (where // is the integer division - so 3//2 will give the floored result of1, instead of the invalid list index of1.5`):
>>> i[:len(i)//2]
[0, 1, 2]
..and the swap the values around to get the second half:
>>> i[len(i)//2:]
[3, 4, 5]
B,C=A[:len(A)/2],A[len(A)/2:]
Here is a common solution, split arr into count part
def split(arr, count):
return [arr[i::count] for i in range(count)]
def splitter(A):
B = A[0:len(A)//2]
C = A[len(A)//2:]
return (B,C)
I tested, and the double slash is required to force int division in python 3. My original post was correct, although wysiwyg broke in Opera, for some reason.
If you have a big list, It's better to use itertools and write a function to yield each part as needed:
from itertools import islice
def make_chunks(data, SIZE):
it = iter(data)
# use `xragne` if you are in python 2.7:
for i in range(0, len(data), SIZE):
yield [k for k in islice(it, SIZE)]
You can use this like:
A = [0, 1, 2, 3, 4, 5, 6]
size = len(A) // 2
for sample in make_chunks(A, size):
print(sample)
The output is:
[0, 1, 2]
[3, 4, 5]
[6]
Thanks to #thefourtheye and #Bede Constantinides
This is similar to other solutions, but a little faster.
# Usage: split_half([1,2,3,4,5]) Result: ([1, 2], [3, 4, 5])
def split_half(a):
half = len(a) >> 1
return a[:half], a[half:]
There is an official Python receipe for the more generalized case of splitting an array into smaller arrays of size n.
from itertools import izip_longest
def grouper(n, iterable, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return izip_longest(fillvalue=fillvalue, *args)
This code snippet is from the python itertools doc page.
10 years later.. I thought - why not add another:
arr = 'Some random string' * 10; n = 4
print([arr[e:e+n] for e in range(0,len(arr),n)])
While the answers above are more or less correct, you may run into trouble if the size of your array isn't divisible by 2, as the result of a / 2, a being odd, is a float in python 3.0, and in earlier version if you specify from __future__ import division at the beginning of your script. You are in any case better off going for integer division, i.e. a // 2, in order to get "forward" compatibility of your code.
#for python 3
A = [0,1,2,3,4,5]
l = len(A)/2
B = A[:int(l)]
C = A[int(l):]
General solution split list into n parts with parameter verification:
def sp(l,n):
# split list l into n parts
if l:
p = len(l) if n < 1 else len(l) // n # no split
p = p if p > 0 else 1 # split down to elements
for i in range(0, len(l), p):
yield l[i:i+p]
else:
yield [] # empty list split returns empty list
Since there was no restriction put on which package we can use.. Numpy has a function called split with which you can easily split an array any way you like.
Example
import numpy as np
A = np.array(list('abcdefg'))
np.split(A, 2)
With hints from #ChristopheD
def line_split(N, K=1):
length = len(N)
return [N[i*length/K:(i+1)*length/K] for i in range(K)]
A = [0,1,2,3,4,5,6,7,8,9]
print line_split(A,1)
print line_split(A,2)
Another take on this problem in 2020 ... Here's a generalization of the problem. I interpret the 'divide a list in half' to be .. (i.e. two lists only and there shall be no spillover to a third array in case of an odd one out etc). For instance, if the array length is 19 and a division by two using // operator gives 9, and we will end up having two arrays of length 9 and one array (third) of length 1 (so in total three arrays). If we'd want a general solution to give two arrays all the time, I will assume that we are happy with resulting duo arrays that are not equal in length (one will be longer than the other). And that its assumed to be ok to have the order mixed (alternating in this case).
"""
arrayinput --> is an array of length N that you wish to split 2 times
"""
ctr = 1 # lets initialize a counter
holder_1 = []
holder_2 = []
for i in range(len(arrayinput)):
if ctr == 1 :
holder_1.append(arrayinput[i])
elif ctr == 2:
holder_2.append(arrayinput[i])
ctr += 1
if ctr > 2 : # if it exceeds 2 then we reset
ctr = 1
This concept works for any amount of list partition as you'd like (you'd have to tweak the code depending on how many list parts you want). And is rather straightforward to interpret. To speed things up , you can even write this loop in cython / C / C++ to speed things up. Then again, I've tried this code on relatively small lists ~ 10,000 rows and it finishes in a fraction of second.
Just my two cents.
Thanks!
from itertools import islice
Input = [2, 5, 3, 4, 8, 9, 1]
small_list_length = [1, 2, 3, 1]
Input1 = iter(Input)
Result = [list(islice(Input1, elem)) for elem in small_list_length]
print("Input list :", Input)
print("Split length list: ", small_list_length)
print("List after splitting", Result)
You can try something like this with numpy
import numpy as np
np.array_split([1,2,3,4,6,7,8], 2)
result:
[array([1, 2, 3, 4]), array([6, 7, 8])]
Related
I have a random list like this
X = [0, 1, 5, 6, 7, 10, 15]
and need to find and replace every climbing sequence with its average.
In the end it should look like this:
X = [0, 6, 10, 15] #the 0 and 1 to 0; and the 5,6,7 to 6
I tried to find the sequence by subtracting the second value from the first like this:
y = 0
z = []
while X[y +1] -X[y] == 1:
z.append(X[y])
y = y +1
And now I dont know how to delete for example 5,6 and 7 and replace it with the average 6.
You can use itertools.groupby on the list with a key function that returns each item's difference with an incremental counter:
from itertools import groupby, count
from statistics import mean
X = [0, 1, 5, 6, 7, 10, 15]
c = count()
X = [int(mean(g)) for _, g in groupby(X, key=lambda i: i - next(c))]
X becomes:
[0, 6, 10, 15]
You can iterate and group in the same list each climbing sequence for then taking the mean.
>>> res = [[x[0]]]
>>> for i in range(1, len(x)):
... if x[i] == x[i-1] + 1:
... res[-1].append(x[i])
... else:
... res.append([x[i]]
>>> res
[[0, 1], [5, 6, 7], [10], [15]]
>>> [int(sum(l)/len(l)) for l in res]
[0, 6, 10, 15]
Here's a starting technique: make a new list that's the difference of adjacent elements in the list:
diff = [X[i] - X[i-1] for i in range(1, len(X)) ]
There are more "Pythonic" ways to do this, but I want to make sure this is accessible to newer programmers.
You now have diff as
[1, 4, 1, 1, 3, 5]
Where you have a 1 in diff, you have a climbing pair in X. Iterate through diff to find a sequence of 1 values. Where you find this, take the slice of X that corresponds to the 1 values. The middle element of that slice is your mean.
If the value is not 1, then you simply take the corresponding element of X, as you've been doing.
append the identified values to z, and there's your desired result.
Can you take it from there?
Not really to answer the question, which is a fairly basic CS 101 question that people should try to figure out themselves, but what I noticed about the nice answer of #blhsing was that it appeared fairly slow. I found that mean() is incredibly slow!
from itertools import groupby, count
from statistics import mean
from timeit import timeit
def generate_1step_seq1(xs):
result = []
n = 0
while n < len(xs):
# sequences with step of 1 only
if not result or xs[n] == result[-1] + 1:
result += [xs[n]]
else:
# int result, rounding down
yield sum(result) // len(result)
result = [xs[n]]
n += 1
if result:
yield sum(result) // len(result)
def generate_1step_seq2(xs):
c = count()
return [int(sum(xs) // len(xs)) for xs in [list(g) for _, g in groupby(xs, key=lambda i: i - next(c))]]
def generate_1step_seq3(xs):
c = count()
return [int(mean(g)) for _, g in groupby(xs, key=lambda i: i - next(c))]
values = [0, 1, 5, 6, 7, 10, 15]
print(list(generate_1step_seq1(values)))
print(generate_1step_seq2(values))
print(generate_1step_seq3(values))
print(timeit(lambda: list(generate_1step_seq1(values)), number=10000))
print(timeit(lambda: list(generate_1step_seq2(values)), number=10000))
print(timeit(lambda: list(generate_1step_seq3(values)), number=10000))
Initially I figured that was probably due to the tiny list size, but even for large lists, mean() is horribly slow. Anyone happen to know why? It appears due to the very safe nature of statistics _sum, trying to avoid float rounding errors?
I had the following code:
return [p.to_dict() for p in points]
I changed it to only print every nth row:
n = 100
count = 0
output = []
for p in points:
if (count % n == 0):
output.append(p.to_dict())
count += 1
return output
Is there a more pythonic way to write this, to acheive the same result?
use enumerate and modulo on the index in a modified list comprehension to filter the ones dividable by n:
return [p.to_dict() for i,p in enumerate(points) if i % n == 0]
List comprehension filtering is good, but in that case, eduffy answer which suggests to use slicing with a step is better since the indices are directly computed. Use the filter part only when you cannot predict the indices.
Improving this answer even more: It's even better to use itertools.islice so not temporary list is generated:
import itertools
return [p.to_dict() for p in itertools.islice(points,None,None,n)]
itertools.islice(points,None,None,n) is equivalent to points[::n] but performing lazy evaluation.
The list slicing syntax takes an optional third argument to define the "step". This take every 3rd in a list:
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> range(10)[::3]
[0, 3, 6, 9]
you can use enumerate with list comprehension.
[p.to_dict() for i, p in enumerate(points) if i %100 == 0]
Suppose I have the following list of lists:
a = [
[1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6]
]
I want to have the average of each n-th element in the arrays. However, when wanting to do this in a simple way, Python generated out-of-bounds errors because of the different lengths. I solved this by giving each array the length of the longest array, and filling the missing values with None.
Unfortunately, doing this made it impossible to compute an average, so I converted the arrays into masked arrays. The code shown below works, but it seems rather cumbersome.
import numpy as np
import numpy.ma as ma
a = [ [1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6] ]
# Determine the length of the longest list
lenlist = []
for i in a:
lenlist.append(len(i))
max = np.amax(lenlist)
# Fill each list up with None's until required length is reached
for i in a:
if len(i) <= max:
for j in range(max - len(i)):
i.append(None)
# Fill temp_array up with the n-th element
# and add it to temp_array
temp_list = []
masked_arrays = []
for j in range(max):
for i in range(len(a)):
temp_list.append(a[i][j])
masked_arrays.append(ma.masked_values(temp_list, None))
del temp_list[:]
# Compute the average of each array
avg_array = []
for i in masked_arrays:
avg_array.append(np.ma.average(i))
print avg_array
Is there a way to do this more quickly? The final list of lists will contain 600000 'rows' and up to 100 'columns', so efficiency is quite important :-).
tertools.izip_longest would do all the padding with None's for you so your code can be reduced to:
import numpy as np
import numpy.ma as ma
from itertools import izip_longest
a = [ [1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6] ]
averages = [np.ma.average(ma.masked_values(temp_list, None)) for temp_list in izip_longest(*a)]
print(averages)
[2.0, 3.0, 4.0, 6.0]
No idea what the fastest way in regard to the numpy logic but this is definitely going to be a lot more efficient than your own code.
If you wanted a faster pure python solution:
from itertools import izip_longest, imap
a = [[1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6]]
def avg(x):
x = filter(None, x)
return sum(x, 0.0) / len(x)
filt = imap(avg, izip_longest(*a))
print(list(filt))
[2.0, 3.0, 4.0, 6.0]
If you have 0's in the arrays that won't work as 0 will be treated as Falsey, you will have to use a list comp to filter in that case but it will still be faster:
def avg(x):
x = [i for i in x if i is not None]
return sum(x, 0.0) / len(x)
filt = imap(avg, izip_longest(*a))
Here's an almost* fully vectorized solution based on np.bincount and np.cumsum -
# Store lengths of each list and their cumulative and entire summations
lens = np.array([len(i) for i in a]) # Only loop to get lengths
C = lens.cumsum()
N = lens.sum()
# Create ID array such that the first element of each list is 0,
# the second element as 1 and so on. This is needed in such a format
# for use with bincount later on.
shifts_arr = np.ones(N,dtype=int)
shifts_arr[C[:-1]] = -lens[:-1]+1
id_arr = shifts_arr.cumsum()-1
# Use bincount to get the summations and thus the
# averages across all lists based on their positions.
avg_out = np.bincount(id_arr,np.concatenate(a))/np.bincount(id_arr)
-* Almost because we are getting the lengths of lists with a loop, but with minimal computation involved there, must not affect the total runtime hugely.
Sample run -
In [109]: a = [ [1, 2, 3],
...: [2, 3, 4],
...: [3, 4, 5, 6] ]
In [110]: lens = np.array([len(i) for i in a])
...: C = lens.cumsum()
...: N = lens.sum()
...:
...: shifts_arr = np.ones(N,dtype=int)
...: shifts_arr[C[:-1]] = -lens[:-1]+1
...: id_arr = shifts_arr.cumsum()-1
...:
...: avg_out = np.bincount(id_arr,np.concatenate(a))/np.bincount(id_arr)
...:
In [111]: avg_out
Out[111]: array([ 2., 3., 4., 6.])
You can already clean your code to compute the max length: this single line does the job:
len(max(a,key=len))
Combining with other answer you will get the result like so:
[np.mean([x[i] for x in a if len(x) > i]) for i in range(len(max(a,key=len)))]
You can also avoid the masked array and use np.nan instead:
def replaceNoneTypes(x):
return tuple(np.nan if isinstance(y, type(None)) else y for y in x)
a = [np.nanmean(replaceNoneTypes(temp_list)) for temp_list in zip_longest(*df[column], fillvalue=np.nan)]
On your test array:
[np.mean([x[i] for x in a if len(x) > i]) for i in range(4)]
returns
[2.0, 3.0, 4.0, 6.0]
If you are using Python version >= 3.4, then import the statistics module
from statistics import mean
if using lower versions, create a function to calculate mean
def mean(array):
sum = 0
if (not(type(array) == list)):
print("there is some bad format in your input")
else:
for elements in array:
try:
sum = sum + float(elements)
except:
print("non numerical entry found")
average = (sum + 0.0) / len(array)
return average
Create a list of lists, for example
myList = [[1,2,3],[4,5,6,7,8],[9,10],[11,12,13,14],[15,16,17,18,19,20,21,22],[23]]
iterate through myList
for i, lists in enumerate(myList):
print(i, mean(lists))
This will print down the sequence n, and the average of nth list.
To find particularly the average of only nth list, create a function
def mean_nth(array, n):
if((type(n) == int) and n >= 1 and type(array) == list):
return mean(myList[n-1])
else:
print("there is some bad format of your input")
Note that index starts from zero, so for instance if you are looking for the mean of 5th list, it will be at index 4. this explains n-1 in the code.
And then call the function, for example
avg_5thList = mean_nth(myList, 5)
print(avg_5thList)
Running the above code on myList yields following result:
0 2.0
1 6.0
2 9.5
3 12.5
4 18.5
5 23.0
18.5
where the first six lines are generated from the iterative loop, and display the index of nth list and list average. Last line (18.5) displays the average of 5th list as a result of mean_nth(myList, 5) call.
Further, for a list like yours,
a = [
[1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6]
]
Lets say you want average of 1st elements, i.e. (1+2+3)/3 = 2, or 2nd elements, i.e., (2+3+4)/3 = 3, or 4th elements such as 6/1 = 6, you will need to find the length of each list so that you can identify in the nth element exists in a list or not. For that, you first need to arrange your list of lists in the order of length of lists.
You can either
1) first sort the main list according to size of constituent lists iteratively, and then go through the sorted list to identify if the constituent lists are of sufficient length
2) or you can iteratively look into the original list for length of constituent lists.
(I can definitely get back with working out a faster recursive algorithm if needed)
Computationally second one is more efficient, so assuming that your 5th element means 4th in the index(0, 1, 2, 3, 4), or nth element means (n-1)th element, lets go with that and create a function
def find_nth_average(array, n):
if(not(type(n) == int and (int(n) >= 1))):
return "Bad input format for n"
else:
if (not(type(array) == list)):
return "Bad input format for main list"
else:
total = 0
count = 0
for i, elements in enumerate(array):
if(not(type(elements) == list)):
return("non list constituent found at location " + str(i+1))
else:
listLen = len(elements)
if(int(listLen) >= n):
try:
total = total + elements[n-1]
count = count + 1
except:
return ("non numerical entity found in constituent list " + str(i+1))
if(int(count) == 0):
return "No such n-element exists"
else:
average = float(total)/float(count)
return average
Now lets call this function on your list a
print(find_nth_average(a, 0))
print(find_nth_average(a, 1))
print(find_nth_average(a, 2))
print(find_nth_average(a, 3))
print(find_nth_average(a, 4))
print(find_nth_average(a, 5))
print(find_nth_average(a, 'q'))
print(find_nth_average(a, 2.3))
print(find_nth_average(5, 5))
The corresponding results are:
Bad input format for n
2.0
3.0
4.0
6.0
No such n-element exists
Bad input format for n
Bad input format for n
Bad input format for main list
If you have an erratic list, like
a = [[1, 2, 3], 2, [3, 4, 5, 6]]
that contains a non - list element, you get an output:
non list constituent found at location 2
If your constituent list is erratic, like:
a = [[1, 'p', 3], [2, 3, 4], [3, 4, 5, 6]]
that contains a non - numerical entity in a list, and find the average of 2nd elements by print(find_nth_average(a, 2))
you get an output:
non numerical entity found in constituent list 1
So I want to create a list which is a sublist of some existing list.
For example,
L = [1, 2, 3, 4, 5, 6, 7], I want to create a sublist li such that li contains all the elements in L at odd positions.
While I can do it by
L = [1, 2, 3, 4, 5, 6, 7]
li = []
count = 0
for i in L:
if count % 2 == 1:
li.append(i)
count += 1
But I want to know if there is another way to do the same efficiently and in fewer number of steps.
Solution
Yes, you can:
l = L[1::2]
And this is all. The result will contain the elements placed on the following positions (0-based, so first element is at position 0, second at 1 etc.):
1, 3, 5
so the result (actual numbers) will be:
2, 4, 6
Explanation
The [1::2] at the end is just a notation for list slicing. Usually it is in the following form:
some_list[start:stop:step]
If we omitted start, the default (0) would be used. So the first element (at position 0, because the indexes are 0-based) would be selected. In this case the second element will be selected.
Because the second element is omitted, the default is being used (the end of the list). So the list is being iterated from the second element to the end.
We also provided third argument (step) which is 2. Which means that one element will be selected, the next will be skipped, and so on...
So, to sum up, in this case [1::2] means:
take the second element (which, by the way, is an odd element, if you judge from the index),
skip one element (because we have step=2, so we are skipping one, as a contrary to step=1 which is default),
take the next element,
Repeat steps 2.-3. until the end of the list is reached,
EDIT: #PreetKukreti gave a link for another explanation on Python's list slicing notation. See here: Explain Python's slice notation
Extras - replacing counter with enumerate()
In your code, you explicitly create and increase the counter. In Python this is not necessary, as you can enumerate through some iterable using enumerate():
for count, i in enumerate(L):
if count % 2 == 1:
l.append(i)
The above serves exactly the same purpose as the code you were using:
count = 0
for i in L:
if count % 2 == 1:
l.append(i)
count += 1
More on emulating for loops with counter in Python: Accessing the index in Python 'for' loops
For the odd positions, you probably want:
>>>> list_ = list(range(10))
>>>> print list_[1::2]
[1, 3, 5, 7, 9]
>>>>
I like List comprehensions because of their Math (Set) syntax. So how about this:
L = [1, 2, 3, 4, 5, 6, 7]
odd_numbers = [y for x,y in enumerate(L) if x%2 != 0]
even_numbers = [y for x,y in enumerate(L) if x%2 == 0]
Basically, if you enumerate over a list, you'll get the index x and the value y. What I'm doing here is putting the value y into the output list (even or odd) and using the index x to find out if that point is odd (x%2 != 0).
You can also use itertools.islice if you don't need to create a list but just want to iterate over the odd/even elements
import itertools
L = [1, 2, 3, 4, 5, 6, 7]
li = itertools.islice(l, 1, len(L), 2)
You can make use of bitwise AND operator &:
>>> x = [1, 2, 3, 4, 5, 6, 7]
>>> y = [i for i in x if i&1]
[1, 3, 5, 7]
This will give you the odd elements in the list. Now to extract the elements at odd indices you just need to change the above a bit:
>>> x = [10, 20, 30, 40, 50, 60, 70]
>>> y = [j for i, j in enumerate(x) if i&1]
[20, 40, 60]
Explanation
Bitwise AND operator is used with 1, and the reason it works is because, odd number when written in binary must have its first digit as 1. Let's check:
23 = 1 * (2**4) + 0 * (2**3) + 1 * (2**2) + 1 * (2**1) + 1 * (2**0) = 10111
14 = 1 * (2**3) + 1 * (2**2) + 1 * (2**1) + 0 * (2**0) = 1110
AND operation with 1 will only return 1 (1 in binary will also have last digit 1), iff the value is odd.
Check the Python Bitwise Operator page for more.
P.S: You can tactically use this method if you want to select odd and even columns in a dataframe. Let's say x and y coordinates of facial key-points are given as columns x1, y1, x2, etc... To normalize the x and y coordinates with width and height values of each image you can simply perform:
for i in range(df.shape[1]):
if i&1:
df.iloc[:, i] /= heights
else:
df.iloc[:, i] /= widths
This is not exactly related to the question but for data scientists and computer vision engineers this method could be useful.
Python has a range method, which allows for stuff like:
>>> range(1, 6)
[1, 2, 3, 4, 5]
What I’m looking for is kind of the opposite: take a list of numbers, and return the start and end.
>>> magic([1, 2, 3, 4, 5])
[1, 5] # note: 5, not 6; this differs from `range()`
This is easy enough to do for the above example, but is it possible to allow for gaps or multiple ranges as well, returning the range in a PCRE-like string format? Something like this:
>>> magic([1, 2, 4, 5])
['1-2', '4-5']
>>> magic([1, 2, 3, 4, 5])
['1-5']
Edit: I’m looking for a Python solution, but I welcome working examples in other languages as well. It’s more about figuring out an elegant, efficient algorithm. Bonus question: is there any programming language that has a built-in method for this?
A nice trick to simplify the code is to look at the difference of each element of the sorted list and its index:
a = [4, 2, 1, 5]
a.sort()
print [x - i for i, x in enumerate(a)]
prints
[1, 1, 2, 2]
Each run of the same number corresponds to a run of consecutive numbers in a. We can now use itertools.groupby() to extract these runs. Here's the complete code:
from itertools import groupby
def sub(x):
return x[1] - x[0]
a = [5, 3, 7, 4, 1, 2, 9, 10]
ranges = []
for k, iterable in groupby(enumerate(sorted(a)), sub):
rng = list(iterable)
if len(rng) == 1:
s = str(rng[0][1])
else:
s = "%s-%s" % (rng[0][1], rng[-1][1])
ranges.append(s)
print ranges
printing
['1-5', '7', '9-10']
Sort numbers, find consecutive ranges (remember RLE compression?).
Something like this:
input = [5,7,9,8,6, 21,20, 3,2,1, 22,23, 50]
output = []
first = last = None # first and last number of current consecutive range
for item in sorted(input):
if first is None:
first = last = item # bootstrap
elif item == last + 1: # consecutive
last = item # extend the range
else: # not consecutive
output.append((first, last)) # pack up the range
first = last = item
# the last range ended by iteration end
output.append((first, last))
print output
Result: [(1, 3), (5, 9), (20, 23), (50, 50)]. You figure out the rest :)
I thought you might like my generalised clojure solution.
(def r [1 2 3 9 10])
(defn successive? [a b]
(= a (dec b)))
(defn break-on [pred s]
(reduce (fn [memo n]
(if (empty? memo)
[[n]]
(if (pred (last (last memo)) n)
(conj (vec (butlast memo))
(conj (last memo) n))
(conj memo [n]))))
[]
s))
(break-on successive? r)
Since 9000 beat me to it, I'll just post the second part of the code, that prints pcre-like ranges from the previously computed output plus the added type check:
for i in output:
if not isinstance(i, int) or i < 0:
raise Exception("Only positive ints accepted in pcre_ranges")
result = [ str(x[0]) if x[0] == x[1] else '%s-%s' % (x[0], x[1]) for x in output ]
print result
Output: ['1-3', '5-9', '20-23', '50']
Let's try generators!
# ignore duplicate values
l = sorted( set( [5,7,9,8,6, 21,20, 3,2,1, 22,23, 50] ) )
# get the value differences
d = (i2-i1 for i1,i2 in zip(l,l[1:]))
# get the gap indices
gaps = (i for i,e in enumerate(d) if e != 1)
# get the range boundaries
def get_ranges(gaps, l):
last_idx = -1
for i in gaps:
yield (last_idx+1, i)
last_idx = i
yield (last_idx+1,len(l)-1)
# make a list of strings in the requested format (thanks Frg!)
ranges = [ "%s-%s" % (l[i1],l[i2]) if i1!=i2 else str(l[i1]) \
for i1,i2 in get_ranges(gaps, l) ]
This has become rather scary, I think :)
This is kind of elegant but also kind of disgusting, depending on your point of view. :)
import itertools
def rangestr(iterable):
end = start = iterable.next()
for end in iterable:
pass
return "%s" % start if start == end else "%s-%s" % (start, end)
class Rememberer(object):
last = None
class RangeFinder(object):
def __init__(self):
self.o = Rememberer()
def __call__(self, x):
if self.o.last is not None and x != self.o.last + 1:
self.o = Rememberer()
self.o.last = x
return self.o
def magic(iterable):
return [rangestr(vals) for k, vals in
itertools.groupby(sorted(iterable), RangeFinder())]
>>> magic([5,7,9,8,6, 21,20, 3,2,1, 22,23, 50])
['1-3', '5-9', '20-23', '50']
Explanation: it uses itertools.groupby to group the sorted elements together by a key, where the key is a Rememberer object. The RangeFinder class keeps a Rememberer object as long as a consecutive bunch of items belongs to the same range block. Once you've passed out of a given block, it replaces the Rememberer so that the key won't compare equal and groupby will make a new group. As groupby walks over the sorted list, it passes the elements one-by-one into rangestr, which constructs the string by remembering the first and the last element and ignoring everything in between.
Is there any practical reason to use this instead of 9000's answer? Probably not; it's basically the same algorithm.