Splitting a list of lists in specific intervals in Python - python

I have a really big list of lists with integers that is also sorted from low to high. Also every nested list is an arithmetic sequence increasing by 1 . To make it more clear it could look something like:
f = [[0,1], [3], [7,8,9,10,11,12], [15,16], [18], [22,23,24], [39,40,41], [49,50,51]]
My goal is to split the nested big list into smaller nested lists. My first list must have numbers between 0 and 10, my second list must have numbers between 20 and 30 , my third between 40 to 50 etc. So I was wondering if there is a way to code in python to get the following lists:
f1 = [[0,1],[3],[7,8,9,10]]
f2 = [[22,23,24]]
f3 = [[40,41],[49,50]]

Here is one way to do so:
data = []
for i in range(0, f[-1][-1], 20):
new_seqs = []
for seq in f:
if i - len(seq) + 1 <= seq[0] <= i + 10:
new_nums = []
for num in seq:
if i <= num <= i + 10:
new_nums.append(num)
new_seqs.append(new_nums)
data.append(new_seqs)
print(data)
The same using list comprehension:
data = [[[num for num in seq if i <= num <= i + 10] for seq in f if i - len(seq) + 1 <= seq[0] <= i + 10] for i in range(0, f[-1][-1], 20)]
print(data)
Output:
[[[0, 1], [3], [7, 8, 9, 10]], [[22, 23, 24]], [[40, 41], [49, 50]]]
We run a for loop from 0 to the largest element in the list (f[-1][-1]), increasing by 20 each time.
For each sub-list we check if at least one element is included between i and i + 10. As it is an arithmetic sequence of common difference 1, we only have to compare the first term.
If there is at least one term in the interval, we compare each number with i and i + 10.

I guess, similar to this is what you are looking for?
from collections import defaultdict
mem = defaultdict(list)
# expectations: each sub-list is stickly between x * 10 and (x+1) * 10 where x is a number
def find_list_number(num, i=10):
if num < i:
return int(i / 10)
else:
return find_list_number(num, i + 10)
for sub_list in flist:
my_max = max(sub_list)
key = find_list_number(my_max)
mem[key].append(sub_list)
for k, v in mem.items():
print((k, v))
Sample Output for above
(1, [[0, 1], [3]])
(2, [[7, 8, 9, 10, 11, 12], [15, 16], [18]])
(3, [[22, 23, 24]])
(5, [[39, 40, 41]])
(6, [[49, 50, 51]])
Note: [7,8,9,10,11,12] is in different class - not bug here. But you can modify conditions as you need & add additional conditions as it suits you. This is a sample, only to guide you.

Related

Pythonic way to separate a list into several sublists where each sublist can only take values within a given range

I have a list of integers which span several useful ranges. I found in:
Split a list based on a condition?
A way to do that and my code works, but I wondered if there was a better way to do it while making it readable?
EDIT: I would encourage anyone with a similar need to explore all of the answers given, they all have different methods and merits.
Thank you to all who helped on this.
Working inelegant code
my_list = [1, 2, 11, 29, 37]
r1_lim = 10
r2_lim = 20
r3_lim = 30
r4_lim = 40
r1_goodvals = list(range(1, r1_lim+1))
print("r1_goodvals : ", r1_goodvals)
r2_goodvals = list(range(r1_lim+1, r2_lim+1))
print("r1_goodvals : ", r2_goodvals)
r3_goodvals = list(range(r2_lim+1, r3_lim+1))
print("r3_goodvals : ", r3_goodvals)
r4_goodvals = list(range(r3_lim+1, r4_lim+1))
print("r4_goodvals : ", r4_goodvals)
r1, r2, r3, r4 = [], [], [], []
for x in my_list:
if x in r1_goodvals:
r1.append(x)
elif x in r2_goodvals:
r2.append(x)
elif x in r3_goodvals:
r3.append(x)
elif x in r4_goodvals:
r4.append(x)
print("r1 : ", r1)
print("r2 : ", r2)
print("r3 : ", r3)
print("r4 : ", r4)
Output
r1_goodvals : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
r1_goodvals : [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
r3_goodvals : [21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
r4_goodvals : [31, 32, 33, 34, 35, 36, 37, 38, 39, 40]
r1 : [1, 2]
r2 : [11]
r3 : [29]
r4 : [37]
You can achieve linear time complexity with minimal code by initializing the limits as a list in reverse order and making comparisons between the lowest limit at the end of the limit list to the current item in the iteration of the input list. If that limit is lower, append a new sub-list to the result list and pop the limit list for the next lowest limit. Then append the current item to the last sub-list, which will always be the one within the current limit.
my_list = [1, 2, 11, 29, 37]
limits = [40, 30, 20, 10]
r = [[]]
for i in my_list:
if limits[-1] < i:
r.append([])
limits.pop()
r[-1].append(i)
r becomes:
[[1, 2], [11], [29], [37]]
Like #Drecker mentions in the comment, however, that this solution comes with the assumption that my_list is pre-sorted. If the assumption isn't valid then it would require a cost of O(n log n) sort the list first.
Also as #Drecker mentions in the comment, some may find the use of an iterator to be more Pythonic than a list with pop, in which case the limits can be listed in the more intuitive ascending order, but there would need to be an additional variable to keep track of the current lowest limit since calling next on an iterator consumes the item:
my_list = [1, 2, 11, 29, 37]
limits = iter((10, 20, 30, 40))
limit = next(limits)
r = [[]]
for i in my_list:
if limit < i:
r.append([])
limit = next(limits)
r[-1].append(i)
If I needed this to be more readable, and keep the same functionality, I would probably do in this kind of way. Alltough if you would start adding more r_n limits this approach would quickly become cluttered.
my_list = [1, 2, 11, 29, 37]
r1, r2, r3, r4 = [], [], [], []
for x in my_list:
if x in range(1, 11):
r1.append(x)
elif x in range(11, 21):
r2.append(x)
elif x in range(21, 31):
r3.append(x)
elif x in range(31, 41):
r4.append(x)
print("r1 : ", r1)
print("r2 : ", r2)
print("r3 : ", r3)
print("r4 : ", r4)
Using list comprehension in this case would make the runtime O(n * number_of_r_n) since you would then need to loop over the entire my_list array for each 'range'. While this has runtime O(n) with n being the length of the array.
Another solution would involve using binary search:
from bisect import bisect
my_list = [1, 2, 11, 29, 37, 100]
limits = [10, 20, 30, 40]
groups = [[] for _ in range(len(limits) + 1)]
for x in my_list:
groups[bisect(limits, x)].append(x)
print(groups)
[[1, 2], [11], [29], [37], [100]]
This is quite fast solution even for high number of limits O(number_of_elements * log(number_of_limits)) and in certain sense it is as fast as you can get, for arbitrary limits.
However, if you have additional information -- for example you want to group the numbers based on their rounding to tens and the list is pre-sorted, you could use itertools.groupby:
from itertools import groupby
my_list = [1, 2, 11, 29, 37, 100]
groups = {key: list(lst) for key, lst in groupby(my_list, lambda x: round(x, -1))}
# We could use 'x // 10' instead of 'round(x, -1)' to get 'flooring'
# hence, results would be more similar to your original usage
print(groups)
{0: [1, 2], 10: [11], 30: [29], 40: [37], 100: [100]}
You can drop the requirement on the pre-sorting, by replacing the comprehension by a full for-loop and using collections.defaultdict(list).
This solution is just O(number_of_elements) in terms of time complexity

how to print elements of a list whose sum is equal to a given number in python?

Consider a list of numbers from 0 to 50. I want to print all the combinations of elements from that list whose sum is equal to any given number say 41.
One combination is [4,5,7,9,16].I want to print other combinations like this.
You can use combinations(...) function from itertools
import itertools
nums = range(51)
for n in range(1, len(nums) + 1):
for p in itertools.combinations(nums, n):
if sum(p) == 41:
print(*p)
The following code is best used in a subroutine but it can be stand alone code.To change the average just change the number after the while loop. To change the amount of numbers change the amount of zeros in the lists. The random part can also be changed for the range of numbers.
def stats():
x =[0,0,0,0,0,0]
while mean(x) != 14.5:
x =[0,0,0,0,0,0]
for i in range(6):
a = random.randint(9,17)
x[i] = a
return x
The mean in this code is a separate subroutine that looks like this:
def mean(x):
y = sum(x)/6
return y
At the start of the code, you need to import the random library to allow the code to work properly. The only thing is that this code will only output one combination,such as:
stats()
[11, 12, 16, 17, 16, 15]
You could write a recursive generator to efficiently produce only the combinations of positive integers that sum up to the target number:
def sumSets(numbers,target):
if not target : yield []
if not numbers or target <= 0: return
yield from sumSets(numbers[1:],target)
yield from (numbers[:1]+ss for ss in sumSets(numbers[1:],target-numbers[0]))
nums = list(range(1,50,3))
for ss in sumSets(nums,41): print(ss)
[19, 22]
[16, 25]
[13, 28]
[10, 31]
[7, 34]
[4, 37]
[1, 40]
[1, 4, 7, 13, 16]
[1, 4, 7, 10, 19]
Note that, if you're looking for all combinations of numbers from 1 to 50 that sum up to 41, you're going to get a lot of them:
nums = list(range(1,51))
print(sum(1 for _ in sumSets(nums,41))) # 1260

Python lists (getting triplets of identical numbers from the list)

Assume I have a list [12,12,12,12,13,13,13,13,14,14,14,14,14,14,14,15,15,15, etc]
I would like my result to be the following:
[12,12,12,13,13,13,14,14,14,15,15,15]
The number of identical numbers in the first list can vary, but I want to get triplets for each range of the identical numbers. I assume I could iterate through the list starting from the first number (12) and get the first 3 identical numbers (12,12,12), and then compare the numbers and once the number 12 changes to 13, get the next 3 numbers (13,13,13), and so on. But I cannot think of a good approach to do it correctly. Thank you for any suggestions.
I would use itertools.groupby() to isolate the strings of identical numbers, then use a list comprehension to create the triplets:
import itertools
some_list = [12,12,12,12,13,13,13,13,14,14,14,14,14,14,14,15,15,15,]
updated_list = [i for k,_ in itertools.groupby(some_list) for i in [k]*3]
assert updated_list == [12,12,12,13,13,13,14,14,14,15,15,15]
updated_list = []
curr_number = some_list[0]
curr_count = 0
for n in some_list:
if n == curr_number
curr_count += 1
if not (curr_count > 3):
updated_list.append(n)
else:
curr_number = n
curr_count = 1
updated_list.append(n)
Seems as a set approach is a bit faster than itertools. If you need it sorted, less but still faster.
A = [12, 12, 12, 12, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 15, 15, 15]
def with_set(A):
A = set(A)
return list(A) * 3
import itertools
def with_iter(A):
return [i for k,_ in itertools.groupby(A) for i in [k]*3]
import timeit
print("Result with set: ", timeit.timeit(lambda:with_set(A),number = 1000))
print("Result with iter: ", timeit.timeit(lambda:with_iter(A),number = 1000))
Result with set: 0.008438773198370306
Result with iter: 0.018557160246834882
Below line of code is self explanatory:
A = [1, 2, 3, 4, 1, 4]
A = list(set(A)) #removes duplicates
A *= 3 #multiplies the unique list 3 times
print sorted(A) # makes a new sorted list

Split a Python list logarithmically

I am trying to do the following..
I have a list of n elements. I want to split this list into 32 separate lists which contain more and more elements as we go towards the end of the original list. For example from:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
I want to get something like this:
b = [[1],[2,3],[4,5,6,7],[8,9,10,11,12]]
I've done the following for a list containing 1024 elements:
for i in range (0, 32):
c = a[i**2:(i+1)**2]
b.append(c)
But I am stupidly struggling to find a reliable way to do it for other numbers like 256, 512, 2048 or for another number of lists instead of 32.
Use an iterator, a for loop with enumerate and itertools.islice:
import itertools
def logsplit(lst):
iterator = iter(lst)
for n, e in enumerate(iterator):
yield itertools.chain([e], itertools.islice(iterator, n))
Works with any number of elements. Example:
for r in logsplit(range(50)):
print(list(r))
Output:
[0]
[1, 2]
[3, 4, 5]
[6, 7, 8, 9]
... some more ...
[36, 37, 38, 39, 40, 41, 42, 43, 44]
[45, 46, 47, 48, 49]
In fact, this is very similar to this problem, except it's using enumerate to get variable chunk sizes.
This is incredibly messy, but gets the job done. Note that you're going to get some empty bins at the beginning if you're logarithmically slicing the list. Your examples give arithmetic index sequences.
from math import log, exp
def split_list(_list, divs):
n = float(len(_list))
log_n = log(n)
indices = [0] + [int(exp(log_n*i/divs)) for i in range(divs)]
unfiltered = [_list[indices[i]:indices[i+1]] for i in range(divs)] + [_list[indices[i+1]:]]
filtered = [sublist for sublist in unfiltered if sublist]
return [[] for _ in range(divs- len(filtered))] + filtered
print split_list(range(1024), 32)
Edit: After looking at the comments, here's an example that may fit what you want:
def split_list(_list):
copy, output = _list[:], []
length = 1
while copy:
output.append([])
for _ in range(length):
if len(copy) > 0:
output[-1].append(copy.pop(0))
length *= 2
return output
print split_list(range(15))
# [[0], [1, 2], [3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13, 14]]
Note that this code is not efficient, but it can be used as a template for writing a better algorithm.
Something like this should solve the problem.
for i in range (0, int(np.sqrt(2*len(a)))):
c = a[i**2:min( (i+1)**2, len(a) )]
b.append(c)
Not very pythonic but does what you want.
def splitList(a, n, inc):
"""
a list to split
n number of sublist
inc ideal difference between the number of elements in two successive sublists
"""
zr = len(a) # remaining number of elements to split into sublists
st = 0 # starting index in the full list of the next sublist
nr = n # remaining number of sublist to construct
nc = 1 # number of elements in the next sublist
#
b=[]
while (zr/nr >= nc and nr>1):
b.append( a[st:st+nc] )
st, zr, nr, nc = st+nc, zr-nc, nr-1, nc+inc
#
nc = int(zr/nr)
for i in range(nr-1):
b.append( a[st:st+nc] )
st = st+nc
#
b.append( a[st:max(st+nc,len(a))] )
return b
# Example of call
# b = splitList(a, 32, 2)
# to split a into 32 sublist, where each list ideally has 2 more element
# than the previous
There's always this.
>>> def log_list(l):
if len(l) == 0:
return [] #If the list is empty, return an empty list
new_l = [] #Initialise new list
new_l.append([l[0]]) #Add first iteration to new list inside of an array
for i in l[1:]: #For each other iteration,
if len(new_l) == len(new_l[-1]):
new_l.append([i]) #Create new array if previous is full
else:
new_l[-1].append(i) #If previous not full, add to it
return new_l
>>> log_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
[[1], [2, 3], [4, 5, 6], [7, 8, 9, 10]]

Remove items from a list in Python based on previous items in the same list

Say I have a simple list of numbers, e.g.
simple_list = range(100)
I would like to shorten this list such that the gaps between the values are greater than or equal to 5 for example, so it should look like
[0, 5, 10...]
FYI the actual list does not have regular increments but it is ordered
I'm trying to use list comprehension to do it but the below obviously returns an empty list:
simple_list2 = [x for x in simple_list if x-simple_list[max(0,x-1)] >= 5]
I could do it in a loop by appending to a list if the condition is met but I'm wondering specifically if there is a way to do it using list comprehension?
This is not a use case for a comprehension, you have to use a loop as there could be any amount of elements together that have less than five between them, you cannot just check the next or any n amount of numbers unless you knew the data had some very specific format:
simple_list = range(100)
def f(l):
it = iter(l)
i = next(it)
for ele in it:
if abs(ele - i) >= 5:
yield i
i = ele
yield i
simple_list[:] = f(simple_list)
print(simple_list)
[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95]
A better example to use would be:
l = [1, 2, 2, 2, 3, 3, 3, 10, 12, 13, 13, 18, 24]
l[:] = f(l)
print(l)
Which would return:
[1, 10, 18, 24]
If your data is always in ascending order you can remove the abs and just if ele - i >= 5.
If I understand your question correctly, which I'm not sure I do (please clarify), you can do this easily. Assume that a is the list you want to process.
[v for i,v in enumerate(a) if abs(a[i] - a[i - 1]) >= 5]
This gives all elements with which the difference to the previous one (should it be next?) are greater or equal than 5. There are some variations of this, according to what you need. Should the first element not be compared and excluded? The previous implementation compares it with index -1 and includes it if the criteria is met, this one excludes it from the result:
[v for i,v in enumerate(a) if i != 0 and abs(a[i] - a[i - 1]) >= 5]
On the other hand, should it always be included? Then use this:
[v for i,v in enumerate(a) if (i != 0 and abs(a[i] - a[i - 1]) >= 5) or (i == 0)]

Categories

Resources