Python - summing and grouping through a list - python

I have a big list of numbers like so:
a = [133000, 126000, 123000, 108000, 96700, 96500, 93800,
93200, 92100, 90000, 88600, 87000, 84300, 82400, 80700,
79900, 79000, 78800, 76100, 75000, 15300, 15200, 15100,
8660, 8640, 8620, 8530, 2590, 2590, 2580, 2550, 2540, 2540,
2510, 2510, 1290, 1280, 1280, 1280, 1280, 951, 948, 948,
947, 946, 945, 609, 602, 600, 599, 592, 592, 592, 591, 583]
What I want to do is cycle through this list one by one checking if a value is above a certain threshold (for example 40000). If it is above this threshold we put that value in a new list and forget about it. Otherwise we wait until the sum of the values is above the threshold and when it is we put the values in a list and then continue cycling. At the end, if the final values don't sum to the threshold we just add them to the last list.
If I'm not being clear consider the simple example, with the threshold being 15
[20, 10, 9, 8, 8, 7, 6, 2, 1]
The final list should look like this:
[[20], [10, 9], [8, 8], [7, 6, 2, 1]]
I'm really bad at maths and python and I'm at my wits end. I have some basic code I came up with but it doesn't really work:
def sortthislist(list):
list = a
newlist = []
for i in range(len(list)):
while sum(list[i]) >= 40000:
newlist.append(list[i])
return newlist
Any help at all would be greatly appreciated. Sorry for the long post.

The function below will accept your input list and some limit to check and then output the sorted list:
a = [20, 10, 9, 8, 8, 7, 6, 2, 1]
def func(a, lim):
out = []
temp = []
for i in a:
if i > lim:
out.append([i])
else:
temp.append(i)
if sum(temp) > lim:
out.append(temp)
temp = []
return out
print(func(a, 15))
# [[20], [10, 9], [8, 8], [7, 6, 2, 1]]
With Python you can iterate over the list itself, rather than iterating over it's indices, as such you can see that I use for i in a rather than for i in range(len(a)).
Within the function out is the list that you want to return at the end; temp is a temporary list that is populated with numbers until the sum of temp exceeds your lim value, at which point this temp is then appended to out and replaced with an empty list.

def group(L, threshold):
answer = []
start = 0
sofar = L[0]
for i,num in enumerate(L[1:],1):
if sofar >= threshold:
answer.append(L[start:i])
sofar = L[i]
start = i
else:
sofar += L[i]
if i<len(L) and sofar>=threshold:
answer.append(L[i:])
return answer
Output:
In [4]: group([20, 10, 9, 8, 8, 7, 6, 2, 1], 15)
Out[4]: [[20], [10, 9], [8, 8], [7, 6, 2]]

Hope this will help :)
vlist = [20, 10,3,9, 7,6,5,4]
thresold = 15
result = []
tmp = []
for v in vlist:
if v > thresold:
tmp.append(v)
result.append(tmp)
tmp = []
elif sum(tmp) + v > thresold:
tmp.append(v)
result.append(tmp)
tmp = []
else:
tmp.append(v)
if tmp != []:
result.append(tmp)
Here what's the result :
[[20], [10, 3, 9], [7, 6, 5], [4]]

Here's yet another way:
def group_by_sum(a, lim):
out = []
group = None
for i in a:
if group is None:
group = []
out.append(group)
group.append(i)
if sum(group) > lim:
group = None
return out
print(group_by_sum(a, 15))

We already have plenty of working answers, but here are two other approaches.
We can use itertools.groupby to collect such groups, given a stateful accumulator that understands the contents of the group. We end up with a set of (key,group) pairs, so some additional filtering gets us only the groups. Additionally since itertools provides iterators, we convert them to lists for printing.
from itertools import groupby
class Thresholder:
def __init__(self, threshold):
self.threshold=threshold
self.sum=0
self.group=0
def __call__(self, value):
if self.sum>self.threshold:
self.sum=value
self.group+=1
else:
self.sum+=value
return self.group
print [list(g) for k,g in groupby([20, 10, 9, 8, 8, 7, 6, 2, 1], Thresholder(15))]
The operation can also be done as a single reduce call:
def accumulator(result, value):
last=result[-1]
if sum(last)>threshold:
result.append([value])
else:
last.append(value)
return result
threshold=15
print reduce(accumulator, [20, 10, 9, 8, 8, 7, 6, 2, 1], [[]])
This version scales poorly to many values due to the repeated call to sum(), and the global variable for the threshold is rather clumsy. Also, calling it for an empty list will still leave one empty group.
Edit: The question logic demands that values above the threshold get put in their own groups (not sharing with collected smaller values). I did not think of that while writing these versions, but the accepted answer by Ffisegydd handles it. There is no effective difference if the input data is sorted in descending order, as all the sample data appears to be.

Related

adding rows based on values of other rows

I have a list (in a dataframe) that looks like this:
oddnum = [1, 3, 5, 7, 9, 11, 23]
I want to create a new list that looks like this:
newlist = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23]
I want to test if the distance between two numbers is 2 (if oddnum[index+1]-oddnum[index] == 2)
If the distance is 2, then I want to add the number following oddnum[index] and create a new list (oddnum[index] + 1)
If the distance is greater than two, keep the list as is
I keep getting key error because (I think) the list runs out of [index] and [index+1] no longer exists once it reaches the end of the list. How do I do this?
To pass errors, the best method is to use try and except conditions. Here's my code:
oddnum = [1, 3, 5, 7, 9, 11, 23]
res = [] # The new list
for i in range(len(oddnum)):
res.append(oddnum[i]) # Append the first value by default
try: # Tries to run the code
if oddnum[i] + 2 == oddnum[i+1]: res.append(oddnum[i]+1) # Appends if the condition is met
except: pass # Passes on exception (in our case KeyError)
print(res)
oddnum = [1, 3, 5, 7, 9, 11, 23]
new_list = []
for pos, num in enumerate(oddnum):
new_list.append(num)
try:
if num-oddnum[pos+1] in [2, -2]:
new_list.append(num+1)
except:
pass
print(new_list)
Use try: except: to prevent exceptions popping up and ignore it

How find all pairs equal to N in a list

I have a problem with this algorithm- I have to find pairs in list:
[4, 8, 9, 0, 12, 1, 4, 2, 12, 12, 4, 4, 8, 11, 12, 0]
which are equal to 12. The thing is that after making a pair those numbers (elements) can not be used again.
For now, I have code which you can find below. I have tried to delete numbers from the list after matching, but I feel that there is an issue with indexing after this.
It looks very easy but still not working. ;/
class Pairs():
def __init__(self, sum, n, arr ):
self.sum = sum
self.n = n
self.arr = arr
def find_pairs(self):
self.n = len(self.arr)
for i in range(0, self.n):
for j in range(i+1, self.n):
if (self.arr[i] + self.arr[j] == self.sum):
print("[", self.arr[i], ",", " ", self.arr[j], "]", sep = "")
self.arr.pop(i)
self.arr.pop(j-1)
self.n = len(self.arr)
i+=1
def Main():
sum = 12
arr = [4, 8, 9, 0, 12, 1, 4, 2, 12, 12, 4, 4, 8, 11, 12, 0]
n = len(arr)
obj_Pairs = Pairs(sum, n, arr)
obj_Pairs.find_pairs()
if __name__ == "__main__":
Main()
update:
Thank you guys for the fast answers!
I've tried your solutions, and unfortunately, it is still not exactly what I'm looking for. I know that the expected output should look like this: [4, 8], [0, 12], [1, 11], [4, 8], [12, 0]. So in your first solution, there is still an issue with duplicated elements, and in the second one [4, 8] and [12, 0] are missing. Sorry for not giving output at the beginning.
With this problem you need to keep track of what numbers have already been tried. Python has a Counter class that will hold the count of each of the elements present in a given list.
The algorithm I would use is:
create counter of elements in list
iterate list
for each element, check if (target - element) exists in counter and count of that item > 0
decrement count of element and (target - element)
from collections import Counter
class Pairs():
def __init__(self, target, arr):
self.target = target
self.arr = arr
def find_pairs(self):
count_dict = Counter(self.arr)
result = []
for num in self.arr:
if count_dict[num] > 0:
difference = self.target - num
if difference in count_dict and count_dict[difference] > 0:
result.append([num, difference])
count_dict[num] -= 1
count_dict[difference] -= 1
return result
if __name__ == "__main__":
arr = [4, 8, 9, 0, 12, 1, 4, 2, 12, 12, 4, 4, 8, 11, 12, 0]
obj_Pairs = Pairs(12, arr)
result = obj_Pairs.find_pairs()
print(result)
Output:
[[4, 8], [8, 4], [0, 12], [12, 0], [1, 11]]
Demo
Brief
If you have learned about hashmaps and linked lists/deques, you can consider using auxiliary space to map values to their indices.
Pro:
It does make the time complexity linear.
Doesn't modify the input
Cons:
Uses extra space
Uses a different strategy from the original. If this is for a class and you haven't learned about the data structures applied then don't use this.
Code
from collections import deque # two-ended linked list
class Pairs():
def __init__(self, sum, n, arr ):
self.sum = sum
self.n = n
self.arr = arr
def find_pairs(self):
mp = {} # take advantage of a map of values to their indices
res = [] # resultant pair list
for idx, elm in enumerate(self.arr):
if mp.get(elm, None) is None:
mp[elm] = deque() # index list is actually a two-ended linked list
mp[elm].append(idx) # insert this element
comp_elm = self.sum - elm # value that matches
if mp.get(comp_elm, None) is not None and mp[comp_elm]: # there is no match
# match left->right
res.append((comp_elm, elm))
mp[comp_elm].popleft()
mp[elm].pop()
for pair in res: # Display
print("[", pair[0], ",", " ", pair[1], "]", sep = "")
# in case you want to do further processing
return res
def Main():
sum = 12
arr = [4, 8, 9, 0, 12, 1, 4, 2, 12, 12, 4, 4, 8, 11, 12, 0]
n = len(arr)
obj_Pairs = Pairs(sum, n, arr)
obj_Pairs.find_pairs()
if __name__ == "__main__":
Main()
Output
$ python source.py
[4, 8]
[0, 12]
[4, 8]
[1, 11]
[12, 0]
To fix your code - few remarks:
If you iterate over array in for loop you shouldn't be changing it - use while loop if you want to modify the underlying list (you can rewrite this solution to use while loop)
Because you're iterating only once the elements in the outer loop - you only need to ensure you "popped" elements in the inner loop.
So the code:
class Pairs():
def __init__(self, sum, arr ):
self.sum = sum
self.arr = arr
self.n = len(arr)
def find_pairs(self):
j_pop = []
for i in range(0, self.n):
for j in range(i+1, self.n):
if (self.arr[i] + self.arr[j] == self.sum) and (j not in j_pop):
print("[", self.arr[i], ",", " ", self.arr[j], "]", sep = "")
j_pop.append(j)
def Main():
sum = 12
arr = [4, 8, 9, 0, 12, 1, 4, 2, 12, 12, 4, 4, 8, 11, 12, 0]
obj_Pairs = Pairs(sum, arr)
obj_Pairs.find_pairs()
if __name__ == "__main__":
Main()

Trying to For Loop a list and append to new list, but it returns after the first item

I'm trying to solve this task:
Loop through list A and create a new list with only items form list A that's between 0-5.
What am I doing wrong here?
a = [100, 1, 10, 2, 3, 5, 8, 13, 21, 34, 55, 98]
def new_list(x):
for item in range(len(x)):
new = []
if x[item] < 5 and x[item] > 0:
(new.append(item))
return new
print(new_list(a))
I'm just getting [1] as an answer.
You return command is inside the loop so as soon as it goes through the first case it returns the value exiting the function.
Here is an example of what your code should look like
a = [100, 1, 10, 2, 3, 5, 8, 13, 21, 34, 55, 98]
def new_list(x):
new = []
for item in range(len(x)):
if x[item] < 5 and x[item] > 0:
new.append(x[item])
return new
print new_list(a)
You can achieve the same result by using a list comprehension
def new_list(x):
return [item for item in x if 0 < item < 5]
You're resetting new to a brand new empty list each time through the loop, which discards any work done in prior iterations.
Also, in the if statement you're calling return, which exits your function immediately, so you never process the remainder of the list.
You probably wanted something like this instead:
def new_list(x):
new = []
for item in x:
if 0 < item < 5:
new.append(item)
return new
Just my recommendation. You could use filter() here instead of a making your own loop.
a = [100, 1, 10, 2, 3, 5, 8, 13, 21, 34, 55, 98]
def new_list(x, low=0, high=5):
return filter(lambda f: f in range(low, high), x)
Filter returns a new list with elements passing a given predicate and it's equivalent to
[item for item in iterable if function(item)]
as per the documentation.
Therefore
print new_list(a)
Results in:
[1, 2, 3, 5]
This way you can check any values such as:
print new_list(a, 5, 10)
[5, 8]
Three errors:
you are reinstantiating new with each iteration of the for loop.
you should return new when the list is finished building, at the end of the function.
You are appending item, but this is your index. In your code, you would have to append x[item].
Code with corrections:
a = [100, 1, 10, 2, 3, 5, 8, 13, 21, 34, 55, 98]
def new_list(x):
new = []
for item in range(len(x)):
if x[item] < 5 and x[item] > 0:
new.append(x[item])
return new
print(new_list(a))
Output:
[1, 2, 3]
Suggestions:
Don't index, loop over the items of x directly (for item in x: ...).
Use chained comparisons, e.g. 0 < item < 5.
Consider a list comprehension.
Code with all three suggestions:
>>> [item for item in a if 0 < item < 5]
>>> [1, 2, 3]
Just a suggestion!
The empty list is inside the For Loop meaning that a new empty list is created every iteration
The 'return' is also inside the for loop which is less than ideal, you want it to be returned after the loop has been exhausted and all suitable elements have been appended.
a = [100, 1, 10, 2, 3, 5, 8, 13, 21, 34, 55, 98]
def new_list(x):
new = []
for item in range(len(x)):
if x[item] < 5 and x[item] > 0:
new.append(item)
return new
print(new_list(a))

Python: split list into indices based on consecutive identical values

If you could advice me how to write the script to split list by number of values I mean:
my_list =[11,11,11,11,12,12,15,15,15,15,15,15,20,20,20]
And there are 11-4,12-2,15-6,20-3 items.
So in next list for exsample range(0:100)
I have to split on 4,2,6,3 parts
So I counted same values and function for split list, but it doen't work with list:
div=Counter(my_list).values() ##counts same values in the list
def chunk(it, size):
it = iter(it)
return iter(lambda: tuple(islice(it, size)), ())
What do I need:
Out: ([0,1,2,3],[4,5],[6,7,8,9,10,11], etc...]
You can use enumerate, itertools.groupby, and operator.itemgetter:
In [45]: import itertools
In [46]: import operator
In [47]: [[e[0] for e in d[1]] for d in itertools.groupby(enumerate(my_list), key=operator.itemgetter(1))]
Out[47]: [[0, 1, 2, 3], [4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14]]
What this does is as follows:
First it enumerates the items.
It groups them, using the second item in each enumeration tuple (the original value).
In the resulting list per group, it uses the first item in each tuple (the enumeration)
Solution in Python 3 , If you are only using counter :
from collections import Counter
my_list =[11,11,11,11,12,12,15,15,15,15,15,15,20,20,20]
count = Counter(my_list)
div= list(count.keys()) # take only keys
div.sort()
l = []
num = 0
for i in div:
t = []
for j in range(count[i]): # loop number of times it occurs in the list
t.append(num)
num+=1
l.append(t)
print(l)
Output:
[[0, 1, 2, 3], [4, 5], [6, 7, 8, 9, 10, 11], [12, 13, 14]]
Alternate Solution using set:
my_list =[11,11,11,11,12,12,15,15,15,15,15,15,20,20,20]
val = set(my_list) # filter only unique elements
ans = []
num = 0
for i in val:
temp = []
for j in range(my_list.count(i)): # loop till number of occurrence of each unique element
temp.append(num)
num+=1
ans.append(temp)
print(ans)
EDIT:
As per required changes made to get desired output as mention in comments by #Protoss Reed
my_list =[11,11,11,11,12,12,15,15,15,15,15,15,20,20,20]
val = list(set(my_list)) # filter only unique elements
val.sort() # because set is not sorted by default
ans = []
index = 0
l2 = [54,21,12,45,78,41,235,7,10,4,1,1,897,5,79]
for i in val:
temp = []
for j in range(my_list.count(i)): # loop till number of occurrence of each unique element
temp.append(l2[index])
index+=1
ans.append(temp)
print(ans)
Output:
[[54, 21, 12, 45], [78, 41], [235, 7, 10, 4, 1, 1], [897, 5, 79]]
Here I have to convert set into list because set is not sorted and I think remaining is self explanatory.
Another Solution if input is not always Sorted (using OrderedDict):
from collections import OrderedDict
v = OrderedDict({})
my_list=[12,12,11,11,11,11,20,20,20,15,15,15,15,15,15]
l2 = [54,21,12,45,78,41,235,7,10,4,1,1,897,5,79]
for i in my_list: # maintain count in dict
if i in v:
v[i]+=1
else:
v[i]=1
ans =[]
index = 0
for key,values in v.items():
temp = []
for j in range(values):
temp.append(l2[index])
index+=1
ans.append(temp)
print(ans)
Output:
[[54, 21], [12, 45, 78, 41], [235, 7, 10], [4, 1, 1, 897, 5, 79]]
Here I use OrderedDict to maintain order of input sequence which is random(unpredictable) in case of set.
Although I prefer #Ami Tavory's solution which is more pythonic.
[Extra work: If anybody can convert this solution into list comprehension it will be awesome because i tried but can not convert it to list comprehension and if you succeed please post it in comments it will help me to understand]

Split a Python list logarithmically

I am trying to do the following..
I have a list of n elements. I want to split this list into 32 separate lists which contain more and more elements as we go towards the end of the original list. For example from:
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
I want to get something like this:
b = [[1],[2,3],[4,5,6,7],[8,9,10,11,12]]
I've done the following for a list containing 1024 elements:
for i in range (0, 32):
c = a[i**2:(i+1)**2]
b.append(c)
But I am stupidly struggling to find a reliable way to do it for other numbers like 256, 512, 2048 or for another number of lists instead of 32.
Use an iterator, a for loop with enumerate and itertools.islice:
import itertools
def logsplit(lst):
iterator = iter(lst)
for n, e in enumerate(iterator):
yield itertools.chain([e], itertools.islice(iterator, n))
Works with any number of elements. Example:
for r in logsplit(range(50)):
print(list(r))
Output:
[0]
[1, 2]
[3, 4, 5]
[6, 7, 8, 9]
... some more ...
[36, 37, 38, 39, 40, 41, 42, 43, 44]
[45, 46, 47, 48, 49]
In fact, this is very similar to this problem, except it's using enumerate to get variable chunk sizes.
This is incredibly messy, but gets the job done. Note that you're going to get some empty bins at the beginning if you're logarithmically slicing the list. Your examples give arithmetic index sequences.
from math import log, exp
def split_list(_list, divs):
n = float(len(_list))
log_n = log(n)
indices = [0] + [int(exp(log_n*i/divs)) for i in range(divs)]
unfiltered = [_list[indices[i]:indices[i+1]] for i in range(divs)] + [_list[indices[i+1]:]]
filtered = [sublist for sublist in unfiltered if sublist]
return [[] for _ in range(divs- len(filtered))] + filtered
print split_list(range(1024), 32)
Edit: After looking at the comments, here's an example that may fit what you want:
def split_list(_list):
copy, output = _list[:], []
length = 1
while copy:
output.append([])
for _ in range(length):
if len(copy) > 0:
output[-1].append(copy.pop(0))
length *= 2
return output
print split_list(range(15))
# [[0], [1, 2], [3, 4, 5, 6], [7, 8, 9, 10, 11, 12, 13, 14]]
Note that this code is not efficient, but it can be used as a template for writing a better algorithm.
Something like this should solve the problem.
for i in range (0, int(np.sqrt(2*len(a)))):
c = a[i**2:min( (i+1)**2, len(a) )]
b.append(c)
Not very pythonic but does what you want.
def splitList(a, n, inc):
"""
a list to split
n number of sublist
inc ideal difference between the number of elements in two successive sublists
"""
zr = len(a) # remaining number of elements to split into sublists
st = 0 # starting index in the full list of the next sublist
nr = n # remaining number of sublist to construct
nc = 1 # number of elements in the next sublist
#
b=[]
while (zr/nr >= nc and nr>1):
b.append( a[st:st+nc] )
st, zr, nr, nc = st+nc, zr-nc, nr-1, nc+inc
#
nc = int(zr/nr)
for i in range(nr-1):
b.append( a[st:st+nc] )
st = st+nc
#
b.append( a[st:max(st+nc,len(a))] )
return b
# Example of call
# b = splitList(a, 32, 2)
# to split a into 32 sublist, where each list ideally has 2 more element
# than the previous
There's always this.
>>> def log_list(l):
if len(l) == 0:
return [] #If the list is empty, return an empty list
new_l = [] #Initialise new list
new_l.append([l[0]]) #Add first iteration to new list inside of an array
for i in l[1:]: #For each other iteration,
if len(new_l) == len(new_l[-1]):
new_l.append([i]) #Create new array if previous is full
else:
new_l[-1].append(i) #If previous not full, add to it
return new_l
>>> log_list([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
[[1], [2, 3], [4, 5, 6], [7, 8, 9, 10]]

Categories

Resources