I'm attempting to equalize a list of elements. In short, I have an array of x length with each element y being in range from 0 - 100000. To accomplish what I'm trying to do, all elements in the list must be relatively equal to each other (as far as is possible).
Here's some sample output of my current function:
>>> equal([1, 4, 1])
[2, 2, 2]
>>> equal([2, 4, 5, 9])
[5, 5, 5, 5]
>>> equal([2, 2])
[2, 2]
>>> equal([1, 2, 3, 4, 5, 6, 7, 8, 9])
[5, 5, 5, 5, 5, 5, 5, 5, 5]
>>> equal([2, 4, 6, 8, 10, 20, 30, 40])
[15, 15, 15, 15, 15, 15, 15, 15]
>>> equal([343, 452, 948, 283, 394, 238, 283, 984, 236, 847, 203])
[474, 474, 474, 474, 474, 474, 474, 474, 473, 473, 473]
And the associated code:
def equal(l):
# The number of times we distribute to the new loop
loops = reduce(lambda x, y: x + y, l)
# Initializes a new list as such: [0, 0, 0 .. len(l)]
nl = [0 for x in range(len(l))]
# Counts how far we've iterated into our new list
x = 0
for i in range(loops):
# Add 1 to this element and move on to the next element in the list
nl[x] += 1
x += 1
# Ensure we don't traverse past the end of the list
if x > len(nl) - 1:
x = 0
return nl
The problem right now is that once you get to extremely large lists (100+ elements with large values) it gets really slow. Is there some idiom I'm missing here that would make this more efficient?
All of your examples have output lists with all equal integers [*], which of course is not possible in the general case.
[*] except one, where it seems you want earlier items to be bigger, later ones smaller, independently of what original items where big or small; see later if that is in fact your spec.
Assuming (you never actually bother to say that, you know!-) that "all items integers" is a constraint, I'd start with
minel = sum(l) // len(l)
which is the minimum value to assign to each output element. This leaves a "shortage" of
numex = sum(l) - minel * len(l)
items that must be set to minel + 1 to keep the sum equal (another constraint you never clearly express...:-).
Which ones? Presumably those that were largest in the first place.
(Added: or maybe not, according to your last example, where it seems the items that need to be made larger are just the earliest ones. If that's the case, then obviously:
[minel+1] * numex + [minel] * (len(l) - numex)
will be fine. The rest of the answer assumes you may want some connection of input items to corresponding output ones, a harder problem).
So pick those, e.g as bigs = set(heapq.nlargest(l, numex)), and
[minel + (x in bigs) for x in l]
is going to be a pretty close approximation to the "equalized" list you seek (exploiting the fact that a bool is worth 0 or 1 in arithmetic:-).
The one little glitch is for a list with duplicates -- more than numex items might satisfy the x in bigs test!
In this case you presumably want to randomly pick which numex items get to be incremented by 1 (the others staying at minel). But guessing exactly what you want for such a case in your totally under-specified desires is stretching guesses a point too much, so I'll let it rest here for now -- you can clarify your specs, ideally with examples where this formula doesn't do what you exactly want and what you would want instead, by editing your question appropriately.
Here you go. Simply find the whole number average and remainder and return the appropriate number of average+1 and average elements to give the same sum. The code below leverages the builtin divmod to give the whole quotient and remainder, and the fact that multiplying a list such as [x]*5 returns [x,x,x,x,x]:
def equal(l):
q,r = divmod(sum(l),len(l))
return [q+1]*r + [q]*(len(l)-r)
print(equal([2,1]))
print(equal([1, 4, 1]))
print(equal([2, 4, 5, 9]))
print(equal([2, 2]))
print(equal([1, 2, 3, 4, 5, 6, 7, 8, 9]))
print(equal([2, 4, 6, 8, 10, 20, 30, 40]))
print(equal([343, 452, 948, 283, 394, 238, 283, 984, 236, 847, 203]))
Output:
[2, 1]
[2, 2, 2]
[5, 5, 5, 5]
[2, 2]
[5, 5, 5, 5, 5, 5, 5, 5, 5]
[15, 15, 15, 15, 15, 15, 15, 15]
[474, 474, 474, 474, 474, 474, 474, 474, 473, 473, 473]
Related
I'm trying to extract lists/sublists from one bigger integer-list with Python2.7 by using start- and end-patterns. I would like to do it with a function, but I cant find a library, algorithm or a regular expression for solving this problem.
def myFunctionForSublists(data, startSequence, endSequence):
# ... todo
data = [99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99]
startSequence = [1,2,3]
endSequence = [4,5,6]
sublists = myFunctionForSublists(data, startSequence, endSequence)
print sublists[0] # [1, 2, 3, 99, 99, 99, 4, 5, 6]
print sublists[1] # [1, 2, 3, 99, 4, 5, 6]
Any ideas how I can realize it?
Here's a more general solution that doesn't require the lists being sliceable, so you can use it on other iterables, like generators.
We keep a deque the size of the start sequence until we come across it. Then we add those values to a list, and keep iterating over the sequence. As we do, we keep a deque the size of the end sequence, until we see it, also adding the elements to the list we're keeping. If we come across the end sequence, we yield that list and set the deque up to scan for the next start sequence.
from collections import deque
def gen(l, start, stop):
start_deque = deque(start)
end_deque = deque(stop)
curr_deque = deque(maxlen=len(start))
it = iter(l)
for c in it:
curr_deque.append(c)
if curr_deque == start_deque:
potential = list(curr_deque)
curr_deque = deque(maxlen=len(stop))
for c in it:
potential.append(c)
curr_deque.append(c)
if curr_deque == end_deque:
yield potential
curr_deque = deque(maxlen=len(start))
break
print(list(gen([99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99], [1,2,3], [4,5,6])))
# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]
Here is an itertools approach that uses a collections.deque of limited length to keep a buffer of the last elements of appropriate size. It assumes that your sublists don't overlap and that your start and end sequences don't overlap either.
It works for any sequence for data, start, end (even generators).
from collections import deque
from itertools import islice
def sublists(data, start, end):
it = iter(data)
start, end = deque(start), deque(end)
while True:
x = deque(islice(it, len(start)), len(start))
# move forward until start is found
while x != start:
x.append(next(it))
out = list(x)
x = deque(islice(it, len(end)), len(end))
# move forward until end is found, storing the sublist
while x != end:
out.append(x[0])
x.append(next(it))
out.extend(end)
yield out
data = [99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99]
startSequence = [1,2,3]
endSequence = [4,5,6]
print(list(sublists(data, startSequence, endSequence)))
# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]
If you really want to use regular expressions, you can change the lists of integers to strings and use the regex that way
import re
def find_span(numbers, start, end):
# Create strings from the start and end lists.
start_pattern = ''.join(map(chr, start))
end_pattern = ''.join(map(chr, end))
# convert the list to search into one string.
s = ''.join(map(chr, numbers))
# Create a pattern that starts and ends with the correct sublists,
# and match all sublists. Then convert each match back to a list of
# integers
# The '?' is to make the regex non-greedy
return [
[ord(c) for c in match]
for match in re.findall(rf'{start_pattern}.*?{end_pattern}', s, re.DOTALL)
]
>>> find_span(search, start, end) # Using OP's sample values
[[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]
Note this is not really efficient, since it requires dynamically building a regex each time it's called. And you need to use re.DOTALL because otherwise it won't match anything containing 10 (which is the ascii encoding of newline). However, if you really want to use regexes, this would work.
Just iterate all in indices in the list and compare the slice to the startSequence or the endSequence, respectively. Assuming that the sublists are not supposed to overlap, you can use the same iterator for both loops.
def myFunctionForSublists(data, startSequence, endSequence):
positions = iter(range(len(data)))
for start in positions:
if data[start:start+len(startSequence)] == startSequence:
for end in positions:
if data[end:end+len(endSequence)] == endSequence:
yield data[start:end+len(endSequence)]
break
This way, the start loop will continue where the end loop left. If they can overlap, use two separate iterators for the loop, i.e. for start in range(len(data)): and for end in range(start+1, len(data)):
Use below method:
def find_sub_list(sl,l):
sll=len(sl)
for ind in (i for i,e in enumerate(l) if e==sl[0]):
if l[ind:ind+sll]==sl:
return ind,ind+sll-1
find_sub_list([1,2,3], data)
>>>(2, 4)
find_sub_list([4,5,6], data)
>>>(8, 10)
data[2:10+1]
>>>[1, 2, 3, 99, 99, 99, 4, 5, 6]
You can follow similar approach for sublists[1]
Courtesy : find-starting-and-ending-indices-of-sublist-in-list
Here is a O(n) solution that finds matches by keeping track of matching patterns of startSequence and endSequence
def myFunctionForSublists(data, startSequence, endSequence):
start,end = tuple(startSequence), tuple(endSequence)
l1, l2 = len(start), len(end)
s = -1
result = []
for i,v in enumerate(zip(*[data[i:] for i in range(0,l1)])):
if v == start:
s = i
if v == end and s != -1:
result.append(data[s:i+l2])
s = -1
return result
print (myFunctionForSublists(data, startSequence, endSequence))
# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]
Given a list lst = [121, 4, 37, 441, 7, 16] , I would like to remove from it all numbers that repeat themselves, resulting in a new string that would be lst = [37,7] (the prime numbers of the original string).
So far I'd only managed to put out this code:
def func(lst,x):
y = []
for i in lst:
for x in range (1, i):
if (i % x) == 0 :
y.append(i)
return y
print(func(lst,3))
Instead of getting lst = [37,7], I'm getting this weird looking list:
[121, 121, 4, 4, 37, 441, 441, 441, 441, 441, 441, 441, 441, 7, 16, 16, 16, 16]
Is there any way I can make this work ?
As this feels like a homework question, I won't give working code, but a strategy. You want to ensure that only the numbers of the original list remain, or filter out the numbers that are not prime.
Slightly more formally, "for each number in the list, determine if it's prime, and if so, include it in a new list".
Your code is 90% there, but your kernel (the primality test) is not correct. The key to testing primality is to ensure that each possible integer divisor does not evenly divide the number in question.
For example, if testing 6, the list of possible "0 remainder" integer divisors is
[1, 2, 3, 4, 5, 6]
The first and last numbers (1 and 6) don't mean anything as far as primality (6/1 is 6, and 6/6 is 1). So, your list of possible divisors to test is now
[2, 3, 4, 5]
From here, I think an insight you're missing in your code is that for a number to be prime, all of its possible divisors must not produce an even number (i.e., remainder of 0).
from math import sqrt
def func(n)
list1=[]
list2=[]
for z in (all(i % x for x in range(2,int(sqrt(i)+1))) for i in n):
list1.append(z)
for i,j in enumerate(list1):
if j == True:
list2.append(n[i])
return list2
I was tasked with creating a program that finds all positive integers within a defined range. Currently I'm in school so I am limited to only using loops and functions to make it work (Also note that I have just started learning to use functions.
I've uploaded the picture of it.
my problem lies when I run it instead of printing only the positive values it also prints out 10 000 blank lines. I want that to not be the case.
I think it has something to do with the second else statement.
def getDivisors(number):
divList = []
for x in range(1,number):
if number % x == 0:
divList.append(x)
return divList
def isPerfectNumber(divList,number):
if sum(divList) == number:
return True
else:
return False
for x in range(2,10001):
divList = getDivisors(x)
if isPerfectNumber(divList,x):
print(x,divList)
Looks like that code works. I also had it print out the divison list so you can check verify the numbers for yourself.
Here is my output:
6 [1, 2, 3]
28 [1, 2, 4, 7, 14]
496 [1, 2, 4, 8, 16, 31, 62, 124, 248]
8128 [1, 2, 4, 8, 16, 32, 64, 127, 254, 508, 1016, 2032, 4064]
I'm traversing a two-dimensional list (my representation of a matrix) in an unusual order: counterclockwise around the outside starting with the top-left element.
I need to do this more than once, but each time I do it, I'd like to do something different with the values I encounter. The first time, I want to note down the values so that I can modify them. (I can't modify them in place.) The second time, I want to traverse the outside of the matrix and modify the values of the matrix as I go, perhaps getting my new values from some generator.
Is there a way I can abstract this traversal to a function and still achieve my goals? I was thinking that this traverse-edge function could take a function and a matrix and apply the function to each element on the edge of the matrix. However, the problems with this are two-fold. If I do this, I don't think I can modify the matrix that's given as an argument, and I can't yield the values one by one because yield isn't a function.
Edit: I want to rotate a matrix counterclockwise (not 90 degrees) where one rotation moves, for example, the top-left element down one spot. To accomplish this, I'm rotating one "level" (or shell) of the matrix at a time. So if I'm rotating the outermost level, I want to traverse it once to build a list which I can shift to the left, then I want to traverse the outermost level again to assign it those new values which I calculated.
Just create 4 loops, one for each side of the array, that counts through the values of the index that changes for that side. For example, the first side, whose x index is always 0, could vary the y from 0 to n-2 (from the top-left corner to just shy of the bottom-left); repeat for the other sides.
I think there are two approaches you can take to solving your problem.
The first option is to create a function that returns an iterable of indexes into the matrix. Then you'd write your various passes over the matrix with for loops:
for i, j in matrix_border_index_gen(len(matrix), len(matrix[0])): # pass in dimensions
# do something with matrix[i][j]
The other option is to write a function that works more like map that applies a given function to each appropriate value of the matrix in turn. If you sometimes need to replace the current values with new ones, I'd suggest doing that all the time (the times when you don't want to replace the value, you can just have your function return the previous value):
def func(value):
# do stuff with value from matrix
return new_value # new_value can be the same value, if you don't want to change it
matrix_border_map(func, matrix) # replace each value on border of matrix with func(value)
I have added a few lines of python 3 code here. It has the mirror function and a spiral iterator (not sure, if that's what you meant). No doc strings (sorry). It is readable though. Change print statement for python 2.
EDIT : FIXED A BUG
class Matrix():
def __init__(self, rows=5, cols=5):
self.cells = [[None for c in range(cols)] for r in range(rows)]
def transpose(self):
self.cells = list(map(list, zip(*self.cells)))
def mirror(self):
for row in self.cells:
row.reverse()
def invert(self):
self.cells.reverse()
def rotate(self, clockwise=True):
self.transpose()
self.mirror() if clockwise else self.invert()
def iter_spiral(self, grid=None):
grid = grid or self.cells
next_grid = []
for cell in reversed(grid[0]):
yield cell
for row in grid[1:-1]:
yield row[0]
next_grid.append(row[1:-1])
if len(grid) > 1:
for cell in grid[-1]:
yield cell
for row in reversed(grid[1:-1]):
yield row[-1]
if next_grid:
for cell in self.iter_spiral(grid=next_grid):
yield cell
def show(self):
for row in self.cells:
print(row)
def test_matrix():
m = Matrix()
m.cells = [[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]]
print("We expect the spiral to be:", "4, 3, 2, 1, 5, 9, 13, 14, 15, 16, 12, 8, 7, 6, 10, 11", sep='\n')
print("What the iterator yields:")
for cell in m.iter_spiral():
print(cell, end=', ')
print("\nThe matrix looks like this:")
m.show()
print("Now this is how it looks rotated 90 deg clockwise")
m.rotate()
m.show()
print("Now we'll rotate it back")
m.rotate(clockwise=False)
m.show()
print("Now we'll transpose it")
m.transpose()
m.show()
print("Inverting the above")
m.invert()
m.show()
print("Mirroring the above")
m.mirror()
m.show()
if __name__ == '__main__':
test_matrix()
This is the output:
We expect the spiral to be:
4, 3, 2, 1, 5, 9, 13, 14, 15, 16, 12, 8, 7, 6, 10, 11
What the iterator yields:
4, 3, 2, 1, 5, 9, 13, 14, 15, 16, 12, 8, 7, 6, 10, 11,
The matrix looks like this:
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14, 15, 16]
Now this is how it looks rotated 90 deg clockwise
[13, 9, 5, 1]
[14, 10, 6, 2]
[15, 11, 7, 3]
[16, 12, 8, 4]
Now we'll rotate it back
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14, 15, 16]
Now we'll transpose it
[1, 5, 9, 13]
[2, 6, 10, 14]
[3, 7, 11, 15]
[4, 8, 12, 16]
Inverting the above
[4, 8, 12, 16]
[3, 7, 11, 15]
[2, 6, 10, 14]
[1, 5, 9, 13]
Mirroring the above
[16, 12, 8, 4]
[15, 11, 7, 3]
[14, 10, 6, 2]
[13, 9, 5, 1]
I would go with generator functions. They can be used to create iterators over which we can iterate. An Example of a generator function -
def genfunc():
i = 0
while i < 10:
yield i
i = i + 1
>>> for x in genfunc():
... print(x)
...
0
1
2
3
4
5
6
7
8
9
When calling the generator function, it returns a generator object -
>>> genfunc()
<generator object genfunc at 0x00553AD0>
It does not start going over the function at that point. When you start iterating over the generator object, calling for its first element, it starts going over the function, untill it reaches the first yield statement, and at that point it returns the value (in above case, it returns value of i) . And it also saves the state of the function at that point (that is it saves at what point the execution was when the value was yielded, what were the values for the variables in the local namespace, etc).
Then when it tries to get the next value, again execution starts from where it stopped last time, till it again yield another value. And this continues on.
I have a big list of numbers like so:
a = [133000, 126000, 123000, 108000, 96700, 96500, 93800,
93200, 92100, 90000, 88600, 87000, 84300, 82400, 80700,
79900, 79000, 78800, 76100, 75000, 15300, 15200, 15100,
8660, 8640, 8620, 8530, 2590, 2590, 2580, 2550, 2540, 2540,
2510, 2510, 1290, 1280, 1280, 1280, 1280, 951, 948, 948,
947, 946, 945, 609, 602, 600, 599, 592, 592, 592, 591, 583]
What I want to do is cycle through this list one by one checking if a value is above a certain threshold (for example 40000). If it is above this threshold we put that value in a new list and forget about it. Otherwise we wait until the sum of the values is above the threshold and when it is we put the values in a list and then continue cycling. At the end, if the final values don't sum to the threshold we just add them to the last list.
If I'm not being clear consider the simple example, with the threshold being 15
[20, 10, 9, 8, 8, 7, 6, 2, 1]
The final list should look like this:
[[20], [10, 9], [8, 8], [7, 6, 2, 1]]
I'm really bad at maths and python and I'm at my wits end. I have some basic code I came up with but it doesn't really work:
def sortthislist(list):
list = a
newlist = []
for i in range(len(list)):
while sum(list[i]) >= 40000:
newlist.append(list[i])
return newlist
Any help at all would be greatly appreciated. Sorry for the long post.
The function below will accept your input list and some limit to check and then output the sorted list:
a = [20, 10, 9, 8, 8, 7, 6, 2, 1]
def func(a, lim):
out = []
temp = []
for i in a:
if i > lim:
out.append([i])
else:
temp.append(i)
if sum(temp) > lim:
out.append(temp)
temp = []
return out
print(func(a, 15))
# [[20], [10, 9], [8, 8], [7, 6, 2, 1]]
With Python you can iterate over the list itself, rather than iterating over it's indices, as such you can see that I use for i in a rather than for i in range(len(a)).
Within the function out is the list that you want to return at the end; temp is a temporary list that is populated with numbers until the sum of temp exceeds your lim value, at which point this temp is then appended to out and replaced with an empty list.
def group(L, threshold):
answer = []
start = 0
sofar = L[0]
for i,num in enumerate(L[1:],1):
if sofar >= threshold:
answer.append(L[start:i])
sofar = L[i]
start = i
else:
sofar += L[i]
if i<len(L) and sofar>=threshold:
answer.append(L[i:])
return answer
Output:
In [4]: group([20, 10, 9, 8, 8, 7, 6, 2, 1], 15)
Out[4]: [[20], [10, 9], [8, 8], [7, 6, 2]]
Hope this will help :)
vlist = [20, 10,3,9, 7,6,5,4]
thresold = 15
result = []
tmp = []
for v in vlist:
if v > thresold:
tmp.append(v)
result.append(tmp)
tmp = []
elif sum(tmp) + v > thresold:
tmp.append(v)
result.append(tmp)
tmp = []
else:
tmp.append(v)
if tmp != []:
result.append(tmp)
Here what's the result :
[[20], [10, 3, 9], [7, 6, 5], [4]]
Here's yet another way:
def group_by_sum(a, lim):
out = []
group = None
for i in a:
if group is None:
group = []
out.append(group)
group.append(i)
if sum(group) > lim:
group = None
return out
print(group_by_sum(a, 15))
We already have plenty of working answers, but here are two other approaches.
We can use itertools.groupby to collect such groups, given a stateful accumulator that understands the contents of the group. We end up with a set of (key,group) pairs, so some additional filtering gets us only the groups. Additionally since itertools provides iterators, we convert them to lists for printing.
from itertools import groupby
class Thresholder:
def __init__(self, threshold):
self.threshold=threshold
self.sum=0
self.group=0
def __call__(self, value):
if self.sum>self.threshold:
self.sum=value
self.group+=1
else:
self.sum+=value
return self.group
print [list(g) for k,g in groupby([20, 10, 9, 8, 8, 7, 6, 2, 1], Thresholder(15))]
The operation can also be done as a single reduce call:
def accumulator(result, value):
last=result[-1]
if sum(last)>threshold:
result.append([value])
else:
last.append(value)
return result
threshold=15
print reduce(accumulator, [20, 10, 9, 8, 8, 7, 6, 2, 1], [[]])
This version scales poorly to many values due to the repeated call to sum(), and the global variable for the threshold is rather clumsy. Also, calling it for an empty list will still leave one empty group.
Edit: The question logic demands that values above the threshold get put in their own groups (not sharing with collected smaller values). I did not think of that while writing these versions, but the accepted answer by Ffisegydd handles it. There is no effective difference if the input data is sorted in descending order, as all the sample data appears to be.