Knapsack problem(optimized doesn't work correctly) - python

I am working on the Python code in order to solve Knapsack problem.
Here is my code:
import time
start_time = time.time()
#reading the data:
values = []
weights = []
test = []
with open("test.txt") as file:
W, size = map(int, next(file).strip().split())
for line in file:
value, weight = map(int, line.strip().split())
values.append(int(value))
weights.append(int(weight))
weights = [0] + weights
values = [0] + values
#Knapsack Algorithm:
hash_table = {}
for x in range(0,W +1):
hash_table[(0,x)] = 0
for i in range(1,size + 1):
for x in range(0,W +1):
if weights[i] > x:
hash_table[(i,x)] = hash_table[i - 1,x]
else:
hash_table[(i,x)] = max(hash_table[i - 1,x],hash_table[i - 1,x - weights[i]] + values[i])
print("--- %s seconds ---" % (time.time() - start_time))
This code works correctly, but on a big files my programm crashes due to RAM issues.
So I have decided to change the followng part:
for i in range(1,size + 1):
for x in range(0,W +1):
if weights[i] > x:
hash_table[(1,x)] = hash_table[0,x]
#hash_table[(0,x)] = hash_table[1,x]
else:
hash_table[(1,x)] = max(hash_table[0,x],hash_table[0,x - weights[i]] + values[i])
hash_table[(0,x)] = hash_table[(1,x)]
As you can see instead of using n rows i am using only two(copying the second row into the first one in order to recreate the following line of code hash_table[(i,x)] = hash_table[i - 1,x]), which should solve issues with RAM.
But unfortunately it gives me a wrong result.
I have used the following test case:
190 6
50 56
50 59
64 80
46 64
50 75
5 17
Should get a total value of 150 and total weight of 190 using 3 items:
item with value 50 and weight 75,
item with value 50 and weight 59,
item with value 50 and weight 56,
More test cases: https://people.sc.fsu.edu/~jburkardt/datasets/knapsack_01/knapsack_01.html

The problem here is that you need to reset all the values in the iteration over i, but also need the x index, so to do so, you could use another loop:
for i in range(1,size + 1):
for x in range(0,W +1):
if weights[i] > x:
hash_table[(1,x)] = hash_table[0,x]
else:
hash_table[(1,x)] = max(hash_table[0,x],hash_table[0,x - weights[i]] + values[i])
for x in range(0, W+1): # Make sure to reset after working on item i
hash_table[(0,x)] = hash_table[(1,x)]

Related

any tip to improve performance when using nested loops with python

so, I had this exercise where I would receive a list of integers and had to find how many sum pairs were multiple to 60
example:
input: list01 = [10,90,50,40,30]
result = 2
explanation: 10 + 50, 90 + 30
example2:
input: list02 = [60,60,60]
result = 3
explanation: list02[0] + list02[1], list02[0] + list02[2], list02[1] + list02[2]
seems pretty easy, so here is my code:
def getPairCount(numbers):
total = 0
cont = 0
for n in numbers:
cont+=1
for n2 in numbers[cont:]:
if (n + n2) % 60 == 0:
total += 1
return total
it's working, however, for a big input with over 100k+ numbers is taking too long to run, and I need to be able to run in under 8 seconds, any tips on how to solve this issue??
being with another lib that i'm unaware or being able to solve this without a nested loop
Here's a simple solution that should be extremely fast (it runs in O(n) time). It makes use of the following observation: We only care about each value mod 60. E.g. 23 and 143 are effectively the same.
So rather than making an O(n**2) nested pass over the list, we instead count how many of each value we have, mod 60, so each value we count is in the range 0 - 59.
Once we have the counts, we can consider the pairs that sum to 0 or 60. The pairs that work are:
0 + 0
1 + 59
2 + 58
...
29 + 31
30 + 30
After this, the order is reversed, but we only
want to count each pair once.
There are two cases where the values are the same:
0 + 0 and 30 + 30. For each of these, the number
of pairs is (count * (count - 1)) // 2. Note that
this works when count is 0 or 1, since in both cases
we're multiplying by zero.
If the two values are different, then the number of
cases is simply the product of their counts.
Here's the code:
def getPairCount(numbers):
# Count how many of each value we have, mod 60
count_list = [0] * 60
for n in numbers:
n2 = n % 60
count_list[n2] += 1
# Now find the total
total = 0
c0 = count_list[0]
c30 = count_list[30]
total += (c0 * (c0 - 1)) // 2
total += (c30 * (c30 - 1)) // 2
for i in range(1, 30):
j = 60 - i
total += count_list[i] * count_list[j]
return total
This runs in O(n) time, due to the initial one-time pass we make over the list of input values. The loop at the end is just iterating from 1 through 29 and isn't nested, so it should run almost instantly.
Below is a translation of Tom Karzes's answer but using numpy. I benchmarked it and it is only faster if the input is already a numpy array, not a list. I still want to write it here because it nicely shows how loops in python can be one-liners in numpy.
def get_pairs_count(numbers, /):
# Count how many of each value we have, modulo 60.
numbers_mod60 = np.mod(numbers, 60)
_, counts = np.unique(numbers_mod60, return_counts=True)
# Now find the total.
total = 0
c0 = counts[0]
c30 = counts[30]
total += (c0 * (c0 - 1)) // 2
total += (c30 * (c30 - 1)) // 2
total += np.dot(counts[1:30:+1], counts[59:30:-1]) # Notice the slicing indices used.
return total

How python [internally] retrieves elements from array and finds minimum

For this question http://www.spoj.com/problems/ACPC10D/ on SPOJ, I wrote a python solution as below:
count = 1
while True:
no_rows = int(raw_input())
if no_rows == 0:
break
grid = [[None for x in range(3)] for y in range(2)]
input_arr = map(int, raw_input().split())
grid[0][0] = 10000000
grid[0][1] = input_arr[1]
grid[0][2] = input_arr[1] + input_arr[2]
r = 1
for i in range(0, no_rows-1):
input_arr = map(int, raw_input().split())
_r = r ^ 1
grid[r][0] = input_arr[0] + min(grid[_r][0], grid[_r][1])
grid[r][1] = input_arr[1] + min(min(grid[_r][0], grid[r][0]), min(grid[_r][1], grid[_r][2]))
grid[r][2] = input_arr[2] + min(min(grid[_r][1], grid[r][1]), grid[_r][2])
r = _r
print str(count) + ". " + str(grid[(no_rows -1) & 1][1])
count += 1
The above code exceeds time limit. However, when I change the line
grid[r][2] = input_arr[2] + min(min(grid[_r][1], grid[r][1]), grid[_r][2])
to
grid[r][2] = input_arr[2] + min(min(grid[_r][1], grid[_r][2]), grid[r][1])
the solution is accepted. If you notice the difference, the first line compares, grid[_r][1], grid[r][1] for minimum (i.e. the row number are different) and second line compares grid[_r][1], grid[_r][2] for minimum(i.e. the row number are same)
This is a consistent behaviour. I want to understand, how python is processing those two lines - so that one results in exceeding time limit, while other is fine.

Interpolate between elements in an array of floats

I'm getting a list of 5 floats which I would like to use as values to send pwm to an LED. I want to ramp smoothly in a variable amount of milliseconds between the elements in the array.
So if this is my array...
list = [1.222, 3.111, 0.456, 9.222, 22.333]
I want to ramp from 1.222 to 3.111 over say 3000 milliseconds, then from 3.111 to 0.456 over the same amount of time, and when it gets to the end of the list I want the 5th element of the list to ramp to the 1st element of the list and continue indefinitely.
do you think about something like that?
import time
l = [1.222, 3.111, 0.456, 9.222, 22.333]
def play_led(value):
#here should be the led- code
print value
def calc_ramp(given_list, interval_count):
new_list = []
len_list = len(given_list)
for i in range(len_list):
first = given_list[i]
second = given_list[(i+1) % len_list]
delta = (second - first) / interval_count
for j in range(interval_count):
new_list.append(first + j * delta)
return new_list
def endless_play_led(ramp_list,count):
endless = count == 0
count = abs(count)
while endless or count!=0:
for i in range(len(ramp_list)):
play_led(ramp_list[i])
#time.sleep(1)
if not endless:
count -= 1
print '##############',count
endless_play_led(calc_ramp(l, 3),2)
endless_play_led(calc_ramp(l, 3),-2)
endless_play_led(calc_ramp(l, 3),0)
another version, similar to the version of dsgdfg (based on his/her idea), but without timing lag:
import time
list_of_ramp = [1.222, 3.111, 0.456, 9.222, 22.333]
def play_LED(value):
s = ''
for i in range(int(value*4)):
s += '*'
print s, value
def interpol(first, second, fract):
return first + (second - first)*fract
def find_borders(list_of_values, total_time, time_per_step):
len_list = len(list_of_values)
total_steps = total_time // time_per_step
fract = (total_time - total_steps * time_per_step) / float(time_per_step)
index1 = int(total_steps % len_list)
return [list_of_values[index1], list_of_values[(index1 + 1) % len_list], fract]
def start_program(list_of_values, time_per_step, relax_time):
total_start = time.time()
while True:
last_time = time.time()
while time.time() - last_time < relax_time:
pass
x = find_borders(list_of_values,time.time(),time_per_step)
play_LED(interpol(x[0],x[1],x[2]))
start_program(list_of_ramp,time_per_step=5,relax_time=0.5)

leading number groups between two numbers

(Python) Given two numbers A and B. I need to find all nested "groups" of numbers:
range(2169800, 2171194)
leading numbers: 21698XX, 21699XX, 2170XX, 21710XX, 217110X, 217111X,
217112X, 217113X, 217114X, 217115X, 217116X, 217117X, 217118X, 2171190X,
2171191X, 2171192X, 2171193X, 2171194X
or like this:
range(1000, 1452)
leading numbers: 10XX, 11XX, 12XX, 13XX, 140X, 141X, 142X, 143X,
144X, 1450, 1451, 1452
Harder than it first looked - pretty sure this is solid and will handle most boundary conditions. :) (There are few!!)
def leading(a, b):
# generate digit pairs a=123, b=456 -> [(1, 4), (2, 5), (3, 6)]
zip_digits = zip(str(a), str(b))
zip_digits = map(lambda (x,y):(int(x), int(y)), zip_digits)
# this ignores problems where the last matching digits are 0 and 9
# leading (12000, 12999) is same as leading(12, 12)
while(zip_digits[-1] == (0,9)):
zip_digits.pop()
# start recursion
return compute_leading(zip_digits)
def compute_leading(zip_digits):
if(len(zip_digits) == 1): # 1 digit case is simple!! :)
(a,b) = zip_digits.pop()
return range(a, b+1)
#now we partition the problem
# given leading(123,456) we decompose this into 3 problems
# lows -> leading(123,129)
# middle -> leading(130,449) which we can recurse to leading(13,44)
# highs -> leading(450,456)
last_digits = zip_digits.pop()
low_prefix = reduce(lambda x, y : 10 * x + y, [tup[0] for tup in zip_digits]) * 10 # base for lows e.g. 120
high_prefix = reduce(lambda x, y : 10 * x + y, [tup[1] for tup in zip_digits]) * 10 # base for highs e.g. 450
lows = range(low_prefix + last_digits[0], low_prefix + 10)
highs = range(high_prefix + 0, high_prefix + last_digits[1] + 1)
#check for boundary cases where lows or highs have all ten digits
(a,b) = zip_digits.pop() # pop last digits of middle so they can be adjusted
if len(lows) == 10:
lows = []
else:
a = a + 1
if len(highs) == 10:
highs = []
else:
b = b - 1
zip_digits.append((a,b)) # push back last digits of middle after adjustments
return lows + compute_leading(zip_digits) + highs # and recurse - woohoo!!
print leading(199,411)
print leading(2169800, 2171194)
print leading(1000, 1452)
def foo(start, end):
index = 0
is_lower = False
while index < len(start):
if is_lower and start[index] == '0':
break
if not is_lower and start[index] < end[index]:
first_lower = index
is_lower = True
index += 1
return index-1, first_lower
start = '2169800'
end = '2171194'
result = []
while int(start) < int(end):
index, first_lower = foo(start, end)
range_end = index > first_lower and 10 or int(end[first_lower])
for x in range(int(start[index]), range_end):
result.append(start[:index] + str(x) + 'X'*(len(start)-index-1))
if range_end == 10:
start = str(int(start[:index])+1)+'0'+start[index+1:]
else:
start = start[:index] + str(range_end) + start[index+1:]
result.append(end)
print "Leading numbers:"
print result
I test the examples you've given, it is right. Hope this will help you
This should give you a good starting point :
def leading(start, end):
leading = []
hundreds = start // 100
while (end - hundreds * 100) > 100:
i = hundreds * 100
leading.append(range(i,i+100))
hundreds += 1
c = hundreds * 100
tens = 1
while (end - c - tens * 10) > 10:
i = c + tens * 10
leading.append(range(i, i + 10))
tens += 1
c += tens * 10
ones = 1
while (end - c - ones) > 0:
i = c + ones
leading.append(i)
ones += 1
leading.append(end)
return leading
Ok, the whole could be one loop-level deeper. But I thought it might be clearer this way. Hope, this helps you...
Update :
Now I see what you want. Furthermore, maria's code doesn't seem to be working for me. (Sorry...)
So please consider the following code :
def leading(start, end):
depth = 2
while 10 ** depth > end : depth -=1
leading = []
const = 0
coeff = start // 10 ** depth
while depth >= 0:
while (end - const - coeff * 10 ** depth) >= 10 ** depth:
leading.append(str(const / 10 ** depth + coeff) + "X" * depth)
coeff += 1
const += coeff * 10 ** depth
coeff = 0
depth -= 1
leading.append(end)
return leading
print leading(199,411)
print leading(2169800, 2171194)
print leading(1000, 1453)
print leading(1,12)
Now, let me try to explain the approach here.
The algorithm will try to find "end" starting from value "start" and check whether "end" is in the next 10^2 (which is 100 in this case). If it fails, it will make a leap of 10^2 until it succeeds. When it succeeds it will go one depth level lower. That is, it will make leaps one order of magnitude smaller. And loop that way until the depth is equal to zero (= leaps of 10^0 = 1). The algorithm stops when it reaches the "end" value.
You may also notice that I have the implemented the wrapping loop I mentioned so it is now possible to define the starting depth (or leap size) in a variable.
The first while loop makes sure the first leap does not overshoot the "end" value.
If you have any questions, just feel free to ask.

how to group a data in python

I have a file with data like:
Entry Freq.
2 4.5
3 3.4
5 4.9
8 9.1
12 11.1
16 13.1
18 12.2
22 11.2
now the problem I am trying to solve is: I want to make it a grouped data (with range 10) based on the Entry and want to add up the frequencies falling within the range.
e.g. for above table if I group it then it should be like:
Range SumFreq.
0-10 21.9(i.e. 4.5 + 3.4 + 4.9 + 9.1)
11-20 36.4
I reached upto column separation with following code but can't be able to perform range separation thing:
my code is:
inp = ("c:/usr/ovisek/desktop/file.txt",'r').read().strip().split('\n')
for line in map(str.split,inp):
k = int(line[0])
l = float(line[-1])
so far is fine but how could I be able to group the data in 10 range.
One way would be to [ab]use the fact that integer division will give you the right bins:
import collections
bin_size = 10
d = collections.defaultdict(float)
for line in map(str.split,inp):
k = int(line[0])
l = float(line[-1])
d[bin_size * (k // bin_size)] += l
How about, just adding to your code there:
def group_data(range):
grouped_data = {}
inp = ("c:/usr/ovisek/desktop/file.txt",'r').read().strip().split('\n')
for line in map(str.split,inp):
k = int(line[0])
l = float(line[-1])
range_value = k // range
if grouped_data.has_key(range_value):
grouped_data[range_value]['freq'] = groped_data[range_value]['freq'] + l
else:
grouped_data[range_value] = {'freq':l, 'value':[str(range_value * range) + ':' + str((range_value + 1) * range )]}
return grouped_data
This should give you a dictionary like:
{1 : {'value':'0-10', 'freq':21.9} , .... }
This should get you started, tested fine:
inp = open("/tmp/input.txt",'r').read().strip().split('\n')
interval = 10
index = 0
resultDict = {}
for line in map(str.split,inp):
k = int(line[0])
l = float(line[-1])
rangeNum = (int) ((k-1)/10 )
rangeKeyName = str(rangeNum*10+1)+"-"+str((rangeNum+1)*10)
if(rangeKeyName in resultDict):
resultDict[rangeKeyName] += l
else:
resultDict[rangeKeyName] = l
print(str(resultDict))
Would output:
{'21-30': 11.199999999999999, '11-20': 36.399999999999999, '1-10': 21.899999999999999}
you can do something like this:
fr = {}
inp = open("file.txt",'r').read().strip().split('\n')
for line in map(str.split,inp):
k = int(line[0])
l = float(line[-1])
key = abs(k-1) / 10 * 10
if fr.has_key(key):
fr[key] += l
else:
fr[key] = l
for k in sorted(fr.keys()):
sum = fr[k]
print '%d-%d\t%f' % (k+1 if k else 0, k+10, sum)
output:
0-10 21.900000
11-20 36.400000
21-30 11.200000

Categories

Resources