The problem is similar to coin change problem, but a little different.
The problem is stated as: You have a collection of coins, and you know the values of the coins and the quantity of each type of coin in it. You want to know how many distinct sums you can make from non-empty groupings of these coins.
So for example of coins = [1, 2, 3] and quantity = [1, 2, 2], there are 11 possible sums, basically all numbers from 1 - 11.
The length of the array coins can only go up to 20 but a quantity[x] can go up to 10^5.
What would be a possible algorithm solution that is efficient. Gathering all possible combinations of such a large quantity will take forever. Is there a mathematical formula that can determine the answer? I dont see how that it will work especially it wants distinct sums.
I was thinking of generating an array base on the coins and its quantity. Basically its multiple:
[ [1],
[2, 4],
[3, 6]]
Then have to select 1 or none from each of the arrays.
1
1,2
1,4
1,3
...
1,4,6
I cant seem to think of a good algorithm to perform that though. Doing nested loop might be too slow since there could be 20 different coins and each coin could have a large quantity.
Another possible solution is looping through 1 to maximum. Where maximum is the sum of all coins times its associated quantity. But the problem would be in determining if there exist a subset that will be equal to that number. I know there is a dynamic programming algorithm (subset sum) to determine if there exists a subset that will add up to a certain value, but what would be the array?
For this example it works fine, having the list as [1,2,4,3,6] and target sum is 11 then count the 'True' in DP will get 11. But for example coins = [10,50,100] and quantity = [1,2,1]. The answer is 9 possible sum but if using subset sum DP algo will get 21 'True'. If the list provided was [10,50,100,100] or [10,50,100] base on [[10], [50, 100], [100]]
A python solution would be preferred, but not necessary.
Below is my current code which got 21 for the [10,50,100] coins example.
def possibleSums(coins, quantity):
def subsetSum(arr,s):
dp = [False] * (s + 1)
dp[0] = True
for num in sorted(arr):
for i in range(1, len(dp)):
if num <= i:
dp[i] = dp[i] or dp[i - num]
return sum(dp)
maximum = sum((map(lambda t: t[0] * t[1], zip(coins, quantity))))
combinations = [[]]*len(coins)
for i,c in enumerate(coins):
combinations[i] = [ j for j in range(c,(c*quantity[i])+1,c) ]
array = []
for item in combinations:
array.extend(item)
print(subsetSum(array,maximum) - 1)
Guaranteed constraints:
1 ≤ coins.length ≤ 20,
1 ≤ coins[i] ≤ 10^4.
quantity.length = coins.length,
1 ≤ quantity[i] ≤ 10^5.
It is guaranteed that (quantity[0] + 1) * (quantity[1] + 1) * ... * (quantity[quantity.length - 1] + 1) <= 10^6.
Bug fix
Your original solution is fine, except that you need to iterate in reverse order to avoid being able to keep adding the same coin multiple times.
Simply change the inner loop to:
for num in sorted(arr):
for i in range(len(dp)-1,-1,-1):
if num <= i:
dp[i] = dp[i] or dp[i - num]
More efficient solution
You can also reduce the complexity by taking advantage of the multiple coins with the same value by scanning up each possible remainder in turn:
def possibleSums2(coins, quantity):
maximum = sum((map(lambda t: t[0] * t[1], zip(coins, quantity))))
dp = [False] * (maximum + 1)
dp[0] = True
for coin,q in zip(coins,quantity):
for b in range(coin):
num = -1
for i in range(b,maximum+1,coin):
if dp[i]:
num = 0
elif num>=0:
num += 1
dp[i] = 0 <= num <= q
print(sum(dp) - 1)
This will have complexity O(maximum * coins) instead of O(maximum * coins * quantity)
Don't gather all the combinations, just the sums.
Your set of sums starts with [0]. Cycle through the coins, one at a time. For each coin, iterate through its quantity, adding that multiple to each item of the set. Set-add each of these sums to the set. For example, let's take that original case: coins = [1, 2, 3], quant = [1, 2, 2]. Walking through this ...
sum_set = {0}
current_coin = 1; # coin[0]
current_quant = 1; # quant[0]
This step is trivial ... add 1 to each element of the set. This gives you {1}.
Add that to the existing set. You now have
sum_set = {0, 1}
Next coin:
current_coin = 2; # coin[0]
current_quant = 2; # quant[0]
Now, you have two items to add to each set element: 1*2, giving you {2, 3}; and 2*2, giving you {4, 5}.
Add these to the original set:
sum_set = {0, 1, 2, 3, 4, 5}
Final coin:
current_coin = 3; # coin[0]
current_quant = 2; # quant[0]
You add 1*3 and 2*3 to each set element, giving you {3, 4, 5, 6, 7, 8} and {6, 7, 8, 9, 10, 11}.
Adding these to the sum_set gives you the set of integers 0 through 11.
Remove 0 from the set (since we're not interested in that sum) and take the size of the remaining set. 11 is your answer.
Is that enough to let you turn this into an algorithm? I'll leave the various efficiencies up to you.
I was going to put up a solution using generating functions, but then you added
It is guaranteed that (quantity[0] + 1) * (quantity1 + 1) * ... * (quantity[quantity.length - 1] + 1) <= 10^6
In that case, just brute force it! Go through every possible set of coins, compute the sum, and use a set to find how many unique sums you get. 10^6 possibilities is trivial.
As for the generating function solution, we can represent the sums possible with a quantity Q of coins of value V through the polynomial
1 + x^V + x^(2V) + ... + x^(QV)
where a term with exponent N means a sum of value N can be achieved.
If we then multiply two polynomials, for example
(1 + x^(V1) + x^(2*V1) + ... + x^(Q1*V1))(1 + x^(V2) + x^(2*V2) + ... + x^(Q2*V2))
the presence of a term with exponent N in the product means that a sum of value N can be achieved by combining the coins corresponding to the input polynomials.
Efficiency then comes down to how we multiply polynomials. If we use dicts or sets to efficiently look up terms by exponent, we can win over brute force by combining like terms to eliminate some of the redundant work brute force does. We can discard the coefficients, since we don't need them. Advanced polynomial multiplication algorithms based on a number-theoretic transform may give further savings in some cases.
Here's a concise brute-force solution (Python 3):
def numsums(values, counts):
from itertools import product
choices = [range(0, v*c+1, v) for v, c in zip(values, counts)]
sums = {sum(p) for p in product(*choices)}
return len(sums) - 1 # sum "0" isn't interesting
Then, e.g.,
print(numsums([10,50,100], [1, 2, 1])) # 9
print(numsums([1, 2, 3], [1, 2, 2])) # 11
print(numsums([1, 2, 4, 8, 16, 32], [1]*6)) # 63
Eliminating duplicates along the way
This variation is functionally equivalent to some other answers; it's just showing how to do it as a variation of the brute-force way:
def numsums(values, counts):
sums = {0}
for v, c in zip(values, counts):
sums |= {i + choice
for choice in range(v, v*c+1, v)
for i in sums}
return len(sums) - 1 # sum "0" isn't interesting
In fact, if you squint just right ;-) , you can view it as one way of implementing #user2357112's polynomial multiplication idea, where "multiplication" has been redefined just to keep track of "is a term with this exponent present or not?" ("yes" if and only if the exponent is in the sums set). Then the outer loop is "multiplying" the polynomial so far by the polynomial corresponding to the current (value, count) pair, and the multiplication by the x**0 term is implicit in the |= union. Although, ya, it's easier to understand if you skip that "explanation" ;-)
This is one is even more optimized
function possibleSums(coins, quantity) {
// calculate running max sums
var max = coins.reduce(function(s, c, i) {
s += c * quantity[i];
return s;
}, 0);
var sums = [0];
var seen = new Map();
for (var j = 0; j < coins.length; j++) {
var coin = coins[j];
var n = sums.length;
for (var i = 0; i < n; i++) {
var s = sums[i];
for (var k = 0; k < quantity[j]; k++) {
s += coin;
if (max < s) break;
if (!seen.has(s)) {
seen.set(s, true);
sums.push(s);
}
}
}
}
return Array.from(seen.keys()).length;
}
Easy python solution
Note:using dynamic programming and finding all sums may result in time limit exceed.
def possibleSums(coins, quantity):
combinations = {0}
for c,q in zip(coins, quantity):
combinations = {j+i*c for j in combinations for i in range(q+1)}
return len(combinations)-1
hmm. it's very interesting problem.
If you want to just get the sum value use possibleSums().
To view all cases, use possibleCases().
import itertools
coins = ['10', '50', '100']
quantity = [1, 2, 1]
# coins = ['A', 'B', 'C', 'D']
# quantity = [1, 2, 2, 1]
def possibleSums(coins, quantity):
totalcnt=1
for i in quantity:
totalcnt = totalcnt * (i+1)
return totalcnt-1 # empty case remove
def possibleCases(coins, quantity):
coinlist = []
for i in range(len(coins)):
cset=[]
for j in range(quantity[i]+1):
val = [coins[i]] * j
cset.append(val)
coinlist.append(cset)
print('coinlist=', coinlist)
# combination the coinlist
# cases=combcase(coinlist)
# return cases
alllist = list(itertools.product(*coinlist))
caselist = []
for x in alllist:
mergelist = list(itertools.chain(*x))
if len(mergelist)==0 : # skip empty select.
continue
caselist.append(mergelist)
return caselist
sum = possibleSums(coins, quantity)
print( 'sum=', sum)
cases = possibleCases(coins, quantity)
cases.sort(key=len, reverse=True)
cases.reverse()
print('count=', len(cases))
for i, x in enumerate(cases):
print('case',(i+1), x)
output is this
sum= 11
coinlist= [[[], ['10']], [[], ['50'], ['50', '50']], [[], ['100']]]
count= 11
case 1 ['10']
case 2 ['50']
case 3 ['100']
case 4 ['10', '50']
case 5 ['10', '100']
case 6 ['50', '50']
case 7 ['50', '100']
case 8 ['10', '50', '50']
case 9 ['10', '50', '100']
case 10 ['50', '50', '100']
case 11 ['10', '50', '50', '100']
you can test other cases.
coins = ['A', 'B', 'C', 'D']
quantity = [1, 3, 2, 1]
sum= 47
coinlist= [[[], ['A']], [[], ['B'], ['B', 'B'], ['B', 'B', 'B']], [[], ['C'], ['C', 'C']], [[], ['D']]]
count= 47
case 1 ['A']
case 2 ['B']
case 3 ['C']
case 4 ['D']
case 5 ['A', 'B']
case 6 ['A', 'C']
case 7 ['A', 'D']
case 8 ['B', 'B']
case 9 ['B', 'C']
case 10 ['B', 'D']
case 11 ['C', 'C']
case 12 ['C', 'D']
case 13 ['A', 'B', 'B']
case 14 ['A', 'B', 'C']
case 15 ['A', 'B', 'D']
case 16 ['A', 'C', 'C']
case 17 ['A', 'C', 'D']
case 18 ['B', 'B', 'B']
case 19 ['B', 'B', 'C']
case 20 ['B', 'B', 'D']
case 21 ['B', 'C', 'C']
case 22 ['B', 'C', 'D']
case 23 ['C', 'C', 'D']
case 24 ['A', 'B', 'B', 'B']
case 25 ['A', 'B', 'B', 'C']
case 26 ['A', 'B', 'B', 'D']
case 27 ['A', 'B', 'C', 'C']
case 28 ['A', 'B', 'C', 'D']
case 29 ['A', 'C', 'C', 'D']
case 30 ['B', 'B', 'B', 'C']
case 31 ['B', 'B', 'B', 'D']
case 32 ['B', 'B', 'C', 'C']
case 33 ['B', 'B', 'C', 'D']
case 34 ['B', 'C', 'C', 'D']
case 35 ['A', 'B', 'B', 'B', 'C']
case 36 ['A', 'B', 'B', 'B', 'D']
case 37 ['A', 'B', 'B', 'C', 'C']
case 38 ['A', 'B', 'B', 'C', 'D']
case 39 ['A', 'B', 'C', 'C', 'D']
case 40 ['B', 'B', 'B', 'C', 'C']
case 41 ['B', 'B', 'B', 'C', 'D']
case 42 ['B', 'B', 'C', 'C', 'D']
case 43 ['A', 'B', 'B', 'B', 'C', 'C']
case 44 ['A', 'B', 'B', 'B', 'C', 'D']
case 45 ['A', 'B', 'B', 'C', 'C', 'D']
case 46 ['B', 'B', 'B', 'C', 'C', 'D']
case 47 ['A', 'B', 'B', 'B', 'C', 'C', 'D']
This is a javascript version of Peter de Rives but a little more efficient since it does not have to do maximum iteration for each coin to find its remainder
function possibleSums(coins, quantity) {
// calculate running max sums
var prevmax = 0;
var maxs = [];
for (var i = 0; i < coins.length; i++) {
maxs[i] = prevmax + coins[i] * quantity[i];
prevmax = maxs[i];
}
var dp = [true];
for (var i = 0; i < coins.length; i++) {
var max = maxs[i];
var coin = coins[i];
var qty = quantity[i];
for (var j = 0; j < coin; j++) {
var num = -1;
// only find remainders in range 0 to maxs[i];
for (var k = j; k <= max; k += coin) {
if (dp[k]) {
num = 0;
}
else if (num >= 0) {
num++;
}
dp[k] = 0 <= num && num <= qty;
}
}
}
return dp.filter(e => e).length - 1;
}
def possibleSums(coins, quantity) -> int:
from itertools import combinations
flat_list = []
for coin, q in zip(coins, quantity):
flat_list += [coin]*q
uniq_sums = set([])
for i in range(1, len(flat_list)+1):
for c in combinations(flat_list, i):
uniq_sums.add(sum(c))
return len(uniq_sums)
Translating jumarov's code to Python gives the following:
def possibleSums4(coins, quantity=None):
if quantity is None:
coins, quantity = zip(*coins.items())
max = sum(i*j for i,j in zip(coins, quantity))
dp = {0}
for c, q in zip(coins, quantity):
for b in range(c):
num = -1
for i in range(b, max + 1, c):
if i in dp:
num = 0
elif num >= 0:
num += 1
if 0 <= num <= q:
dp.add(i)
return(len(dp) - 1)
What interests me about this is that for a given set of coins it looks like once they all reach a certain multiplicity, the behavior of the number of sub-sums becomes very regular: the addition of another coin introduces an increase in the number of possible sums by the value of the coin.
Consider the coin set {4, 5, 7}. When each is used at most once, the possible sums are {0, 4, 5, 7, 9, 11, 12, 16}. When used up to twice there are 25 possibilities:
25: 0; 4-5; 7-25; 27-28; 32
If any more coins are added, the possibilities increases by the value of the coin(s) added. Here I use a couple of routines to help in displaying the values (which can be confirmed with other routines presented in answers of this question):
>>> show(Set(4,5,7)**2*Set(4)) # Set(1,2)**2 -> exponents in ((1+x)*(1+x^2))^2
'29: 0; 4-5; 7-29; 31-32; 36'
>>> show(Set(4,5,7)**2*Set(5))
'30: 0; 4-5; 7-30; 32-33; 37'
>>> show(Set(4,5,7)**2*Set(7))
'32: 0; 4-5; 7-32; 34-35; 39'
>>> show(Set(4,5,7)**2*Set(4,5,7))
'41: 0; 4-5; 7-41; 43-44; 48'
Notice that the first number is higher than 25 -- the sums present at multiplicity of 2 when the structure of the sums becomes "stable" -- by the value of the coin(s) added. e.g. adding 1 more of each (4, 5, 7 which sum to 16) give a total of 25 + 16 sums possible. By "stable" I mean that the basic range of sums is fixed, only the number in the ranges varies. e.g. in this case, the structure at (and after) a multiplicity of 2 is "singleton, range of 2, varying range, range of 2, singleton" -- symmetric.
Not every set will become stable so quickly, however. Though even large sets of "coin" denominations may become so, too: the following 42 denominations give 10366 possible sums when the multiplicity is 5:
{3, 4, 5, 9, 16, 18, 20, 21, 23, 24, 25, 29, 31, 33, 34, 38, 39, 44,
47, 49, 50, 52, 55, 56, 57, 60, 61, 63, 64, 65, 68, 69, 70, 75, 78,
80, 81, 85, 88, 94, 95, 96}
With a sum of 2074, the number of sums increases by 2074 as the multiplicity of each is increased by 1. So if each coin is used 10 more times (above the stable 5), the number of sums is 10366 + 10*2074 = 31106.
The small set of coins {18, 93, 100} gives 4886 unique sums at multiplicity 26, a significantly larger multiplicity than 5 is needed to achieve stability.
So is there a formula? I don't know how to predict the number of sums when coins are used at sub-stable counts (i.e. less than the number needed to get the number of possible sums increasing in a predictable way) and I don't know how to predict when the structure will become stable (though, for two coins, it seems to be at a value 1 less than the largest coin value). But once any coins are used more than the stable multiplicity value -- a function of the coins in use -- the number of sums (and the actually sums) appear to be easy to predict.
Of course this all applies to determining which exponents are present when a polynomial with positive coefficients is raised to some power. What is special about the "coin polynomial" is that it is a product of binomials (1+x^c_1)^n_1*...*(1 + x^c_n)^m_n where coin values are c_i and multiplicities are n_i. The "stable multiplicity" is seeking the exponent m for ((1+x^c_1)*...*(1 + x^c_n))^m which gives a predictable structure for the exponents that are present when any coin has a multiplicity greater than m.
Related
If there are two lists:
One being the items:
items = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
The other being their indexes:
index = [0, 15, 20, 2, 16, 7, 17]
ie. The first 'A' is in index 0, the second 'A' is in index 15, etc.
How would I be able to get the closest indexes for the unique items, A, B, and C
ie. Get 15, 16, 17?
You can achieve this with a simples script. Consider those two lists as input, you just want to find the index on the letter list and it's correspondence on number list:
list_of_repeated=[]
list_of_closest=[]
for letter in list_letter:
if letter in list_of_repeated:
continue
else:
list_of_repeated.append(letter)
list_of_closest.append(list_number[list_letter.index(letter)])
What you are trying to do is minimize the sum of differences between indices.
You can find the minimal combination like this:
import numpy as np
from itertools import product
items = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
index = [0, 15, 20, 2, 16, 7, 17]
cost = np.inf
for combination in product(*[list(filter(lambda x: x[0] == i, zip(items, index))) for i in set(items)]):
diff = sum(abs(np.ediff1d([i[1] for i in combination])))
if diff < cost:
cost = diff
idx = combination
print(idx)
This is bruteforcing the solution, there may be more elegant / faster ways to do this, but this is what comes to my mind on the fly.
I am trying to do the following:
Given a dataFrame of distance, I want to identify the k-nearest neighbours for each element.
Example:
A B C D
A 0 1 3 2
B 5 0 2 2
C 3 2 0 1
D 2 3 4 0
If k=2, it should return:
A: B D
B: C D
C: D B
D: A B
Distances are not necessarily symmetric.
I am thinking there must be something somewhere that does this in an efficient way using Pandas DataFrames. But I cannot find anything?
Homemade code is also very welcome! :)
Thank you!
The way I see it, I simply find n + 1 smallest numbers/distances/neighbours for each row and remove the 0, which would then give you n numbers/distances/neighbours. Keep in mind that the code will not work if you have a distance of zeroes! Only the diagonals are allowed to be 0.
import pandas as pd
import numpy as np
X = pd.DataFrame([[0, 1, 3, 2],[5, 0, 2, 2],[3, 2, 0, 1],[2, 3, 4, 0]])
X.columns = ['A', 'B', 'C', 'D']
X.index = ['A', 'B', 'C', 'D']
X = X.T
for i in X.index:
Y = X.nsmallest(3, i)
Y = Y.T
Y = Y[Y.index.str.startswith(i)]
Y = Y.loc[:, Y.any()]
for j in Y.index:
print(i + ": ", list(Y.columns))
This prints out:
A: ['B', 'D']
B: ['C', 'D']
C: ['D', 'B']
D: ['A', 'B']
I've a list like this one:
categories_list = [
['a', array([ 12994, 1262824, 145854, 92469]),
'b', array([273300]),
'c', array([341395, 32857711])],
['a', array([ 356424311, 165573412, 2032850784]),
'b', array([2848105, 228835]),
'c', array([])],
['a', array([1431689, 30655043, 1739919]),
'b', array([597, 251911, 246600]),
'c', array([35590])]
]
where each array belongs to the letter before.
Example: a -> array([ 12994, 1262824, 145854, 92469]), b -> array([273300]), 'a' -> array([1431689, 30655043, 1739919]) and so on...
So, is it possible to retrieve the total items number for each letter?
Desiderata:
----------
a 10
b 6
c 3
All suggestions are welcome
pd.DataFrame(
[dict(zip(x[::2], [len(y) for y in x[1::2]])) for x in categories_list]
).sum()
a 10
b 6
c 3
dtype: int64
I'm aiming at creating a list of dictionaries. So I have to fill in the ...... with something that parses each sub-list with a dictionary
[ ...... for x in catgories_list]
If I use dict on a list or generator of tuples, it will magically turn that into a dictionary with keys as the first value in the tuple and values as the second value in the tuple.
dict(...list of tuples...)
zip will give me that generator of tuples
zip(list one, list two)
I know that in each sub-list, my keys are at the even indices [0, 2, 4...] and values are at the odd indices [1, 3, 5, ...]
# even odd
zip(x[::2], x[1::2])
but x[1::2] will be arrays, and I don't want the arrays. I want the length of the arrays.
# even odd
zip(x[::2], [len(y) for y in x[1::2]])
pandas.DataFrame will take a list of dictionaries and create a dataframe.
Finally, use sum to count the lengths.
I use groupby in order to group key in column 0, 2, 4 (which has keys a, b, c respectively) and then count number of distinct item number in the next column. Number in the group in this case is len(set(group)) (or len(group) if you want just total length of the group). See the code below:
from itertools import groupby, chain
count_distincts = []
cols = [0, 2, 4]
for c in cols:
for gid, group in groupby(categories_list, key=lambda x: x[c]):
group = list(chain(*[list(g[c + 1]) for g in group]))
count_distincts.append([gid, len(set(group))])
Output [['a', 10], ['b', 6], ['c', 3]]
list_1 = ['a', 'a', 'a', 'b']
list_2 = ['a', 'b', 'b', 'b', 'c']
so in the list above, only items in index 0 is the same while index 1 to 4 in both list are different. also, list_2 has an extra item 'c'.
I want to count the number of times the index in both list are different, In this case I should get 3.
I tried doing this:
x = 0
for i in max(len(list_1),len(list_2)):
if list_1[i]==list_2[i]:
continue
else:
x+=1
I am getting an error.
Use the zip() function to pair up the lists, counting all the differences, then add the difference in length.
zip() will only iterate over the items that can be paired up, but there is little point in iterating over the remainder; you know those are all to be counted as different:
differences = sum(a != b for a, b in zip(list_1, list_2))
differences += abs(len(list_1) - len(list_2))
The sum() sums up True and False values; this works because Python's boolean type is a subclass of int and False equals 0, True equals 1. Thus, for each differing pair of elements, the True values produced by the != tests add up as 1s.
Demo:
>>> list_1 = ['a', 'a', 'a', 'b']
>>> list_2 = ['a', 'b', 'b', 'b', 'c']
>>> sum(a != b for a, b in zip(list_1, list_2))
2
>>> abs(len(list_1) - len(list_2))
1
>>> difference = sum(a != b for a, b in zip(list_1, list_2))
>>> difference += abs(len(list_1) - len(list_2))
>>> difference
3
You can try with this :
list1 = [1,2,3,5,7,8,23,24,25,32]
list2 = [5,3,4,21,201,51,4,5,9,12,32,23]
list3 = []
for i in range(len(list2)):
if list2[i] not in list1:
pass
else :
list3.append(list2[i])
print list3
print len(list3)
As ZdaR commented, you should get 3 as the result and zip_longest can help here if you don't have Nones in the lists.
from itertools import zip_longest
list_1=['a', 'a', 'a', 'b']
list_2=['a', 'b', 'b', 'b', 'c']
x = sum(a != b for a,b in zip_longest(list_1,list_2))
Can i try this way using for loop:
>>> count = 0
>>> ls1 = ['a', 'a', 'a', 'b']
>>> ls2 = ['a', 'b', 'b', 'b', 'c']
>>> for i in range(0, max(len(ls1),len(ls2)), 1):
... if ls1[i:i+1] != ls2[i:i+1]:
... count += 1
...
>>> print count
3
>>>
Or try this (didn't change the lists):
dif = 0
for i in range(len(min(list_1, list_2))):
if list_1[i]!=list_2[i]:
dif+=1
#print(list_1[i], " != ", list_2[i], " --> Dif = ", dif)
dif+=(len(max(list_1, list_2)) - len(min(list_1, list_2)))
print("Difference = ", dif)
(Output: Difference = 3)
Not much better, but here's another option
if len(a) < len(b):
b = b[0:len(a)]
else:
a = a[0:len(b)]
correct = sum(a == b)
I have the following list of lists that contains 6 entries:
lol = [['a', 3, 1.01],
['x', 5, 1.00],
['k', 7, 2.02],
['p', 8, 3.00],
['b', 10, 1.09],
['f', 12, 2.03]]
Each sublist in lol contains 3 elements:
['a', 3, 1.01]
e1 e2 e3
The list above is already sorted according to e2 (i.e, 2nd element)
I'd like to 'cluster' the above list following roughly these steps:
Pick the lowest entry (wrt. e2) in lol as the key of first cluster
Assign that as first member of the cluster (dictionary of list)
Calculate the difference current e3 in next list with first member
of existing clusters.
If the difference is less than threshold, assign that list as
the member of the corresponding cluster
Else, create new cluster with current list as new key.
Repeat the rest until finish
The final result will look like this, with threshold <= 0.1.
dol = {'a':['a', 'x', 'b'],
'k':['k', 'f'],
'p':['p']}
I'm stuck with this, what's the right way to do it:
import json
from collections import defaultdict
thres = 0.1
tmp_e3 = 0
tmp_e1 = "-"
lol = [['a', 3, 1.01], ['x', 5, 1.00], ['k', 7, 2.02],
['p', 8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
dol = defaultdict(list)
for thelist in lol:
e1, e2, e3 = thelist
if tmp_e1 == "-":
tmp_e1 = e1
else:
diff = abs(tmp_e3 - e3)
if diff > thres:
tmp_e1 = e1
dol[tmp_e1].append(e1)
tmp_e1 = e1
tmp_e3 = e3
print json.dumps(dol, indent=4)
I would first ensure lol is sorted on second element, then iterate keeping in the list only what in not in threshold from first element :
import json
thres = 0.1
tmp_e3 = 0
tmp_e1 = "-"
lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02],
['p',8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]]
# ensure lol is sorted
lol.sort(key = (lambda x: x[1]))
dol = {}
while len(lol) > 0:
x = lol.pop(0)
lol2 = []
dol[x[0]] = [ x[0] ]
for i in lol:
if abs(i[2] - x[2]) < thres:
dol[x[0]].append(i[0])
else:
lol2.append(i)
lol = lol2
print json.dumps(dol, indent=4)
Result :
{
"a": [
"a",
"x",
"b"
],
"p": [
"p"
],
"k": [
"k",
"f"
]
}
Letting e2/e3 aside, here's a rough draft.
First generator groups data by value, it does need data to be sorted by value though.
Then an example use, first raw and then with data re-sorted by value.
In [32]: def cluster(lol, threshold=0.1):
cl, start = None, None
for e1, e2, e3 in lol:
if cl and abs(start - e3) <= threshold:
cl.append(e1)
else:
if cl: yield cl
cl = [e1]
start = e3
if cl: yield cl
In [33]: list(cluster(lol))
Out[33]: [['a', 'x'], ['k'], ['p'], ['b'], ['f']]
In [34]: list(cluster(sorted(lol, key = lambda ar:ar[-1])))
Out[34]: [['x', 'a', 'b'], ['k', 'f'], ['p']]