How to get the closest indexes within a list - python

If there are two lists:
One being the items:
items = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
The other being their indexes:
index = [0, 15, 20, 2, 16, 7, 17]
ie. The first 'A' is in index 0, the second 'A' is in index 15, etc.
How would I be able to get the closest indexes for the unique items, A, B, and C
ie. Get 15, 16, 17?

You can achieve this with a simples script. Consider those two lists as input, you just want to find the index on the letter list and it's correspondence on number list:
list_of_repeated=[]
list_of_closest=[]
for letter in list_letter:
if letter in list_of_repeated:
continue
else:
list_of_repeated.append(letter)
list_of_closest.append(list_number[list_letter.index(letter)])

What you are trying to do is minimize the sum of differences between indices.
You can find the minimal combination like this:
import numpy as np
from itertools import product
items = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
index = [0, 15, 20, 2, 16, 7, 17]
cost = np.inf
for combination in product(*[list(filter(lambda x: x[0] == i, zip(items, index))) for i in set(items)]):
diff = sum(abs(np.ediff1d([i[1] for i in combination])))
if diff < cost:
cost = diff
idx = combination
print(idx)
This is bruteforcing the solution, there may be more elegant / faster ways to do this, but this is what comes to my mind on the fly.

Related

Extracting certain elements from array of each row for a specific column

I'm trying to extract values from array rows of a specific column with specified indices.
A dummy example, if I have a column called 'arr' in my dataframe where each array below is a row-
[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]
[11, 12, 13, 14, 15]
[16, 17, 18, 19, 20]
I've tried:
for row in df.itertuples():
i1 = [0,1,2]
r1 = np.array(df.arr)[i1]
i2 = [2,3]
r2 = np.array(df.arr)[i2]
which gives the rows 0, 1 and 2 from the dataframe.
And I've tried:
for row in df.itertuples():
i1 = [0,1,2]
r1 = np.array(row.arr)[i1]
i2 = [2,3]
r2 = np.array(row.arr)[i2]
which gives the values from only the last row. I don't understand why.
What I want to get are the indices specified in i1 and i2 as two different variables (r1 and r2) for each row. So-
r1 should give-
[1, 2, 3]
[6, 7, 8]
[11, 12, 13]
[16, 17, 18]
And r2 should give-
[3, 4]
[8, 9]
[13, 14]
[18, 19]
I've also used iterrows() with no luck.
if you want columns r1 and r2 in same dataframe , you can use:
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
df['r1']=df['arr']
df['r1']=df['r1'].apply(lambda x:x[0:3])
df['r2']=df['arr']
df['r2']=df['r2'].apply(lambda x:x[2:4])
I have applied lambda that does the work, is this what you want?
If you want a new dataframe with rows r1 and r2 , you can use
from operator import itemgetter
a=[0,1,2]
b=[2,3]
df = pd.DataFrame(np.random.randn(10, 5), columns=['a', 'b', 'c', 'd', 'e'])
df['arr'] = df[['b', 'c', 'd', 'e']].values.tolist()
data=pd.DataFrame()
data['r1']=df['arr']
data['r2']=df['arr']
data['r1']=data['r1'].apply(lambda x:itemgetter(*a)(x))
data['r2']=data['r2'].apply(lambda x:itemgetter(*b)(x))
data
does this edit help you!
Try:
i1, i2 = [0,1,2],[2,3]
number_rows = 4
r1, r2 = np.zeros((number_rows,3)), np.zeros((number_rows,2))
for i in range(number_rows):
r1[i] = np.array(df.arr)[i][i1]
r2[i] = np.array(df.arr)[i][i2]
The problem with your first attempt was, that if you give a 2D (like np.array(df.arr)) array only one index, it will return the whole row for each index.
In your second attempt, you actually get the results you want in each row, but you overwrite the results of former rows, so you only get the values of the last row.
You can fix this by inserting the results of each row into your result arrays, as done above.

Count and sort pandas dataframe

I have a dataframe with column 'code' which I have sorted based on frequency.
In order to see what each code means, there is also a column 'note'.
For each counting/grouping of the 'code' column, I display the first note that is attached to the first 'code'
df.groupby('code')['note'].agg(['count', 'first']).sort_values('count', ascending=False)
Now my question is, how do I display only those rows that have frequency of e.g. >= 30?
Add a query call before you sort. Also, if you only want those rows EQUALing < insert frequency here >, sort_values isn't needed (right?!).
df.groupby('code')['note'].agg(['count', 'first']).query('count == 30')
If the question is for all groups with AT LEAST < insert frequency here >, then
(
df.groupby('code')
.note.agg(['count', 'first'])
.query('count >= 30')
.sort_values('count', ascending=False)
)
Why do I use query? It's a lot easier to pipe and chain with it.
You can just filter your result accordingly:
grp = grp[grp['count'] >= 30]
Example with data
import pandas as pd
df = pd.DataFrame({'code': [1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3],
'note': ['A', 'B', 'A', 'A', 'C', 'C', 'C', 'A', 'A',
'B', 'B', 'C', 'A', 'B'] })
res = df.groupby('code')['note'].agg(['count', 'first']).sort_values('count', ascending=False)
# count first
# code
# 2 5 C
# 3 5 B
# 1 4 A
res2 = res[res['count'] >= 5]
# count first
# code
# 2 5 C
# 3 5 B

Complete search algorithm for combinations of coins

The problem is similar to coin change problem, but a little different.
The problem is stated as: You have a collection of coins, and you know the values of the coins and the quantity of each type of coin in it. You want to know how many distinct sums you can make from non-empty groupings of these coins.
So for example of coins = [1, 2, 3] and quantity = [1, 2, 2], there are 11 possible sums, basically all numbers from 1 - 11.
The length of the array coins can only go up to 20 but a quantity[x] can go up to 10^5.
What would be a possible algorithm solution that is efficient. Gathering all possible combinations of such a large quantity will take forever. Is there a mathematical formula that can determine the answer? I dont see how that it will work especially it wants distinct sums.
I was thinking of generating an array base on the coins and its quantity. Basically its multiple:
[ [1],
[2, 4],
[3, 6]]
Then have to select 1 or none from each of the arrays.
1
1,2
1,4
1,3
...
1,4,6
I cant seem to think of a good algorithm to perform that though. Doing nested loop might be too slow since there could be 20 different coins and each coin could have a large quantity.
Another possible solution is looping through 1 to maximum. Where maximum is the sum of all coins times its associated quantity. But the problem would be in determining if there exist a subset that will be equal to that number. I know there is a dynamic programming algorithm (subset sum) to determine if there exists a subset that will add up to a certain value, but what would be the array?
For this example it works fine, having the list as [1,2,4,3,6] and target sum is 11 then count the 'True' in DP will get 11. But for example coins = [10,50,100] and quantity = [1,2,1]. The answer is 9 possible sum but if using subset sum DP algo will get 21 'True'. If the list provided was [10,50,100,100] or [10,50,100] base on [[10], [50, 100], [100]]
A python solution would be preferred, but not necessary.
Below is my current code which got 21 for the [10,50,100] coins example.
def possibleSums(coins, quantity):
def subsetSum(arr,s):
dp = [False] * (s + 1)
dp[0] = True
for num in sorted(arr):
for i in range(1, len(dp)):
if num <= i:
dp[i] = dp[i] or dp[i - num]
return sum(dp)
maximum = sum((map(lambda t: t[0] * t[1], zip(coins, quantity))))
combinations = [[]]*len(coins)
for i,c in enumerate(coins):
combinations[i] = [ j for j in range(c,(c*quantity[i])+1,c) ]
array = []
for item in combinations:
array.extend(item)
print(subsetSum(array,maximum) - 1)
Guaranteed constraints:
1 ≤ coins.length ≤ 20,
1 ≤ coins[i] ≤ 10^4.
quantity.length = coins.length,
1 ≤ quantity[i] ≤ 10^5.
It is guaranteed that (quantity[0] + 1) * (quantity[1] + 1) * ... * (quantity[quantity.length - 1] + 1) <= 10^6.
Bug fix
Your original solution is fine, except that you need to iterate in reverse order to avoid being able to keep adding the same coin multiple times.
Simply change the inner loop to:
for num in sorted(arr):
for i in range(len(dp)-1,-1,-1):
if num <= i:
dp[i] = dp[i] or dp[i - num]
More efficient solution
You can also reduce the complexity by taking advantage of the multiple coins with the same value by scanning up each possible remainder in turn:
def possibleSums2(coins, quantity):
maximum = sum((map(lambda t: t[0] * t[1], zip(coins, quantity))))
dp = [False] * (maximum + 1)
dp[0] = True
for coin,q in zip(coins,quantity):
for b in range(coin):
num = -1
for i in range(b,maximum+1,coin):
if dp[i]:
num = 0
elif num>=0:
num += 1
dp[i] = 0 <= num <= q
print(sum(dp) - 1)
This will have complexity O(maximum * coins) instead of O(maximum * coins * quantity)
Don't gather all the combinations, just the sums.
Your set of sums starts with [0]. Cycle through the coins, one at a time. For each coin, iterate through its quantity, adding that multiple to each item of the set. Set-add each of these sums to the set. For example, let's take that original case: coins = [1, 2, 3], quant = [1, 2, 2]. Walking through this ...
sum_set = {0}
current_coin = 1; # coin[0]
current_quant = 1; # quant[0]
This step is trivial ... add 1 to each element of the set. This gives you {1}.
Add that to the existing set. You now have
sum_set = {0, 1}
Next coin:
current_coin = 2; # coin[0]
current_quant = 2; # quant[0]
Now, you have two items to add to each set element: 1*2, giving you {2, 3}; and 2*2, giving you {4, 5}.
Add these to the original set:
sum_set = {0, 1, 2, 3, 4, 5}
Final coin:
current_coin = 3; # coin[0]
current_quant = 2; # quant[0]
You add 1*3 and 2*3 to each set element, giving you {3, 4, 5, 6, 7, 8} and {6, 7, 8, 9, 10, 11}.
Adding these to the sum_set gives you the set of integers 0 through 11.
Remove 0 from the set (since we're not interested in that sum) and take the size of the remaining set. 11 is your answer.
Is that enough to let you turn this into an algorithm? I'll leave the various efficiencies up to you.
I was going to put up a solution using generating functions, but then you added
It is guaranteed that (quantity[0] + 1) * (quantity1 + 1) * ... * (quantity[quantity.length - 1] + 1) <= 10^6
In that case, just brute force it! Go through every possible set of coins, compute the sum, and use a set to find how many unique sums you get. 10^6 possibilities is trivial.
As for the generating function solution, we can represent the sums possible with a quantity Q of coins of value V through the polynomial
1 + x^V + x^(2V) + ... + x^(QV)
where a term with exponent N means a sum of value N can be achieved.
If we then multiply two polynomials, for example
(1 + x^(V1) + x^(2*V1) + ... + x^(Q1*V1))(1 + x^(V2) + x^(2*V2) + ... + x^(Q2*V2))
the presence of a term with exponent N in the product means that a sum of value N can be achieved by combining the coins corresponding to the input polynomials.
Efficiency then comes down to how we multiply polynomials. If we use dicts or sets to efficiently look up terms by exponent, we can win over brute force by combining like terms to eliminate some of the redundant work brute force does. We can discard the coefficients, since we don't need them. Advanced polynomial multiplication algorithms based on a number-theoretic transform may give further savings in some cases.
Here's a concise brute-force solution (Python 3):
def numsums(values, counts):
from itertools import product
choices = [range(0, v*c+1, v) for v, c in zip(values, counts)]
sums = {sum(p) for p in product(*choices)}
return len(sums) - 1 # sum "0" isn't interesting
Then, e.g.,
print(numsums([10,50,100], [1, 2, 1])) # 9
print(numsums([1, 2, 3], [1, 2, 2])) # 11
print(numsums([1, 2, 4, 8, 16, 32], [1]*6)) # 63
Eliminating duplicates along the way
This variation is functionally equivalent to some other answers; it's just showing how to do it as a variation of the brute-force way:
def numsums(values, counts):
sums = {0}
for v, c in zip(values, counts):
sums |= {i + choice
for choice in range(v, v*c+1, v)
for i in sums}
return len(sums) - 1 # sum "0" isn't interesting
In fact, if you squint just right ;-) , you can view it as one way of implementing #user2357112's polynomial multiplication idea, where "multiplication" has been redefined just to keep track of "is a term with this exponent present or not?" ("yes" if and only if the exponent is in the sums set). Then the outer loop is "multiplying" the polynomial so far by the polynomial corresponding to the current (value, count) pair, and the multiplication by the x**0 term is implicit in the |= union. Although, ya, it's easier to understand if you skip that "explanation" ;-)
This is one is even more optimized
function possibleSums(coins, quantity) {
// calculate running max sums
var max = coins.reduce(function(s, c, i) {
s += c * quantity[i];
return s;
}, 0);
var sums = [0];
var seen = new Map();
for (var j = 0; j < coins.length; j++) {
var coin = coins[j];
var n = sums.length;
for (var i = 0; i < n; i++) {
var s = sums[i];
for (var k = 0; k < quantity[j]; k++) {
s += coin;
if (max < s) break;
if (!seen.has(s)) {
seen.set(s, true);
sums.push(s);
}
}
}
}
return Array.from(seen.keys()).length;
}
Easy python solution
Note:using dynamic programming and finding all sums may result in time limit exceed.
def possibleSums(coins, quantity):
combinations = {0}
for c,q in zip(coins, quantity):
combinations = {j+i*c for j in combinations for i in range(q+1)}
return len(combinations)-1
hmm. it's very interesting problem.
If you want to just get the sum value use possibleSums().
To view all cases, use possibleCases().
import itertools
coins = ['10', '50', '100']
quantity = [1, 2, 1]
# coins = ['A', 'B', 'C', 'D']
# quantity = [1, 2, 2, 1]
def possibleSums(coins, quantity):
totalcnt=1
for i in quantity:
totalcnt = totalcnt * (i+1)
return totalcnt-1 # empty case remove
def possibleCases(coins, quantity):
coinlist = []
for i in range(len(coins)):
cset=[]
for j in range(quantity[i]+1):
val = [coins[i]] * j
cset.append(val)
coinlist.append(cset)
print('coinlist=', coinlist)
# combination the coinlist
# cases=combcase(coinlist)
# return cases
alllist = list(itertools.product(*coinlist))
caselist = []
for x in alllist:
mergelist = list(itertools.chain(*x))
if len(mergelist)==0 : # skip empty select.
continue
caselist.append(mergelist)
return caselist
sum = possibleSums(coins, quantity)
print( 'sum=', sum)
cases = possibleCases(coins, quantity)
cases.sort(key=len, reverse=True)
cases.reverse()
print('count=', len(cases))
for i, x in enumerate(cases):
print('case',(i+1), x)
output is this
sum= 11
coinlist= [[[], ['10']], [[], ['50'], ['50', '50']], [[], ['100']]]
count= 11
case 1 ['10']
case 2 ['50']
case 3 ['100']
case 4 ['10', '50']
case 5 ['10', '100']
case 6 ['50', '50']
case 7 ['50', '100']
case 8 ['10', '50', '50']
case 9 ['10', '50', '100']
case 10 ['50', '50', '100']
case 11 ['10', '50', '50', '100']
you can test other cases.
coins = ['A', 'B', 'C', 'D']
quantity = [1, 3, 2, 1]
sum= 47
coinlist= [[[], ['A']], [[], ['B'], ['B', 'B'], ['B', 'B', 'B']], [[], ['C'], ['C', 'C']], [[], ['D']]]
count= 47
case 1 ['A']
case 2 ['B']
case 3 ['C']
case 4 ['D']
case 5 ['A', 'B']
case 6 ['A', 'C']
case 7 ['A', 'D']
case 8 ['B', 'B']
case 9 ['B', 'C']
case 10 ['B', 'D']
case 11 ['C', 'C']
case 12 ['C', 'D']
case 13 ['A', 'B', 'B']
case 14 ['A', 'B', 'C']
case 15 ['A', 'B', 'D']
case 16 ['A', 'C', 'C']
case 17 ['A', 'C', 'D']
case 18 ['B', 'B', 'B']
case 19 ['B', 'B', 'C']
case 20 ['B', 'B', 'D']
case 21 ['B', 'C', 'C']
case 22 ['B', 'C', 'D']
case 23 ['C', 'C', 'D']
case 24 ['A', 'B', 'B', 'B']
case 25 ['A', 'B', 'B', 'C']
case 26 ['A', 'B', 'B', 'D']
case 27 ['A', 'B', 'C', 'C']
case 28 ['A', 'B', 'C', 'D']
case 29 ['A', 'C', 'C', 'D']
case 30 ['B', 'B', 'B', 'C']
case 31 ['B', 'B', 'B', 'D']
case 32 ['B', 'B', 'C', 'C']
case 33 ['B', 'B', 'C', 'D']
case 34 ['B', 'C', 'C', 'D']
case 35 ['A', 'B', 'B', 'B', 'C']
case 36 ['A', 'B', 'B', 'B', 'D']
case 37 ['A', 'B', 'B', 'C', 'C']
case 38 ['A', 'B', 'B', 'C', 'D']
case 39 ['A', 'B', 'C', 'C', 'D']
case 40 ['B', 'B', 'B', 'C', 'C']
case 41 ['B', 'B', 'B', 'C', 'D']
case 42 ['B', 'B', 'C', 'C', 'D']
case 43 ['A', 'B', 'B', 'B', 'C', 'C']
case 44 ['A', 'B', 'B', 'B', 'C', 'D']
case 45 ['A', 'B', 'B', 'C', 'C', 'D']
case 46 ['B', 'B', 'B', 'C', 'C', 'D']
case 47 ['A', 'B', 'B', 'B', 'C', 'C', 'D']
This is a javascript version of Peter de Rives but a little more efficient since it does not have to do maximum iteration for each coin to find its remainder
function possibleSums(coins, quantity) {
// calculate running max sums
var prevmax = 0;
var maxs = [];
for (var i = 0; i < coins.length; i++) {
maxs[i] = prevmax + coins[i] * quantity[i];
prevmax = maxs[i];
}
var dp = [true];
for (var i = 0; i < coins.length; i++) {
var max = maxs[i];
var coin = coins[i];
var qty = quantity[i];
for (var j = 0; j < coin; j++) {
var num = -1;
// only find remainders in range 0 to maxs[i];
for (var k = j; k <= max; k += coin) {
if (dp[k]) {
num = 0;
}
else if (num >= 0) {
num++;
}
dp[k] = 0 <= num && num <= qty;
}
}
}
return dp.filter(e => e).length - 1;
}
def possibleSums(coins, quantity) -> int:
from itertools import combinations
flat_list = []
for coin, q in zip(coins, quantity):
flat_list += [coin]*q
uniq_sums = set([])
for i in range(1, len(flat_list)+1):
for c in combinations(flat_list, i):
uniq_sums.add(sum(c))
return len(uniq_sums)
Translating jumarov's code to Python gives the following:
def possibleSums4(coins, quantity=None):
if quantity is None:
coins, quantity = zip(*coins.items())
max = sum(i*j for i,j in zip(coins, quantity))
dp = {0}
for c, q in zip(coins, quantity):
for b in range(c):
num = -1
for i in range(b, max + 1, c):
if i in dp:
num = 0
elif num >= 0:
num += 1
if 0 <= num <= q:
dp.add(i)
return(len(dp) - 1)
What interests me about this is that for a given set of coins it looks like once they all reach a certain multiplicity, the behavior of the number of sub-sums becomes very regular: the addition of another coin introduces an increase in the number of possible sums by the value of the coin.
Consider the coin set {4, 5, 7}. When each is used at most once, the possible sums are {0, 4, 5, 7, 9, 11, 12, 16}. When used up to twice there are 25 possibilities:
25: 0; 4-5; 7-25; 27-28; 32
If any more coins are added, the possibilities increases by the value of the coin(s) added. Here I use a couple of routines to help in displaying the values (which can be confirmed with other routines presented in answers of this question):
>>> show(Set(4,5,7)**2*Set(4)) # Set(1,2)**2 -> exponents in ((1+x)*(1+x^2))^2
'29: 0; 4-5; 7-29; 31-32; 36'
>>> show(Set(4,5,7)**2*Set(5))
'30: 0; 4-5; 7-30; 32-33; 37'
>>> show(Set(4,5,7)**2*Set(7))
'32: 0; 4-5; 7-32; 34-35; 39'
>>> show(Set(4,5,7)**2*Set(4,5,7))
'41: 0; 4-5; 7-41; 43-44; 48'
Notice that the first number is higher than 25 -- the sums present at multiplicity of 2 when the structure of the sums becomes "stable" -- by the value of the coin(s) added. e.g. adding 1 more of each (4, 5, 7 which sum to 16) give a total of 25 + 16 sums possible. By "stable" I mean that the basic range of sums is fixed, only the number in the ranges varies. e.g. in this case, the structure at (and after) a multiplicity of 2 is "singleton, range of 2, varying range, range of 2, singleton" -- symmetric.
Not every set will become stable so quickly, however. Though even large sets of "coin" denominations may become so, too: the following 42 denominations give 10366 possible sums when the multiplicity is 5:
{3, 4, 5, 9, 16, 18, 20, 21, 23, 24, 25, 29, 31, 33, 34, 38, 39, 44,
47, 49, 50, 52, 55, 56, 57, 60, 61, 63, 64, 65, 68, 69, 70, 75, 78,
80, 81, 85, 88, 94, 95, 96}
With a sum of 2074, the number of sums increases by 2074 as the multiplicity of each is increased by 1. So if each coin is used 10 more times (above the stable 5), the number of sums is 10366 + 10*2074 = 31106.
The small set of coins {18, 93, 100} gives 4886 unique sums at multiplicity 26, a significantly larger multiplicity than 5 is needed to achieve stability.
So is there a formula? I don't know how to predict the number of sums when coins are used at sub-stable counts (i.e. less than the number needed to get the number of possible sums increasing in a predictable way) and I don't know how to predict when the structure will become stable (though, for two coins, it seems to be at a value 1 less than the largest coin value). But once any coins are used more than the stable multiplicity value -- a function of the coins in use -- the number of sums (and the actually sums) appear to be easy to predict.
Of course this all applies to determining which exponents are present when a polynomial with positive coefficients is raised to some power. What is special about the "coin polynomial" is that it is a product of binomials (1+x^c_1)^n_1*...*(1 + x^c_n)^m_n where coin values are c_i and multiplicities are n_i. The "stable multiplicity" is seeking the exponent m for ((1+x^c_1)*...*(1 + x^c_n))^m which gives a predictable structure for the exponents that are present when any coin has a multiplicity greater than m.

python pandas: list of sublist: total items number

I've a list like this one:
categories_list = [
['a', array([ 12994, 1262824, 145854, 92469]),
'b', array([273300]),
'c', array([341395, 32857711])],
['a', array([ 356424311, 165573412, 2032850784]),
'b', array([2848105, 228835]),
'c', array([])],
['a', array([1431689, 30655043, 1739919]),
'b', array([597, 251911, 246600]),
'c', array([35590])]
]
where each array belongs to the letter before.
Example: a -> array([ 12994, 1262824, 145854, 92469]), b -> array([273300]), 'a' -> array([1431689, 30655043, 1739919]) and so on...
So, is it possible to retrieve the total items number for each letter?
Desiderata:
----------
a 10
b 6
c 3
All suggestions are welcome
pd.DataFrame(
[dict(zip(x[::2], [len(y) for y in x[1::2]])) for x in categories_list]
).sum()
a 10
b 6
c 3
dtype: int64
I'm aiming at creating a list of dictionaries. So I have to fill in the ...... with something that parses each sub-list with a dictionary
[ ...... for x in catgories_list]
If I use dict on a list or generator of tuples, it will magically turn that into a dictionary with keys as the first value in the tuple and values as the second value in the tuple.
dict(...list of tuples...)
zip will give me that generator of tuples
zip(list one, list two)
I know that in each sub-list, my keys are at the even indices [0, 2, 4...] and values are at the odd indices [1, 3, 5, ...]
# even odd
zip(x[::2], x[1::2])
but x[1::2] will be arrays, and I don't want the arrays. I want the length of the arrays.
# even odd
zip(x[::2], [len(y) for y in x[1::2]])
pandas.DataFrame will take a list of dictionaries and create a dataframe.
Finally, use sum to count the lengths.
I use groupby in order to group key in column 0, 2, 4 (which has keys a, b, c respectively) and then count number of distinct item number in the next column. Number in the group in this case is len(set(group)) (or len(group) if you want just total length of the group). See the code below:
from itertools import groupby, chain
count_distincts = []
cols = [0, 2, 4]
for c in cols:
for gid, group in groupby(categories_list, key=lambda x: x[c]):
group = list(chain(*[list(g[c + 1]) for g in group]))
count_distincts.append([gid, len(set(group))])
Output [['a', 10], ['b', 6], ['c', 3]]

Get only unique elements from two lists

If I have two lists (may be with different len):
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
result = [11,22,33,44]
im doing:
for element in f:
if element in x:
f.remove(element)
I'm getting
result = [11,22,33,44,4]
UPDATE:
Thanks to #Ahito:
In : list(set(x).symmetric_difference(set(f)))
Out: [33, 2, 22, 11, 44]
This article has a neat diagram that explains what the symmetric difference does.
OLD answer:
Using this piece of Python's documentation on sets:
>>> # Demonstrate set operations on unique letters from two words
...
>>> a = set('abracadabra')
>>> b = set('alacazam')
>>> a # unique letters in a
{'a', 'r', 'b', 'c', 'd'}
>>> a - b # letters in a but not in b
{'r', 'd', 'b'}
>>> a | b # letters in a or b or both
{'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'}
>>> a & b # letters in both a and b
{'a', 'c'}
>>> a ^ b # letters in a or b but not both
{'r', 'd', 'b', 'm', 'z', 'l'}
I came up with this piece of code to obtain unique elements from two lists:
(set(x) | set(f)) - (set(x) & set(f))
or slightly modified to return list:
list((set(x) | set(f)) - (set(x) & set(f))) #if you need a list
Here:
| operator returns elements in x, f or both
& operator returns elements in both x and f
- operator subtracts the results of & from | and provides us with the elements that are uniquely presented only in one of the lists
If you want the unique elements from both lists, this should work:
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
res = list(set(x+f))
print(res)
# res = [1, 2, 3, 4, 33, 11, 44, 22]
Based on the clarification of this question in a new (closed) question:
If you want all items from the second list that do not appear in the first list you can write:
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
result = set(f) - set(x) # correct elements, but not yet in sorted order
print(sorted(result)) # sort and print
# Output: [11, 22, 33, 44]
x = [1, 2, 3, 4]
f = [1, 11, 22, 33, 44, 3, 4]
list(set(x) ^ set(f))
[33, 2, 22, 11, 44]
if you want to get only unique elements from the two list then you can get it by..
a=[1,2,3,4,5]
b= [2,4,1]
list(set(a) - set(b))
OP:- [3, 5]
Input :
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
Code:
l = list(set(x).symmetric_difference(set(f)))
print(l)
Output :
[2, 22, 33, 11, 44]
Your method won't get the unique element "2". What about:
list(set(x).intersection(f))
Simplified Version & in support of #iopheam's answer.
Use Set Subtraction.
# original list values
x = [1,2,3,4]
f = [1,11,22,33,44,3,4]
# updated to set's
y = set(x) # {1, 2, 3, 4}
z = set(f) # {1, 33, 3, 4, 11, 44, 22}
# parsed to the result variable
result = z - y # {33, 11, 44, 22}
# printed using the sorted() function to display as requested/stated by the op.
print(f"Result of f - x: {sorted(result)}")
# Result of f - x: [11, 22, 33, 44]
v_child_value = [{'a':1}, {'b':2}, {'v':22}, {'bb':23}]
shop_by_cat_sub_cats = [{'a':1}, {'b':2}, {'bbb':222}, {'bb':23}]
unique_sub_cats = []
for ind in shop_by_cat_sub_cats:
if ind not in v_child_value:
unique_sub_cats.append(ind)
unique_sub_cats = [{'bbb': 222}]
Python code to create a unique list from two lists :
a=[1,1,2,3,5,1,8,13,6,21,34,55,89,1,2,3]
b=[1,2,3,4,5,6,7,8,9,10,11,12,2,3,4]
m=list(dict.fromkeys([a[i] for i in range(0,len(a)) if a [i] in a and a[i] in b and a[i]]))
print(m)
L=[]
For i in x:
If i not in f:
L. Append(I)
For i in f:
If I not in x:
L. Append(I)
Return L

Categories

Resources