Finding number of pairs of items from same list? - python

So I am trying to create a function where from a list of numbers, it tells me how many pairs of socks there are. Eg. in a list of [10, 20, 20, 10, 10, 30, 50, 10, 20], it tells me there are 3 pairs, because 10x10 and 20x20 and 10x10, with 30, 50 and 20 being left over. But is enough for answer to simply be just '3'!
So this is my code so far, where
n: the number of socks in the pile
ar: the colors of each sock
def sockMerchant(n, ar):
ar = [10, 20, 20, 10, 10, 30, 50, 10, 20]
n = len(ar)
pair = []
for i in ar:
if ar.count >= 2 and (ar.count % 2) == 0:
pair.append(i)
if ar.count < 0:
return False
return (n,ar)
print(len(pair))
However...code not quite there yet. Am i making a mistake in how i call the function? And how is my approach, in first testing whether the number appears at least twice and in even counts, to check for pairs? Please do advise me!

A simple approach would be to count the numbers in a dictionary, and sum the number of pairs found, which must be a multiple of two.
More specifically, you can sum() up the pairs from a collections.Counter() object. Remember that we use // for floor division to round down to the correct number of pairs.
Sample Implementation
from collections import Counter
def sum_pairs(lst):
return sum(v // 2 for v in Counter(lst).values())
Tests
>>> sum_pairs([10, 20, 20, 10, 10, 30, 50, 10, 20, 20])
4
>>> sum_pairs([10, 20, 20, 10, 10, 30, 50, 10, 20, 20, 20])
4
>>> sum_pairs([10, 20, 20, 10, 10, 30, 50, 10, 20])
3
>>> sum_pairs([10, 20, 30, 50])
0
>>> sum_pairs([10, 20, 30, 50, 10])
1
Note: Just for clarity, Counter is a subclass of dict. Its the simplest way to count items from a list.

Related

Sample irregular list of numbers with a set delta

Is there a simpler way, using e.g. numpy, to get samples for a given X and delta than the below code?
>>> X = [1, 4, 5, 6, 11, 13, 15, 20, 21, 22, 25, 30]
>>> delta = 5
>>> samples = [X[0]]
>>> for x in X:
... if x - samples[-1] >= delta:
... samples.append(x)
>>> samples
[1, 6, 11, 20, 25, 30]
If you are aiming to "vectorize" the process for performance reasons (e.g. using numpy), you could compute the number of elements that are less than each element plus the delta. This will give you indices for the items to select with the items that need to be skipped getting the same index as the preceding ones to be kept.
import numpy as np
X = np.array([1, 4, 5, 6, 11, 13, 15, 20, 21, 22, 25, 30])
delta = 5
i = np.sum(X<X[:,None]+delta,axis=1) # index of first to keep
i = np.insert(i[:-1],0,0) # always want the first, never the last
Y = X[np.unique(i)] # extract values as unique indexes
print(Y)
[ 1 6 11 20 25 30]
This assumes that the numbers are in ascending order
[EDIT]
As indicated in my comment, the above solution is flawed and will only work some of the time. Although vectorizing a python function does not fully leverage the parallelism (and is slower than the python loop), it is possible to implement the filter like this
X = np.array([1, 4, 5, 6, 10,11,12, 13, 15, 20, 21, 22, 25, 30])
delta = 5
fdelta = np.frompyfunc(lambda a,b:a if a+delta>b else b,2,1)
Y = X[X==fdelta.accumulate(X,dtype=np.object)]
print(Y)
[ 1 6 11 20 25 30]

Generating discrete random numbers under constraints

I have a problem that I'm not sure how to solve properly.
Suppose we have to generate 1 <= n <= 40 numbers: X[1], X[2], ..., X[n].
For each number, we have some discrete space we can draw a number from. This space is not always a range and can be quite large (thousands/millions of numbers).
Another constraint is that the resulting array of numbers should be sorted in ascending order: X[1] <= X[2] <= ... <= X[n].
As an example for three numbers:
X[1] in {8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31}
X[2] in {10, 20, 30, 50}
X[3] in {1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003}
Examples of valid outputs for this test: [9, 20, 2001], [18, 30, 1995]
Example of invalid outputs for this test: [25, 10, 1998] (not increasing order)
I already tried different methods but what I'm not satisfied with is that they all yield not uniformly distributed results, i.e. there is a strong bias in all my solutions and some samples are underrepresented.
One of the methods is to try to randomly generate numbers one by one and at each iteration reduce the space for the upcoming numbers to satisfy the increasing order condition. This is bad because this solution always biases the last numbers towards the higher end of their possible range.
I already gave up on looking for an exact solution that could yield samples uniformly. I would really appreciate any reasonable solution (preferably, on Python, but anything will do, really).
I won't code it for you but here's the logic to do the non brute force approach:
Let's define N(i,x) the number of possible samples of X[1],...,X[i] where X[i]=x. And S(i) the possible values for X[i]. You have the recursion formula N(i,x) = Sum over y in S(i-1) with y<=x of N(i-1,y). This allows you to very quickly compute all N(i,x). It is then easy to build up your sample from the end:
Knowing all N(n,x), you can draw X[n] from S(n) with probability N(n,X[n]) / (Sum over x in S(N) of N(n,x))
And then you keep building down: given you have already drawn X[n],X[n-1],...,X[i+1] you draw X[i] from S(i) with X[i]<=X[i+1] with probability N(i,X[i]) / (Sum over x in S(i) with x<=X[i+1] of N(i,x))
Here is an implementation of the hueristic I suggested in the comments:
import random
def rand_increasing(sets):
#assume: sets is list of sets
sets = [s.copy() for s in sets]
n = len(sets)
indices = list(range(n))
random.shuffle(indices)
chosen = [0]*n
for i,k in enumerate(indices):
chosen[k] = random.choice(list(sets[k]))
for j in indices[(i+1):]:
if j > k:
sets[j] = {x for x in sets[j] if x > chosen[k]}
else:
sets[j] = {x for x in sets[j] if x < chosen[k]}
return chosen
#test:
sets = [{8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31},
{10, 20, 30, 50},
{1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003}]
for _ in range(10):
print(rand_increasing(sets))
Typical output:
[24, 50, 1996]
[26, 30, 2001]
[17, 30, 1995]
[11, 20, 2000]
[12, 20, 1996]
[11, 50, 2003]
[14, 20, 2002]
[9, 10, 2001]
[8, 30, 1999]
[8, 10, 1998]
Of course, if you can get uniform sampling with Julien's approach, that is preferable. (This heuristic might give uniform -- but that would require proof). Also note that poor choices in the earlier stages might drive some of the later sets in the permutation to being empty, raising an error. The function could be called in a loop with proper error trapping, yielding a hit-or-miss approach.

How to iterate through two lists of different length and then save only those values which are not similar in a new list

array1 - [40, 30, 20, 10]
array2 -[40, 30, 40, 20, 40, 10, 30, 40, 30, 20, 30, 10, 20, 40, 20, 30, 20, 10, 10, 40, 10, 30, 10, 20]
I want to iterate array2 through array1 in the following index order:
array2{0}(40)->array2{1}(30)->array2{0}(40)->array2{1}(30) and then move to: array2{2}(40)->... repeat the same thing till you get to the end of array2. And while this is happening, it's going over each elements of array1 and looking for any non similar values (20,10 here), this gets saved in a new array.
Here {} are [].
If I understood correctly, what you want is iterate over the arrays, two by two (pairs), each time comparing the pair from the array 1 with the pair from the array 2, to determine if it counts as a ifference, in which case you will return all the differing pairs at the end.
(I used an itertools recipe, available in its official documentation, or in the more_itertools library)
This is my proposal :
from itertools import zip_longest
from more_itertools import grouper
def compare_arrays(a, b):
differences = []
# we start by grouping each list by twos, and zipping them together
for pair_a, pair_b in zip_longest(grouper(a, 2), grouper(b, 2)):
# print(pair_a, "|", pair_b)
if pair_a is None or pair_b is None:
break # if the lists have not the same length, it does not count as a difference
elif pair_a != pair_b:
differences.extend(pair_a)
return differences
array1 = [40, 30, 20, 10]
array2 = [40, 30, 40, 20, 40, 10, 30, 40, 30, 20, 30, 10, 20, 40, 20, 30, 20, 10, 10, 40, 10, 30, 10, 20]
print(compare_arrays(array1, array2))
# [20, 10]
array1 = [20, 20, 30, 40, 50, 50, 60, 70]
array2 = [20, 20, 30, 00, 50, 00]
print(compare_arrays(array1, array2))
# [30, 40, 50, 50]

Obtaining a list of ordered integers from a list of "pairs" in Python

Hello I am currently working with a large set of data which contains an even amount of integers, all of which have a matching value. I am trying to create a list which is made up of "one of a pair" in Python.I am able to have multiple pairs of the same value, thus simply using the set function does not work. For example, if I have a list:
List = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
In this example, indices 0 and 1 would be a pair, then 2 and 7, 3 and 5, 4 and 6, 8 and 9.
I want to extract from that list the values that make up each pair and create a new list with said values to produce something such as:
newList = [10, 11, 20, 15, 10]
Using the set function makes it such that only one element from the entire set of data is put into the list, where I need half of the total data from List. For situations where I have more than one pair of the same value, it would look something such as:
List = [10, 10, 11, 10, 11, 10]
Would need to produce a list such as:
newList = [10, 11, 10]
Any insight would be great as I am new to Python and there are a lot of functions I may not be aware of.
Thank you
Just try:
new_list = set(list)
This should return your desired output.
If I've understood correctly, you don't want to have any duplicated value, want to retain a list with unique values from a particular list.
If I'm right, a simple way to do so would be:
List = [10, 10, 11, 11, 15, 20, 15, 20]
newList = []
for x in List:
if x not in newList:
newList.append(x)
print(newList)
A python-like way to do so would be:
newList = set(List)
Here is a slight variation on one of #Alain T's answer:
[i for s in [set()] for i in List if (s.remove(i) if i in s else (not s.add(i)))]
NB: the following was my answer before you add the ordering requirement
sorted(List)[::2]
This sorts the input List and then take only one value out of each two consecutive.
As a general approach, this'll do:
l = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
i = 0
while i < len(l):
del l[l.index(l[i], i + 1)]
i += 1
It iterates through the list one by one, finding the index of the next occurrence of the current value, and deletes it, shortening the list. This can probably be dressed up in various ways, but is a simple algorithm. Should a number not have a matching pair, this will raise a ValueError.
The following code reates a new list of half the number of items occuring in the input list. The order is in the order of first occurrence in the input list.
>>> from collections import Counter
>>> d = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
>>> c = Counter(d)
>>> c
Counter({10: 4, 11: 2, 20: 2, 15: 2})
>>> answer = sum([[key] * (val // 2) for key, val in c.items()], [])
>>> answer
[10, 10, 11, 20, 15]
>>>
If you need to preserve the order of the first occurrence of each pair, you could use a set with an XOR operation on values to alternate between first and second occurrences.
List = [10, 10, 11, 20, 15, 20, 15, 11, 10, 10]
paired = [ i for pairs in [set()] for i in List if pairs.symmetric_difference_update({i}) or i in pairs]
print(p)
# [10, 11, 20, 15, 10]
You could also do this with the accumulate function from itertools:
from itertools import accumulate
paired = [a for a,b in zip(List,accumulate(({n} for n in List),set.__xor__)) if a in b]
print(paired)
# [10, 11, 20, 15, 10]
Or use a bitmap instead of a set (if your values are relatively small positive integers (e.g. between 0 and 64):
paired = [ n for n,m in zip(List,accumulate((1<<n for n in List),int.__xor__)) if (1<<n)&m ]
print(paired)
# [10, 11, 20, 15, 10]
Or you could use a Counter from collections
from collections import Counter
paired = [ i for c in [Counter(List)] for i in List if c.update({i:-1}) or c[i]&1 ]
print(paired)
# [10, 11, 20, 15, 10]
And , if you're not too worried about efficiency, a double sort with a 2 step striding could do it:
paired = [List[i] for i,_ in sorted(sorted(enumerate(List),key=lambda n:n[1])[::2])]
print(paired)
# [10, 11, 20, 15, 10]

How to print the sum of the current and previous element in a list

I am trying to iterate through a list of numbers and print the sum of the current element and the previous element using python. For example,
Given numbers = [5,10,15,20,25,30,30], the output should be 5, 15, 25, 35, 45, 55, 60,. This is the following code that I have tried, it is very close to the answer but the first element is wrong.
numbers = [5, 10, 15, 20, 25, 30, 30]
i = 0
for x in range(1, 8):
print(numbers[i] + numbers[i - 1], end=", ")
i += 1
I am getting the output 35, 15, 25, 35, 45, 55, 60,. What am I doing wrong?
You can pair adjacent items of numbers by zipping it with itself but padding one with a 0, so that you can iterate through the pairs to output the sums in a list comprehension:
[a + b for a, b in zip([0] + numbers, numbers)]
or by mapping the pairs to the sum function:
list(map(sum, zip([0] + numbers, numbers)))
Both would return:
[5, 15, 25, 35, 45, 55, 60]
You are starting at index 0, where it seems your intended output starts at index 1:
Here is a better solution:
numbers = [5, 10, 15, 20, 25, 30, 30]
for i in range(len(numbers)):
if i == 0:
print(numbers[i])
else:
print(numbers[i - 1] + numbers[i])
Outputs:
5
15
25
35
45
55
60
This should work:
numbers = [5, 10, 15, 20, 25, 30, 30]
output = [numbers[i]+numbers[i-1] if i > 0 else numbers[i] for i in range(len(numbers))]
print(output)
You are starting at i = 0, so the first number you are adding is the 0 and the -1 (the last one, in this case). That's why you are getting the 35 (5+30).
This list comprehension works:
numbers = [5, 10, 15, 20, 25, 30, 30]
output = [value + numbers[i-1] if i else value for i, value in enumerate(numbers)]
print(output)
>>> [5, 15, 25, 35, 45, 55, 60]
Cheat, and add a [0] at the start to prevent the first sum to be wrong.
You'll run into problems at the end, though, because then the list in the enumerate is one item longer than the original, so also clip off its last number:
print ([a+numbers[index] for index,a in enumerate([0]+numbers[:-1])])
Result:
[5, 15, 25, 35, 45, 55, 60]
If you want to see how it works, print the original numbers before addition:
>>> print ([(a,numbers[index]) for index,a in enumerate([0]+numbers[:-1])])
[(0, 5), (5, 10), (10, 15), (15, 20), (20, 25), (25, 30), (30, 30)]
The enumerate loops over the changed list [0, 5, 15, .. 55], where everything is shifted up a place, but numbers[index] still returns the correct index from the original list. Adding them up yields the correct result.

Categories

Resources