Given a tuple list a:
a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)]
We can count how many appearances of each tuple we have using Counter:
>>> from collections import Counter
>>> b = Counter(a)
>>> b
Counter({(4, 15): 1, (10, 16): 3, (12, 3): 1, (13, 11): 1, (23, 11): 1}
Now, the idea is to select 3 random tuples from the list, without repetition, such that the count determines the probability that a particular tuple is chosen.
For instance, (10, 16) is more likely to be chosen than the others - its weight is 3/7 while the other four tuples have weight 1/7.
I have tried to use np.random.choice:
a[np.random.choice(len(a), 3, p=b/len(a))]
But I'm not able to generate the tuples.
Im trying:
a =[(23, 11), (10, 16), (13, 11), (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(a)
c = []
print "counter list"
print b
for item in b:
print "item from current list"
print item
print "prob of the item"
print (float(b[item])/float(len(a)))
c.append(float(b[item])/float(len(a)))
print "prob list"
print c
print (np.random.choice(np.arange(len(b)), 3, p=c, replace=False))
In this case im getting the random indexes of the array.
Is there any more optimized way not to have to calculate the
probabilities array?
Also there is an issue that is that the prob array does not correspond to the Counter array.
If you aren't interested in the intermediate step of calculating frequencies, you could use random.shuffle (either on the list or a copy) and then slice off as many items as you need.
e.g.
import random
a =[(23, 11), (10, 16), (13, 11), (12, 3), (4, 15), (10, 16), (10, 16)]
random.shuffle(a)
random_sample = a[0:3]
print(random_sample)
As shuffle reorders in place it will avoid the repetition issue, and statistically should give the same result (excluding differences in random number generation between np and random).
This will do the trick
from collections import Counter
import matplotlib.pyplot as plt
import numpy as np
import random
listOfNumbers =[(23, 11), (10, 16), (13, 11), (10, 16), (10, 16), (10, 16), (10, 16)]
b = Counter(listOfNumbers)
c = []
pres=[]
for k,v in b.most_common():
c.append(float(v)/float(len(listOfNumbers)))
pres.append(k)
resultIndex = np.random.choice(np.arange(len(b)), 3, p=c, replace=False)
ass=[]
for res in resultIndex:
ass.append(pres[res])
print ass
Now is just to see if is there any way to optimize it.
You can repeat the following steps 3 times:
Randomly chose a number i in range [0..n-1] where n is a current number of elements in a.
Find a tuple on i-th position in the initial a list. Add tuple to resulting triplet.
Remove all occurrences of tuple from a.
Pay attention to a corner case when a can be empty.
The overall time complexity will be O(n) for a list.
On the first step number i should be generated according to uniform distribution which regular random provides. The more occurrences of a particular tuple are in a the more likely it will be chosen.
Related
Is there any standard library Python or Numpy operation for doing the following:
my_array = [(1, 3), (3, 4), (4, 5), (5, 7), (10, 12), (12, 17), (21, 24)]
new_array = magic_function(my_array)
print(new_array)
> [(1, 7), (10, 17), (21, 24)]
I feel like something in itertools should be able to do this, seems like something a lot of people would use. We can assume the list is sorted by onset times already. It wouldn't be hard to do that anyway, you'd just use the sorted function with a key on the first element.
Apologies if this question has already been asked, wasn't sure how to word this problem, but this could be seen as a list of onsets and offsets and I want to merge elements with adjacent/equivalent timing.
EDIT: Inspired by #chris-charley's answer below, which relies on some third party module, I just wrote up a small function which does what I wanted.
import re
def magic_function(mylist):
# convert list to intspan
intspan = ','.join([f'{int(a)}-{int(b)}' for (a,b) in mylist])
# collapse adjacent ranges
intspan = re.sub(r'\-(\d+)\,\1', '', intspan)
# convert back to list
return [tuple(map(int, _.split('-'))) for _ in intspan.split(',')]
Here is the same function generalized for floats also:
import re
def magic_function(mylist):
# convert list to floatspan
floatspan = ','.join([f'{float(a)}-{float(b)}' for (a,b) in mylist])
# collapse adjacent ranges
floatspan = re.sub(r'\-(\d+\.?\d+?)+\,\1', '', floatspan)
# convert back to list
return [tuple(map(float, _.split('-'))) for _ in floatspan.split(',')]
intspan has the methods from_ranges() and ranges() to produce the results you need.
>>> from intspan import intspan
>>> my_array = [(1, 3), (3, 4), (4, 5), (5, 7), (10, 12), (12, 17), (21, 24)]
>>> intspan.from_ranges(my_array).ranges()
[(1, 7), (10, 17), (21, 24)]
I am a newbee in python and programing, I am trying to come up with combinations and weed out combinations with certain conditions.
So in the case below, I have tried to generate all possible combinations between 1-100. But I don't know where to go after this.
import itertools
i_list = []
for i in range (1, 101):
i_list.append(i)
comb = itertools.combinations(i_list,2)
for combinations in list(comb):
print (combinations)
This runs fine and will generate a list from 1-100, and give me an output of
(1,2) (1,3).........(98,99) (98,100) (99,100)
Now my goal is to weed out the combinations with a difference < 5, so for example: (1,2) the difference is less than 5, so it should not be outputted. (1,8) the difference is greater than 5, so it should be outputted. I hope that make sense.
Can anyone guide me through the thought process and suggest an easy approach?
You can use itertools.filterfalse for this and then iterate over the result.
Also, with iterators, you want to wait until you really need a list before you convert to a list with list(). There's no reason to ever do that in this case because you are always iterating. This allows you to work with very large sets without taking up the memory and time of running through the iterator just to make a list to then iterate the list:
from itertools import combinations, filterfalse
comb = combinations(range(1, 101),2)
filtered = filterfalse(lambda x: abs(x[0] - x[1]) < 5, comb)
for combinations in filtered:
print (combinations)
The iterators produced by range(), combinations and fitleredfalse are all lazy, so they never start evaluating until you start looping over them. This allows you to defer any work until it needs to be done or to iterate over part of a large set without calculating the entire thing.
You can use a list comprehension to restrict the generated values to be kept inside the list:
from itertools import combinations
comb = [ x for x in combinations(range(1,101),2) if x[1]-x[0]>4 ]
print (comb)
Output:
[(1, 6), (1, 7), (1, 8), ... snipp ..., (93, 99), (93, 100), (94, 99), (94, 100), (95, 100)]
combinations respects the order of numbers so no abs() around x[1]-x[0] needed - range itself is a sequence and your resulting list weeds out all numbers you do not want due to the if x[1]-x[0]>4 condition.
This should accomplish what you are asking:
>>> import itertools
>>> combinations = itertools.combinations(range(1, 101), 2)
>>> generator = ((a, b) for a, b in combinations if b - a >= 5)
>>> for pair in generator:
print(pair, end=' ')
(1, 6) (1, 7) (1, 8) (1, 9) (1, 10) (1, 11) (1, 12) (1, 13) (1, 14) (1, 15) ...
Alternatively, you can try this instead to do the exact same thing:
>>> generator = ((a, b) for a in range(1, 96) for b in range(a + 5, 101))
>>> for pair in generator:
print(pair, end=' ')
(1, 6) (1, 7) (1, 8) (1, 9) (1, 10) (1, 11) (1, 12) (1, 13) (1, 14) (1, 15) ...
I am trying to get the highest 4 values in a list of tuples and put them into a new list. However, if there are two tuples with the same value I want to take the one with the lowest number.
The list originally looks like this:
[(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)...]
And I want the new list to look like this:
[(9,20), (3,16), (54, 13), (2,10)]
This is my current code any suggestions?
sorted_y = sorted(sorted_x, key=lambda t: t[1], reverse=True)[:5]
sorted_z = []
while n < 4:
n = 0
x = 0
y = 0
if sorted_y[x][y] > sorted_y[x+1][y]:
sorted_z.append(sorted_y[x][y])
print(sorted_z)
print(n)
n = n + 1
elif sorted_y[x][y] == sorted_y[x+1][y]:
a = sorted_y[x]
b = sorted_y[x+1]
if a > b:
sorted_z.append(sorted_y[x+1][y])
else:
sorted_z.append(sorted_y[x][y])
n = n + 1
print(sorted_z)
print(n)
Edit: When talking about lowest value I mean the highest value in the second value of the tuple and then if two second values are the same I want to take the lowest first value of the two.
How about groupby?
from itertools import groupby, islice
from operator import itemgetter
data = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
pre_sorted = sorted(data, key=itemgetter(1), reverse=True)
result = [sorted(group, key=itemgetter(0))[0] for key, group in islice(groupby(pre_sorted, key=itemgetter(1)), 4)]
print(result)
Output:
[(9, 20), (3, 16), (54, 13), (2, 10)]
Explanation:
This first sorts the data by the second element's value in descending order. groupby then puts them into groups where each tuple in the group has the same value for the second element.
Using islice, we take the top four groups and sort each by the value of the first element in ascending order. Taking the first value of each group, we arrive at our answer.
You can try this :
l = [(9, 20), (3, 16), (54, 13), (67, 10), (2, 10)]
asv = set([i[1] for i in l]) # The set of unique second elements
new_l = [(min([i[0] for i in l if i[1]==k]),k) for k in asv]
OUTPUT :
[(3, 16), (2, 10), (9, 20), (54, 13)]
VERY sorry for the vagueness, but I don't actually know what part of what I'm doing is inefficient.
I've made a program that takes a list of positive integers (example*):
[1, 1, 3, 5, 16, 2, 4, 6, 6, 8, 9, 24, 200,]
*the real lists can be up to 2000 in length and the elements between 0 and 100,000 exclusive
And creates a dictionary where each number tupled with its index (like so: (number, index)) is a key and the value for each key is a list of every number (and that number's index) in the input that it goes evenly into.
So the entry for the 3 would be: (3, 2): [(16, 4), (6, 7), (6, 8), (9, 10), (24, 11)]
My code is this:
num_dict = {}
sorted_list = sorted(beginning_list)
for a2, a in enumerate(sorted_list):
num_dict[(a, a2)] = []
for x2, x in enumerate(sorted_list):
for y2, y in enumerate(sorted_list[x2 + 1:]):
if y % x == 0:
pair = (y, y2 + x2 + 1)
num_dict[(x, x2)].append(pair)
But, when I run this script, I hit a MemoryError.
I understand that this means that I am running out of memory but in the situation I'm in, adding more ram or updating to a 64-bit version of python is not an option.
I am certain that the problem is not coming from the list sorting or the first for loop. It has to be the second for loop. I just included the other lines for context.
The full output for the list above would be (sorry for the unsortedness, that's just how dictionaries do):
(200, 12): []
(6, 7): [(24, 11)]
(16, 10): []
(6, 6): [(6, 7), (24, 11)]
(5, 5): [(200, 12)]
(4, 4): [(8, 8), (16, 10), (24, 11), (200, 12)]
(9, 9): []
(8, 8): [(16, 10), (24, 11), (200, 12)]
(2, 2): [(4, 4), (6, 6), (6, 7), (8, 8), (16, 10), (24, 11), (200, 12)]
(24, 11): []
(1, 0): [(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (6, 7), (8, 8), (9, 9), (16, 10), (24, 11), (200, 12)]
(1, 1): [(2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (6, 7), (8, 8), (9, 9), (16, 10), (24, 11), (200, 12)]
(3, 3): [(6, 6), (6, 7), (9, 9), (24, 11)]
Is there a better way of going about this?
EDIT:
This dictionary will then be fed into this:
ans_set = set()
for x in num_dict:
for y in num_dict[x]:
for z in num_dict[y]:
ans_set.add((x[0], y[0], z[0]))
return len(ans_set)
to find all unique possible triplets in which the 3rd value can be evenly divided by the 2nd value which can be evenly divided by the 1st.
If you think you know of a better way of doing the entire thing, I'm open to redoing the whole of it.
Final Edit
I've found the best way to find the number of triples by reevaluating what I needed it to do. This method doesn't actually find the triples, it just counts them.
def foo(l):
llen = len(l)
total = 0
cache = {}
for i in range(llen):
cache[i] = 0
for x in range(llen):
for y in range(x + 1, llen):
if l[y] % l[x] == 0:
cache[y] += 1
total += cache[x]
return total
And here's a version of the function that explains the thought process as it goes (not good for huge lists though because of spam prints):
def bar(l):
list_length = len(l)
total_triples = 0
cache = {}
for i in range(list_length):
cache[i] = 0
for x in range(list_length):
print("\n\nfor index[{}]: {}".format(x, l[x]))
for y in range(x + 1, list_length):
print("\n\ttry index[{}]: {}".format(y, l[y]))
if l[y] % l[x] == 0:
print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
cache[y] += 1
total_triples += cache[x]
print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
print("\t\tcount is now {}".format(total_triples))
print("\t\t(+{} from cache[{}])".format(cache[x], x))
else:
print("\n\t\tfalse")
print("\ntotal number of triples:", total_triples)
Well, you could start by not unnecessarily duplicating information.
Storing full tuples (number and index) for each multiple is inefficient when you already have that information available.
For example, rather than:
(3, 2): [(16, 4), (6, 7), (6, 8), (9, 10), (24, 11)]
(the 16 appears to be wrong there as it's not a multiple of 3 so I'm guessing you meant 15) you could instead opt for:
(3, 2): [15, 6, 9, 24]
(6, 7): ...
That pretty much halves your storage needs since you can go from the 6 in the list and find all its indexes by searching the tuples. That will, of course, be extra processing effort to traverse the list but it's probably better to have a slower working solution than a faster non-working one :-)
You could reduce the storage even more by not storing the multiples at all, instead running through the tuple list using % to see if you have a multiple.
But, of course, this all depends on your actual requirements which would be better off stating the intent of what your trying to achieve rather than pre-supposing a solution.
You rebuild tuples in places like pair = (y, y2 + x2 + 1) and num_dict[(x, x2)].append(pair) when you could build a canonical set of tuples early on and then just put references in the containers. I cobbled up a 2000 item test my machine that works. I have python 3.4 64 bit with a relatively modest 3.5 GIG of RAM...
import random
# a test list that should generate longish lists
l = list(random.randint(0, 2000) for _ in range(2000))
# setup canonical index and sort ascending
sorted_index = sorted((v,i) for i,v in enumerate(l))
num_dict = {}
for idx, vi in enumerate(sorted_index):
v = vi[0]
num_dict[vi] = [vi2 for vi2 in sorted_index[idx+1:] if not vi2[0] % v]
for item in num_dict.items():
print(item)
if i have a tuple set of numbers:
locSet = [(62.5, 121.0), (62.50000762939453, 121.00001525878906), (63.0, 121.0),(63.000003814697266, 121.00001525878906), (144.0, 41.5)]
I want to group them with a tolerance range of +/- 3.
aFunc(locSet)
which returns
[(62.5, 121.0), (144.0, 41.5)]
I have seen Identify groups of continuous numbers in a list but that is for continous integers.
If I have understood well, you are searching the tuples whose values differs in an absolute amount that is in the tolerance range: [0, 1, 2, 3]
Assuming this, my solution returns a list of lists, where every internal list contains tuples that satisfy the condition.
def aFunc(locSet):
# Sort the list.
locSet = sorted(locSet,key=lambda x: x[0]+x[1])
toleranceRange = 3
resultLst = []
for i in range(len(locSet)):
sum1 = locSet[i][0] + locSet[i][1]
tempLst = [locSet[i]]
for j in range(i+1,len(locSet)):
sum2 = locSet[j][0] + locSet[j][1]
if (abs(sum1-sum2) in range(toleranceRange+1)):
tempLst.append(locSet[j])
if (len(tempLst) > 1):
for lst in resultLst:
if (list(set(tempLst) - set(lst)) == []):
# This solution is part of a previous solution.
# Doesn't include it.
break
else:
# Valid solution.
resultLst.append(tempLst)
return resultLst
Here two use examples:
locSet1 = [(62.5, 121.0), (62.50000762939453, 121.00001525878906), (63.0, 121.0),(63.000003814697266, 121.00001525878906), (144.0, 41.5)]
locSet2 = [(10, 20), (12, 20), (13, 20), (14, 20)]
print aFunc(locSet1)
[[(62.5, 121.0), (144.0, 41.5)]]
print aFunc(locSet2)
[[(10, 20), (12, 20), (13, 20)], [(12, 20), (13, 20), (14, 20)]]
I hope to have been of help.