Run a random algorithm mutiple times and average over the results - python

I have the following random selection script:
import random
length_of_list = 200
my_list = list(range(length_of_list))
num_selections = 10
numbers = random.sample(my_list, num_selections)
It looks at a list of predetermined size and randomly selects 10 numbers. Is there a way to run this section 500 times and then get the top 10 numbers which were selected the most? I was thinking that I could feed the numbers into a dictionary and then get the top 10 numbers from there. So far, I've done the following:
for run in range(0, 500):
numbers = random.sample(my_list, num_selections)
for number in numbers:
current_number = my_dict.get(number)
key_number = number
my_dict.update(number = number+1)
print(my_dict)
Here I want the code to take the current number assigned to that key and then add 1, but I cannot manage to make it work. It seems like the key for the dictionary update has to be that specific key, cannot insert a variable.. Also, I think having this nested loop might not be so efficient as I have to run this 500 times 1500 times 23... so I am concerned about performance. If anyone has an idea of what I should try, it would be great! Thanks
SOLUTION:
import random
from collections import defaultdict
from collections import OrderedDict
length_of_list = 50
my_list = list(range(length_of_list))
num_selections = 10
my_dict = dict.fromkeys(my_list)
di = defaultdict(int)
for run in range(0, 500):
numbers = random.sample(my_list, num_selections)
for number in numbers:
di[number] += 1
def get_top_numbers(data, n, order=False):
"""Gets the top n numbers from the dictionary"""
top = sorted(data.items(), key=lambda x: x[1], reverse=True)[:n]
if order:
return OrderedDict(top)
return dict(top)
print(get_top_numbers(di, n=10))

my_dict.update(number = number+1) in this line you are assigning a new value to a variable inside the parentheses of a function call. Unless you're giving the function a kwarg called number with value number+1 this in the following error:
TypeError: 'number' is an invalid keyword argument for this function
Also dict.update doesn't accept an integer but another dictionary. You should read the documentation about this function: https://www.tutorialspoint.com/python3/dictionary_update.htm
Here it say's dict.update(dict2) takes a dictionary which it will integrate into dict. See example below:
dict = {'Name': 'Zara', 'Age': 17}
dict2 = {'Gender': 'female' }
dict.update(dict2)
print ("updated dict : ", dict)
Gives as result:
updated dict : {'Gender': 'female', 'Age': 17, 'Name': 'Zara'}
So far for the errors in your code, I see a good answer is already given so I won't repeat him.

Checkout defaultdict of collections module,
So basically, you create a defaultdict with default value 0 and then iterate over your numbers list and update the value of the number to +=1
from collections import defaultdict
di = defaultdict(int)
for run in range(0, 500):
numbers = random.sample(my_list, num_selections)
for number in numbers:
di[number] += 1
print(di)

You can use for this task collections.Counter which provides addition method. So you will use two counters one which is sum of all and second which contains count of samples.
counter = collections.Counter()
for run in range(500):
samples = random.sample(my_list, num_samples)
sample_counter = collections.Counter(samples)
counter = counter + sample_counter

Related

How do I use a while loop to access all the 2nd elements of lists which are the values stored in a dictionary?

If I have a dictionary like this, filled with similar lists, how can I apply a while loo tp extract a list that prints that second element:
racoona_valence={}
racoona_valence={"rs13283416": ["7:87345874365-839479328749+","BOBB7"],\}
I need to print the part that says "BOBB7" for 2nd element of the lists in a larger dictionary. There are ten key-value pairs in it, so I am starting it like so, but unsure what to do because all the examples I can find don't relate to my problem:
n=10
gene_list = []
while n>0:
Any help greatly appreciated.
Well, there's a bunch of ways to do it depending on how well-structured your data is.
racoona_valence={"rs13283416": ["7:87345874365-839479328749+","BOBB7"], "rs13283414": ["7:87345874365-839479328749+","BOBB4"]}
output = []
for key in racoona_valence.keys():
output.append(racoona_valence[key][1])
print(output)
other_output = []
for key, value in racoona_valence.items():
other_output.append(value[1])
print(other_output)
list_comprehension = [value[1] for value in racoona_valence.values()]
print(list_comprehension)
n = len(racoona_valence.values())-1
counter = 0
gene_list = []
while counter<=n:
gene_list.append(list(racoona_valence.values())[n][1])
counter += 1
print(gene_list)
Here is a list comprehension that does what you want:
second_element = [x[1] for x in racoona_valence.values()]
Here is a for loop that does what you want:
second_element = []
for value in racoona_valence.values():
second_element.append(value[1])
Here is a while loop that does what you want:
# don't use a while loop to loop over iterables, it's a bad idea
i = 0
second_element = []
dict_values = list(racoona_valence.values())
while i < len(dict_values):
second_element.append(dict_values[i][1])
i += 1
Regardless of which approach you use, you can see the results by doing the following:
for item in second_element:
print(item)
For the example that you gave, this is the output:
BOBB7

How to find the highest value element in a list with reference to a dictionary on python

How do I code a function in python which can:
iterate through a list of word strings which may contain duplicate words and referencing to a dictionary,
find the word with the highest absolute sum, and
output it along with the corresponding absolute value.
The function also has to ignore words which are not in the dictionary.
For example,
Assume the function is called H_abs_W().
Given the following list and dict:
list_1 = ['apples','oranges','pears','apples']
Dict_1 = {'apples':5.23,'pears':-7.62}
Then calling the function as:
H_abs_W(list_1,Dict_1)
Should give the output:
'apples',10.46
EDIT:
I managed to do it in the end with the code below. Looking over the answers, turns out I could have done it in a shorter fashion, lol.
def H_abs_W(list_1,Dict_1):
freqW = {}
for char in list_1:
if char in freqW:
freqW[char] += 1
else:
freqW[char] = 1
ASum_W = 0
i_word = ''
for a,b in freqW.items():
x = 0
d = Dict_1.get(a,0)
x = abs(float(b)*float(d))
if x > ASum_W:
ASum_W = x
i_word = a
return(i_word,ASum_W)
list_1 = ['apples','oranges','pears','apples']
Dict_1 = {'apples':5.23,'pears':-7.62}
d = {k:0 for k in list_1}
for x in list_1:
if x in Dict_1.keys():
d[x]+=Dict_1[x]
m = max(Dict_1, key=Dict_1.get)
print(m,Dict_1[m])
try this,
key, value = sorted(Dict_1.items(), key = lambda x : x[1], reverse=True)[0]
print(f"{key}, {list_1.count(key) * value}")
# apples, 10.46
you can use Counter to calculate the frequency(number of occurrences) of each item in the list.
max(counter.values()) will give us the count of maximum occurring element
max(counter, key=counter.get) will give the which item in the list is
associated with that highest count.
========================================================================
from collections import Counter
def H_abs_W(list_1, Dict_1):
counter = Counter(list_1)
count = max(counter.values())
item = max(counter, key=counter.get)
return item, abs(count * Dict_1.get(item))

Speed up dictionary merging with soft conjunction logic

I have a look-up table which contains <word: dictionary>pairs.
Then, given a word list,
I can produce a dictionary list using this look-up table.
(Each time, the length of this word list is not fixed).
Values in these dictionaries represent log probability of some keys.
Here is an example:
Given a word list
['fruit','animal','plant'],
we can check out the look-up table and have
dict_list = [{'apple':-1, 'flower':-2}, {'apple':-3, 'dog':-1}, {'apple':-2, 'flower':-1}].
We can see from the list that we have a set of keys: {'apple', 'flower', 'dog'}
For each key, I want to give a sum of each value in the dict_list. And if a key is not existed in one dictionary, then we add a small value -10 to the value (you can regard -10 as an very small log probability).
The result dictionary looks like:
dict_merge = {'apple':-6, 'flower':-13, 'dog':-21},
because 'apple' = (-1) + (-3) + (-2), 'flower' = (-2) + (-10) + (-1), 'dog' = (-10) + (-1) + (-10)
Here is my python3 code:
dict_list = [{'apple':-1, 'flower':-2}, {'apple':-3, 'dog':-1}, {'apple':-2, 'flower':-1}]
key_list = []
for dic in dict_list:
key_list.extend(dic.keys())
dict_merge = dict.fromkeys(key_list, 0)
for key in dict_merge:
for dic in dict_list:
dict_merge[key] += dic.get(key, -10)
This code works, but if the sizes of some dictionaries in dict_list are super large (for example 100,000), then it could take over 200ms, which is not acceptable in practice.
The main computation is in the for key in dict_merge loop, imagine it is a loop of size 100,000.
Is there any speed-up solutions? Thanks! And, thanks for reading~ maybe too long and too annoying...
P.S.
There are only a few dictionaries in the look-up table have super large size. So there could be some chances here.
As I can understand, sum(len(d) for d in dict_list) is much smaller then len(key_list) * len(dict_list).
from collections import defaultdict
dict_list = [{'apple':-1, 'flower':-2}, {'apple':-3, 'dog':-1}, {'apple':-2, 'flower':-1}]
default_value = len(dict_list) * (-10)
dict_merge = defaultdict(lambda: default_value)
for d in dict_list:
for key, value in d.items():
dict_merge[key] += value + 10

Calculating means of values for subgroups of keys in python dictionary

I have a dictionary which looks like this:
cq={'A1_B2M_01':2.04, 'A2_B2M_01':2.58, 'A3_B2M_01':2.80, 'B1_B2M_02':5.00,
'B2_B2M_02':4.30, 'B2_B2M_02':2.40 etc.}
I need to calculate mean of triplets, where the keys[2:] agree. So, I would ideally like to get another dictionary which will be:
new={'_B2M_01': 2.47, '_B2M_02': 3.9}
The data is/should be in triplets so in theory I could just get the means of the consecutive values, but first of all, I have it in a dictionary so the keys/values will likely get reordered, besides I'd rather stick to the names, as a quality check for the triplets assigned to names (I will later add a bit showing error message when there will be more than three per group).
I've tried creating a dictionary where the keys would be _B2M_01 and _B2M_02 and then loop through the original dictionary to first append all the values that are assigned to these groups of keys so I could later calculate an average, but I am getting errors even in the first step and anyway, I am not sure if this is the most effective way to do this...
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
trips=set([x[2:] for x in cq.keys()])
new={}
for each in trips:
for k,v in cq.iteritems():
if k[2:]==each:
new[each].append(v)
Traceback (most recent call last):
File "<pyshell#28>", line 4, in <module>
new[each].append(v)
KeyError: '_B2M_01'
I would be very grateful for any suggestions. It seems like a fairly easy operation but I got stuck.
An alternative result which would be even better would be to get a dictionary which contains all the names used as in cq, but with values being the means of the group. So the end result would be:
final={'A1_B2M_01':2.47, 'A2_B2M_01':2.47, 'A3_B2M_01':2.47, 'B1_B2M_02':3.9,
'B2_B2M_02':3.9, 'B2_B2M_02':3.9}
Something like this should work. You can probably make it a little more elegant.
cq = {'A1_B2M_01':2.04, 'A2_B2M_01':2.58, 'A3_B2M_01':2.80, 'B1_B2M_02':5.00, 'B2_B2M_02':4.30, 'B2_B2M_02':2.40 }
sum = {}
count = {}
mean = {}
for k in cq:
if k[2:] in sum:
sum[k[2:]] += cq[k]
count[k[2:]] += 1
else:
sum[k[2:]] = cq[k]
count[k[2:]] = 1
for k in sum:
mean[k] = sum[k] / count[k]
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
sums = dict()
for k, v in cq.iteritems():
_, p2 = k.split('_', 1)
if p2 not in sums:
sums[p2] = [0, 0]
sums[p2][0] += v
sums[p2][1] += 1
res = {}
for k, v in sums.iteritems():
res[k] = v[0]/float(v[1])
print res
also could be done with one iteration
Grouping:
SEPARATOR = '_'
cq={'A1_B2M_01':2.4, 'A2_B2M_01':5, 'A3_B2M_01':4, 'B1_B2M_02':3, 'B2_B2M_02':7, 'B3_B2M_02':6}
groups = {}
for key in cq:
group_key = SEPARATOR.join(key.split(SEPARATOR)[1:])
if group_key in groups:
groups[group_key].append(cq[key])
else:
groups[group_key] = [cq[key]]
Generate means:
def means(groups):
for group, group_vals in groups.iteritems():
yield (group, float(sum(group_vals)) / len(group_vals),)
print list(means(groups))

count occurrences of timeframes in a list

I need to create a tally dictionary of time stamps on our server log files with the hours as keys
I dont want to do the long-winded case by case check regular expression and append (its python..there is a better way)
e.g. say I have a list:
times = ['02:49:04', '02:50:03', '03:21:23', '03:21:48', '03:24:29', '03:30:29', '03:30:30', '03:44:54', '03:50:11', '03:52:03', '03:52:06', '03:52:30', '03:52:48', '03:54:50', '03:55:21', '03:56:50', '03:57:31', '04:05:10', '04:35:59', '04:39:50', '04:41:47', '04:46:43']
How do I (in a pythonic manner) produce something like so:
where "0200" would hold the number of times a value between 02:00:00 to 02:59:59 occurs
result = { "0200":2, "0300":15, "0400":5 }
something like:
from collections import Counter
counts = Counter(time[:2]+'00' for time in times)
from collections import defaultdict
countDict = defaultdict(int)
for t in times:
countDict[t[:2]+"--"] += 1
print countDict
If you don't want to use counter. You can do:
dict = {}
for i in times:
try:
dict[i.split(':')[0] + "00"]+=1
except KeyError:
dict[i.split(':')[0] + "00"] = 1
Here's one more way with itertools.
import itertools
key = lambda x: x[:2]
result = {}
for hour, group in itertools.groupby(sorted(times, key=key), key=key):
result[hour + '00'] = len(list(group))

Categories

Resources