Simulating the sample space for a probability problem in Python - python

I am interested in simulating the sample space for the following question on a probability assignment:
A man will carve pumpkins for his two daughters and three sons. His wife will bring each kid’s pumpkins in a completely random order. The man has decided that as soon as he has carved pumpkins for two of his sons, he would ask his wife to carve the remaining pumpkins. Let W denote the number of pumpkins he will carve.
So the resulting sample space of W would look something like this:
sample_space=[['S','S'],
['S','D','S'],
['S','D','D','S'],
['D','S','S'],
['D','S','D','S'],
['D','D','S','S']]
I was thinking about having two lists, one of sons, one of daughters:
son_list1=['S','S','S']
daughter_list1=['D','D']
And then combining them with in every possible order:
result_list1=[['S','S','S','D','D'],
['S','S','D','S','D'],
['S','S','D','D','S'],
['S','D','S','S','D'],
['S','D','S','D','S'],
['S','D','D','S','S'],
['D','S','S','S','D'],
['D','S','S','D','S'],
['D','S','D','S','S'],
['D','D','S','S','S']]
I don't know if numbering each son and each daughter and then combining them would be easier where we have:
son_list2=['S1','S2','S3']
daughter_list2=['D1','D2']
where this resulting list would be something like:
result_list2=[['S1','S2','S3','D1','D2'],
['S1','S3','S2','D1','D2'],
['S2','S1','S3','D1','D2'],
['S2','S3','S1','D1','D2'],
['S3','S1','S2','D1','D2'],
['S3','S2','S1','D1','D2'],
...
['D2','D1','S3','S2','S1']]
But if this method would be easier, I could just get rid of the numbers after result_list2 was generaged and then delete the repeats.
Anyway, after I get the resulting list in the form of result_list1, I could create a "son counter" and then go through each list and then stop when the "son counter" reaches 2 and then from there delete the repeats to get the sample_space list.
Is there any better logic?

To solve this problem, I think the best solution would be to get all of the permutations of the order in which he carves each pumpkin.
I just used the following code, for getting all permutations of a set, from GeeksforGeeks. I just changed some of the variable names to make it more clear.
def permutation(passed_list):
# If passed_list is empty then there are no permutations
if len(passed_list) == 0:
return []
# If there is only one element in lst then, only
# one permutation is possible
if len(passed_list) == 1:
return [passed_list]
# Find the permutations for passed_list if there are
# more than 1 characters
perm_list = [] # empty list that will store current permutation
# Iterate the input(passed_list) and calculate the permutation
for i in range(len(passed_list)):
item = passed_list[i]
# Extract passed_list[i] or item from the list. remaining_list is
# remaining list
remaining_list = passed_list[:i] + passed_list[i + 1:]
# Generating all permutations where item is first
# element
for p in permutation(remaining_list):
perm_list.append([item] + p)
return perm_list
Then you can just iterate through all of the permutations, keeping track of the order as you go. Once you get to two sons, you stop going through that iteration, add that order to your sample space, and then go to the next permutation.
if __name__ == '__main__':
# Set of all children. It doesn't matter what order this list is in
children = ['S', 'S', 'S', 'D', 'D']
# perms is the list of all permutations of children list
perms = permutation(children)
# This set will hold the resulting sample space you are looking for
total_set = []
# For each permutation
for perm in perms:
order = [] # Contains the order of whose pumpkin he carves
son_counter = 0
for child in perm:
if child is 'S':
son_counter += 1
# Update the order
order.append(child)
if son_counter is 2:
# To keep from adding duplicate orders
if order not in total_set:
total_set.append(order)
# Reset the following two variables for the next iteration
order = []
son_counter = 0
break
print(total_set)
This gave me the following output:
[['S', 'S'], ['S', 'D', 'S'], ['S', 'D', 'D', 'S'], ['D', 'S', 'S'], ['D', 'S', 'D', 'S'], ['D', 'D', 'S', 'S']]
I believe this is the answer you are looking for.
Let me know if you have any questions!

You could use dynamic programming to build up the sample space from the bottom up. For example,
def create_samples(n_sons, n_daughters):
if n_sons == 0:
# stop carving
yield []
elif n_daughters == 0:
# must carve n_sons more pumpkins
yield ['S'] * n_sons
else:
# choose to carve for a sun
for sample in create_samples(n_sons - 1, n_daughters):
yield ['S'] + sample
# choose to carve for a daughter
for sample in create_samples(n_sons, n_daughters - 1):
yield ['D'] + sample
samples = list(create_samples(2, 2))
# [['S', 'S'],
# ['S', 'D', 'S'],
# ['S', 'D', 'D', 'S'],
# ['D', 'S', 'S'],
# ['D', 'S', 'D', 'S'],
# ['D', 'D', 'S', 'S']]
The function create_samples(n_sons, n_daughters) returns all samples that meet your condition, under the assumption that n_sons and n_daughters remain to be processed.

Related

Rearrange a list of strings

I want to rearrange or modify he sequence of elements (strings) in a list. This is the original list
['A', 'B', 'C', 'D', 'E', 'F', 'G']
I want to move E and F behind (or after?) B.
['A', 'B', 'E', 'F', 'C', 'D', 'G']
^^^ ^^^
The decision what to move comes from the user. There is no rule behind and no way to formulate that in an algorithm. In other words the action move something behind something other is input from the user; e.g. the user mark two elements with her/his mouse and drag an drop it behind another element.
My code works and is able to do this. But I wonder if there is a more efficient and pythonic way to do this. Maybe I missed some of Python's nice in-build features.
#!/usr/bin/env python3
# input data
original = list('ABCDEFG')
# move "EF" behind "B" (this is user input)
to_move = 'EF'
behind = 'B'
# expected result
rearanged = list('ABEFCDG')
# index for insertion
idx_behind = original.index(behind)
# each element to move
for c in reversed(to_move): # "reverse!"
# remove from original position
original.remove(c)
# add to new position
original.insert(idx_behind + 1, c)
# True
print(original == rearanged)
You can assume
Elements in original are unique.
to_move always exist in original.
behind always exist in original.
The elements in to_move are always adjacent.
Other example of possible input:
Move ['B'] behind F
Move ['A', 'B'] behind C
This is not possible:
Move ['A', 'F'] behind D
Don't use .remove when the goal is to erase from a specific position; though you may know what is at that position, .remove a) will search for it again, and b) remove the first occurrence, which is not necessarily the one you had in mind.
Don't remove elements one at a time if you want to remove several consecutive elements; that's why slices exist, and why the del operator works the way that it does. Not only is it already harder to iterate when you can say what you want directly, but you have to watch out for the usual problems with modifying a list while iterating over it.
Don't add elements one at a time if you want to add several elements that will be consecutive; instead, insert them all at once by slice assignment. Same reasons apply here.
Especially don't try to interleave insertion and removal operations. That's far more complex than necessary, and could cause problems if the insertion location overlaps the source location.
Thus:
original = list('ABCDEFG')
start = original.index('E')
# grabbing two consecutive elements:
to_move = original[start:start+2]
# removing them:
del original[start:start+2]
# now figure out where to insert in that result:
insertion_point = original.index('B') + 1
# and insert:
original[insertion_point:insertion_point] = to_move
If it is just a small number of items you want to rearrange, just swap the relevant elements:
lst = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
lst[2], lst[4] = lst[4], lst[2] # switch 'C' and 'E'
lst[3], lst[5] = lst[5], lst[3] # switch 'D' and 'F'
lst
['A', 'B', 'E', 'F', 'C', 'D', 'G']

Do loop and print result if found, else do loop again: how to get rid of multiple nesting if's in Python

I have a dictionary with thousands of elements of the following structure:
full_dict={'A': ['B','C','D','E'],
'B':['A','C','D','E','F','X']
'X':['W','Y','Z','S'],
'S':['W','K','T'],
...}
where every letter is a word.
Every word can be both a key and a value (together with other words) for another dict element.
I am trying to find a "path" from one word to another. For example a path from 'A' to 'S' is A-B-X-S as S is among values of X, X in B and B in A.
Currently, I am using this code:
query=['A','S']
if 'S' in full_dict['A']:
print ('Found during 1st iteration')
else:
for i in full_dict['A']:
if 'S' in full_dict[i]:
print ('Found during 2nd iteration')
else:
for ii in full_dict[i]:
etc.
I do 10 iterations and it works fine but I wonder if there is a better way to do it.
Thank you in advance.
You can use networkx package:
import networkx as nx
full_dict={'A': ['B','C','D','E'],
'B': ['A','C','D','E','F','X'],
'X': ['W','Y','Z','S'],
'S': ['W','K','T'],
}
g = nx.DiGraph() # directed graph
for k, v in full_dict.items():
g.add_edges_from((k, to) for to in v)
print(*nx.all_simple_paths(g, 'A', 'S')) # ['A', 'B', 'X', 'S']
A (nested) loop wouldn't work in general because it will involve a lot of nests (at best), and at worst, we don't generally know how many nests we would need.
Because of this, one could think of recursion instead. But this could be complicated since you might need to put much effort into things like breaking potential infinite loops or remembering intermediate results (to reduce computations).
So in my opinion, the best approach is a package that readily makes use of the graph structure of your data.
External libs are useful if you're doing this for production and need the high performance, optimizations, all possible paths, etc.
If you're doing this as an exercise or something where you can be sure of clean input (no loops within the word lists) or don't want an external lib for just one operation, then you can try a recursion:
full_dict = {
'A': ['B','C','D','E'],
'B': ['A','C','D','E','F','X'],
'X': ['W','Y','Z','S'],
'S': ['W','K','T'],
}
def find_path(source, target):
for key, words in full_dict.items():
if target in words:
yield target
if key == source: # check if it's source
yield source
else:
# otherwise, look for this key as new target
yield from find_path(source, key)
break # prevent further checking of remaining items
Usage:
list(find_key('A', 'S'))
# ['S', 'X', 'B', 'A']
# get it in the right order, reverse it
list(find_key('A', 'S'))[::-1]
# ['A', 'B', 'X', 'S']
The approach looks for the target word in the word list and then yields each key. Rather than check the word lists possible for every word in the source - since that creates many chains that may or may not get to the target.
More info on yield expressions.
Note that Python3's recursion limit is 1000, by default. So if the word chain may be longer, either increase the limit or use another method. Setting the limit to 4 and using this modified dict full_dict = {'A': ['B', 'D', 'E'], 'B': ['A', 'D', 'E', 'F', 'X'], 'X': ['W', 'Y', 'Z', 'C'], 'C': ['W', 'K', 'T', 'S'], 'S': []}, you'd hit the limit for A to S which is now a chain of 5 (['A', 'B', 'X', 'C', 'S']).
You could make it a recursive function:
def findLink(path,target,links):
if not isinstance(path,list): path = [path] # start path with initial word
for linked in links.get(path[-1],[]): # go through links of word
if linked == target: return path+[target] # done, return full path
if linked in path: continue # avoid circular linking
foundPath = findLink(path+[linked], target,links) # dig deeper
if foundPath: return foundPath # reached end on that path
output:
full_dict={'A': ['B','C','D','E'],
'B':['A','C','D','E','F','X'],
'X':['W','Y','Z','S'],
'S':['W','K','T'] }
print(findLink('A','S',full_dict))
['A', 'B', 'X', 'S']

Rank aggregation: Merge local subrankings into global ranking

I have a dataset of multiple local store rankings that I'm looking to aggregate / combine into one national ranking, programmatically. I know that the local rankings are by sales volume, but I am not given the sales volume so must use the relative rankings to create as accurate a national ranking as possible.
As a short example, let's say that we have 3 local ranking lists, from best ranking (1st) to worst ranking (last), that represent different geographic boundaries that can overlap with one another.
ranking_1 = ['J','A','Z','B','C']
ranking_2 = ['A','H','K','B']
ranking_3 = ['Q','O','A','N','K']
We know that J or Q is the highest ranked store, as both are highest in ranking_1 and ranking_3, respectively, and they appear above A, which is the highest in ranking_2. We know that O is next, as it's above A in ranking_3. A comes next, and so on...
If I did this correctly on paper, the output of this short example would be:
global_ranking = [('J',1.5),('Q',1.5),('O',3),('A',4),('H',6),('N',6),('Z',6),('K',8),('B',9),('C',10)]
Note that when we don't have enough data to determine which of two stores is ranked higher, we consider it a tie (i.e. we know that one of J or Q is the highest ranked store, but don't know which is higher, so we put them both at 1.5). In the actual dataset, there are 100+ lists of 1000+ items in each.
I've had fun trying to figure out this problem and am curious if anyone has any smart approaches to it.
Modified Merge Sort algorithm will help here. The modification should take into account incomparable stores and though build groups of incomparable elements which you are willing to consider as equal (like Q and J)
This method seeks to analyze all of the stores at the front of the rankings. If they are not located in a lower than first position in any other ranking list, then they belong at this front level and are added to a 'level' list. Next, they are removed from the front runners and all of the list are adjusted so that there are new front runners. Repeat the process until there are no stores left.
def rank_stores(rankings):
"""
Rank stores with rankings by volume sales with over lap between lists.
:param rankings: list of rankings of stores also in lists.
:return: Ordered list with sets of items at same rankings.
"""
rank_global = []
# Evaluate all stores in the number one postion, if they are not below
# number one somewhere else, then they belong at this level.
# Then remove them from the front of the list, and repeat.
while sum([len(x) for x in rankings]) > 0:
tops = []
# Find out which of the number one stores are not in a lower position
# somewhere else.
for rank in rankings:
if not rank:
continue
else:
top = rank[0]
add = True
for rank_test in rankings:
if not rank_test:
continue
elif not rank_test[1:]:
continue
elif top in rank_test[1:]:
add = False
break
else:
continue
if add:
tops.append(top)
# Now add tops to total rankings list,
# then go through the rankings and pop the top if in tops.
rank_global.append(set(tops))
# Remove the stores that just made it to the top.
for rank in rankings:
if not rank:
continue
elif rank[0] in tops:
rank.pop(0)
else:
continue
return rank_global
For the rankings provided:
ranking_1 = ['J','A','Z','B','C']
ranking_2 = ['A','H','K','B']
ranking_3 = ['Q','O','A','N','K']
rankings = [ranking_1, ranking_2, ranking_3]
Then calling the function:
rank_stores(rankings)
Results in:
[{'J', 'Q'}, {'O'}, {'A'}, {'H', 'N', 'Z'}, {'K'}, {'B'}, {'C'}]
In some circumstances there may not be enough information to determine definite rankings. Try this order.
['Z', 'A', 'B', 'J', 'K', 'F', 'L', 'E', 'W', 'X', 'Y', 'R', 'C']
We can derive the following rankings:
a = ['Z', 'A', 'B', 'F', 'E', 'Y']
b = ['Z', 'J', 'K', 'L', 'X', 'R']
c = ['F', 'E', 'W', 'Y', 'C']
d = ['J', 'K', 'E', 'W', 'X']
e = ['K', 'F', 'W', 'R', 'C']
f = ['X', 'Y', 'R', 'C']
g = ['Z', 'F', 'W', 'X', 'Y', 'R', 'C']
h = ['Z', 'A', 'E', 'W', 'C']
i = ['L', 'E', 'Y', 'R', 'C']
j = ['L', 'E', 'W', 'R']
k = ['Z', 'B', 'K', 'L', 'W', 'Y', 'R']
rankings = [a, b, c, d, e, f, g, h, i, j, k]
Calling the function:
rank_stores(rankings)
results in:
[{'Z'},
{'A', 'J'},
{'B'},
{'K'},
{'F', 'L'},
{'E'},
{'W'},
{'X'},
{'Y'},
{'R'},
{'C'}]
In this scenario there is not enough information to determine where 'J' should be relative to 'A' and 'B'. Only that it is in the range beetween 'Z' and 'K'.
When multiplied among hundreds of rankings and stores, some of the stores will not be properly ranked on an absolute volume basis.

Nested list loop indexing

I am trying to create a list of characters based on a list of words
i.e.
["BOARD", "GAME"] -> [["B","O"...], ["G","A","M"...]
From my understanding, I have an IndexError because my initial boardlist does not contain a predetermined the amount of lists.
Is there a way for to create a new list in boardlist according to number of objects in board?
I don't know if I'm being clear.
Thank you.
board=["BOARD", "GAME"]
boardlist=[[]]
i=0
for word in board:
for char in word:
boardlist[i].append(char)
i=i+1
print(boardlist)
IndexError: list index out of range
Note that this can be done in a much simpler way by taking a list of each string in the board, as the list constructor will be converting the input iterable, in this case a string, to a list of substrings from it:
l = ["BOARD", "GAME"]
[list(i) for i in l]
# [['B', 'O', 'A', 'R', 'D'], ['G', 'A', 'M', 'E']]
Let's also find a fix to your current approach. Firstly boardlist=[[]] is not a valid way of initializing a list (check what it returns). You might want to check this post. Also instead of incrementing a counter you have enumerate for that:
boardlist = [[] for _ in range(len(board))]
for i, word in enumerate(board):
for char in word:
boardlist[i].extend(char)
print(boardlist)
# [['B', 'O', 'A', 'R', 'D'], ['G', 'A', 'M', 'E']]

How can method which evaluates a list to determine if it contains specific consecutive items be improved?

I have a nested list of tens of millions of lists (I can use tuples also). Each list is 2-7 items long. Each item in a list is a string of 1-5 characters and occurs no more than once per list. (I use single char items in my example below for simplicity)
#Example nestedList:
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
I need to find which lists in my nested list contain a pair of items so I can do stuff to these lists while ignoring the rest. This needs to be as efficient as possible.
I am using the following function but it seems pretty slow and I just know there has to be a smarter way to do this.
def isBadInList(bad, checkThisList):
numChecks = len(list) - 1
for x in range(numChecks):
if checkThisList[x] == bad[0] and checkThisList[x + 1] == bad[1]:
return True
elif checkThisList[x] == bad[1] and checkThisList[x + 1] == bad[0]:
return True
return False
I will do this,
bad = ['O', 'I']
for checkThisList in nestedLists:
result = isBadInList(bad, checkThisList)
if result:
doStuffToList(checkThisList)
#The function isBadInList() only returns true for the first and third list in nestedList and false for all else.
I need a way to do this faster if possible. I can use tuples instead of lists, or whatever it takes.
nestedList = [
['a', 'e', 'O', 'I', 'g', 's'],
['w', 'I', 'u', 'O', 's', 'g'],
['e', 'z', 's', 'I', 'O', 'g']
]
#first create a map
pairdict = dict()
for i in range(len(nestedList)):
for j in range(len(nestedList[i])-1):
pair1 = (nestedList[i][j],nestedList[i][j+1])
if pair1 in pairdict:
pairdict[pair1].append(i+1)
else:
pairdict[pair1] = [i+1]
pair2 = (nestedList[i][j+1],nestedList[i][j])
if pair2 in pairdict:
pairdict[pair2].append(i+1)
else:
pairdict[pair2] = [i+1]
del nestedList
print(pairdict.get(('e','z'),None))
create a value pair and store them into map,the key is pair,value is index,and then del your list(this maybe takes too much memory),
and then ,you can take advantage of the dict for look up,and print the indexes where the value appears.
I think you could use some regex here to speed this up, although it will still be a sequential operation so your best case is O(n) using this approach since you have to iterate through each list, however since we have to iterate over every sublist as well that would make it O(n^2).
import re
p = re.compile('[OI]{2}|[IO]{2}') # match only OI or IO
def is_bad(pattern, to_check):
for item in to_check:
maybe_found = pattern.search(''.join(item))
if maybe_found:
yield True
else:
yield False
l = list(is_bad(p, nestedList))
print(l)
# [True, False, True]

Categories

Resources