How to prevent duplicates in random selection? - python

I wrote small program to populate my game with NPCs named by random selections from first name and last name lists.
It worked but sometimes there are duplicate names selected. How can I prevent duplicates?
I could use dict but I prefer list. Is this big disadvantage?
The commented block in adding_male_NPC is my attempt to solve this problem.
import random
women_names = ["Jennifer", "Jenna", "Judith", "Becky", "Kelly"]
man_names = ["Adam", "John", "Jack", "Jim", ]
surnames =["Salzinger", "Jefferson", "Blunt", "Jigsaw", "Elem"]
marriage_status = ["Single", "In couple", "Engaged", "Married", "Divorced", "Widow"]
male_NPCs = []
list = []
def clr_list(list):
del list
def randomizer(list):
random_choice = random.choice(list)
clr_list(list)
return random_choice
def random_male():
male_surname = randomizer(surnames)
male_name = randomizer(man_names)
male_NPC = male_name + " " + male_surname
return (male_NPC)
def add_one_man():
male_NPCs.append(random_male())
return
def addding_male_NPC(count_of_NPC_males):
while count_of_NPC_males > 1:
add_one_man()
# for m in male_NPCs:
# unique_count = male_NPCs.count(m)
# if unique_count > 1:
# male_NPCs.pop(unique)
# count_of_NPC_males +=1
# else:
count_of_NPC_males -= 1
count_of_NPC_males = int(input("How many males should create?: "))
addding_male_NPC(count_of_NPC_males)
print(male_NPCs)
print(len(male_NPCs))
So i tried this but its impossible to count strings or somehow don't use well .count what is most possible.
Get idea to take indexes before creating sum of stings and use it to check double but i feel that i make circles.
I understand that provided list of names and surnames are not guarantee make doubles with high numbers but you got the point of this.
def addding_male_NPC(count_of_NPC_males):
while count_of_NPC_males > 1:
add_one_man()
for m in male_NPCs:
unique_count = male_NPCs.count(m)
if unique_count > 1:
male_NPCs.pop(unique)
count_of_NPC_males +=1
else:
count_of_NPC_males -= 1
Edit
This is so sad :(
mylist = ["a", "b", "a", "c", "c"]
mylist = list(dict.fromkeys(mylist))
print(mylist)
But anyway it will cut planned and needed numbers of item. So question is still active. This answer is quite half answer. I wait for better one.
==========================
Yes! I finally found an answer!
Thank to >>Parnav<< ( he is The guy!)
From his suggestion i made code generating from text file more than i can imagine
import random
import itertools
with open('stock_male_names.txt', 'r') as mn, open('stock_female_names.txt', 'r') as wn, open('stock_surnames.txt', 'r') as sn:
broken_male_names, broken_female_names, broken_surnames = mn.readlines(), wn.readlines(), sn.readlines()
male_names = [name.strip() for name in broken_male_names]
female_names = [name.strip() for name in broken_female_names]
surnames = [name.strip() for name in broken_surnames]
male_persons = [f"{fname} {lname}" for fname, lname in itertools.product(male_names, surnames)]
female_persons = [f"{fname} {lname}" for fname, lname in itertools.product(female_names, surnames)]
print(male_names)
print(len(male_names)) #1001
print(female_names)
print(len(female_names)) #1000
print(surnames)
print(len(surnames)) #1003
print(male_persons)
print(len(male_persons)) #1004003
print(female_persons)
print(len(female_persons)) #1003000
So from three text files of 1k items i made 1kk unique NPC names in almost no load time with open road to expand.
I am amazingly Happy :)
Case closed!

First, we want all possible combinations of the first and last names. We can get this using itertools.product:
import itertools
import random
male_names = [f"{fname} {lname}" for fname, lname in itertools.product(man_names, surname)]
print(male_names)
# ['Adam Salzinger', 'Adam Jefferson', ..., 'John Salzinger', 'John Jefferson', ..., 'Jim Jigsaw', 'Jim Elem']
Since you want to randomly get names from this list, shuffle it.
random.shuffle(male_names)
print(male_names)
# ['Jim Jefferson', 'Jack Jigsaw', 'Adam Jefferson', ..., 'Adam Blunt', 'John Blunt']
Every time you want to add a NPC, pop the last element from this list. Since you shuffled the list earlier, you're guaranteed a random element even if you always pop the last element. Popping the element removes it from the list, so you don't have to worry about duplicates. Take care not to pop more than the list contains (you have indicated elsewhere that this isn't a problem). I prefer to pop from the end of the list because that is an O(1) operation. Popping from a different location would be more expensive.
def add_male_npcs(count=1):
for _ in range(count):
male_NPCs.append(male_names.pop())

At the end, convert the list of strings into a set and then back to a list to remove any duplicates. Then, use the len() function to determine the length of the list compared to the desired length and call the function again this time adding to the list.

Related

How can I generate a new, random value in a list in Python 3 repeatedly until all values have been randomly picked (once)

from easygui import *
from time import *
from statedict import *
import random
correctnum = 0
questionsanswered = 0
print("Indian Map Quiz v1")
sleep(0.5)
print("Note all features might not work correctly, this is a work in progress.")
msgbox("Welcome to Indian Map Quiz", "INDIA", "Continue")
title = "Guess The State"
msg = "What Indian state is this?"
stateFactList = [APdict, ARdict, ASdict, BRdict, CTdict, GAdict, GJdict, HPdict, HRdict, JHdict,
KAdict, KLdict, MHdict, MLdict, MNdict, MPdict, MZdict, NLdict, ODdict, PBdict,
RJdict, SKdict, TGdict, TNdict, TRdict, UPdict, UTdict, WBdict]
stateID = random.choice(stateFactList)
print(stateID["state"])
stateQuestion = buttonbox(msg=msg, title=title, choices=stateID["choices"], image=stateID["image file"])
if stateQuestion == stateID["correct"]:
print("it's correct")
correctnum += 1
questionsanswered += 1
else:
print("incorrect")
questionsanswered += 1
Here is the code, essentially when you run the program, it should randomly pick a state and provide some multiple choice answers based on the state. It randomly picks a state from the stateFactlist list and matches it with a dictionary stored in another file. Whenever the user answers a question, I want it to generate a new, random state to be displayed to the user, along with the respective multiple choice answers, but I can't find a way to implement it. Help is appreciated.
To help clear up the confusion, random.sample() includes a parameter, k, which lets you specify the number of unique elements to randomly choose from the specified sequence. It can work well for what OP has in mind. Here is a simplified example for illustration purposes:
import random
arr = ["A", "B", "C", "D"]
for x in random.sample(arr, k=len(arr)):
print(x)
Output:
C
D
A
B
EDIT: In response to OP's question on how to implement this in their code, here is a rough approximation. The key is to move the state Q&A code inside of a loop that is iterating (in random order) over stateFactList. Note, I wasn't able to run this code since I don't have access to OP's data structures or GUI library, so take it as a rough guide, not working code.
stateFactList = [APdict, ARdict, ASdict, BRdict, CTdict, GAdict, GJdict, HPdict, HRdict, JHdict,
KAdict, KLdict, MHdict, MLdict, MNdict, MPdict, MZdict, NLdict, ODdict, PBdict,
RJdict, SKdict, TGdict, TNdict, TRdict, UPdict, UTdict, WBdict]
for stateID in random.sample(stateFactList, k=len(stateFactList)): # iterate over stateFactList in random order
msg = "What Indian state is this?"
# This statement no longer needed
# stateID = random.choice(stateFactList)
print(stateID["state"])
stateQuestion = buttonbox(msg=msg, title=title, choices=stateID["choices"], image=stateID["image file"])
if stateQuestion == stateID["correct"]:
print("it's correct")
correctnum += 1
questionsanswered += 1
else:
print("incorrect")
questionsanswered += 1
Just shuffle the list, then iterate normally.
randome.shuffle(stateFactList)
for state in stateFactList:
...
you can remove the already picked element from the list for fair share
Like below example:
>>> a = [1,2,3,4,5]
>>> choice = random.choice(a)
>>> choice
4
>>> a.remove(a.index(choice)) # removes 4 from the list
>>> a
[1, 2, 4, 5]
>>> choice = random.choice(a)
>>> choice
2

Python insertion sorting a csv by row

My objective is to use an insertion sort to sort the contents of a csv file by the numbers in the first column for example I want this:
[[7831703, Christian, Schmidt]
[2299817, Amber, Cohen]
[1964394, Gregory, Hanson]
[1984288, Aaron, White]
[9713285, Alexander, Kirk]
[7025528, Janice, Lee]
[6441979, Sarah, Browning]
[8815776, Rick, Wallace]
[2395480, Martin, Weinstein]
[1927432, Stephen, Morrison]]
and sort it to:
[[1927432, Stephen, Morrison]
[1964394, Gregory, Hanson]
[1984288, Aaron, White]
[2299817, Amber, Cohen]
[2395480, Martin, Weinstein]
[6441979, Sarah, Browning]
[7025528, Janice, Lee]
[7831703, Christian, Schmidt]
[8815776, Rick, Wallace]
[9713285, Alexander, Kirk]]
based off the numbers in the first column within python my current code looks like:
import csv
with open('EmployeeList.csv', newline='') as File:
reader = csv.reader(File)
readList = list(reader)
for row in reader:
print(row)
def insertionSort(readList):
#Traverse through 1 to the len of the list
for row in range(len(readList)):
# Traverse through 1 to len(arr)
for i in range(1, len(readList[row])):
key = readList[row][i]
# Move elements of arr[0..i-1], that are
# greater than key, to one position ahead
# of their current position
j = i-1
while j >=0 and key < readList[row][j] :
readList[row] = readList[row]
j -= 1
readList[row] = key
insertionSort(readList)
print ("Sorted array is:")
for i in range(len(readList)):
print ( readList[i])
The code can already sort the contents of a 2d array, but as it is it tries to sort everything.
I think if I got rid of the [] it would work but in testing it hasn't given what I needed.
To try to clarify again I want to sort the rows positions based off of the first columns numerical value.
Sorry if I didn't understand your need right. But you have a list and you need to sort it? Why you don't you just use sort method in list object?
>>> data = [[7831703, "Christian", "Schmidt"],
... [2299817, "Amber", "Cohen"],
... [1964394, "Gregory", "Hanson"],
... [1984288, "Aaron", "White"],
... [9713285, "Alexander", "Kirk"],
... [7025528, "Janice", "Lee"],
... [6441979, "Sarah", "Browning"],
... [8815776, "Rick", "Wallace"],
... [2395480, "Martin", "Weinstein"],
... [1927432, "Stephen", "Morrison"]]
>>> data.sort()
>>> from pprint import pprint
>>> pprint(data)
[[1927432, 'Stephen', 'Morrison'],
[1964394, 'Gregory', 'Hanson'],
[1984288, 'Aaron', 'White'],
[2299817, 'Amber', 'Cohen'],
[2395480, 'Martin', 'Weinstein'],
[6441979, 'Sarah', 'Browning'],
[7025528, 'Janice', 'Lee'],
[7831703, 'Christian', 'Schmidt'],
[8815776, 'Rick', 'Wallace'],
[9713285, 'Alexander', 'Kirk']]
>>>
Note that here we have first element parsed as integer. It is important if you want to sort it by numerical value (99 comes before 100).
And don't be confused by importing pprint. You don't need it to sort. I just used is to get nicer output in console.
And also note that List.sort() is in-place method. It doesn't return sorted list but sorts the list itself.
*** EDIT ***
Here is two different apporach to sort function. Both could be heavily optimized but I hope you get some ideas how this can be done. Both should work and you can add some print commands in loops to see what happens there.
First recursive version. It orders the list a little bit on every run until it is ordered.
def recursiveSort(readList):
# You don't want to mess original data, so we handle copy of it
data = readList.copy()
changed = False
res = []
while len(data): #while 1 shoudl work here as well because eventually we break the loop
if len(data) == 1:
# There is only one element left. Let's add it to end of our result.
res.append(data[0])
break;
if data[0][0] > data[1][0]:
# We compare first two elements in list.
# If first one is bigger, we remove second element from original list and add it next to the result set.
# Then we raise changed flag to tell that we changed the order of original list.
res.append(data.pop(1))
changed = True
else:
# otherwise we remove first element from the list and add next to the result list.
res.append(data.pop(0))
if not changed:
#if no changes has been made, the list is in order
return res
else:
#if we made changes, we sort list one more time.
return recursiveSort(res)
And here is a iterative version, closer your original function.
def iterativeSort(readList):
res = []
for i in range(len(readList)):
print (res)
#loop through the original list
if len(res) == 0:
# if we don't have any items in our result list, we add first element here.
res.append(readList[i])
else:
done = False
for j in range(len(res)):
#loop through the result list this far
if res[j][0] > readList[i][0]:
#if our item in list is smaller than element in res list, we insert it here
res.insert(j, readList[i])
done = True
break
if not done:
#if our item in list is bigger than all the items in result list, we put it last.
res.append(readList[i])
print(res)
return res

How to find closely matching unique elements in two lists? (Using a distance function here)

I am trying to create a name matcher to compare say, 'JOHN LEWIS' to 'JOHN SMITH LEWIS'. They are clearly the same person and I want to create a function where when you enter those names, it turns it into a list then gives you the matching names.
The problem is that my loop is returning that 'LEWIS' matches with 'LEWIS' and 'SMITH' matches with 'LEWIS' because of the order that it is in.
from pyjarowinkler import distance
entered_name = 'JOHN LEWIS'.split(' ') # equals ['JOHN','LEWIS']
system_name = 'JOHN SMITH LEWIS'.split(' ') # equals ['JOHN','SMITH','LEWIS']
ratio = []
for i in entered_name:
maximum = 0
for j in system_name:
score = distance.get_jaro_distance(i, j, winkler=True,
scaling=0.1)
while score > maximum:
maximum = score
new = (i, j, maximum)
system_name.remove(i)
#removes that name from the original list
ratio.append(new)
would return something like: [('JOHN', 'JOHN', 1.0), ('LEWIS', 'SMITH', 0.47)]
and not: [('JOHN', 'JOHN', 1.0), ('LEWIS', 'LEWIS', 1.0)] <- this is what I want.
Also, if you try something like 'ALLY A ARM' with 'ALLY ARIANA ARMANI', it matches 'ALLY' twice if you don't do that remove(i) line. This is why I only want unique matches!
I just keep getting errors or the answers that I am not looking for.
The issue is with your system_name.remove(i) line. First of all, it's usually a bad idea to modify a list while you're iterating through that list. This can lead to unexpected behavior. In your case, here's what your code is doing:
First time through, matches 'JOHN', and 'JOHN'. No problem.
Removes 'JOHN' from system_name. Now system_name = ['SMITH', 'LEWIS'].
Second time through, i = 'LEWIS', j = 'SMITH', score = .47 which is greater than 0, so your check score > maximum passes
We set maximum = score
We set new = ('LEWIS', 'SMITH', 0.47)
We remove 'LEWIS' from system_name. Now system_name = ['SMITH']. Uh oh...
Simple rewrite below, using an if instead of a while loop because the while loop is totally unnecessary:
for i in entered_name:
maximum = 0
for j in system_name:
score = distance.get_jaro_distance(i, j, winkler=True,
scaling=0.1)
if score > maximum:
maximum = score
new = (i, j, maximum)
system_name.remove(new[1]) # want to remove 'SMITH' in the example, not 'LEWIS'
ratio.append(new)
All I did was move the system_name.remove() call outside of the loop over system_name, and replace i with j (using new[1] since I'm outside of the j loop).
Jaro-Winkler distance is for comparison of sequences, there is no need to compare individual elements as if you were trying to find an edit distance between individual characters rather than whole words.
With that in mind, one should probably treat parts of a name as individual letters, and the whole name as a word, comparing, say, "JL" vs. "JSL" instead of "JOHN LEWIS" and "JOHN SMITH LEWIS":
import string
import itertools
from pyjarowinkler import distance
WORDS_CACHE = {}
def next_letter():
base = ""
while True:
for ch in string.ascii_lowercase:
yield base + ch
base += ch
GENERATOR = next_letter()
def encode(word):
if word not in WORDS_CACHE:
WORDS_CACHE[word] = GENERATOR.next()
return WORDS_CACHE[word]
def score(first_name, second_name):
return distance.get_jaro_distance(
"".join(map(encode, first_name.split())),
"".join(map(encode, second_name.split())),
)

Longest cycle of key-value pairs in a dictionary without recursion

I've a question regarding a dictionaries. I would like to know how to solve this without using recursive functions (since that is a requirement).
The code creates a random dictionary with the names in the nameslist connected to each other. I know what the code should do, but not how to do this.
I need the starting key, which I succesfully extract (probably in an incorrect/ugly manner). Then the code should loop the entire cycle as shown in the quote at the bottom of my code, untill the starting key has been found again as a value. The loop should then end and return the length of this cycle.
The code below is what I managed to come up with, even though it is wrong.
I would prefer an answer without recursive functions as said before.
from random import seed, choice
import time
seed(0)
nameslist = [ "Liam", "Emma", "Noah", "Olivia", ]
# Creates random couples dictionary from a list
def create_dictionary(nlist):
dict = {}
nlistcopy = nlist[:]
for item in nlist:
dict[item] = choice(nlistcopy)
nlistcopy.remove(dict[item])
return dict
# Generates the longest cycle in the couples dictionary, however, the code does not seem to work.
def longest_cycle(dict):
longest = 0
for each in dict:
start = dict[each]
break
each = 0
while each != start :
for each in dict:
each = dict[each]
print(each)
longest += 1
time.sleep(5)
namesdict = create_dictionary(nameslist)
print(longest_cycle(namesdict))
# Dictionary = {'Liam': 'Olivia', 'Noah': 'Liam', 'Olivia': 'Noah', 'Emma': 'Emma'}
# Liam --> Olivia --> Noah --> Liam (longest cycle = 3)!
The eventual list of names will contain much more names, this shorter version is just for testing purposes. The sleeping time is implemented to prevent the infinite loop from crashing my notebook (I'm using Jupyter notebook to tackle the issue). Thanks in advance!
I don't know if there is a better solution but anyway, this is very simple and does not use any recursive function:
dict = {'Liam': 'Olivia', 'Noah': 'Liam', 'Olivia': 'Noah', 'Emma': 'Emma'}
result = 0
longest = 1 # longest_cycle of a key, always == 1 at first
for key in dict.keys():
dest = key
key = dict[key]
while dest != key:
key = dict[key]
longest += 1
if longest > result:
result = longest
longest = 0
print(result)
Firs thing avoid naming dict which is a reserved word just like list, second let's simplify the code:
import random
def create_names_dict(names_list):
names_list_random = names_list[:]
random.shuffle(names_list_random)
return {k:v for k,v in zip(names_list, names_list_random)}
Next step:
def longest_cycle(names_list, names_dict):
start = names_list[0]
key = start
value = names_dict[start]
longest = [start]
while start != value:
longest.append(value)
key, value = value, names_dict[value]
longest.append(value)
print('%s (longest cycle: %d)' % (' --> '.join(longest), len(longest) - 1))
Test:
>>> names_list = [ "Liam", "Emma", "Noah", "Olivia", ]
>>> names_dict = create_names_dict(names_list)
>>> names_dict
{'Noah': 'Noah', 'Liam': 'Emma', 'Olivia': 'Liam', 'Emma': 'Olivia'}
>>> longest_cycle(names_list, names_dict)
Liam --> Emma --> Olivia --> Liam (longest cycle: 3)
Cheers!

Get all possible combinations of rows in a matrix

I'm setting up a simple sentence generator in python, to create as many word combinations as possible to describe a generic set of images involving robots. (Its a long story :D)
It outputs something like this: 'Cyborg Concept Downloadable Illustration'
Amazingly, the random generate I wrote only goes up to 255 unique combinations. Here is the script:
import numpy
from numpy import matrix
from numpy import linalg
import itertools
from pprint import pprint
import random
m = matrix( [
['Robot','Cyborg','Andoid', 'Bot', 'Droid'],
['Character','Concept','Mechanical Person', 'Artificial Intelligence', 'Mascot'],
['Downloadable','Stock','3d', 'Digital', 'Robotics'],
['Clipart','Illustration','Render', 'Image', 'Graphic'],
])
used = []
i = 0
def make_sentence(m, used):
sentence = []
i = 0
while i <= 3:
word = m[i,random.randrange(0,4)]
sentence.append(word)
i = i+1
return ' '.join(sentence)
def is_used(sentence, used):
if sentence not in used:
return False
else:
return True
sentences = []
i = 0
while i <= 1000:
sentence = make_sentence(m, used)
if(is_used(sentence, used)):
continue
else:
sentences.append(sentence)
print str(i) + ' ' +sentence
used.append(sentence)
i = i+1
Using randint instead of randrange, I get up to 624 combinations (instantly) then it hangs in an infinite loop, unable to create more combos.
I guess the question is, is there a more appropriate way of determining all possible combinations of a matrix?
You can make use of itertools to get the all possible combinations of matrix. I given one example to show how itertools will work.
import itertools
mx = [
['Robot','Cyborg','Andoid', 'Bot', 'Droid'],
['Character','Concept','Mechanical Person', 'Artificial Intelligence', 'Mascot'],
['Downloadable','Stock','3d', 'Digital', 'Robotics'],
['Clipart','Illustration','Render', 'Image', 'Graphic'],
]
for combination in itertools.product(*mx):
print combination
Your code can make use of recursion. Without itertools, here is one strategy:
def make_sentences(m, choices = []):
output = []
if len(choices) == 4:
sentence = ""
i = 0
#Go through the four rows of the matrix
#and choose words for the sentence
for j in choices:
sentence += " " + m[i][j]
i += 1
return [sentence] #must be returned as a list
for i in range(0,4):
output += make_sentences(m, choices+[i])
return output #this could be changed to a yield statement
This is quite different from your original function.
The choices list keeps track of the index of the column for each ROW in m that has been selected. When the recursive method finds that choices four rows have been selected, it outputs a list with just ONE sentence.
Where the method finds that the choices list doesn't have four elements, it recursively calls itself for FOUR new choices lists. The results of these recursive calls are added to the output list.

Categories

Resources