The usage of radix sort

The usage of radix sort - python

Let's say i have a file containing data on users and their favourite movies.
Ace: FANTASTIC FOUR, IRONMAN
Jane: EXOTIC WILDLIFE, TRANSFORMERS, NARNIA
Jack: IRONMAN, FANTASTIC FOUR
and based of that, the program I'm about to write returns me the name of the users that likes the same movies.
Since Ace and Jack likes the same movie, they will be partners hence the program would output:
Movies: FANTASTIC FOUR, IRONMAN
Partners: Ace, Jack
Jane would be exempted since she doesn't have anyone who shares the same interest in movies as her.
The problem I'm having now is figuring out on how Radix Sort would help me achieve this as I've been thinking whole day long. I don't have much knowledge on radix sort but i know that it compares elements one by one but I'm terribly confused in cases such as FANTASTIC FOUR being arranged first in Ace's data and second in Jack's data.
Would anyone kindly explain some algorithms that i could understand to achieve the output?

Can you show us how you sort your lists ? The quick and dirty code below give me the same output for sorted Ace and Jack.
Ace = ["FANTASTIC FOUR", "IRONMAN"]
Jane = ["EXOTIC WILDLIFE", "TRANSFORMERS", "NARNIA"]
Jack = ["IRONMAN", "FANTASTIC FOUR"]
sorted_Ace = sorted(Ace)
print (sorted_Ace)
sorted_Jack = sorted(Jack)
print (sorted_Jack)
You could start comparing elements one by one from here.
I made you a quick solution, it can show you how you can proceed as it's not optimized at all and not generalized.
Ace = ["FANTASTIC FOUR", "IRONMAN"]
Jane = ["EXOTIC WILDLIFE", "TRANSFORMERS", "NARNIA"]
Jack = ["IRONMAN", "FANTASTIC FOUR"]
Movies = []
Partners = []
sorted_Ace = sorted(Ace)
sorted_Jane = sorted(Jane)
sorted_Jack = sorted(Jack)
for i in range(len(sorted_Ace)):
if sorted_Ace[i] == sorted_Jack[i]:
Movies.append(sorted_Ace[i])
if len(Movies) == len(sorted_Ace):
Partners.append("Ace")
Partners.append("Jack")
print(Movies)
print(Partners)

Related

Code printing the incorrect output (Python problem)

My code is not printing the correct result for the test case
2 6
1 alex
1 Alex
2 sam
1 alix
1 Alix
2 caM
properly as it should result in "alex, sam", yet it results in "alex, Alex" (After reading the following instructions youll understand why). Was hoping I can get some insight as to what is wrong with my code.
Exercise: Dan’s recently announced that he’s teaching n top-secret courses next semester.
Instead of enrolling in them through ACORN, students need to email Dan to
express their interests. These courses are numbered from 1 to n in some arbitrary
order.
In particular, if a student named s is interested in taking a course c, they need to
send an email to Dan containing the message c s. Note that if a student is
interested in taking multiple courses, they need to send multiple emails, one per
course.
Upon receiving a message c s, Dan looks at the list of students already enrolled in
course c. If there’s already a student on the list whose name is too similar to s,
Dan assumes s is the same student and ignores the message. Otherwise, he
enrolls s in the course.
Dan considers two names too similar if and only if they have the same length and
differ in at most one letter(note that “a” and “A” are considered the same letter).
For example, “Josh” and “Josh” are too similar. “Sam” and “CaM” are too similar
as well. However, neither “Max” and “Cat” nor “Ann” and “Anne” are too similar.
Dan has a lot of students and teaches a lot of courses. Consequently, it would take
him forever to process the messages sent by the students one-by-one manually.
Instead, he’s asking you to help him out by writing a program that takes in the
messages as the input and outputs, for every course, the list of the students
enrolled in that course in the order of their enrolments.
My code thus far: `
u = input()
u, w = u.split()
courses = int(u)
students = int(w)
names = []
classes = []
for i in range(students):
names_input = input()
selection = names_input.split()
course_num = selection[0]
student_name = selection[1]
if course_num not in classes:
classes.append(course_num)
names.append(student_name)
else:
if student_name not in names:
names.append(student_name)
print(classes)
print(names)
for i in range(0, len(classes)):
print(names[i])
`Image of the question
image of the question here as well Question in a screenshot, if its easier to read feel free to see it here as well.

understanding a python snippet about a boolean variable

I beginner in python & I don't know this line in the following is used for what? It's a Boolean value?(for & if in a sentence?) If anybody know it, please explain. Thanks
taken_match = [couple for couple in tentative_engagements if woman in couple]
###
for woman in preferred_rankings_men[man]:
#Boolean for whether woman is taken or not
taken_match = [couple for couple in tentative_engagements if woman in couple]
if (len(taken_match) == 0):
#tentatively engage the man and woman
tentative_engagements.append([man, woman])
free_men.remove(man)
print('%s is no longer a free man and is now tentatively engaged to %s'%(man, woman))
break
elif (len(taken_match) > 0):
...

Python has some pretty sweet syntax to create lists quickly. What you're seeing here is list comprehension-
taken_match = [couple for couple in tentative_engagements if woman in couple]
taken_match will be a list of all the couples where the woman is in the couple- basically, this filters out all the couples where the woman is NOT in the couple.
If we were to write this out without list comprehension:
taken_match = []
for couple in couples:
if woman in couple:
taken_match.append(couple)
As you can see.. list comprehension is way cooler :)
After that line, you're checking if the length of the taken_match is 0- if it is, no couples were found with that woman in them, so we add in an engagement between what the man and the woman, and then move on. If you have any other lines you didn't understand, feel free to ask!

break paragraph into sentences in python and link back to an ID

I have two lists, one with ids and one with corresponding comments for each id.
list_responseid = ['id1', 'id2', 'id3', 'id4']
list_paragraph = [['I like working and helping them reach their goals.'],
['The communication is broken.',
'Information that should have come to me is found out later.'],
['Try to promote from within.'],
['I would relax the required hours to be available outside.',
'We work a late night each week.']]
The ResponseID 'id1' is related to the paragraph ('I like working and helping them reach their goals.') and so on.
I can break paragraph into sentences using the following function:
list_sentence = list(itertools.chain(*list_paragraph))
What would be the syntax to get the end result that is data frame (or list) with separate entry for a sentence and have an ID associated with that sentence (which is now linked to paragraph). The final result would look like this (I will convert list to panda data frame at the end).
id1 'I like working with students and helping them reach their goals.'
id2 'The communication from top to bottom is broken.'
id2 'Information that should have come to me is found out later and in some cases students know more about what is going on than we do!'
id3 'Try to promote from within.'
id4 'I would relax the required 10 hours to be available outside of 8 to 5 back to 9 to 5 like it used to be.'
id4 'We work a late night each week and rarely do students take advantage of those extended hours.'
Thanks.

If you do it often it would be clearer, and probably more efficient depending on the size of the arrays, if you make a dedicated function for that with two regular nested loops, but if you need a quick one liner for it (it's doing just that):
id_sentence_tuples = [(list_responseid[id_list_idx], sentence) for id_list_idx in range(len(list_responseid)) for sentence in list_paragraph[id_list_idx]]
id_sentence_tuples will then be a list of tupples where each of the elements is a pair like (paragraph_id, sentence) just as the result you expect.
Also i would advise you to check that both lists have the same length before doing it so in case they don't you get a meaningful error.
if len(list_responseid) != len(list_paragraph):
IndexError('Lists must have same cardinality')

I had a dataframe with an ID and a review (col = ['ID','Review']). If you can combine these lists to make a dataframe then you can use my approach. I split these reviews into sentences using nltk and then linked back the IDs within the loop. Following is the code that you can use.
## Breaking feedback into sentences
import nltk
count = 0
df_sentences = pd.DataFrame()
for index, row in df.iterrows():
feedback = row['Reviews']
sent_text = nltk.sent_tokenize(feedback) # this gives us a list of sentences
for j in range(0,len(sent_text)):
# print(index, "-", sent_text[j])
df_sentences = df_sentences.append({'ID':row['ID'],'Count':int(count),'Sentence':sent_text[j]}, ignore_index=True)
count = count + 1
print(df_sentences)

Two way matching for Secret Santa game [duplicate]

This question already has answers here:
Secret santa algorithm
(9 answers)
Closed 7 years ago.
I am trying to write a script that pairs up men and women for a secret Santa type event. So I have 2 lists of boys and girls, and want to carry out 2 way matching, but at the moment I can only seem to figure out how to do 1 way matching.
Furthermore the problem I have is this... in the example below if Kedrick gets Annabel, then Annabel can't get Kedrick. Kedrick has to get someone else from the list.
My current implementation is as follows, how can I extend its functionality to meet the abovementioned requirements?
boys = ['Kedrick','Jonathan','Tim','Philip','John','Quincy'];
girls = ['Annabel','Janet','Jocelyn','Pamela','Priscilla','Viviana'];
matches = []
for i in boys:
rand - randint(0, len(girls-1)
fullname = "{} matched with {}".format(i, girls(rand)
del girls(rand)
matches.append(fullname)
print matches

This could probably be done with fewer loops and a lot less code but here is my solution! Created 2 dict's to store names and their targets (dict's could be combined or done at the same time to cut down on memory issues, but with a program this size I don't think you would ever run into this issue!
boys = ['Kedrick','Jonathan','Tim','Philip','John','Quincy'];
girls = ['Annabel','Janet','Jocelyn','Pamela','Priscilla','Viviana'];
matchesBoys = {i:{'to':''} for i in boys}
matchesGirls = {i:{'to':''} for i in girls}
for name in boys:
giveTo = girls[random.randint(0, len(girls)-1)]
girls.remove(giveTo)
matchesBoys[name]['to']=giveTo
for name in matchesGirls:
giveTo = boys[random.randint(0, len(boys)-1)]
boys.remove(giveTo)
matchesGirls[name]['to']=giveTo
del boys, girls
for i in matchesBoys:
print "%s matched with %s"%(i, matchesBoys[i]['to'])
for i in matchesGirls:
print '%s matched with %s'%(i, matchesGirls[i]['to'])

Shuffle both lists and put them in a ring, with every other element being from the first or second list. Each person gives a gift to the one on their right. Something similar to a list like this:
[Girl, Boy, Girl, Boy, ..., Boy]
The last element gives a gift to the first.
It works under the assumption that both lists have the same amount of elements and that there are at least four elements in total, otherwise the problem is unsolvable.
This gives one solution that fulfills your constraints. The general solution to the problem is to find a directed bipartite graph between the sets where each vertex have exactly two edges, one incoming and one outgoing. Perhaps the solution to that problem also always creates a ring?

This is an implementation that creates a circle with alternate boy and girl. See #Emil Vickstom's answer for an explanation of the idea.
from random import shuffle
boys = ['Kedrick','Jonathan','Tim','Philip','John','Quincy'];
girls = ['Annabel','Janet','Jocelyn','Pamela','Priscilla','Viviana'];
shuffle(boys)
shuffle(girls)
circle = [person for pair in zip(boys, girls) for person in pair]
print(' -> '.join(circle + circle[:1]))
Output:
Tim -> Priscilla -> Quincy -> Annabel -> John -> Janet -> Kedrick ->
Jocelyn -> Philip -> Pamela -> Jonathan -> Viviana -> Tim

Compensating for "variance" in a survey

The title for this one was quite tricky.
I'm trying to solve a scenario,
Imagine a survey was sent out to XXXXX amount of people, asking them what their favourite football club was.
From the response back, it's obvious that while many are favourites of the same club, they all "expressed" it in different ways.
For example,
For Manchester United, some variations include...
Man U
Man Utd.
Man Utd.
Manchester U
Manchester Utd
All are obviously the same club however, if using a simple technique, of just trying to get an extract string match, each would be a separate result.
Now, if we further complication the scenario, let's say that because of the sheer volume of different clubs (eg. Man City, as M. City, Manchester City, etc), again plagued with this problem, its impossible to manually "enter" these variances and use that to create a custom filter such that converters all Man U -> Manchester United, Man Utd. > Manchester United, etc. But instead we want to automate this filter, to look for the most likely match and converter the data accordingly.
I'm trying to do this in Python (from a .cvs file) however welcome any pseudo answers that outline a good approach to solving this.
Edit: Some additional information
This isn't working off a set list of clubs, the idea is to "cluster" the ones we have together.
The assumption is there are no spelling mistakes.
There is no assumed length of how many clubs
And the survey list is long. Long enough that it doesn't warranty doing this manually (1000s of queries)

Google Refine does just this, but I'll assume you want to roll your own.
Note, difflib is built into Python, and has lots of features (including eliminating junk elements). I'd start with that.
You probably don't want to do it in a completely automated fashion. I'd do something like this:
# load corrections file, mapping user input -> output
# load survey
import difflib
possible_values = corrections.values()
for answer in survey:
output = corrections.get(answer,None)
if output = None:
likely_outputs = difflib.get_close_matches(input,possible_values)
output = get_user_to_select_output_or_add_new(likely_outputs)
corrections[answer] = output
possible_values.append(output)
save_corrections_as_csv

Please edit your question with answers to the following:
You say "we want to automate this filter, to look for the most likely match" -- match to what?? Do you have a list of the standard names of all of the possible football clubs, or do the many variations of each name need to be clustered to create such a list?
How many clubs?
How many survey responses?
After doing very light normalisation (replace . by space, strip leading/trailing whitespace, replace runs of whitespace by a single space, convert to lower case [in that order]) and counting, how many unique responses do you have?
Your focus seems to be on abbreviations of the standard name. Do you need to cope with nicknames e.g. Gunners -> Arsenal, Spurs -> Tottenham Hotspur? Acronyms (WBA -> West Bromwich Albion)? What about spelling mistakes, keyboard mistakes, SMS-dialect, ...? In general, what studies of your data have you done and what were the results?
You say """its impossible to manually "enter" these variances""" -- is it possible/permissible to "enter" some "variances" e.g. to cope with nicknames as above?
What are your criteria for success in this exercise, and how will you measure it?

It seems to me that you could convert many of these into a standard form by taking the string, lower-casing it, removing all punctuation, then comparing the start of each word.
If you had a list of all the actual club names, you could compare directly against that as well; and for strings which don't match first-n-letters to any actual team, you could try lexigraphical comparison against any of the returned strings which actually do match.
It's not perfect, but it should get you 99% of the way there.
import string
def words(s):
s = s.lower().strip(string.punctuation)
return s.split()
def bestMatchingWord(word, matchWords):
score,best = 0., ''
for matchWord in matchWords:
matchScore = sum(w==m for w,m in zip(word,matchWord)) / (len(word) + 0.01)
if matchScore > score:
score,best = matchScore,matchWord
return score,best
def bestMatchingSentence(wordList, matchSentences):
score,best = 0., []
for matchSentence in matchSentences:
total,words = 0., []
for word in wordList:
s,w = bestMatchingWord(word,matchSentence)
total += s
words.append(w)
if total > score:
score,best = total,words
return score,best
def main():
data = (
"Man U",
"Man. Utd.",
"Manch Utd",
"Manchester U",
"Manchester Utd"
)
teamList = (
('arsenal',),
('aston', 'villa'),
('birmingham', 'city', 'bham'),
('blackburn', 'rovers', 'bburn'),
('blackpool', 'bpool'),
('bolton', 'wanderers'),
('chelsea',),
('everton',),
('fulham',),
('liverpool',),
('manchester', 'city', 'cty'),
('manchester', 'united', 'utd'),
('newcastle', 'united', 'utd'),
('stoke', 'city'),
('sunderland',),
('tottenham', 'hotspur'),
('west', 'bromwich', 'albion'),
('west', 'ham', 'united', 'utd'),
('wigan', 'athletic'),
('wolverhampton', 'wanderers')
)
for d in data:
print "{0:20} {1}".format(d, bestMatchingSentence(words(d), teamList))
if __name__=="__main__":
main()
run on sample data gets you
Man U (1.9867767507647776, ['manchester', 'united'])
Man. Utd. (1.7448074166742613, ['manchester', 'utd'])
Manch Utd (1.9946817328797555, ['manchester', 'utd'])
Manchester U (1.989100008901989, ['manchester', 'united'])
Manchester Utd (1.9956787398647866, ['manchester', 'utd'])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.