Alphabetize a List of Multiline Values in Python? - python

I have a list of names and addresses organized in the following format:
Mr and Mrs Jane Doe
Candycane Lane
Magic Meadows, SC
I have several blocks of data written like this, and I want to be able to alphabetize each block by the last name (Doe, in this case). After doing some digging, the best I can reckon is that I need to make a "List of lists" and then use the last name as a key by which to alphabetize the block. However, given by freshness to python and lack of Google skills, the closest I could find was this. I'm confused as to converting each block to a list and then slicing it; I can't seem to find a way to do this and still be able to alphabetize properly. Any and all guidance is greatly appreciated.

If I understood correctly, what you want basically is to sort values by "some computation done on the value", in this case the extracted last name.
For that, use the key keyword argument to .sort() or sorted():
def my_key_function(original_name):
## do something to extract the last name, for example:
try:
return original_name.split(',')[1].strip()
except IndexError:
return original_name
my_sorted_values = sorted(my_original_values, key=my_key_function)
The only requirement is that your "key" function is deterministic, i.e. always return the same output for each given input.
You might also want to sort by last name and then first name: in this case, just return a tuple (last, first): if last si the same for two given items, first will be used to further sort the two.
Update
For your specific case, this function should do the trick:
def my_key_function(original_name):
return original_name.splitlines()[0].split()[-1]

Assuming you already have the data in a list
l = ['Mr and Mrs Jane Smith\nCandycane Lane\nMagic Meadows, SC',
'Mr and Mrs Jane Doe\nCandycane Lane\nMagic Meadows, SC',
'Mr and Mrs Jane Atkins\nCandycane Lane\nMagic Meadows, SC']
You can specify the key to sort on.
l.sort(key=lambda x: x.split('\n')[0].split(' ')[-1])
In this case, get the last word (.split(' ')[-1]) on the first line (.split('\n')[0])

you want to make a new list where each entry is a tuple containing the sort key you want and the whole thing. Sort that list and then get the second component of each entry in the sort:
def get_sort_name (address):
name, address, city = address.split('\n')
return (name.split(' ')[-1] , address) # last item of first line & whole thing as tulle
keyed_list = map (get_sort_name, addresses)
keyed_list.sort()
sorted_addresses = [item[1] for item in keyed_list]
Thi could be more compact using lambdas of course but its better to be readable :)

Related

Creating dictionary from list of lists

I am working on an online course exercise (practice problem before the final test).
The test involves working with a big csv file (not downloadable) and answering questions about the dataset. You're expected to write code to get the answers.
The data set is a list of all documented baby names each year, along with
#how often each name was used for boys and for girls.
A sample list of the first 10 lines is also given:
Isabella,42567,Girl
Sophia,42261,Girl
Jacob,42164,Boy
and so on.
Questions you're asked include things like 'how many names in the data set', 'how many boys' names beginning with z' etc.
I can get all the data into a list of lists:
[['Isabella', '42567', 'Girl'], ['Sophia', '42261', 'Girl'], ['Jacob', '42164', 'Boy']]
My plan was to convert into a dictionary, as that would probably be easier for answering some of the other questions. The list of lists is saved to the variable 'data':
names = {}
for d in data:
names[d[0]] = d[1:]
print(names)
{'Isabella': ['42567', 'Girl'], 'Sophia': ['42261', 'Girl'], 'Jacob': ['42164', 'Boy']}
Works perfectly.
Here's where it gets weird. If instead of opening the sample file with 10 lines, I open the real csv file, with around 16,000 lines. everything works perfectly right up to the very last bit.
I get the complete list of lists, but when I go to create the dictionary, it breaks - here I'm just showing the first three items, but the full 16000 lines are all wrong in a similar way):
names = {}
for d in data:
names[d[0]] = d[1:]
print(names)
{'Isabella': ['56', 'Boy'], 'Sophia': ['48', 'Boy'], 'Jacob': ['49', 'Girl']
I know the data is there and correct, since I can read it directly:
for d in data:
print(d[0], d[1], d[2])
Isabella 42567 Girl
Sophia 42261 Girl
Jacob 42164 Boy
Why would this dictionary work fine with the cvs file with 10 lines, but completely break with the full file? I can't find any
Follow the comments to create two dicts, or a single dictionary with tuple keys. Using tuples as keys is fine if you keep your variables inside python, but you might get into trouble when exporting to json for example.
Try a dictionary comprehension with list unpacking
names = {(name, sex): freq for name, freq, sex in data}
Or a for loop as you started
names = dict()
for name, freq, sex in data:
names[(name, freq)] = freq
I'd go with something like
results = {}
for d in data:
name, amount, gender = d.split(',')
results[name] = data.get(name, {})
results[name].update({ gender: amount })
this way you'll get results in smth like
{
'Isabella': {'Girl': '42567', 'Boy': '67'},
'Sophia': {'Girl': '42261'},
'Jacob': {'Boy': '42164'}
}
However duplicated values will override previous, so you need to take that into account if there are some and it also assumes that the whole file matches format you've provided

Sorting random element into dictionary? Python

I am trying to append a random choice into a dictionary, but my code doesn't seem to be working.
The file I am using (mood.txt):
happy, Jennifer Clause
happy, Jake Foster
sad, Jonathan Bower
mad, Penny
excited, Logan
awkward, Mason Tyme
my code:
def theFile():
moodFile = open("mood.txt")
theMood = moodFile.readlines()
moodFile.close()
return(theMood)
def makeTheDict(myFile):
moodDict = {}
for lines in myFile:
(mood, name) = lines.split(",")
moodDict[mood] = name.strip()
return(moodDict)
def randomMood(name, mood, moodDict):
if mood in moodDict:
randomMood = random.choice(mood)
moodDict[mood] = randomMood
moodDict.append(name, randomMood)
print(name, "has been put in the", randomMood, "group")
def main():
moodFile = theFile()
moodDict = makeTheDict(moodFile)
name = input("Choose a name: ")
newMood = input("Choose a mood: ")
randomMood(name, newMood, moodDict)
For example, I want to add a "Jamie Green" into a random group, and if it randomly chose "sad" then -
happy, Jennifer Clause
happy, Jake Foster
sad, Jonathan Bower
mad, Penny
excited, Logan
awkward, Mason Tyme
#sad, Jamie Green
How would I append this into the dictionary randomly?
Thank you!
It seems that you want to map strings to lists of strings, but instead of that you are mapping strings to strings.
Look at this line:
moodDict[mood] = name.strip()
Here you are mapping the string mood to the string name.strip(). If at this point, there was already a name mapped to the current mood, the old value would be replaced and lost. In your file sample, both Jennifer and Jake are happy. At the first iteration of the for loop you have:
moodDict["happy"] = "Jennifer Clause"
Then, at the second step, you have.
moodDict["happy"] = "Jake Foster"
Here "Jake Foster" replaces "Jennifer Clause". Since the moods can be repeated, what you probably want is something like this:
if mood in moodDict:
moonDict[mood].append(name.strip())
else:
moonDict[mood] = [name.strip()]
This way, for each mood key you have a list of name values.
Regarding the randomMood function, there are may things doesn't look good:
The if statement should be indented since is part of the function. This should throw an IndentationError, but I will assume it happened when you copied the code into StackOverflow.
Mood is a string, so what you are actually doing in random.choice(mood) is to choose a random character from that string, which doesn't make any sense. You probably want to choose from the list of moods, which would be something like this randomMood = random.choice(moodDict.keys()).
Because of what I explained in the previous point, the following line just replace the value under the mood key with a random character from the mood, which doesn't make sense.
Dictionaries don't have any method named append, this should throw an error too. You probably want to replace it with this: moonDict[randomMood].append(name)
Finally, I don't understand why you ask the user to input a mood when it is supposed to be chosen randomly.
It seems you are a bit confused about what a Python dictionary is and how it works. Remember that it map keys to values. In your code your keys are the moods and the values are the names, both represented as strings. The keys are unique. This means that if you assign a new value to an existing key, the old value mapped under that key gets lost. If you want to deal with multiple values under the same key you should map the key to a collection of values, like a list, instead of a single value.

Keeping track of iterations through list done inside function in Python

I have an interesting question that very well may be answered with "Do it another way."
I have a function that iterates through a list with a for loop. I call the function within itself on certain parameters, and keep iterating through the list from the point it was at. The issue is I would like to be able to jump out of the recursive call back into the top function but keep track of how far I went in the list and go on from there.
Basically I want something like this:
def iterate_list(listA)
dictA = {}
for pos,item in enumerate(listA):
if item == 1:
dictA[pos] = iterate_list(listA[pos])
#At this point I want to go back to for loop (like continue does) except I want
#to be at the pos I was at when I left the sub function
continue #? Don't think continue is what I want but its the closest thing I could
#find so I left it in for now
elif item == 2:
return dictA
else:
dictA[pos] = item
return dictA
dictFinal = iterate_list(original_list)
So the end result is a dictionary of whatever is in the list (integers in this example but not always) except for some points when the key points to a sub dictionary. This is a simplified version of the code were I took out all the extra code that gets the keys and values (that bit works I tested it extensively) so what I'm putting in the dictionary here looks a little silly (but simulates what I'm doing well enough). Thanks for the help.
edit: A little more detail on the input and output as requested. The input is a list of strings that are mostly written as word : word, the output is the first word as the key, the second as the value for the dictionary. The parsing of the strings code is written and works. But there are some areas of repeated keys so I want those to go into a sub dictionary. So for example
Input = [Name: Bob, ID: 12345, Age: 99, Job: Dentist, Patient Name: John, Patient ID: 321, Patient Name: Susan, Patient ID: 666, Patient Name: Lucy, Patient ID: 087, Employees: 5, Address: 233 Main St, Phone: 555-5555]
Output = {Name : Bob, ID : 12345, Age : 99, Job : Dentist, Patient1 : {Patient Name : John, Patient ID : 321}, Patient2 : {Patient Name : Susan, Patient ID : 666}, Patient3 : {Patient Name : Lucy, Patient ID : 087}, Employees : 5, Address : 233 Main St, Phone : 555-5555}
If that makes sense. Let me know if more detail is needed.
One simple answer would be to use an iterator. If you pass an iterator to a recursive call and it consumes some elements, when the recursive call returns, you'll continue where it left off:
def function(iterable):
iterable = iter(iterable):
for thing in iterable:
if some_condition(thing):
function(iterable)
continue
# The next iteration of the loop will use
# the first item the recursive call didn't.
This might not cut it, though; for example, you might need to go back a position or two in the list, and most Python iterators don't support that. In that case, you could write an iterator that allows you to unconsume elements, or you could iterate with an explicit index and put the index you stopped at into the return value.
It seems like you are parsing input list. List elements are tokens and grammar is simple but syntactic analysis is better to implemented with more (recursive) functions, depending on grammar definition.
From your example, BNF seems like:
<patient> ::= <patient-id> <patient-name> # Patient has 2 fields
<doctor-data> ::= Name | ID | Age | Job # Personal doctor data is combination of these data
<doctor> ::= <doctor-data> <patient>* # Doctor data is personal data and some patients
<hospital_data> ::= Employees | Address | Phone # Hospital data fields
<hospital> ::= <doctor>* <hospital_data> # Hospital has doctors and additional data

Sorting a list using a regex in Python

I have a list of email addresses with the following format:
name####email.com
But the number is not always present. For example: john45#email.com, bob#email.com joe2#email.com, etc. I want to sort these names by the number, with those without a number coming first. I have come up with something that works, but being new to Python, I'm curious as to whether there's a better way of doing it. Here is my solution:
import re
def sortKey(name):
m = re.search(r'(\d+)#', name)
return int(m.expand(r'\1')) if m is not None else 0
names = [ ... a list of emails ... ]
for name in sorted(names, key = sortKey):
print name
This is the only time in my script that I am ever using "sortKey", so I would prefer it to be a lambda function, but I'm not sure how to do that. I know this will work:
for name in sorted(names, key = lambda n: int(re.search(r'(\d+)#', n).expand(r'\1')) if re.search(r'(\d+)#', n) is not None else 0):
print name
But I don't think I should need to call re.search twice to do this. What is the most elegant way of doing this in Python?
Better using re.findall as if no numbers are found, then it returns an empty list which will sort before a populated list. The key used to sort is any numbers found (converted to ints), followed by the string itself...
emails = 'john45#email.com bob#email.com joe2#email.com'.split()
import re
print sorted(emails, key=lambda L: (map(int, re.findall('(\d+)#', L)), L))
# ['bob#email.com', 'joe2#email.com', 'john45#email.com']
And using john1 instead the output is: ['bob#email.com', 'john1#email.com', 'joe2#email.com'] which shows that although lexicographically after joe, the number has been taken into account first shifting john ahead.
There is a somewhat hackish way if you wanted to keep your existing method of using re.search in a one-liner (but yuck):
getattr(re.search('(\d+)#', s), 'groups', lambda: ('0',))()

Group related objects in Django

I'm building an app where you can search for objects in a database (let's assume the objects you search for are persons). What I want to do is to group related objects, for example married couples. In other words, if two people share the same last name, we assume that they are married (not a great example, but you get the idea). The last name is the only thing that identifies two people as married.
In the search results I want to display the married couples next to each other, and all the other persons by themselves.
Let's say you search for "John", this is what I want:
John Smith - Jane Smith
John Adams - Nancy Adams
John Washington
John Andersson
John Ryan
Each name is then a link to that person's profile page.
What I have right now is a function that finds all pairs, and returns a list of tuples, where each tuple is a pair. The problem is that on the search results, every name that is in a pair is listed twice.
I do a query for the search query (Person.objects.filter(name__contains="John")), and the result of that query is sent to the match function. I then send both the original queryset and the match function result to the template.
I guess I could just exclude every person that the match function finds a match for, but I don't know, but is that the most efficient solution?
Edit:
As I wrote in a comment, the actual strings that I want to match are not identical. To quote myself:
In fact, the strings I want to match are not identical, instead they
look more like this: "foo2(bar13)" - "foo2(bar14)". That is, if two
strings have the same foo id (2), and if the bar id is an odd number
(13), then its match is the bar id + 1 (14). I have a regular
expression to find these matches
First get your objects sorted by last name:
def keyfun(p):
return p.name.split()[-1]
persons = sorted(Person.objects.all(), key = keyfun)
Then use groupby:
from itertools import groupby
for lname, persons in groupby(persons, keyfun):
print ' - '.join(p.name for p in persons)
Update Yes, this solution works for your new requirement too. All you need is a stable way to generate keys for each item, and replace the body of the keyfun with it:
from re import findall
def keyfun(p):
v1, v2 = findall(p.name, '\d+')
tot = int(v1) + int(v2) % 2
return tot
Your description for how to generate the key for each item is not clear enough, although you should be able to figure it out yourself with the above example.

Categories

Resources