Searching a Parallel Array in Python

Searching a Parallel Array in Python - python

This is a homework question, I got the basics down, but I can't seem to find the correct method of searching two parallel arrays.
Original Question: Design a program that has two parallel arrays: a String array named people that is initialized with the names of seven people, and a String array named phoneNumbers that is initialized with your friends' phone numbers. The program should allow the user to enter a person's name (or part of a person's name). It should then search for that person in the people array. If the person is found, it should get that person's phone number from the phoneNumbers array and display it. If the person is not found, program should display a message indicating so.
My current code:
# create main
def main():
# take in name or part of persons name
person = raw_input("Who are you looking for? \n> ")
# convert string to all lowercase for easier searching
person = person.lower()
# run people search with the "person" as the parameters
peopleSearch(person)
# create module to search the people list
def peopleSearch(person):
# create list with the names of the people
people = ["john",
"tom",
"buddy",
"bob",
"sam",
"timmy",
"ames"]
# create list with the phone numbers, indexes are corresponding with the names
# people[0] is phoneNumbers[0] etc.
phoneNumbers = ["5503942",
"9543029",
"5438439",
"5403922",
"8764532",
"8659392",
"9203940"]
Now, my entire problem begins here. How do I conduct a search (or partial search) on a name, and return the index of the persons name in the people array and print the phone number accordingly?
Update: I added this to the bottom of the code in order to conduct the search.
lookup = dict(zip(people, phoneNumbers))
if person in lookup:
print "Name: ", person ," \nPhone:", lookup[person]
But this only works for full matches, I tried using this to get a partial match.
[x for x in enumerate(people) if person in x[1]]
But when I search it on 'tim' for example, it returns [(5, 'timmy')]. How do I get that index of 5 and apply it in print phoneNumbers[the index returned from the search]?
Update 2: Finally got it to work perfectly. Used this code:
# conduct a search for the person in the people list
search = [x for x in enumerate(people) if person in x[1]]
# for each person that matches the "search", print the name and phone
for index, person in search:
# print name and phone of each person that matches search
print "Name: ", person , "\nPhone: ", phoneNumbers[index]
# if there is nothing that matches the search
if not search:
# display message saying no matches
print "No matches."

Since this is homework, I'll refrain from giving the code outright.
You can create a dict that works as a lookup table with the name as the key and the phone number as its value.
Creating the lookup table:
You can easily convert the parallel arrays into a dict using dict() and zip(). Something along the lines of:
lookup = dict(zip(people, phoneNumbers))
To see how that works, have a look at this example:
>>> people = ["john", "jacob", "bob"]
>>> phoneNumbers = ["5503942", "8659392", "8659392"]
>>> zip(people, phoneNumbers)
[('john', '5503942'), ('jacob', '8659392'), ('bob', '8659392')]
>>> dict(zip(people, phoneNumbers))
{'jacob': '8659392', 'bob': '8659392', 'john': '5503942'}
Finding if a person exist:
You can quickly figure out if a person (key) exist in the lookup table using:
if name in lookup:
# ... phone number will be lookup[name]
List of people whose name matches substring:
This answer should put you on the right track.
And of course, if the search returns an empty list there are no matching names and you can display an appropriate message.
Alternative suggestion
Another approach is to search the list directly and obtain the index of matches which you can then use to retrieve the phone number.
I'll offer you this example and leave it up to you to expand it into a viable solution.
>>> people = ["john", "jacob", "bob", "jacklyn", "cojack", "samantha"]
>>> [x for x in enumerate(people) if "jac" in x[1]]
[(1, 'jacob'), (3, 'jacklyn'), (4, 'cojack')]
If you hit a snag along the way, share what you've done and we'll be glad to assist.
Good luck, and have fun.
Response to updated question
Note that I've provided two alternative solutions, one using a dict as a lookup table and another searching the list directly. Your updates indicate you're trying to mix both solutions together, which is not necessary.
If you need to search through all the names for substring matches, you might be better off with the second solution (searching the listdirectly). The code example I provided returns a list (since there may be more than one name that contain that substring), with each item being a tuple of (index, name). You'll need to iterate throught the list and extract the index and name. You can then use the index to retrieve the phone number.
To avoid just giving you the solution, here's related example:
>>> people = ["john", "jacob", "bob", "jacklyn", "cojack", "samantha"]
>>> matches = [x for x in enumerate(people) if "jac" in x[1]]
>>> for index, name in matches:
... print index, name
...
1 jacob
3 jacklyn
4 cojack
>>> matches = [x for x in enumerate(people) if "doesnotexist" in x[1]]
>>> if not matches:
... print "no matches"
...
no matches

You might want to look here for the answer to How do I ... return the index of the persons name in the people array.

Related

Python - Iterating through python list using another list

I'm stuck on the following problem:
I have a list with a ton of duplicative data. This includes entry numbers and names.
The following gives me a list of unique (non duplicative) names of people from the Data2014 table:
tablequery = c.execute("SELECT * FROM Data2014")
tablequery_results = list(people2014)
people2014_count = len(tablequery_results)
people2014_list = []
for i in tablequery_results:
if i[1] not in people2014_list:
people2014_list.append(i[1])
people2014_count = len(people2014_list)
# for i in people2014_list:
# print(i)
Now that I have a list of people. I need to iterate through tablequery_results again, however, this time I need to find the number of unique entry numbers each person has. There are tons of duplicates in the tablequery_results list. Without creating a block of code for each individual person's name, is there a way to iterate through tablequery_results using the names from people2014_list as the unique identifier? I can replicate the code from above to give me a list of unique entry numbers, but I can't seem to match the names with the unique entry numbers.
Please let me know if that does not make sense.
Thanks in advance!

I discovered my answer after delving into SQL a bit more. This gives me a list with two columns. The person's name in the first column, and then the numbers of entries that person has in the second column.
def people_data():
data_fetch = c.execute("SELECT person, COUNT(*) AS `NUM` FROM Data2014 WHERE ACTION='UPDATED' GROUP BY Person ORDER BY NUM DESC")
people_field_results = list(data_fetch)
people_field_results_count = len(people_field_results)
for i in people_field_results:
print(i)
print(people_field_results_count)

Regular expressions matching words which contain the pattern but also the pattern plus something else

I have the following problem:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
I need to find which elements of list2 are in list1. In actual fact the elements of list1 correspond to a numerical value which I need to obtain then change. The problem is that 'xyz2' contains 'xyz' and therefore matches also with a regular expression.
My code so far (where 'data' is a python dictionary and 'specie_name_and_initial_values' is a list of lists where each sublist contains two elements, the first being specie name and the second being a numerical value that goes with it):
all_keys = list(data.keys())
for i in range(len(all_keys)):
if all_keys[i]!='Time':
#print all_keys[i]
pattern = re.compile(all_keys[i])
for j in range(len(specie_name_and_initial_values)):
print re.findall(pattern,specie_name_and_initial_values[j][0])
Variations of the regular expression I have tried include:
pattern = re.compile('^'+all_keys[i]+'$')
pattern = re.compile('^'+all_keys[i])
pattern = re.compile(all_keys[i]+'$')
And I've also tried using 'in' as a qualifier (i.e. within a for loop)
Any help would be greatly appreciated. Thanks
Ciaran
----------EDIT------------
To clarify. My current code is below. its used within a class/method like structure.
def calculate_relative_data_based_on_initial_values(self,copasi_file,xlsx_data_file,data_type='fold_change',time='seconds'):
copasi_tool = MineParamEstTools()
data=pandas.io.excel.read_excel(xlsx_data_file,header=0)
#uses custom class and method to get the list of lists from a file
specie_name_and_initial_values = copasi_tool.get_copasi_initial_values(copasi_file)
if time=='minutes':
data['Time']=data['Time']*60
elif time=='hour':
data['Time']=data['Time']*3600
elif time=='seconds':
print 'Time is already in seconds.'
else:
print 'Not a valid time unit'
all_keys = list(data.keys())
species=[]
for i in range(len(specie_name_and_initial_values)):
species.append(specie_name_and_initial_values[i][0])
for i in range(len(all_keys)):
for j in range(len(specie_name_and_initial_values)):
if all_keys[i] in species[j]:
print all_keys[i]
The table returned from pandas is accessed like a dictionary. I need to go to my data table, extract the headers (i.e. the all_keys bit), then look up the name of the header in the specie_name_and_initial_values variable and obtain the corresponding value (the second element within the specie_name_and_initial_value variable). After this, I multiply all values of my data table by the value obtained for each of the matched elements.
I'm most likely over complicating this. Do you have a better solution?
thanks
----------edit 2 ---------------
Okay, below are my variables
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
species = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])

You don't need a regex to find common elements, set.intersection will find all elements in list2 that are also in list1:
list1=['xyz','xyz2','other_randoms']
list2=['xyz']
print(set(list2).intersection(list1))
set(['xyz'])
Also if you wanted to compare 'xyz' to 'xyz2' you would use == not in and then it would correctly return False.
You can also rewrite your own code a lot more succinctly, :
for key in data:
if key != 'Time':
pattern = re.compile(val)
for name, _ in specie_name_and_initial_values:
print re.findall(pattern, name)
Based on your edit you have somehow managed to turn lists into strings, one option is to strip the []:
all_keys = set([u'Cyp26_G_R1', u'Cyp26_G_rep1', u'Time'])
specie_name_and_initial_values = set(['[Cyp26_R1R2_RARa]', '[Cyp26_SRC3_1]', '[18-OH-RA]', '[p38_a]', '[Cyp26_G_rep1]', '[Cyp26]', '[Cyp26_G_a]', '[SRC3_p]', '[mRARa]', '[np38_a]', '[mRARa_a]', '[RARa_pp_TFIIH]', '[RARa]', '[Cyp26_G_L2]', '[atRA]', '[atRA_c]', '[SRC3]', '[RARa_Ser369p]', '[p38]', '[Cyp26_mRNA]', '[Cyp26_G_L]', '[TFIIH]', '[Cyp26_SRC3_2]', '[Cyp26_G_R1R2]', '[MSK1]', '[MSK1_a]', '[Cyp26_G]', '[Basal_Kinases]', '[Cyp26_R1_RARa]', '[4-OH-RA]', '[Cyp26_G_rep2]', '[Cyp26_Chromatin]', '[Cyp26_G_R1]', '[RXR]', '[SMRT]'])
specie_name_and_initial_values = set(s.strip("[]") for s in specie_name_and_initial_values)
print(all_keys.intersection(specie_name_and_initial_values))
Which outputs:
set([u'Cyp26_G_R1', u'Cyp26_G_rep1'])
FYI, if you had lists inside the set you would have gotten an error as lists are mutable so are not hashable.

Sorting a list using a regex in Python

I have a list of email addresses with the following format:
name####email.com
But the number is not always present. For example: john45#email.com, bob#email.com joe2#email.com, etc. I want to sort these names by the number, with those without a number coming first. I have come up with something that works, but being new to Python, I'm curious as to whether there's a better way of doing it. Here is my solution:
import re
def sortKey(name):
m = re.search(r'(\d+)#', name)
return int(m.expand(r'\1')) if m is not None else 0
names = [ ... a list of emails ... ]
for name in sorted(names, key = sortKey):
print name
This is the only time in my script that I am ever using "sortKey", so I would prefer it to be a lambda function, but I'm not sure how to do that. I know this will work:
for name in sorted(names, key = lambda n: int(re.search(r'(\d+)#', n).expand(r'\1')) if re.search(r'(\d+)#', n) is not None else 0):
print name
But I don't think I should need to call re.search twice to do this. What is the most elegant way of doing this in Python?

Better using re.findall as if no numbers are found, then it returns an empty list which will sort before a populated list. The key used to sort is any numbers found (converted to ints), followed by the string itself...
emails = 'john45#email.com bob#email.com joe2#email.com'.split()
import re
print sorted(emails, key=lambda L: (map(int, re.findall('(\d+)#', L)), L))
# ['bob#email.com', 'joe2#email.com', 'john45#email.com']
And using john1 instead the output is: ['bob#email.com', 'john1#email.com', 'joe2#email.com'] which shows that although lexicographically after joe, the number has been taken into account first shifting john ahead.
There is a somewhat hackish way if you wanted to keep your existing method of using re.search in a one-liner (but yuck):
getattr(re.search('(\d+)#', s), 'groups', lambda: ('0',))()

Combining two lists and sorting through reference to dictionary Python

I have (what seems to me is) a pretty convoluted problem. I'm going to try to be as succinct as possible - though in order to understand the issue fully, you might have to click on my profile and look at the (only other) two questions I've posted on StackOverflow. In short: I have two lists -- one is comprised of email strings that contain a facility name, and a date of incident. The other is comprised of the facility ids for each email (I use one of the following regex functions to get this list). I've used Regex to be able to search each string for these pieces of information. The 3 Regex functions are:
def find_facility_name(incident):
pattern = re.compile(r'Subject:.*?for\s(.+?)\n')
findPat1 = re.search(pattern, incident)
facility_name = findPat1.group(1)
return facility_name
def find_date_of_incident(incident):
pattern = re.compile(r'Date of Incident:\s(.+?)\n')
findPat2 = re.search(pattern, incident)
incident_date = findPat2.group(1)
return incident_date
def find_facility_id(incident):
pattern = re.compile('(\d{3})\n')
findPat3 = re.search(pattern, incident)
f_id = findPat3.group(1)
return f_id
I also have a dictionary that is formatted like this:
d = {'001' : 'Facility #1', '002' : 'Another Facility'...etc.}
I'm trying to COMBINE the two lists and sort by the Key values in the dictionary, followed by the Date of Incident. Since the key values are attached to the facility name, this should automatically caused emails from the same facilities to be grouped together. In order to do that, I've tried to use these two functions:
def get_facility_ids(incident_list):
'''(lst) -> lst
Return a new list from incident_list that inserts the facility IDs from the
get_facilities dictionary into each incident.
'''
f_id = []
for incident in incident_list:
find_facility_name(incident)
for k in d:
if find_facility_name(incident) == d[k]:
f_id.append(k)
return f_id
id_list = get_facility_ids(incident_list)
def combine_lists(L1, L2):
combo_list = []
for i in range(len(L1)):
combo_list.append(L1[i] + L2[i])
return combo_list
combination = combine_lists(id_list, incident_list)
def get_sort_key(incident):
'''(str) -> tup
Return a tuple from incident containing the facility id as the first
value and the date of the incident as the second value.
'''
return (find_facility_id(incident), find_date_of_incident(incident))
final_list = sorted(combination, key=get_sort_key)
Here is an example of what my input might be and the desired output:
d = {'001' : 'Facility #1', '002' : 'Another Facility'...etc.}
input: first_list = ['email_1', 'email_2', etc.]
first output: next_list = ['facility_id_for_1+email_1', 'facility_id_for_2 + email_2', etc.]
DESIRED OUTPUT: FINAL_LIST = sorted(next_list, key=facility_id, date of incident)
The only problem is, the key values are not matching properly with what's found in each individual email string. Some DO, others are completely random. I have no idea why this is happening, but I have a feeling it has something to do with the way I'm combining the two lists. Can anyone help this lowly n00b? Thanks!!!

First off, I would suggest reversing your ID-to-name dictionary. Looking up a value by key is very fast but finding a key by value is very slow.
rd = { name: id_num for id_num, name in d.items() }
Then your first function can be replaced by a list comprehension:
id_list = [rd[find_facility_name(incident)] for incident in incident_list]
This might also expose why you're getting messed up values in your results. If an incident has a facility name that's not in your dictionary, this code will raise a KeyError (whereas your old function would just skip it).
Your combine function is very similar to Python's built in zip function. I'd replace it with:
combination = [id+incident for id, incident in zip(id_list, incident_list)]
However, since you're building the first list from the second one, it might make sense to build the combined version directly, rather than making separate lists and then combining them in a separate step. Here's an update to the list comprehension above that goes right to the combination result:
combination = [rd[find_facility_name(incident)] + incident
for incident in incident_list]
To do the sort, you can use the ID string that we just prepended to the email message, rather than parsing to find the ID again:
combination.sort(key=lambda x: (x[0:3], get_date_of_incident(x)))
The 3 in the slice is based off of your example of "001" and "002" as the id values. If the actual ids are longer or shorter you'll need to adjust that.

So, here is what I think is going on. Please correct me if possible.
The 'incident_list' is a list of email strings. You go in and find the facility names in each email because you have an external dictionary that has the (key:value)=(facility id: facility name). From the dictionary, you can extract the facility id in this 'id_list'.
You combine the lists so that you get a list of strings [facility id + email,...]
Then you want it to sort by a tuple( facility id, date of incidence ).
It looks like you are searching for the facility id and the facility name twice. You can skip a step if they are the same. Then, the best way is to do it all at once with tuples:
incident_list = ['email1', 'email2',...]
unsorted_list = []
for email in incident list:
id = find_facility_id(email)
date = find_date_of_incident(email)
mytuple = ( id, date, id + email )
unsorted_list.append(mytuple)
final_list = sorted(unsorted_list, key=lambda mytup:(mytup[0], mytup[1]))
Then you get an easy list of tuples sorted by first element (id as a string), and then second element (date as a string). If you just need a list of strings ( id + email ), then you need a list with the last element of each tuple part
FINALLIST = [ tup[-1] for tup in final_list ]

Group related objects in Django

I'm building an app where you can search for objects in a database (let's assume the objects you search for are persons). What I want to do is to group related objects, for example married couples. In other words, if two people share the same last name, we assume that they are married (not a great example, but you get the idea). The last name is the only thing that identifies two people as married.
In the search results I want to display the married couples next to each other, and all the other persons by themselves.
Let's say you search for "John", this is what I want:
John Smith - Jane Smith
John Adams - Nancy Adams
John Washington
John Andersson
John Ryan
Each name is then a link to that person's profile page.
What I have right now is a function that finds all pairs, and returns a list of tuples, where each tuple is a pair. The problem is that on the search results, every name that is in a pair is listed twice.
I do a query for the search query (Person.objects.filter(name__contains="John")), and the result of that query is sent to the match function. I then send both the original queryset and the match function result to the template.
I guess I could just exclude every person that the match function finds a match for, but I don't know, but is that the most efficient solution?
Edit:
As I wrote in a comment, the actual strings that I want to match are not identical. To quote myself:
In fact, the strings I want to match are not identical, instead they
look more like this: "foo2(bar13)" - "foo2(bar14)". That is, if two
strings have the same foo id (2), and if the bar id is an odd number
(13), then its match is the bar id + 1 (14). I have a regular
expression to find these matches

First get your objects sorted by last name:
def keyfun(p):
return p.name.split()[-1]
persons = sorted(Person.objects.all(), key = keyfun)
Then use groupby:
from itertools import groupby
for lname, persons in groupby(persons, keyfun):
print ' - '.join(p.name for p in persons)
Update Yes, this solution works for your new requirement too. All you need is a stable way to generate keys for each item, and replace the body of the keyfun with it:
from re import findall
def keyfun(p):
v1, v2 = findall(p.name, '\d+')
tot = int(v1) + int(v2) % 2
return tot
Your description for how to generate the key for each item is not clear enough, although you should be able to figure it out yourself with the above example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Searching a Parallel Array in Python - python

You might want to look here for the answer to How do I ... return the index of the persons name in the people array.

Related

Python - Iterating through python list using another list

Regular expressions matching words which contain the pattern but also the pattern plus something else

Sorting a list using a regex in Python

Combining two lists and sorting through reference to dictionary Python

Group related objects in Django

Categories

Resources