Query dictionary based on a criteria and skip values that are missing - python

data = [
{'firstname': 'Tom ', 'lastname': 'Frank', 'title': 'Mr',
'education': 'B.Sc'},{'firstname': 'Anne ', 'middlename': 'David', 'lastname': 'Frank', 'title': 'Doctor',
'education': 'Ph.D'} , {'firstname': 'Ben ', 'lastname': 'William', 'title': 'Mr'}
]
I want to query the list of dictionaries based on the key 'education'. If the person's detail does not have this key the entire dictionary will be passed over.The desired output is
[(' Mr Tom Frank', 'B.Sc'),
('Doctor Anne David Frank', 'Ph.D') ]
My attempt would have an extra space between Tom and Frank as in Mr Tom Frank as well as between Anne and David . Here is the actual output
[('Mr Tom Frank', 'B.Sc'), ('Doctor Anne David Frank', 'Ph.D')]
I would like to avoid this if possible.
Here is the code I have written. I apologize if the code does not seem to be readable enough and I am ready to take any comments.
def qualified_applicants(data):
full_name_education=[ ]
keys = ['title','firstname','middlename','lastname']
for record in data:
#check to see if 'education' is one of the key
if 'education' in record.keys():
full_name=[' '.join([record.get(key,'') for key in keys])]
# make a tuple of education and full names
full_name_education.append(tuple(full_name+[record['education']]))
return full_name_education

You can use regex:
import re
data = [
{'firstname': 'Tom ', 'lastname': 'Frank', 'title': 'Mr',
'education': 'B.Sc'},{'firstname': 'Anne ', 'middlename': 'David', 'lastname': 'Frank', 'title': 'Doctor',
'education': 'Ph.D'} , {'firstname': 'Ben ', 'lastname': 'William', 'title': 'Mr'}
]
new_data = [(re.sub('\s{2,}', ' ', ' '.join(re.sub('\s+$', '', i.get(b, '')) for b in ['title', 'firstname', 'middlename', 'lastname'])), i['education']) for i in data if 'education' in i]
Output:
[('Mr Tom Frank', 'B.Sc'), ('Doctor Anne David Frank', 'Ph.D')]

The 'firstname' entries for your data appear to have a trailing blank. You can trim such leading and trailing white space using the strip method of the string returned by record.get(). This would make your list comprehension line be:
full_name = [' '.join([record.get(key,'').strip() for key in keys])]
to be tolerant of the extra whitespace.
FWIW, I think you would probably be better off having full_name not be a list but a plain string.

The codes seems to be working with the addition of one line of code like so:
temp=[' '.join(record.get(key,'') for key in keys)]
full_name=[' '.join(full_name.split() ) for full_name in temp ]
The rest of the lines didn't need any change.
This could be verbose but it is working. What is the most pythonic way of achieving the same result?

Related

How to convert list of lists to dict key value pairs python

I have a list of lists like so:
splitted = [['OID:XXXXXXXXXXX1',
' street:THE ROAD',
'town:NEVERPOOL',
'postcode:M1 2DD',
'Name:SOMEHWERE',
'street:THE ROAD',
'town:NEVERLAND',
'postcode:M1 2DD'],
['OID:XXXXXXXXXXX2',
' Name:30',
'street:DA PLACE',
'town:PERTH',
'postcode:PH1 2DD',
'Name:30',
'street:DA PLACE',
'town:PERTH',
'postcode:PH1 2DD']]
I'd like to convert these to key values pairs like so:
{'OID': 'XXXXXXXXXXX1', ' street': 'THE ROAD', 'town': 'NEVERPOOL', 'postcode': 'M1 2DD', 'Name': 'SOMEWHERE', 'street': 'THE ROAD', 'town': 'NEVERPOOL', 'postcode': 'M1 2DD'}, {'MPXN': 'XXXXXXXXXXX2', ' Name': '30', 'street': 'DA PLACE', 'town': 'PERTH', 'postcode': 'PH1 2DD', 'primaryName': '30', 'street1': 'DA PLACE', 'town': 'PERTH', 'postcode': 'PH1 2DD'}
I am unable to find a way online to convert a list of lists into key-value pairs as a dict to then be consumed by pandas. The purpose of this is to convert the dicts into a pandas DataFrame so I can then consume it and work with it in a tabular format
The code I have used thus far is here:
output = []
for list in splitted:
for key_value in list:
key, value = key_value.split(':', 1)
if not output or key in output[0]:
output.append({})
output[-1][key] = value
The problem with the above code is that it does not maintain the list of lists and mixes up the OID field with other data items, I'd like a dict starting from each OID.
Any help would be greatly appreciated :)
The problem is that you append a new dict to output every time that the first element of output has the key you're looking for. This causes your code to fail, because after the first list in splitted has been processed, the first element of output looks like this:
{'OID': 'XXXXXXXXXXX1',
' street': 'THE ROAD',
'town': 'NEVERPOOL',
'postcode': 'M1 2DD',
'Name': 'SOMEHWERE',
'street': 'THE ROAD'}
and all key values you will see henceforth already exist in said element.
What you actually want to do is to add a new dict every time you encounter a new list in splitted.
output = []
for l in splitted:
output.append(dict())
for key_value in l:
key, value = key_value.split(':', 1)
output[-1][key] = value
And now you get what you expected:
[{'OID': 'XXXXXXXXXXX1',
' street': 'THE ROAD',
'town': 'NEVERLAND',
'postcode': 'M1 2DD',
'Name': 'SOMEHWERE',
'street': 'THE ROAD'},
{'OID': 'XXXXXXXXXXX2',
' Name': '30',
'street': 'DA PLACE',
'town': 'PERTH',
'postcode': 'PH1 2DD',
'Name': '30'}]
While I have your attention:
dicts are an unordered data type in python (or ordered by insertion-order), so "mixes up the OID field with other data items" isn't really a thing. You wanted to create a single dict with all those keys, but you ended up creating a bunch of dicts, each with one key (after the first one)
list is a built-in class in python, so creating a variable called list shadows this class. You shouldn't do this, because later you might encounter errors if you want to use the list class.
Debugging is a crucial skill for a programmer to have. I encourage you to take a look at these links: How to debug small programs.
|
What is a debugger and how can it help me diagnose problems? You can use a debugger to step through your code and observe how each statement affects the state of your program, and this helps you figure out where you're going wrong.
I believe it is simplier:
splitted = [['OID:XXXXXXXXXXX1',
' street:THE ROAD',
'town:NEVERPOOL',
'postcode:M1 2DD',
'Name:SOMEHWERE',
'street:THE ROAD',
'town:NEVERLAND',
'postcode:M1 2DD'],
['OID:XXXXXXXXXXX2',
' Name:30',
'street:DA PLACE',
'town:PERTH',
'postcode:PH1 2DD',
'Name:30',
'street:DA PLACE',
'town:PERTH',
'postcode:PH1 2DD']]
output = []
for i in range(len(splitted)):
output.append(dict())
for j in splitted[i]:
k,v = j.split(':')
output[i][k] = v
the output is:
[{'OID': 'XXXXXXXXXXX1',
' street': 'THE ROAD',
'town': 'NEVERLAND',
'postcode': 'M1 2DD',
'Name': 'SOMEHWERE',
'street': 'THE ROAD'},
{'OID': 'XXXXXXXXXXX2',
' Name': '30',
'street': 'DA PLACE',
'town': 'PERTH',
'postcode': 'PH1 2DD',
'Name': '30'}]
You could also try to use dictionary comprehension:
[{x.split(":")[0]:x.split(":")[1] for x in splitted[0]},
{x.split(":")[0]:x.split(":")[1] for x in splitted[1]}]
The output is:
[{'OID': 'XXXXXXXXXXX1', ' street': 'THE ROAD', 'town': 'NEVERLAND', 'postcode': 'M1 2DD', 'Name': 'SOMEHWERE', 'street': 'THE ROAD'}, {'OID': 'XXXXXXXXXXX2', ' Name': '30', 'street': 'DA PLACE', 'town': 'PERTH', 'postcode': 'PH1 2DD', 'Name': '30'}]

How do i split this list into pairs or single elements. Is there an easier way of doing this?

In python I have a list of names, however some have a second name and some do not, how would I split the list into names with surnames and names without?
I don't really know how to explain it so please look at the code and see if you can understand (sorry if I have worded it really badly in the title)
See code below :D
names = ("Donald Trump James Barack Obama Sammy John Harry Potter")
# the names with surnames are the famous ones
# names without are regular names
list = names.split()
# I want to separate them into a list of separate names so I use split()
# but now surnames like "Trump" are counted as a name
print("Names are:",list)
This outputs
['Donald', 'Trump', 'James', 'Barack', 'Obama', 'Sammy', 'John', 'Harry', 'Potter']
I would like it to output something like ['Donald Trump', 'James', 'Barack Obama', 'Sammy', 'John', 'Harry Potter']
Any help would be appreciated
As said in the comments, you need a list of famous names.
# complete list of famous people
US_PRESIDENTS = (('Donald', 'Trump'), ('Barack', 'Obama'), ('Harry', 'Potter'))
def splitfamous(namestring):
words = namestring.split()
# create all tuples of 2 adjacent words and compare them to US_PRESIDENTS
for i, name in enumerate(zip(words, words[1:])):
if name in US_PRESIDENTS:
words[i] = ' '.join(name)
words[i+1] = None
# remove empty fields and return the result
return list(filter(None, words))
names = "Donald Trump James Barack Obama Sammy John Harry Potter"
print(splitfamous(names))
The resulting list:
['Donald Trump', 'James', 'Barack Obama', 'Sammy', 'John', 'Harry Potter']

How to split and remove a string in a list?

Here's my example code:
list1 = [{'name': 'foobar', 'parents': 'John Doe and Bartholomew Shoe'},
{'name': 'Wisteria Ravenclaw', 'parents': 'Douglas Lyphe and Jackson Pot'
}]
I need to split parent into a list and remove 'and' string. So the output should look like this:
list1 = [{'name': 'foobar', 'parents': ['John Doe', 'Bartholomew Shoe'],
{'name': 'Wisteria Ravenclaw', 'parents': ['Douglal Lyphe', 'Jackson', 'Pot']
}]
Please help me figure this out.
for people in list1:
people['parents'] = people['parents'].split('and')
I'm not sure how to move that ', ' string.
You should use people inside loop, not the iterator itself.
for people in list1:
people['parents'] = people['parents'].split(' and ')
and then when you print list1, you get:
[{'name': 'foobar', 'parents': ['John Doe', 'Bartholomew Shoe']}, {'name': 'Wisteria Ravenclaw', 'parents': ['Douglas Lyphe', 'Jackson Pot']}]
Expanding on what others said: You may want to split on a regular expression so that
you don't split on and in case a name happens to contain that substring,
you remove the whitespace around and.
Like so:
import re
list1 = [
{'name': 'foobar', 'parents': 'John Doe and Bartholomew Shoe'},
{'name': 'Wisteria Ravenclaw', 'parents': 'Douglas Lyphe and Jackson Pot'}
]
for people in list1:
people['parents'] = re.split(r'\s+and\s+', people['parents'])
print(list1)

Python Variable Amount Of Input

I'm working on a program that determines whether a graph is strongly connected.
I am reading standard input on a sequence of lines.
The lines have two or three whitespace-delimited tokens, the name of the source and destination vertices, and an optional decimal edge weight.
Input might look like this:
'''
Houston Washington 1000
Vancouver Houston 300
Dallas Sacramento 800
Miami Ames 2000
SanFrancisco LosAngeles
ORD PVD 1000
'''
How can I read in this input and add it to my graph?
I believe I will be using a collection like this:
flights = collections.defaultdict(dict)
Thank you for any help!
with d as your data, you can use split your line with '\n' in it and then strip trailing white space and find the last occurrence of . With that you can slice your string to get the name and the number associated with it.
Here I've stored the data to a dictionary. You can modify it according to your requirement!
Use regular expression modules re.sub to remove the extra spaces.
>>> import re
>>> d
'\nHouston Washington 1000\nVancouver Houston 300\nDallas Sacramento 800\nMiami Ames 2000\nSanFrancisco LosAngeles\nORD PVD 1000\n'
>>>[{'Name':re.sub(r' +',' ',each[:each.strip().rfind(' ')]).strip(),'Flight Number':each[each.strip().rfind(' '):].strip()} for each in filter(None,d.split('\n'))]
[{'Flight Number': '1000', 'Name': 'Houston Washington'}, {'Flight Number': '300', 'Name': 'Vancouver Houston'}, {'Flight Number': '800', 'Name': 'Dallas Sacramento'}, {'Flight Number': '2000', 'Name': 'Miami Ames'}, {'Flight Number': 'LosAngeles', 'Name': 'SanFrancisco'}, {'Flight Number': '1000', 'Name': 'ORD PVD'}]
Edit:
To match your flights dict,
>>> flights={'Houston':{'Washington':''},'Vancouver':{'Houston':''}} #sample dict
>>> for each in filter(None,d.split('\n')):
... flights[each.split()[0]][each.split()[1]]=each.split()[2]

turn key="value" string into a dict

I have a string with the following format:
author="PersonsName" date="1183050420" format="1.1" version="1.2"
I want to turn it in to a Python dict, a la:
{'author': 'PersonsName', 'date': '1183050420', 'format': '1.1', 'version': '1.2'}
I have tried to do so using re.split on the string as so:
attribs = (re.split('(=?" ?)', twikiattribs))
thinking I would get a list back like:
['author', 'PersonsName', 'date', '1183050420', 'format', '1.1', 'version', '1.2']
that then I could turn into a dict, but instead I'm getting:
['author', '="', 'PersonsName', '" ', 'date', '="', '1183050420', '" ', 'format', '="', '1.1', '" ', 'version', '="', '1.2', '"', '']
So, before I follow the re.split line further, is there generally a better way to achieve what I'm trying to do, and/or if the solution involves re.split, how can I write a regex that will split on any of the strings =", "_ (where "_" is a space char) or just " to just yield a list with the keys in the odd indices, and values in the even?
Use re.findall():
dict(re.findall(r'(\w+)="([^"]+)"', twikiattribs))
re.findall(), when presented with a pattern with multiple capturing groups, returns a list of tuples, each nested tuple containing the captured groups. dict() happily takes that output and interprets each nested tuple as a key-value pair.
Demo:
>>> import re
>>> twikiattribs = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
>>> re.findall(r'(\w+)="([^"]+)"', twikiattribs)
[('author', 'PersonsName'), ('date', '1183050420'), ('format', '1.1'), ('version', '1.2')]
>>> dict(re.findall(r'(\w+)="([^"]+)"', twikiattribs))
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}
re.split() also behaves differently based on capturing groups; the text on which you split is included in the output if grouped. Compare the output with and without the capturing group:
>>> re.split('(=?" ?)', twikiattribs)
['author', '="', 'PersonsName', '" ', 'date', '="', '1183050420', '" ', 'format', '="', '1.1', '" ', 'version', '="', '1.2', '"', '']
>>> re.split('=?" ?', twikiattribs)
['author', 'PersonsName', 'date', '1183050420', 'format', '1.1', 'version', '1.2', '']
The re.findall() output is far easier to convert to a dictionary however.
you can also do it without re in one line:
>>> data = '''author="PersonsName" date="1183050420" format="1.1" version="1.2"'''
>>> {k:v.strip('"') for k,v in [i.split("=",1) for i in data.split(" ")]}
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}
if whitespaces are allowed inside the values you can use this line:
>>> {k:v.strip('"') for k,v in [i.split("=",1) for i in data.split('" ')]}
The way I'd personally parse it:
import shlex
s = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
dict(x.split('=') for x in shlex.split(s))
Out[12]:
{'author': 'PersonsName',
'date': '1183050420',
'format': '1.1',
'version': '1.2'}
A non-regex list comprehension one liner:
>>> s = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
>>> print dict([tuple(x.split('=')) for x in s.split()])
{'date': '"1183050420"', 'format': '"1.1"', 'version': '"1.2"', 'author': '"PersonsName"'}
The problem is that you included parenthesis in your regex, which turns it into a captured group and includes it in the split. Assign attribs like this
attribs = (re.split('=?" ?', twikiattribs))
and it will work as expected. This does return a blank string (due to the final " in your input string), so you'll want to use attribs[:-1] when creating the dictionary.
Try
>>> str = 'author="PersonsName" date="1183050420" format="1.1" version="1.2"'
>>> eval ('dict(' + str.replace(" ",",") + ')')
{'date': '1183050420', 'format': '1.1', 'version': '1.2', 'author': 'PersonsName'}
assuming as earlier the values have no space in them.
Beware of using eval() though. Bad things may happen for funny input. Don't use it on user input.
This might help some other people that re.findall() doesn't.
# grabbing input
input1 = dict,list,ect
# creating a phantom variable
Phantom = 'variable_name = ' + input1
# executing the phantom
phenomenon = exec(Phantom)
# storing the phantom variable in a live one
output = variable_name
# printing the stored phantom variable
print(output)
What it essentially does is adds a variable name to your input and creates that variable.
For example, if your list returns as "[[1,2][list][3,4]]" this executes as variable_name = [[1,2][list][3,4]]
In which activates it's original function.
It does create a PEP 8 error since the variable doesn't exist until it runs.

Categories

Resources