Compare dictionary with a variable

Compare dictionary with a variable - python

I am trying to compare a dictionary value with a variable but for some reasons I can't output the part that I want from the dictionary.
The dictionary is an ouput from a html table.
This is the code that I use to prase the html table into a dictionary:
with open('output.csv') as fd:
rd = csv.DictReader(fd, skipinitialspace=True)
for row in rd:
lista = { k: row[k] for k in row if k in ['Name', 'Clan Days']}
This is the output:
{'Name': 'SirFulgeruL2k19', 'Clan Days': '140'}
{'Name': 'Darius', 'Clan Days': '127'}
How to I compare for example the clan days from the first dictionary and if the value matches the value that I set in a variable should get the name as a string so I can later use it in another line.

Assuming you first read the data into a list of dictionaries:
data = [{ k: row[k] for k in row if k in ['Name', 'Clan Days']}
for row in rd]
You may use next() to search for the first dictionary in data matching the Clan Days value defaulting to None if no entries matched your search query:
desired_clan_days = '140'
clan_name = next((entry["Name"] for entry in data
if entry["Clan Days"] == desired_clan_days), None)
Now, next() would return you the first match, if you need all of the matches, just use a list comprehension:
clan_names = [entry["Name"] for entry in data
if entry["Clan Days"] == desired_clan_days]
Note that this kind of search requires you to, in the worst case (entry not found), loop through all the entries in data. If this kind of search is the primary use case of this data structure, consider re-designing it to better fit the problem - e.g. having clan_days value as a key with a list of clan names:
data = {
"140": ["SirFulgeruL2k19"],
"127": ["Darius"]
}
In that state, getting a match would be a constant operation and as easy as data[desired_clan_days]. defaultdict(list) is something that would help you to make that transformation.

Not really sure what exactly you want, but if it's just comparing a dictionary value to a variable and getting the Name part if they match, you would get something like this..
>>> dict = {'Name': 'SirFulgeruL2k19', 'Clan Days': '140'}
>>> target = 140
>>> if int(dict['Clan Days']) == target:
... name = dict['Name']
...
>>> name
'SirFulgeruL2k19'
Edit: Read your post too quickly, considering it's all the rows from a HTML table this code is too simple. Use alecxe's answer :)

Related

How do i retrieve the variable name for each dict in a list of dicts?

I may be missing something fundamental here but consider the following:
graph=nx.read_graphml('path here...')
dDict=dict(nx.degree_centrality(graph)) #create dict
lDict=dict(nx.load_centrality(graph))
new_lists=[dDict,lDict]
for i in new_lists:
print().... #how to get variable name i.e. dDict
how do i iterate through the list of dicts so that when i do a print it returns me the variable name the dict equals i.e. i want to be able to retrieve back 'dDict' and 'lDict'?I do not want a quick hack such as
dDict['name'] = 'dDict'
Any ideas..?
EDIT: the reason i want to do this is so that i can append these centrality measures to a dataframe with new column name i.e.:
for idx in range(len(new_lists)):
for i in range(len(df)):
rowIndex = df.index[i]
df.loc[rowIndex, idx] = new_lists[idx][rowIndex] #instead of idx how do i dynamically name the column according to the list item name.

You can iterate over globals() and get the variable name of the object that matches the content you are looking for.
More info on https://docs.python.org/3/library/functions.html?highlight=globals#globals
However, this is a rather cumbersome trick, you should not do that! Rather, redesign your software so you don't have to look for the variable names in the first place.
def get_var_name_of(x):
return [k for k,v in globals().items() if v==x][0]
dDict = {1:'asd'}
lDict = {2:'qwe'}
new_list=[dDict,lDict]
for d in new_list:
print(get_var_name_of(d))
dDict
lDict

Short way to add dictionary item to a set only when its string value isn't empty?

I currently have this piece of code:
name_set = set()
reader = [{'name':'value1'}, {'name':''}, {'name':'value2'}]
for row in reader:
name = row.get('name', None)
if name:
name_set.add(name)
print(name_set)
In the real code the reader is a DictReader, but I use a list with dicts to represent this.
Note that the if name: will check for:
Empty string present in the Dictionary (thus "")
Not whenever the key does not exist in the Dictionary
Although, I think this code is easy readable, but I'm wondering if there is a shorter way as this code is 6 lines to simply extract values from dicts and save these in a set.

Your existing code is fine.
But since you asked for a "short" way, you could just use set comprehensions/arithmetic:
>>> reader = [{'name':'value1'}, {'name':''}, {'name':'value2'}]
>>> {d['name'] for d in reader} - {''}
{'value1', 'value2'}

Pandas Dataframe to Dictionary with Multiple Keys

I am currently working with a dataframe consisting of a column of 13 letter strings ('13mer') paired with ID codes ('Accession') as such:
However, I would like to create a dictionary in which the Accession codes are the keys with values being the 13mers associated with the accession so that it looks as follows:
{'JO2176': ['IGY....', 'QLG...', 'ESS...', ...],
'CYO21709': ['IGY...', 'TVL...',.............],
...}
Which I've accomplished using this code:
Accession_13mers = {}
for group in grouped:
Accession_13mers[group[0]] = []
for item in group[1].iteritems():
Accession_13mers[group[0]].append(item[1])
However, now I would like to go back through and iterate through the keys for each Accession code and run a function I've defined as find_match_position(reference_sequence, 13mer) which finds the 13mer in in a reference sequence and returns its position. I would then like to append the position as a value for the 13mer which will be the key.
If anyone has any ideas for how I can expedite this process that would be extremely helpful.
Thanks,
Justin

I would suggest creating a new dictionary, whose values are another dictionary. Essentially a nested dictionary.
position_nmers = {}
for key in H1_Access_13mers:
position_nmers[key] = {} # replicate key, val in new dictionary, as a dictionary
for value in H1_Access_13mers[key]:
position_nmers[key][value] = # do something
To introspect the dictionary and make sure it's okay:
print position_nmers

You can iterate over the groupby more cleanly by unpacking:
d = {}
for key, s in df.groupby('Accession')['13mer']:
d[key] = list(s)
This also makes it much clearer where you should put your function!
... However, I think that it might be better suited to an enumerate:
d2 = {}
for pos, val in enumerate(df['13mer']):
d2[val] = pos

Combining two lists and sorting through reference to dictionary Python

I have (what seems to me is) a pretty convoluted problem. I'm going to try to be as succinct as possible - though in order to understand the issue fully, you might have to click on my profile and look at the (only other) two questions I've posted on StackOverflow. In short: I have two lists -- one is comprised of email strings that contain a facility name, and a date of incident. The other is comprised of the facility ids for each email (I use one of the following regex functions to get this list). I've used Regex to be able to search each string for these pieces of information. The 3 Regex functions are:
def find_facility_name(incident):
pattern = re.compile(r'Subject:.*?for\s(.+?)\n')
findPat1 = re.search(pattern, incident)
facility_name = findPat1.group(1)
return facility_name
def find_date_of_incident(incident):
pattern = re.compile(r'Date of Incident:\s(.+?)\n')
findPat2 = re.search(pattern, incident)
incident_date = findPat2.group(1)
return incident_date
def find_facility_id(incident):
pattern = re.compile('(\d{3})\n')
findPat3 = re.search(pattern, incident)
f_id = findPat3.group(1)
return f_id
I also have a dictionary that is formatted like this:
d = {'001' : 'Facility #1', '002' : 'Another Facility'...etc.}
I'm trying to COMBINE the two lists and sort by the Key values in the dictionary, followed by the Date of Incident. Since the key values are attached to the facility name, this should automatically caused emails from the same facilities to be grouped together. In order to do that, I've tried to use these two functions:
def get_facility_ids(incident_list):
'''(lst) -> lst
Return a new list from incident_list that inserts the facility IDs from the
get_facilities dictionary into each incident.
'''
f_id = []
for incident in incident_list:
find_facility_name(incident)
for k in d:
if find_facility_name(incident) == d[k]:
f_id.append(k)
return f_id
id_list = get_facility_ids(incident_list)
def combine_lists(L1, L2):
combo_list = []
for i in range(len(L1)):
combo_list.append(L1[i] + L2[i])
return combo_list
combination = combine_lists(id_list, incident_list)
def get_sort_key(incident):
'''(str) -> tup
Return a tuple from incident containing the facility id as the first
value and the date of the incident as the second value.
'''
return (find_facility_id(incident), find_date_of_incident(incident))
final_list = sorted(combination, key=get_sort_key)
Here is an example of what my input might be and the desired output:
d = {'001' : 'Facility #1', '002' : 'Another Facility'...etc.}
input: first_list = ['email_1', 'email_2', etc.]
first output: next_list = ['facility_id_for_1+email_1', 'facility_id_for_2 + email_2', etc.]
DESIRED OUTPUT: FINAL_LIST = sorted(next_list, key=facility_id, date of incident)
The only problem is, the key values are not matching properly with what's found in each individual email string. Some DO, others are completely random. I have no idea why this is happening, but I have a feeling it has something to do with the way I'm combining the two lists. Can anyone help this lowly n00b? Thanks!!!

First off, I would suggest reversing your ID-to-name dictionary. Looking up a value by key is very fast but finding a key by value is very slow.
rd = { name: id_num for id_num, name in d.items() }
Then your first function can be replaced by a list comprehension:
id_list = [rd[find_facility_name(incident)] for incident in incident_list]
This might also expose why you're getting messed up values in your results. If an incident has a facility name that's not in your dictionary, this code will raise a KeyError (whereas your old function would just skip it).
Your combine function is very similar to Python's built in zip function. I'd replace it with:
combination = [id+incident for id, incident in zip(id_list, incident_list)]
However, since you're building the first list from the second one, it might make sense to build the combined version directly, rather than making separate lists and then combining them in a separate step. Here's an update to the list comprehension above that goes right to the combination result:
combination = [rd[find_facility_name(incident)] + incident
for incident in incident_list]
To do the sort, you can use the ID string that we just prepended to the email message, rather than parsing to find the ID again:
combination.sort(key=lambda x: (x[0:3], get_date_of_incident(x)))
The 3 in the slice is based off of your example of "001" and "002" as the id values. If the actual ids are longer or shorter you'll need to adjust that.

So, here is what I think is going on. Please correct me if possible.
The 'incident_list' is a list of email strings. You go in and find the facility names in each email because you have an external dictionary that has the (key:value)=(facility id: facility name). From the dictionary, you can extract the facility id in this 'id_list'.
You combine the lists so that you get a list of strings [facility id + email,...]
Then you want it to sort by a tuple( facility id, date of incidence ).
It looks like you are searching for the facility id and the facility name twice. You can skip a step if they are the same. Then, the best way is to do it all at once with tuples:
incident_list = ['email1', 'email2',...]
unsorted_list = []
for email in incident list:
id = find_facility_id(email)
date = find_date_of_incident(email)
mytuple = ( id, date, id + email )
unsorted_list.append(mytuple)
final_list = sorted(unsorted_list, key=lambda mytup:(mytup[0], mytup[1]))
Then you get an easy list of tuples sorted by first element (id as a string), and then second element (date as a string). If you just need a list of strings ( id + email ), then you need a list with the last element of each tuple part
FINALLIST = [ tup[-1] for tup in final_list ]

Mapping data from excel with Python

I am reading data from an xls spreadsheet with xlrd. First, I gather the index for the column that contains the data that I need (may not always be in the same column in every instance):
amr_list, pssr_list, inservice_list = [], [], []
for i in range(sh.ncols):
for j in range(sh.nrows):
if 'amrprojectnumber' in sh.cell_value(j,i).lower():
amr_list.append(sh.cell_value(j,i))
if 'pssrnumber' in sh.cell_value(j,i).lower():
pssr_list.append(sh.cell_value(j,i))
if 'inservicedate' in sh.cell_value(j,i).lower():
inservice_list.append(sh.cell_value(j,i))
Now I have three lists, which I need to use for writing data to a new workbook. The values in a row are related. So the index of an item in one list corresponds to the same index of the items in the other lists.
The amr_list has repeating string values. For example:
['4006BA','4006BA','4007AC','4007AC','4007AC']
The pssr_list always shares the same value as the amr_list but with additional info:
['4006BA(1)','4006BA(2)','4007AC(1)','4007AC(2)','4007AC(3)']
Finally, the inservice_list may or may not contain a variable date (as read from excel):
[40780.0, '', 40749.0, 40764.0, '']
This is the result I want from the data:
amr = { '4006BA':[('4006BA(1)',40780.0),('4006BA(2)','')], '4007AC':[('4007AC(1)',40749.0),('4007AC(2)',40764.0),('4007AC(3)','')] }
But I am having a hard time figuring out how an easy way to get there. Thanks in advance.

Maybe this can help:
A = ['4006BA','4006BA','4007AC','4007AC','4007AC']
B = ['4006BA(1)','4006BA(2)','4007AC(1)','4007AC(2)','4007AC(3)']
C = [40780.0, '', 40749.0, 40764.0, '']
result = dict()
for item in xrange(len(A)):
key = A[item]
result.setdefault(key, [])
result[key].append( (B[item], C[item] ) )
print result
This will print you the data in the format you are looking for.

look into itertools.groupby and
zip(amr_list, pssr_list, inservice_list)
For your case:
dict((x,list(a[1:] for a in y)) for x,y in
itertools.groupby(zip(amr_list, pssr_list, inservice_list), lambda z: z[0]))
Note that this assumes your input is sorted by amr_list.
Another approach would be:
combined={}
for k, v in zip(amr_list, zip(pssr_list, inservice_list)):
combined.setdefault(k, []).append(v)
Which does not require your input to be sorted.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Compare dictionary with a variable - python

Related

How do i retrieve the variable name for each dict in a list of dicts?

Short way to add dictionary item to a set only when its string value isn't empty?

Pandas Dataframe to Dictionary with Multiple Keys

Combining two lists and sorting through reference to dictionary Python

Mapping data from excel with Python

Categories

Resources