Nested dictionaries from file in Python - python

So I'm trying to create a nested dictionary but I can't seem to wrap my head around the logic.
So say I have input coming in from a csv:
1,2,3
2,3,4
1,4,5
Now'd like to create a dictionary as follows:
d ={1:{2:3,4:5}, 2:{3:4}}
Such that for the first being some ID column that we create keys in the sub dictionary corresponding to second value.
The way I tried it was to go:
d[row[0]] = {row[1]:row[2]}
But that overwrites the first instead of appending/pushing to it, how would I go about this problem? I can't seem to wrap my mind around what keys to use.
Any guidance is appreciated.

Yes, cause dict[row[0]] = is dict[1] = what overwrites previous dict[1] value
You should use :
dict.setdefault(row[0],{})[row[1]] = row[2]
remember there must be no duplicates for row[1] then
or
dict.setdefault(row[0],{}).update({row[1]:row[2]})

if dict[row[0]] == None:
dict[row[0]] = {row[1]:row[2]}
else:
dict[row[0]][row[1]] = row[2]

you can use defaultdict, which is similaire to the built-in dictionaries, but will create dictionaries with a default values, in your case a default value will be a dictionary
from collections import defaultdict
res = defaultdict(dict)
with this code we are creating a defaultdict, with it's values default value being of type dict, so next we would do
for row in l:
res[row[0]][row[1]] = row[2]

Related

Python Remove Duplicate Dict

I am trying to find a way to remove duplicates from a dict list. I don't have to test the entire object contents because the "name" value in a given object is enough to identify duplication (i.e., duplicate name = duplicate object). My current attempt is this;
newResultArray = []
for i in range(0, len(resultArray)):
for j in range(0, len(resultArray)):
if(i != j):
keyI = resultArray[i]['name']
keyJ = resultArray[j]['name']
if(keyI != keyJ):
newResultArray.append(resultArray[i])
, which is wildly incorrect. Grateful for any suggestions. Thank you.
If name is unique, you should just use a dictionary to store your inner dictionaries, with name being the key. Then you won't even have the issue of duplicates, and you can remove from the list in O(1) time.
Since I don't have access to the code that populates resultArray, I'll simply show how you can convert it into a dictionary in linear time. Although the best option would be to use a dictionary instead of resultArray in the first place, if possible.
new_dictionary = {}
for item in resultArray:
new_dictionary[item['name']] = item
If you must have a list in the end, then you can convert back into a dictionary as such:
new_list = [v for k,v in new_dictionary.items()]
Since "name" provides uniqueness... and assuming "name" is a hashable object, you can build an intermediate dictionary keyed by "name". Any like-named dicts will simply overwrite their predecessor in the dict, giving you a list of unique dictionaries.
tmpDict = {result["name"]:result for result in resultArray}
newArray = list(tmpDict.values())
del tmpDict
You could shrink that down to
newArray = list({result["name"]:result for result in resultArray}.values())
which may be a bit obscure.

Appending to list in Python dictionary [duplicate]

This question already has answers here:
Initialize List to a variable in a Dictionary inside a loop
(2 answers)
Closed 8 years ago.
Is there a more elegant way to write this code?
What I am doing: I have keys and dates. There can be a number of dates assigned to a key and so I am creating a dictionary of lists of dates to represent this. The following code works fine, but I was hoping for a more elegant and Pythonic method.
dates_dict = dict()
for key, date in cur:
if key in dates_dict:
dates_dict[key].append(date)
else:
dates_dict[key] = [date]
I was expecting the below to work, but I keep getting a NoneType has no attribute append error.
dates_dict = dict()
for key, date in cur:
dates_dict[key] = dates_dict.get(key, []).append(date)
This probably has something to do with the fact that
print([].append(1))
None
but why?
list.append returns None, since it is an in-place operation and you are assigning it back to dates_dict[key]. So, the next time when you do dates_dict.get(key, []).append you are actually doing None.append. That is why it is failing. Instead, you can simply do
dates_dict.setdefault(key, []).append(date)
But, we have collections.defaultdict for this purpose only. You can do something like this
from collections import defaultdict
dates_dict = defaultdict(list)
for key, date in cur:
dates_dict[key].append(date)
This will create a new list object, if the key is not found in the dictionary.
Note: Since the defaultdict will create a new list if the key is not found in the dictionary, this will have unintented side-effects. For example, if you simply want to retrieve a value for the key, which is not there, it will create a new list and return it.
Is there a more elegant way to write this code?
Use collections.defaultdict:
from collections import defaultdict
dates_dict = defaultdict(list)
for key, date in cur:
dates_dict[key].append(date)
dates_dict[key] = dates_dict.get(key, []).append(date) sets dates_dict[key] to None as list.append returns None.
In [5]: l = [1,2,3]
In [6]: var = l.append(3)
In [7]: print var
None
You should use collections.defaultdict
import collections
dates_dict = collections.defaultdict(list)

Pandas Dataframe to Dictionary with Multiple Keys

I am currently working with a dataframe consisting of a column of 13 letter strings ('13mer') paired with ID codes ('Accession') as such:
However, I would like to create a dictionary in which the Accession codes are the keys with values being the 13mers associated with the accession so that it looks as follows:
{'JO2176': ['IGY....', 'QLG...', 'ESS...', ...],
'CYO21709': ['IGY...', 'TVL...',.............],
...}
Which I've accomplished using this code:
Accession_13mers = {}
for group in grouped:
Accession_13mers[group[0]] = []
for item in group[1].iteritems():
Accession_13mers[group[0]].append(item[1])
However, now I would like to go back through and iterate through the keys for each Accession code and run a function I've defined as find_match_position(reference_sequence, 13mer) which finds the 13mer in in a reference sequence and returns its position. I would then like to append the position as a value for the 13mer which will be the key.
If anyone has any ideas for how I can expedite this process that would be extremely helpful.
Thanks,
Justin
I would suggest creating a new dictionary, whose values are another dictionary. Essentially a nested dictionary.
position_nmers = {}
for key in H1_Access_13mers:
position_nmers[key] = {} # replicate key, val in new dictionary, as a dictionary
for value in H1_Access_13mers[key]:
position_nmers[key][value] = # do something
To introspect the dictionary and make sure it's okay:
print position_nmers
You can iterate over the groupby more cleanly by unpacking:
d = {}
for key, s in df.groupby('Accession')['13mer']:
d[key] = list(s)
This also makes it much clearer where you should put your function!
... However, I think that it might be better suited to an enumerate:
d2 = {}
for pos, val in enumerate(df['13mer']):
d2[val] = pos

Adding Multiple Values to a Single Key in Python Dictionary

Python dictionaries really have me today. I've been pouring over stack, trying to find a way to do a simple append of a new value to an existing key in a python dictionary adn I'm failing at every attempt and using the same syntaxes I see on here.
This is what i am trying to do:
#cursor seach a xls file
definitionQuery_Dict = {}
for row in arcpy.SearchCursor(xls):
# set some source paths from strings in the xls file
dataSourcePath = str(row.getValue("workspace_path")) + "\\" + str(row.getValue("dataSource"))
dataSource = row.getValue("dataSource")
# add items to dictionary. The keys are the dayasource table and the values will be definition (SQL) queries. First test is to see if a defintion query exists in the row and if it does, we want to add the key,value pair to a dictionary.
if row.getValue("Definition_Query") <> None:
# if key already exists, then append a new value to the value list
if row.getValue("dataSource") in definitionQuery_Dict:
definitionQuery_Dict[row.getValue("dataSource")].append(row.getValue("Definition_Query"))
else:
# otherwise, add a new key, value pair
definitionQuery_Dict[row.getValue("dataSource")] = row.getValue("Definition_Query")
I get an attribute error:
AttributeError: 'unicode' object has no attribute 'append'
But I believe I am doing the same as the answer provided here
I've tried various other methods with no luck with various other error messages. i know this is probably simple and maybe I couldn't find the right source on the web, but I'm stuck. Anyone care to help?
Thanks,
Mike
The issue is that you're originally setting the value to be a string (ie the result of row.getValue) but then trying to append it if it already exists. You need to set the original value to a list containing a single string. Change the last line to this:
definitionQuery_Dict[row.getValue("dataSource")] = [row.getValue("Definition_Query")]
(notice the brackets round the value).
ndpu has a good point with the use of defaultdict: but if you're using that, you should always do append - ie replace the whole if/else statement with the append you're currently doing in the if clause.
Your dictionary has keys and values. If you want to add to the values as you go, then each value has to be a type that can be extended/expanded, like a list or another dictionary. Currently each value in your dictionary is a string, where what you want instead is a list containing strings. If you use lists, you can do something like:
mydict = {}
records = [('a', 2), ('b', 3), ('a', 4)]
for key, data in records:
# If this is a new key, create a list to store
# the values
if not key in mydict:
mydict[key] = []
mydict[key].append(data)
Output:
mydict
Out[4]: {'a': [2, 4], 'b': [3]}
Note that even though 'b' only has one value, that single value still has to be put in a list, so that it can be added to later on.
Use collections.defaultdict:
from collections import defaultdict
definitionQuery_Dict = defaultdict(list)
# ...

How to go from a values_list to a dictionary of lists

I have a django queryset that returns a list of values:
[(client pk, timestamp, value, task pk), (client pk, timestamp, value, task pk),....,].
I am trying to get it to return a dictionary of this format:
{client pk:[[timestamp, value],[timestamp, value],...,], client pk:[list of lists],...,}
The values_list may have multiple records for each client pk. I have been able to get dictionaries of lists for client or task pk using:
def dict_from_vl(vls_list):
keys=[values_list[x][3] for x in range(0,len(values_list),1)]
values = [[values_list[x][1], values_list[x][2]] for x in range(0,len(values_list),1)]
target_dict=dict(zip(keys,values))
return target_dict
However using this method, values for the same key write over previous values as it iterates through the values_list, rather than append them to a list. So this works great for getting the most recent if the values list is sorted oldest records to newest, but not for the purpose of creating a list of lists for the dict value.
Instead of target_dict=dict(zip(keys,values)), do
target_dict = defaultdict(list)
for i, key in enumerate(keys):
target_dict[k].append(values[i])
(defaultdict is available in the standard module collections.)
from collections import defaultdict
d = defaultdict(list)
for x in vls_list:
d[x].append(list(x[1:]))
Although I'm not sure if I got the question right.
I know in Python you're supposed to cram everything into a single line, but you could do it the old fashioned way...
def dict_from_vl(vls_list):
target_dict = {}
for v in vls_list:
if v[0] not in target_dict:
target_dict[v[0]] = []
target_dict[v[0]].append([v[1], v[2]])
return target_dict
For better speed, I suggest you don't create the keys and values lists separately but simply use only one loop:
tgt_dict = defaultdict(list)
for row in vas_list:
tgt_dict[row[0]].append([row[1], row[2]])

Categories

Resources