Finding all keys of a multi-key dictionary based on one key - python

I have a dictionary in which 3 keys are assigned to each value: dictTest[c,pH,T] = value. I would like to retrieve all values corresponding to a given, single key: dictTest[c,*,*] = value(s)
I looked online but could not find any solutions in Python, only C#. I've tried using dictTest[c,*,*] but get a syntax error. Another option I can see is using multi-level keys, i.e. have the first level as c, second as pH and so on, i.e. dictTest[c][pH][T] = value (from http://python.omics.wiki/data-structures/dictionary/multiple-keys)
Here is some test code:
dictTest={}
dictTest[1,100,10]=10
dictTest[1,101,11]=11
The following gives a syntax error:
print(dictTest[1,*,*])
Whilst trying to specify only one key gives a key error:
print(dictTest[1])
I've also tried the above mentioned multi-level keys, but it raises a syntax error when I try and define the dictionary:
dictTest[1][100][10]=10
In the above example, I would like to specify only the first key, (i.e. key1=1, and return both values of the dictionary, as the first key value of both is 1.
Thanks,
Mustafa.

dictTest={}
dictTest[1,100,10]=10
dictTest[1,101,11]=11
dictTest[2,102,11]=12
print([dictTest[i] for i in dictTest.keys() if i[0]==1])
print([dictTest[i] for i in dictTest if i[0]==1]) #more pythonic way
#creating a function instead of printing directly
def get_first(my_dict,val):
return [my_dict[i] for i in my_dict if i[0]==val]
print(get_first(dictTest,1))

The key of your dictionary is a tuple of 3 values. It's not a "multi-key" dict that you can search efficiently based on one of the element of the tuple.
You could perform a linear search based on the first key OR you could create another dictionary with the first key only, which would be much more efficient if the access is repeated.
Since the key repeats, you need a list as value. For instance, let the value be a tuple containing the rest of the key and the current value. Like this:
dictTest={}
dictTest[1,100,10]=10
dictTest[1,101,11]=11
dictTest[2,101,11]=30
import collections
newdict = collections.defaultdict(list)
for (newkey,v2,v3),value in dictTest.items():
newdict[newkey].append(((v2,v3),value))
now newdict[1] is [((101, 11), 11), ((100, 10), 10)] (the list of all values matching this key, with - added - the rest of the original key so no data is lost)
and the whole dict:
>>> dict(newdict)
{1: [((101, 11), 11), ((100, 10), 10)], 2: [((101, 11), 30)]}

To create a multi level nested dictionary, you can use of recursivly created defaultdicts:
from collections import defaultdict
def recursive_defaultdict():
return defaultdict(recursive_defaultdict)
dictTest = recursive_defaultdict()
dictTest[1][100][10] = 10
dictTest[1][101][11] = 11
print(dictTest[1][100])
Output:
defaultdict(<function recursive_defaultdict at 0x1061fe848>, {10: 10})

Another option to implement is:
from collections import defaultdict
dictTest = defaultdict(lambda: defaultdict(dict))
dictTest[1][100][10] = 10
dictTest[1][101][11] = 11
print(dict(dictTest[1]))
The output is:
{100: {10: 10}, 101: {11: 11}}

Related

How to start from second key when iterating over dictionary using for loop in Python

I am computing returns from data in a dictionary. My keys are dates and for every key I have a dataframe with data to compute my returns. To compute the returns I need data today and yesterday (t and t-1), hence I want to initiate from the second observation (key).
Since I do not have much experience my initial thought was to execute like this:
dict_return = {}
for t, value in dict_data.items()[1:]:
returns = 'formula'
dict_returns[t] = returns
Which gave me the error:
TypeError: 'dict_items' object is not subscriptable
Searching for an answer, the only discussion I could find was skipping the first item, e.g. like this:
from itertools import islice
for key, value in islice(largeSet.items(), 1, None):
Is there a simple approach to skip the first key?
Thank you
If you are in Python 3 you need to use a list, Dict_ items ([‘No surfacing ‘,’flippers’]) returns a dict_ The items object is no longer of the list type and does not support index, this is why the list type can be used
I can think of 2 options, both require an extra step:
Option 1: Create a second dict without your key and loop over that
loop_dict = dict_data.pop(<key_to_remove>)
Then loop over loop_dict as you have done above.
Option 2: Create a list of keys from your dict and loop over that
keys = dict_data.keys()
loop_keys = keys[1:]
for key in loop_keys:
Etc
If you pass a reference to your dictionary to list() you will get a list of the dictionary's keys. This is because dictionaries are iterable. In your code you're not interested in the key's value so:
dict_data = {'a': 1, 'b': 2} # or whatever
dict_data[list(dict_data)[1]] = 3
print(dict_data)
Output:
{'a': 1, 'b': 3}

Long list/array into a dictionary with indices as key

I am trying to solve a coding exercise.
Part of it is creating a dictionary from a random list of integers.
The dictionary must have as key the index of the element in the original list and as value the element of the list.
This is my function:
def my_funct(pricesLst):
price_dict = {}
for i in range(0, len(pricesLst)):
price_dict[i] = pricesLst[i]
print(price_dict)
a = np.random.randint(1,100,5)
my_funct(a)
The output I get is the right one:
{0: 42, 1: 23, 2: 38, 3: 27, 4: 61}
HOWEVER if the list is longer, I get a weird results as output.
Example:
a = np.random.randint(1,1000000000,5000000)
my_funct(a)
The output is:
{2960342: 133712726, 2960343: 58347003, 2960344: 340350742, 949475: 944928187.........4999982: 417669027, 4999983: 650062265, 4999984: 656764316, 4999985: 32618345, 4999986: 213384749, 4999987: 383964739, 4999988: 229138815, 4999989: 203341047, 4999990: 54928779, 4999991: 139476448, 4999992: 244547714, 4999993: 790982769, 4999994: 298507070, 4999995: 715927973, 4999996: 365280953, 4999997: 543382916, 4999998: 532161768, 4999999: 598932697}
I am not sure why does it occur.
Why aren't the keys of my dictionary starting from 0 as it happens for the shortest list?
The only thing I can think of is that the list is too long and thus python, instead of using the index starting from 0 as key, it associate the space in memory.
Because dicts in python are not necessarily ordered. You should use an ordered dictionary which is declared as:
my_ordered_dict=OrderedDict()
The dictionaries are ordered in python 3.7. If you are older python version (<3.7), then you will have to use ordered dictionary.
You can use ordered dictionary as follows:
from collections import OrderedDict
import numpy as np
def my_funct(pricesLst):
price_dict = OrderedDict()
for i in range(0, len(pricesLst)):
price_dict[i] = pricesLst[i]
print(price_dict)
a = np.random.randint(1,10000,10000)
my_funct(a)

Maintain ordered list in Python with O(logN) lookup/insertion? [duplicate]

I am writing some code that requires me to fetch the lower bound of a key (for simplicity, ignore keys that lie below the smallest key in the collection).
In C++, using std::map (as the most comparable data type) I would simply use the lower_bound() to return the iterator.
My Pythonfoo is not that great, but I am guessing that (in case Python does not already have a way of doing this), this would be a good use of a lambda function ...
What is the Pythonic way of retrieving the lower bound key for a given index?
In case the question is too abstract, this is what I am actually trying to do:
I have a Python dict indexed by date. I want to be able to use a date to look up the dict, and return the value associated with the lowerbound of the specified key.
Snippet follows:
mymap = { datetime.date(2007, 1, 5): 'foo',
datetime.date(2007, 1, 10): 'foofoo',
datetime.date(2007, 2, 2): 'foobar',
datetime.date(2007, 2, 7): 'foobarbar' }
mydate = datetime.date(2007, 1, 7)
# fetch lbound key for mydate from mymap
def mymap_lbound_key(orig):
pass # return the lbound for the key
I don't really want to loop through the keys, looking for the first key <= provided key, unless there is no better alternative ...
Python's dict class doesn't have this functionality; you'd need to write it yourself. It sure would be convenient if the keys were already sorted, wouldn't it, so you could do a binary search on them and avoid iterating over them all? In this vein, I'd have a look at the sorteddict class in the blist package. http://pypi.python.org/pypi/blist/
if you have date somehow overloaded that it can compare things look into the bisect module.
a minimal integer coding example:
from bisect import bisect_left
data = {
200 : -100,
-50 : 0,
51 : 100,
250 : 200
}
keys = list(data.keys())
print data[ keys[ bisect_left(keys, -79) ] ]
When I want something that resembles a c++ map, I use SortedDict. You can use irange to get an iterator over items greater than a given key--which I think is how std::lower_bound works.
code:
from sortedcontainers import SortedDict
sd = SortedDict()
sd[105] = 'a'
sd[102] = 'b'
sd[101] = 'c'
#SortedDict is sorted on insert, like std::map
print(sd)
# sd.irange(minimum=<key>) returns an iterator beginning with the first key not less than <key>
print("min = 100", list(sd.irange(minimum=100)))
print("min = 102", list(sd.irange(minimum=102)))
print("min = 103", list(sd.irange(minimum=103)))
print("min = 106", list(sd.irange(minimum=106)))
output:
SortedDict(None, 1000, {101: 'c', 102: 'b', 105: 'a'})
min = 100 [101, 102, 105]
min = 102 [102, 105]
min = 103 [105]
min = 106 []
Still not sure what the "lower bound" is: The latest date before/after the query date?
Anyway since a dict doesn't impose an inherent order on its keys, you need a different structure. Store your keys in some structure that keeps them sorted and allows fast searches.
The simplest solution would be to just store the dates sorted in a list of (date, value), and do a binary search to zoom in on the region you want. If you need/want better performance, I think a b-tree is what you need.

Pandas Dataframe to Dictionary with Multiple Keys

I am currently working with a dataframe consisting of a column of 13 letter strings ('13mer') paired with ID codes ('Accession') as such:
However, I would like to create a dictionary in which the Accession codes are the keys with values being the 13mers associated with the accession so that it looks as follows:
{'JO2176': ['IGY....', 'QLG...', 'ESS...', ...],
'CYO21709': ['IGY...', 'TVL...',.............],
...}
Which I've accomplished using this code:
Accession_13mers = {}
for group in grouped:
Accession_13mers[group[0]] = []
for item in group[1].iteritems():
Accession_13mers[group[0]].append(item[1])
However, now I would like to go back through and iterate through the keys for each Accession code and run a function I've defined as find_match_position(reference_sequence, 13mer) which finds the 13mer in in a reference sequence and returns its position. I would then like to append the position as a value for the 13mer which will be the key.
If anyone has any ideas for how I can expedite this process that would be extremely helpful.
Thanks,
Justin
I would suggest creating a new dictionary, whose values are another dictionary. Essentially a nested dictionary.
position_nmers = {}
for key in H1_Access_13mers:
position_nmers[key] = {} # replicate key, val in new dictionary, as a dictionary
for value in H1_Access_13mers[key]:
position_nmers[key][value] = # do something
To introspect the dictionary and make sure it's okay:
print position_nmers
You can iterate over the groupby more cleanly by unpacking:
d = {}
for key, s in df.groupby('Accession')['13mer']:
d[key] = list(s)
This also makes it much clearer where you should put your function!
... However, I think that it might be better suited to an enumerate:
d2 = {}
for pos, val in enumerate(df['13mer']):
d2[val] = pos

Adding Multiple Values to a Single Key in Python Dictionary

Python dictionaries really have me today. I've been pouring over stack, trying to find a way to do a simple append of a new value to an existing key in a python dictionary adn I'm failing at every attempt and using the same syntaxes I see on here.
This is what i am trying to do:
#cursor seach a xls file
definitionQuery_Dict = {}
for row in arcpy.SearchCursor(xls):
# set some source paths from strings in the xls file
dataSourcePath = str(row.getValue("workspace_path")) + "\\" + str(row.getValue("dataSource"))
dataSource = row.getValue("dataSource")
# add items to dictionary. The keys are the dayasource table and the values will be definition (SQL) queries. First test is to see if a defintion query exists in the row and if it does, we want to add the key,value pair to a dictionary.
if row.getValue("Definition_Query") <> None:
# if key already exists, then append a new value to the value list
if row.getValue("dataSource") in definitionQuery_Dict:
definitionQuery_Dict[row.getValue("dataSource")].append(row.getValue("Definition_Query"))
else:
# otherwise, add a new key, value pair
definitionQuery_Dict[row.getValue("dataSource")] = row.getValue("Definition_Query")
I get an attribute error:
AttributeError: 'unicode' object has no attribute 'append'
But I believe I am doing the same as the answer provided here
I've tried various other methods with no luck with various other error messages. i know this is probably simple and maybe I couldn't find the right source on the web, but I'm stuck. Anyone care to help?
Thanks,
Mike
The issue is that you're originally setting the value to be a string (ie the result of row.getValue) but then trying to append it if it already exists. You need to set the original value to a list containing a single string. Change the last line to this:
definitionQuery_Dict[row.getValue("dataSource")] = [row.getValue("Definition_Query")]
(notice the brackets round the value).
ndpu has a good point with the use of defaultdict: but if you're using that, you should always do append - ie replace the whole if/else statement with the append you're currently doing in the if clause.
Your dictionary has keys and values. If you want to add to the values as you go, then each value has to be a type that can be extended/expanded, like a list or another dictionary. Currently each value in your dictionary is a string, where what you want instead is a list containing strings. If you use lists, you can do something like:
mydict = {}
records = [('a', 2), ('b', 3), ('a', 4)]
for key, data in records:
# If this is a new key, create a list to store
# the values
if not key in mydict:
mydict[key] = []
mydict[key].append(data)
Output:
mydict
Out[4]: {'a': [2, 4], 'b': [3]}
Note that even though 'b' only has one value, that single value still has to be put in a list, so that it can be added to later on.
Use collections.defaultdict:
from collections import defaultdict
definitionQuery_Dict = defaultdict(list)
# ...

Categories

Resources