I'm working on determining the maximum value (third value in the tuple) shared between the first two values presented in the tuple.
I created a defaultdict that utilizes the sorted concatenated values of the first two values of the tuple as the dic key and assign the dic value as the third value of the tuple.
How can I impose a condition so that when I come across the same pairing I replace the dic value with the larger value? I only want to read through my list once to be efficient.
users = [
('2','1',0.7),
('1','2', 0.5),
('3','2', 0.99),
('1','3', 0.78),
('2','1', 0.5),
('2','3', 0.99),
('3','1', 0.78),
('3','2', 0.96)]
#The above list is much longer ~10mill+, thus the need to only read through it once.
#Current code
from collections import defaultdict
user_pairings = defaultdict()
for us1, us2, maxval in users:
user_pairings[''.join(sorted(us1+us2))] = maxval ##-> How to impose the condition here?
print(user_pairings)
EDIT
Just realized a major flaw in my approach. If the values used for keys are not single digit, then my output will not be the correct result due to using
sorted.
You can use the dictionary get method to check if a key already exists in the dictionary, returning 0 if it doesn't, and then assign the max of that value and the current value to the key:
user_pairings = {}
for us1, us2, maxval in users:
key = '-'.join(sorted([us1, us2]))
user_pairings[key] = max(maxval, user_pairings.get(key, 0))
print(user_pairings)
Output for your sample data:
{'1-3': 0.78, '2-3': 0.99, '1-2': 0.7}
Note I don't see much point in converting us1 and us2 into a string so that sorted can then split it back out into a list. May as well just use a list [us1, us2] to begin with.
By using a list and joining with a character (I've used - but any would do), we can avoid the issue that can arise when the us1 and us2 values have multiple digits (e.g. if us1, us2 = 1, 23 and us1, us2 = 12, 3).
On way of doing it would be to replace:
user_pairings[''.join(sorted(us1+us2))] = maxval
With:
key = ''.join(sorted(us1 + us2))
user_pairings[key] = max(maxval, user_pairings[key] if key in user_pairings else 0)
Related
I have a dictionary in which 3 keys are assigned to each value: dictTest[c,pH,T] = value. I would like to retrieve all values corresponding to a given, single key: dictTest[c,*,*] = value(s)
I looked online but could not find any solutions in Python, only C#. I've tried using dictTest[c,*,*] but get a syntax error. Another option I can see is using multi-level keys, i.e. have the first level as c, second as pH and so on, i.e. dictTest[c][pH][T] = value (from http://python.omics.wiki/data-structures/dictionary/multiple-keys)
Here is some test code:
dictTest={}
dictTest[1,100,10]=10
dictTest[1,101,11]=11
The following gives a syntax error:
print(dictTest[1,*,*])
Whilst trying to specify only one key gives a key error:
print(dictTest[1])
I've also tried the above mentioned multi-level keys, but it raises a syntax error when I try and define the dictionary:
dictTest[1][100][10]=10
In the above example, I would like to specify only the first key, (i.e. key1=1, and return both values of the dictionary, as the first key value of both is 1.
Thanks,
Mustafa.
dictTest={}
dictTest[1,100,10]=10
dictTest[1,101,11]=11
dictTest[2,102,11]=12
print([dictTest[i] for i in dictTest.keys() if i[0]==1])
print([dictTest[i] for i in dictTest if i[0]==1]) #more pythonic way
#creating a function instead of printing directly
def get_first(my_dict,val):
return [my_dict[i] for i in my_dict if i[0]==val]
print(get_first(dictTest,1))
The key of your dictionary is a tuple of 3 values. It's not a "multi-key" dict that you can search efficiently based on one of the element of the tuple.
You could perform a linear search based on the first key OR you could create another dictionary with the first key only, which would be much more efficient if the access is repeated.
Since the key repeats, you need a list as value. For instance, let the value be a tuple containing the rest of the key and the current value. Like this:
dictTest={}
dictTest[1,100,10]=10
dictTest[1,101,11]=11
dictTest[2,101,11]=30
import collections
newdict = collections.defaultdict(list)
for (newkey,v2,v3),value in dictTest.items():
newdict[newkey].append(((v2,v3),value))
now newdict[1] is [((101, 11), 11), ((100, 10), 10)] (the list of all values matching this key, with - added - the rest of the original key so no data is lost)
and the whole dict:
>>> dict(newdict)
{1: [((101, 11), 11), ((100, 10), 10)], 2: [((101, 11), 30)]}
To create a multi level nested dictionary, you can use of recursivly created defaultdicts:
from collections import defaultdict
def recursive_defaultdict():
return defaultdict(recursive_defaultdict)
dictTest = recursive_defaultdict()
dictTest[1][100][10] = 10
dictTest[1][101][11] = 11
print(dictTest[1][100])
Output:
defaultdict(<function recursive_defaultdict at 0x1061fe848>, {10: 10})
Another option to implement is:
from collections import defaultdict
dictTest = defaultdict(lambda: defaultdict(dict))
dictTest[1][100][10] = 10
dictTest[1][101][11] = 11
print(dict(dictTest[1]))
The output is:
{100: {10: 10}, 101: {11: 11}}
I was wondering if there's a way for me to pull a value at a specific index. Let's say I have a key with multiple values associated with it. But in my dictionary I have multiple keys, each key with multiple values. I want to iterate through the keys and then each respective value associated with that key. I want to be able to pull the value at the first index and subtract it from the value at the second index.
d= {108572791: [200356.77, 200358], 108577388: [19168.7, 19169]}
output for key 108572791 would be -1.33
output for key 108577388 would be -.03
I've try reading up on dict and how it works apparently you can't really index it. I just wanted to know if there's a way to get around that.
for key, values in total_iteritems():
for value in values:
value[0]-value[1]:
Edit:
Since the question is way different now, I'll address the new subject:
d= {108572791: [200356.77, 200358], 108577388: [19168.7, 19169]}
for i in d:
print("Output for key ",str(i), "would be ",(d[i][1]-d[i][0]))
Output:
Output for key 108572791 would be 1.2300000000104774
Output for key 108577388 would be 0.2999999999992724
Original answer
Yes. When you have a dict containing a list as value if you want to obtain a specific value, then you need to address the index in the list. An example is:
a = {'Name':['John','Max','Robert']}
This means that:
print(a['Name'])
Output:
['John','Max','Robert']
Since ['Name'] is a list:
for i in range(len(a['Name'])):
print(a['Name'][i]
Output:
John #(Because it's the index 0)
Max #(Index = 1)
Robert #(Index = 2)
If you want a specific value (for instance 'Max' which is index = 1)
print(a['Name'][1]
Output:
Max
Depends on how many values in key obvious but this does the trick:
for x in d:
print(x)
print(d[x][0]-d[x][1])
You can use list of tuples if you want to use indexing.
d= [(108572791,[200356.77, 200358]), (108577388,[19168.7, 19169)]
for tuple in my_list:
print(tuple[0])
for value in tuple[1]:
print(value)
I have been struggling on how to get the key(id) of the dictionary that has the minimum difference with a specific value.
for example,
I have a dictionary like,
dummy_w = {'Time': 1006120000,'T_id' : ''}
and the following,
dummy_R001 = {'Filename':"home/abc/de.csv",'Time':1006090000,'t_id':'x'}
dummy_R002 = {'Filename':"home/abc/df.csv",'Time':1006100000,'t_id':'y'}
dummy_R003 = {'Filename':"home/abc/d.csv",'Time':1026030000,'t_id':'z'}
dummy_R004 = {'Filename':"home/abc/ef.csv",'Time':1026080000,'t_id':'p'}
dummy_R005 = {'Filename':"home/abc/f.csv",'Time':1026120000,'t_id':'q'}
I want to assign the T_id for this dummy_w based on the difference with values of Time of each of the five dictionary's (dummy_R001 to _R005) Time.
I want to assign the one that has the minimum abs(time difference).
In this case the id assigned to dummy_w[T_id] should be 'y'.
Any suggestions would be highly appreciated. thanks.
You could use a one-liner where you iterate over all the 'Time' values, calculate the abs() difference, and take the min() difference. Then assign it to dummy_w['T_id']:
min_diff = min((abs(dummy_w['Time']-d['Time']),d['t_id']) for d in [dummy_R001,dummy_R002,dummy_R003,dummy_R004,dummy_R005])
# (20000, 'y')
dummy_w['T_id'] = min_diff[1]
# {'T_id': 'y', 'Time': 1006120000}
Note, I'm not sure where dummy_R001 etc. came from, but consider that you might have been better of starting with a nested dictionary where these were the keys instead of variable names.
I would use the key keyword argument to the min function:
>>> result = min(list_of_all_dicts, key=lambda d: abs(d['Time'] - dummy_w['Time']))
>>> result['t_id']
'y'
the current code I have is category1[name]=(number) however if the same name comes up the value in the dictionary is replaced by the new number how would I make it so instead of the value being replaced the original value is kept and the new value is also added, giving the key two values now, thanks.
You would have to make the dictionary point to lists instead of numbers, for example if you had two numbers for category cat1:
categories["cat1"] = [21, 78]
To make sure you add the new numbers to the list rather than replacing them, check it's in there first before adding it:
cat_val = # Some value
if cat_key in categories:
categories[cat_key].append(cat_val)
else:
# Initialise it to a list containing one item
categories[cat_key] = [cat_val]
To access the values, you simply use categories[cat_key] which would return [12] if there was one key with the value 12, and [12, 95] if there were two values for that key.
Note that if you don't want to store duplicate keys you can use a set rather than a list:
cat_val = # Some value
if cat_key in categories:
categories[cat_key].add(cat_val)
else:
# Initialise it to a set containing one item
categories[cat_key] = set(cat_val)
a key only has one value, you would need to make the value a tuple or list etc
If you know you are going to have multiple values for a key then i suggest you make the values capable of handling this when they are created
It's a little hard to understand your question.
I think you want this:
>>> d[key] = [4]
>>> d[key].append(5)
>>> d[key]
[4, 5]
Depending on what you expect, you could check if name - a key in your dictionary - already exists. If so, you might be able to change its current value to a list, containing both the previous and the new value.
I didn't test this, but maybe you want something like this:
mydict = {'key_1' : 'value_1', 'key_2' : 'value_2'}
another_key = 'key_2'
another_value = 'value_3'
if another_key in mydict.keys():
# another_key does already exist in mydict
mydict[another_key] = [mydict[another_key], another_value]
else:
# another_key doesn't exist in mydict
mydict[another_key] = another_value
Be careful when doing this more than one time! If it could happen that you want to store more than two values, you might want to add another check - to see if mydict[another_key] already is a list. If so, use .append() to add the third, fourth, ... value to it.
Otherwise you would get a collection of nested lists.
You can create a dictionary in which you map a key to a list of values, in which you would want to append a new value to the lists of values stored at each key.
d = dict([])
d["name"] = 1
x = d["name"]
d["name"] = [1] + x
I guess this is the easiest way:
category1 = {}
category1['firstKey'] = [7]
category1['firstKey'] += [9]
category1['firstKey']
should give you:
[7, 9]
So, just use lists of numbers instead of numbers.
I currently have a Python dictionary with keys assigned to multiple values (which have come from a CSV), in a format similar to:
{
'hours': ['4', '2.4', '5.8', '2.4', '7'],
'name': ['Adam', 'Bob', 'Adam', 'John', 'Harry'],
'salary': ['55000', '30000', '55000', '30000', '80000']
}
(The actual dictionary is significantly larger in both keys and values.)
I am looking to find the mode* for each set of values, with the stipulation that sets where all values occur only once do not need a mode. However, I'm not sure how to go about this (and I can't find any other examples similar to this). I am also concerned about the different (implied) data types for each set of values (e.g. 'hours' values are floats, 'name' values are strings, 'salary' values are integers), though I have a rudimentary conversion function included but not used yet.
import csv
f = 'blah.csv'
# Conducts type conversion
def conversion(value):
try:
value = float(value)
except ValueError:
pass
return value
reader = csv.DictReader(open(f))
# Places csv into a dictionary
csv_dict = {}
for row in reader:
for column, value in row.iteritems():
csv_dict.setdefault(column, []).append(value.strip())
*I'm wanting to attempt other types of calculations as well, such as averages and quartiles- which is why I'm concerned about data types- but I'd mostly like assistance with modes for now.
EDIT: the input CSV file can change; I'm unsure if this has any effect on potential solutions.
Ignoring all the csv file stuff which seems tangential to your question, lets say you have a list salary. You can use the Counter class from collections to count the unique list elements.
From that you have a number of different options about how to get from a Counter to your mode.
For example:
from collections import Counter
salary = ['55000', '30000', '55000', '30000', '80000']
counter = Counter(salary)
# This returns all unique list elements and their count, sorted by count, descending
mc = counter.most_common()
print(mc)
# This returns the unique list elements and their count, where their count equals
# the count of the most common list element.
gmc = [(k,c) for (k,c) in mc if c == mc[0][1]]
print(gmc)
# If you just want an arbitrary (list element, count) pair that has the most occurences
amc = counter.most_common()[0]
print(amc)
For the salary list in the code, this outputs:
[('55000', 2), ('30000', 2), ('80000', 1)] # mc
[('55000', 2), ('30000', 2)] # gmc
('55000', 2) # amc
Of course, for your case you'd probably use Counter(csv_dict["salary"]) instead of Counter(salary).
I'm not sure I understand the question, but you could create a dictionary matching each desired mode to those keys, manually, or you could use the 'type' class by asking the values, then if the type returns a string ask other questions/parameters, like length of the item.