Related
Consider this string: "{'a': A, 'b': B, 'c': 10}". Now I want to update this "string" and add new key d with let say value 20, so result would be "{'a': A, 'b': B, 'c': 10, 'd': 20}"
Normally, you could just eval string (eval or literal_eval) into dict, update the way you want and convert it back to string. But in this case, there are placeholders, which would not be recognized when evaluating.
What would be best way to update it, so old values are kept the same, but "dict-string" is updated properly?
For a more robust solution that properly parses the dict, you can subclass lib2to3.refactor.RefactoringTool to refactor the code using a fixer that is a subclass of lib2to3.fixer_base.BaseFix with a pattern that looks for a dictsetmaker node, and a transform method that extends the children list with leaf nodes that consist of the tokens that will make for a new key-value pair in the dict:
from lib2to3 import fixer_base, refactor, pytree
from lib2to3.pgen2 import token
class AddKeyValue(fixer_base.BaseFix):
PATTERN = "dictsetmaker"
def transform(self, node, results):
node.children.extend((
pytree.Leaf(token.COMMA, ','),
pytree.Leaf(token.STRING, "'d'", prefix=' '),
pytree.Leaf(token.COLON, ':'),
pytree.Leaf(token.NUMBER, 20, prefix=' ')
))
return node
class Refactor(refactor.RefactoringTool):
def __init__(self, fixers):
self._fixers= [cls(None, None) for cls in fixers]
super().__init__(None)
def get_fixers(self):
return self._fixers, []
s = "{'a': A, 'b': B, 'c': 10}"
print(Refactor([AddKeyValue]).refactor_string(s + '\n', ''))
This outputs:
{'a': A, 'b': B, 'c': 10, 'd': 20}
lib2to3 is round-trip stable so all white spaces are preserved after the transformation, and a new node should be specified with a prefix if whitespaces are to be inserted before it.
You can find the definition of the Python grammar in Grammar.txt of the lib2to3 module.
Demo: https://repl.it/#blhsing/RudeLimegreenConcentrate
This by no means a best solution but here is one approach:
import re
dict_str = "{'a': A, 'b': B, 'c': 10}"
def update_dict(dict_str, **keyvals):
"""creates an updated dict_str
Parameters:
dict_str (str): current dict_str
**keyvals: variable amounts of key-values
Returns:
str:updated string
"""
new_entries = ", ".join(map(lambda keyval: f"'{keyval[0]}': {keyval[1]}", keyvals.items())) # create a string representation for each key-value and join by ','
return dict_str.replace("}", f", {new_entries}{'}'}") # update the dict_str by removing the last '}' and add the new entries
output:
updated = update_dict(dict_str,
d = 20,
e = 30
)
print(updated)
{'a': A, 'b': B, 'c': 10, 'd': 20, 'e': 30}
some_dict = {
'g': 2,
'h': 3
}
updated = update_dict(dict_str,
**some_dict
)
print(updated)
{'a': A, 'b': B, 'c': 10, 'g': 2, 'h': 3}
I think that you can:
Option 1 - Adding
Insert the new string ", key: value" at the end of the string, before the "}".
Option 2 - RagEx for adding/updating
1 - use find() and search for the key. If it exist use the regex to substitute:
re.replace(regex_search,regex_replace,contents)
So using something like:
string = re.sub(r'key: (.+),', 'key: value', article)
2 - if the find() fail, use the add of the option 1
If it's just about adding at the end of the string...
this_string = "{'a': A, 'b': B, 'c': 10}"
this_add = "'d': 20"
this_string = f"{this_string[:-1]}, {this_add}{this_string[-1]}"
print(this_string)
will output
{'a': A, 'b': B, 'c': 10, 'd': 20}
If you need to insert the new string in between you can do something similar using string.find to locate the index and use that index number instead.
It's basically rewriting the entire string but strings are immutable what can we do.
I have multiple sets (the number is unknown) and I would like to find the commonality between the sets, if I have a match between sets (80% match) I would like to merge these 2 sets and then rerun the new set that I have against all the other sets from the beginning.
for example:
A : {1,2,3,4}
B : {5,6,7}
C : {1,2,3,4,5}
D : {2,3,4,5,6,7}
Then A runs and there is no commonality between A & B and then it runs A against C which hits the commonalty target therefore we have now a new set AC = {1,2,3,4,5} and now we compare AC to B it doesn't hit the threshold but D does therefore we have a new ACD set and now we run again and now we have a hit with B.
I'm currently using 2 loops but this solve only if I compare between 2 sets.
in order to calculate the commonality I'm using the following calculation:
overlap = a_set & b_set
universe = a_set | b_set
per_overlap = (len(overlap)/len(universe))
I think the solution should be a recursive function but I'm not so sure how to write this I'm kind of new to Python or maybe there is a different and simple way to do this.
I believe this does what you are looking for. The complexity is awful because it starts over each time it gets a match. No recursion is needed.
def commonality(s1, s2):
overlap = s1 & s2
universe = s1 | s2
return (len(overlap)/len(universe))
def set_merge(s, threshold=0.8):
used_keys = set()
out = s.copy()
incomplete = True
while incomplete:
incomplete = False
restart = False
for k1, s1 in list(out.items()):
if restart:
incomplete = True
break
if k1 in used_keys:
continue
for k2, s2 in s.items():
if k1==k2 or k2 in used_keys:
continue
print(k1, k2)
if commonality(s1, s2) >= threshold:
out.setdefault(k1+k2, s1 | s2)
out.pop(k1)
if k2 in out:
out.pop(k2)
used_keys.add(k1)
used_keys.add(k2)
restart = True
break
out.update({k:v for k,v in s.items() if k not in used_keys})
return out
For your particular example, it only merges A and C, as any other combination is below the threshold.
set_dict = {
'A' : {1,2,3,4},
'B' : {5,6,7},
'C' : {1,2,3,4,5},
'D' : {2,3,4,5,6,7},
}
set_merge(set_dict)
# returns:
{'B': {5, 6, 7},
'D': {2, 3, 4, 5, 6, 7},
'AC': {1, 2, 3, 4, 5}}
I need to swap two random values from a dicitonary
def alphabetcreator():
letters = random.sample(range(97,123), 26)
newalpha = []
engalpha =['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
alphasmerged = {}
for i in letters:
newalpha.append(chr(i))
alphasmerged = dict(zip(engalpha, newalpha))
return(alphabetsmerged)
This code gives me my two different alphabets, putting them into a dictionary so I can translate between one and the other. I now need to randomly swap two of the values whilst keeping all the rest the same. How can I do this?
You can first use random.sample to randomly pick two different values from a collection.
From the doc:
Return a k length list of unique elements chosen from the population sequence or set. Used for random sampling without replacement.
Use this function on the keys of your dictionary to have two distinct keys.
In Python 3, you can directly use it on a dict_keys object.
In Python 2, you can either convert d.keys() into a list, or directly pass the dictionary to the sample.
>>> import random
>>> d = {'a': 1, 'b': 2}
>>> k1, k2 = random.sample(d.keys(), 2) # Python 3
>>> k1, k2 = random.sample(d, 2) # Python 2
>>> k1, k2
['a', 'b']
Then, you can in-place-ly swap two values of a collection.
>>> d[k1], d[k2] = d[k2], d[k1]
>>> d
{'b': 1, 'a': 2}
d = {12: 34, 67: 89}
k, v = random.choice(list(d.items()))
d[v] = k
d.pop(k)
which when running, gave the random output of d as:
{12: 34, 89: 67}
You can try this:
import random
engalpha =['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
new_dict = {a:b for a, b in zip(engalpha, map(chr, random.sample(range(97,123), 26)))}
key_val = random.choice(list(new_dict.keys()))
final_dict = {b if a == key_val else a:a if a == key_val else b for a, b in new_dict.items()}
Regarding your recent comment:
import random
s = {'a': 'h', 'b': 'd', 'c': 'y'}
random_dict = [(a, b) for a, b in random.sample(list(s.items()), 2)]
new_dict = {a:b for a, b in zip([i[0] for i in sorted(random_dict, key=lambda x:x[0])], [i[-1] for i in sorted(random_dict, key=lambda x:x[-1])][::-1])}
final_dict = {a:new_dict.get(a, b) for a, b in s.items()}
Output (randomly generated):
{'a': 'y', 'c': 'h', 'b': 'd'}
Is there any way to refer to the dict keys in the initialization body using one line and using dict keys "a" and "b"
Example:
def func(a,b)
return {"a":longComputation1(), "b":longComputation2(), sum_a_b:?????}
Please don't change semanthics of code. This just an example.
Use the function parameters:
>>> def func(a, b):
... return {"a": a, "b": b, "sum_a_b": a + b}
...
>>> func(1, 2)
{'a': 1, 'b': 2, 'sum_a_b': 3}
UPDATE Question changed after I posted the above code; Use jonrsharpe's solution.
Short answer: no.
This would have to be done over multiple lines:
def func():
d = {"a": longComputation1(),
"b": longComputation2()}
d.update(sum_a_b = d['a'] + d['b'])
return d
Use a function to create the dict and define the names of the keys for the sum in a key named sum:
def sum_dict(**kwargs):
result = {}
total = 0
sum_keys = kwargs["sum"]
del kwargs["sum"]
for key, value in kwargs.items():
val = value()
result[key] = val
if key in sum_keys:
total += val
result["sum_" + "_".join(sum_keys)] = total
return result
print(sum_dict(a=lambda: 3,b=lambda: 2,c=lambda: 14, sum=["a", "b"]))
# {'a': 3, 'c': 14, 'b': 2, 'sum_a_b': 5}
To access the keys from a not created dict is not possible.
Another way would be to create a own dict class.
I wonder what the practical application of this is, but if you mean that the key is dynamically constructed at initialisation time from other keys already present in the dictionary:
d = {"a":longComputation1(), "b":longComputation2()}
d['_'.join(['sum'] + d.keys())] = sum(d.values()) # assumes that keys are strings
>>> d
{'a': 10, 'b': 20, 'sum_a_b': 30} # assuming that longComputation1() == 10 and longComputation2() = 20
Sorry that it is not a single line (why the constraint??), but AFAIK you can't refer to the dict's keys during initialisation.
I have a dictionary where each key has a list of variable length, eg:
d = {
'a': [1, 3, 2],
'b': [6],
'c': [0, 0]
}
Is there a clean way to get a random dictionary key, weighted by the length of its value?
random.choice(d.keys()) will weight the keys equally, but in the case above I want 'a' to be returned roughly half the time.
This would work:
random.choice([k for k in d for x in d[k]])
Do you always know the total number of values in the dictionary? If so, this might be easy to do with the following algorithm, which can be used whenever you want to make a probabilistic selection of some items from an ordered list:
Iterate over your list of keys.
Generate a uniformly distributed random value between 0 and 1 (aka "roll the dice").
Assuming that this key has N_VALS values associated with it and there are TOTAL_VALS total values in the entire dictionary, accept this key with a probability N_VALS / N_REMAINING, where N_REMAINING is the number of items left in the list.
This algorithm has the advantage of not having to generate any new lists, which is important if your dictionary is large. Your program is only paying for the loop over K keys to calculate the total, a another loop over the keys which will on average end halfway through, and whatever it costs to generate a random number between 0 and 1. Generating such a random number is a very common application in programming, so most languages have a fast implementation of such a function. In Python the random number generator a C implementation of the Mersenne Twister algorithm, which should be very fast. Additionally, the documentation claims that this implementation is thread-safe.
Here's the code. I'm sure that you can clean it up if you'd like to use more Pythonic features:
#!/usr/bin/python
import random
def select_weighted( d ):
# calculate total
total = 0
for key in d:
total = total + len(d[key])
accept_prob = float( 1.0 / total )
# pick a weighted value from d
n_seen = 0
for key in d:
current_key = key
for val in d[key]:
dice_roll = random.random()
accept_prob = float( 1.0 / ( total - n_seen ) )
n_seen = n_seen + 1
if dice_roll <= accept_prob:
return current_key
dict = {
'a': [1, 3, 2],
'b': [6],
'c': [0, 0]
}
counts = {}
for key in dict:
counts[key] = 0
for s in range(1,100000):
k = select_weighted(dict)
counts[k] = counts[k] + 1
print counts
After running this 100 times, I get select keys this number of times:
{'a': 49801, 'c': 33548, 'b': 16650}
Those are fairly close to your expected values of:
{'a': 0.5, 'c': 0.33333333333333331, 'b': 0.16666666666666666}
Edit: Miles pointed out a serious error in my original implementation, which has since been corrected. Sorry about that!
Without constructing a new, possibly big list with repeated values:
def select_weighted(d):
offset = random.randint(0, sum(d.itervalues())-1)
for k, v in d.iteritems():
if offset < v:
return k
offset -= v
Given that your dict fits in memory, the random.choice method should be reasonable. But assuming otherwise, the next technique is to use a list of increasing weights, and use bisect to find a randomly chosen weight.
>>> import random, bisect
>>> items, total = [], 0
>>> for key, value in d.items():
total += len(value)
items.append((total, key))
>>> items[bisect.bisect_left(items, (random.randint(1, total),))][1]
'a'
>>> items[bisect.bisect_left(items, (random.randint(1, total),))][1]
'c'
Make a list in which each key is repeated a number of times equal to the length of its value. In your example: ['a', 'a', 'a', 'b', 'c', 'c']. Then use random.choice().
Edit: or, less elegantly but more efficiently, try this: take the sum of the lengths of all values in the dictionary, S (you can cache and invalidate this value, or keep it up to date as you edit the dictionary, depending on the exact usage pattern you anticipate). Generate a random number from 0 to S, and do a linear search through the dictionary keys to find the range into which your random number falls.
I think that's the best you can do without changing or adding to your data representation.
Here is some code that is based on a previous answer I gave for probability distribution in python but is using the length to set the weight. It uses an iterative markov chain so that it does not need to know what the total of all of the weights are. Currently it calculates the max length but if that is too slow just change
self._maxw = 1
to
self._maxw = max lenght
and remove
for k in self._odata:
if len(self._odata[k])> self._maxw:
self._maxw=len(self._odata[k])
Here is the code.
import random
class RandomDict:
"""
The weight is the length of each object in the dict.
"""
def __init__(self,odict,n=0):
self._odata = odict
self._keys = list(odict.keys())
self._maxw = 1 # to increase speed set me to max length
self._len=len(odict)
if n==0:
self._n=self._len
else:
self._n=n
# to increase speed set above max value and comment out next 3 lines
for k in self._odata:
if len(self._odata[k])> self._maxw:
self._maxw=len(self._odata[k])
def __iter__(self):
return self.next()
def next(self):
while (self._len > 0) and (self._n>0):
self._n -= 1
for i in range(100):
k=random.choice(self._keys)
rx=random.uniform(0,self._maxw)
if rx <= len(self._odata[k]): # test to see if that is the value we want
break
# if you do not find one after 100 tries then just get a random one
yield k
def GetRdnKey(self):
for i in range(100):
k=random.choice(self._keys)
rx=random.uniform(0,self._maxw)
if rx <= len(self._odata[k]): # test to see if that is the value we want
break
# if you do not find one after 100 tries then just get a random one
return k
#test code
d = {
'a': [1, 3, 2],
'b': [6],
'c': [0, 0]
}
rd=RandomDict(d)
dc = {
'a': 0,
'b': 0,
'c': 0
}
for i in range(100000):
k=rd.GetRdnKey()
dc[k]+=1
print("Key count=",dc)
#iterate over the objects
dc = {
'a': 0,
'b': 0,
'c': 0
}
for k in RandomDict(d,100000):
dc[k]+=1
print("Key count=",dc)
Test results
Key count= {'a': 50181, 'c': 33363, 'b': 16456}
Key count= {'a': 50080, 'c': 33411, 'b': 16509}
I'd say this:
random.choice("".join([k * len(d[k]) for k in d]))
This makes it clear that each k in d gets as many chances as the length of its value. Of course, it is relying on dictionary keys of length 1 that are characters....
Much later:
table = "".join([key * len(value) for key, value in d.iteritems()])
random.choice(table)
I modified some of the other answers to come up with this. It's a bit more configurable. It takes 2 arguments, a list and a lambda function to tell it how to generate a key.
def select_weighted(lst, weight):
""" Usage: select_weighted([0,1,10], weight=lambda x: x) """
thesum = sum([weight(x) for x in lst])
if thesum == 0:
return random.choice(lst)
offset = random.randint(0, thesum - 1)
for k in lst:
v = weight(k)
if offset < v:
return k
offset -= v
Thanks to sth for the base code for this.
import numpy as np
my_dict = {
"one": 5,
"two": 1,
"three": 25,
"four": 14
}
probs = []
elements = [my_dict[x] for x in my_dict.keys()]
total = sum(elements)
probs[:] = [x / total for x in elements]
r = np.random.choice(len(my_dict), p=probs)
print(list(my_dict.values())[r])
# 25
Need to mention random.choices for Python 3.6+:
import random
raffle_dict = {"Person 1": [1,2], "Person 2": [1]}
random.choices(list(raffle_dict.keys()), [len(w[1]) for w in raffle_dict.items()], k=1)[0]
random.choices returns a list of samples, so k=1 if you only need one and we'll take the first item in the list. If your dictionary already has the weights, just get rid of the len or better yet:
raffle_dict = {"Person 1": 1, "Person 2": 10}
random.choices(list(raffle_dict.keys()), raffle_dict.values(), k=1)[0]
See also this question and this tutorial,