Create a new dictionary using an existing one and list - python

If I had:
adict = {'a':3, 'b':6, 'c':9, 'd':12}
alist = ['a', 'z', 't', 's']
How would I create a new dict with the keys of the first dict and the items of the list, resulting in this?
bdict = {'a': 'a', 'b': 'z', 'c': 't', 'd': 's'}

To bring the keys of adict together the values from alist use the zip() function.
>>> from collections import OrderedDict
>>> adict = OrderedDict([('a', 3), ('b', 6), ('c', 9), ('d', 12)])
>>> alist = ['a', 'z', 't', 's']
>>> bdict = OrderedDict(zip(adict, alist))
>>> bdict
OrderedDict([('a', 'a'), ('b', 'z'), ('c', 't'), ('d', 's')])
I've used ordered dictionaries here because the question only makes sense if the dictionaries are OrderedDicts; otherwise, you can't guarantee the pairwise one-to-one correspondence between adict and alist.

Related

Convert values-as-list of a dictionary to values-as-tuple (python)

Consider a dictionary:
{'a': ['b', 'c'], 'b':['a', 'c', 'e'], 'c':['a', 'b', 'f']}
How can I get the values as tuple in one line:
{'a': ('b', 'c'), 'b':('a', 'c', 'e'), 'c':('a', 'b', 'f')}
First I converted the values of this dictionary to list of tuple using comprehension
list_of_tuple = [tuple(val) for val in dict.values()]
Iterating over the values of dict and items in list_of_tuple, then equating nth element of dict to nth element of list_of_tuple doesn't work.
Is there a better, compact way of doing this?
You can use a dict comprehension:
out = {k:tuple(v) for k,v in d.items()}
or map with lambda:
out = dict(map(lambda x: (x[0],tuple(x[1])), d.items()))
or map with zip:
out = dict(zip(d.keys(), map(tuple, d.values())))
Output:
{'a': ('b', 'c'), 'b':('a', 'c', 'e'), 'c':('a', 'b', 'f')}

How to convert nested dict of dict to nested OrderedDict

Having a requirement to convert nested dict of dict to nested ordered dict
user_dict = {"a": {"b": {"c":
{'d': 'e',
'f': 'g',
'h': 'i'
}}}}
Expected output:
cfg_opts = OrderedDict([('a', OrderedDict([('b', OrderedDict([('c', OrderedDict([('d', 'e'), ('f','g'), ('h', 'i')]))]))]))])
I would use recursive function for this task as follows
import collections
user_dict = {'a': {'b': {'c': {'d': 'e', 'f': 'g', 'h': 'i'}}}}
def orderify(d):
if isinstance(d,dict):
return collections.OrderedDict({k:orderify(v) for k,v in d.items()})
else:
return d
ordered_user_dict = orderify(user_dict)
print(ordered_user_dict)
output
OrderedDict([('a', OrderedDict([('b', OrderedDict([('c', OrderedDict([('d', 'e'), ('f', 'g'), ('h', 'i')]))]))]))])

Pandas dataframe to dict of list of tuples

Suppose I have the following dataframe:
df = pd.DataFrame({'id': [1,2,3,3,3], 'v1': ['a', 'a', 'c', 'c', 'd'], 'v2': ['z', 'y', 'w', 'y', 'z']})
df
id v1 v2
1 a z
2 a y
3 c w
3 c y
3 d z
And I want to transform it to this format:
{1: [('a', 'z')], 2: [('a', 'y')], 3: [('c', 'w'), ('c', 'y'), ('d', 'z')]}
I basically want to create a dict where the keys are the id and the values is a list of tuples of the (v1,v2) of this id.
I tried using groupby in id:
df.groupby('id')[['v1', 'v2']].apply(list)
But this didn't work
Create tuples first and then pass to groupby with aggregate list:
d = df[['v1', 'v2']].agg(tuple, 1).groupby(df['id']).apply(list).to_dict()
print (d)
{1: [('a', 'z')], 2: [('a', 'y')], 3: [('c', 'w'), ('c', 'y'), ('d', 'z')]}
Another idea is using MultiIndex:
d = df.set_index(['v1', 'v2']).groupby('id').apply(lambda x: x.index.tolist()).to_dict()
You can use defaultdict from the collections library :
from collections import defaultdict
d = defaultdict(list)
for k, v, s in df.to_numpy():
d[k].append((v, s))
defaultdict(list,
{1: [('a', 'z')],
2: [('a', 'y')],
3: [('c', 'w'), ('c', 'y'), ('d', 'z')]})
df['New'] = [tuple(x) for x in df[['v1','v2']].to_records(index=False)]
df=df[['id','New']]
df=df.set_index('id')
df.to_dict()
Output:
{'New': {1: ('a', 'z'), 2: ('a', 'y'), 3: ('d', 'z')}}

Calculating total and relative frequency of values in a dict representing a Markov-chain rule

I have made a function make_rule(text, scope=1) that simply goes over a string and generates a dictionary that serves as a rule for a Markovian text-generator (where the scope is the number of linked characters, not words).
>>> rule = make_rule("abbcad", 1)
>>> rule
{'a': ['b', 'd'], 'b': ['b', 'c'], 'c': ['a']}
I have been tasked with calculating the entropy of this system. In order to do that I think I would need to know:
How often a value appears in the dictionary in total, i.e. its total frequency.
How often a value appears given a key in the dictionary, i.e. its relative frequency.
Is there a quick way to get both of these numbers for each of the values in the dictionary?
For the above example I would need this output:
'a' total: 1, 'a'|'a': 0, 'a'|'b': 0, 'a'|'c': 1
'b' total: 2, 'b'|'a': 1, 'b'|'b': 1, 'b'|'c': 0
'c' total: 1, 'c'|'a': 0, 'c'|'b': 1, 'c'|'c': 0
'd' total: 1, 'd'|'a': 1, 'a'|'b': 1, 'a'|'c': 1
I guess the 'a' total is easily inferred, so maybe instead just output a list of triples for every unique item that appears in the dictionary:
[[('a', 'a', 0), ('a', 'b', 0), ('a', 'c', 1)], [('b', 'a', 1), ('b', 'b', 1), ('b', 'c', 0)], ...]
I'll just deal with "How often a value appears given a key in the dictionary", since you've said that "How often a value appears in the dictionary in total" is easily inferred.
If you just want to be able to look up the relative frequency of a value for a given key, it's easy to get that with a dict of Counter objects:
from collections import Counter
rule = {'a': ['b', 'd'], 'b': ['b', 'c'], 'c': ['a']}
freq = {k: Counter(v) for k, v in rule.items()}
… which gives you a freq like this:
{
'a': Counter({'b': 1, 'd': 1}),
'b': Counter({'b': 1, 'c': 1}),
'c': Counter({'a': 1})
}
… so that you can get the relative frequency of 'a' given the key 'c' like this:
>>> freq['c']['a']
1
Because Counter objects return 0 for nonexistent keys, you'll also get zero frequencies as you would expect:
>>> freq['a']['c']
0
If you need a list of 3-tuples as specified in your question, you can get that with a little extra work. Here's a function to do it:
def triples(rule):
freq = {k: Counter(v) for k, v in rule.items()}
all_values = sorted(set().union(*rule.values()))
sorted_keys = sorted(rule)
return [(v, k, freq[k][v]) for v in all_values for k in sorted_keys]
The only thing here which I think may not be self-explanatory is the all_values = ... line, which:
creates an empty set()
produces the union() of that set with all the individual elements of the lists in rule.values() (note the use of the argument-unpacking * operator)
converts the result into a sorted() list.
If you still have the original text, you can avoid all that work by using e.g. all_values = sorted(set(original_text)) instead.
Here it is in action:
>>> triples({'a': ['b', 'd'], 'b': ['b', 'c'], 'c': ['a']})
[
('a', 'a', 0), ('a', 'b', 0), ('a', 'c', 1),
('b', 'a', 1), ('b', 'b', 1), ('b', 'c', 0),
('c', 'a', 0), ('c', 'b', 1), ('c', 'c', 0),
('d', 'a', 1), ('d', 'b', 0), ('d', 'c', 0)
]
I cannot think of a quick way other than iterating over the word's characters, counting the occurences in each list of the dictionary and summing it in the end:
alphabet = sorted(set("abbcad"))
rule = {'a': ['b', 'd'], 'b': ['b', 'c'], 'c': ['a']}
totalMatrix = []
for elem in alphabet:
total = 0
occurences = []
for key in rule.keys():
currentCount = rule[key].count(elem)
total += currentCount
occurences.append((elem,key,currentCount))
totalMatrix.append([elem, total] + occurences)
for elem in totalMatrix:
print(elem)
The content of totalMatrix will be:
['a', 1, ('a', 'a', 0), ('a', 'b', 0), ('a', 'c', 1)]
['b', 2, ('b', 'a', 1), ('b', 'b', 1), ('b', 'c', 0)]
['c', 1, ('c', 'a', 0), ('c', 'b', 1), ('c', 'c', 0)]
['d', 1, ('d', 'a', 1), ('d', 'b', 0), ('d', 'c', 0)]

Most Pythonic way for creating a defaultdictionary counter

I am trying to count occurrences of various items based on condition. What I have until now is this function that given two items will increase the counter like this:
given [('a', 'a'), ('a', 'b'), ('b', 'a')] will output defaultdict(<class 'collections.Counter'>, {'a': Counter({'a': 1, 'b': 1}), 'b': Counter({'a': 1})
the function can be seen bellow
def freq(samples=None):
out = defaultdict(Counter)
if samples:
for (c, s) in samples:
out[c][s] += 1
return out
It is limited though to only work with tuples while I would like it to be more generic and work with any number of variables e.g., [('a', 'a', 'b'), ('a', 'b', 'c'), ('b', 'a', 'a')] would still work and I would be able to query the result for lets say res['a']['b'] and get the count for 'c' that is one.
What would be the best way to do this in Python?
Assuming all tuples in the list have the same length:
from collections import Counter
from itertools import groupby
from operator import itemgetter
def freq(samples=[]):
sorted_samples = sorted(samples)
if sorted_samples and len(sorted_samples[0]) > 2:
return {key: freq(value[1:] for value in values) for key, values in groupby(sorted_samples, itemgetter(0))}
else:
return {key: Counter(value[1] for value in values) for key, values in groupby(sorted_samples, itemgetter(0))}
That gives:
freq([('a', 'a'), ('a', 'b'), ('b', 'a'), ('a', 'c')])
>>> {'a': Counter({'a': 1, 'b': 1, 'c': 1}), 'b': Counter({'a': 1})}
freq([('a', 'a', 'a'), ('a', 'b', 'c'), ('b', 'a', 'a'), ('a', 'c', 'c')])
>>> {'a': {'a': Counter({'a': 1}), 'b': Counter({'c': 1}), 'c': Counter({'c': 1})}, 'b': {'a': Counter({'a': 1})}}
One option is to use the full tuples as keys
def freq(samples=[]):
out = Counter()
for sample in samples:
out[sample] += 1
return out
which would then return things as
Counter({('a', 'a', 'b'): 1, ('a', 'b', 'c'): 1, ('b', 'a', 'a'): 1})
You could convert the tuples to strings to select certain slices, e.g. "('a', 'b',". For example in a new dictionary {k: v for k,v in out.items() if str(k)[:10] == "('a', 'b',"}.
If the groups are indeed either 2 or 3 long, but never both, you can change to:
def freq(samples):
l = len(samples[0])
if l == 2:
out = defaultdict(lambda: 0)
for a, b in samples:
out[a][b] += 1
elif l == 3:
out = defaultdict(lambda: defaultdict(lambda: 0))
for a, b, c in samples:
out[a][b][c] += 1
return out

Categories

Resources