Explanation:
I wrote the following function to shorten the hostnames of machines used in production. The names have been changed here, but the same structure and format has been preserved. The code below is clunky, and I would like to make the code DRY (Don't Repeat Yourself). Readability is also important, as this is code that might need to be maintained or understood by more than just myself.
Code:
def shorten_hostnames(machines):
# split items
d = {k: v.split('.') for k, v in machines.items()}
# trim end
while all(d.values()):
if not len(set([v[-1] for v in d.values()])) == 1:
break
if not all(len(v) > 1 for v in d.values()):
break
d = {k: v[:-1] for k, v in d.items()}
# trim start
while all(d.values()):
if not len(set([v[0] for v in d.values()])) == 1:
break
if not all(len(v) > 1 for v in d.values()):
break
d = {k: v[1:] for k, v in d.items()}
# join items
d = {k: '.'.join(v) for k, v in d.items()}
# return shortened hostnames
return d
Sample Input:
machines = {'a.ace.site.info': 'a.ace.site.info',
'b.ace.site.info': 'b.ace.site.info',
'a.bob.site.info': 'a.bob.site.info',
'b.bob.site.info': 'b.bob.site.info',}
Output:
>>> for k, v in shorten_hostnames(machines).items():
print k, '-->', v
b.ace.site.info --> b.ace
a.ace.site.info --> a.ace
b.bob.site.info --> b.bob
a.bob.site.info --> a.bob
Where and why I need your help:
I was trying to embed a function that would do the trimming from either end based on supplied parameters, but I can't figure out how to modify the slice notation to trim from either the start or the end of the list. I am sure there is a simple solution I am overlooking, either with slice notation or something else.
Gotcha's:
There are a few things that need to be mentioned here that are what you would call a "Gotcha". If only one hostname is passed into the function, (example machines = {'a.ace.site.info': 'a.ace.site.info'}) it should return only the first part (in the example a). Also - there should be no duplicate results in the final answer. Also - the hostnames can have different lengths from each other (not the same amount of segments)
Afterthought:
Once a proper solution can be found, I will edit the question title and tags to better reflect how this can apply to future visitors of the site. For instance, if slice notation is the solution (and it can be applied dynamically) I would probably modify the question to reflect that dynamic slice notation is the topic of the question.
More Sample Input and Expected Ouput:
# In
machines = {'ace.a.site.info': 'ace.a.site.info',
'ace.b.site.info': 'ace.b.site.info',}
# Out
ace.b.site.info --> b
ace.a.site.info --> a
# In
machines = {'a.ace.site.info': 'a.ace.site.info',}
# Out
a.ace.site.info --> a
# In
machines = {'ace.a.site.info': 'ace.a.site.info',
'ace.b.site.com': 'ace.b.site.com',}
# Out
ace.b.site.com --> b.site.com
ace.a.site.info --> a.site.info
At the very least, split out the values and keys into separate lists, then process just the values before reconstituting your dictionary, and use a short loop to pick an index for start and end trimming:
def shorten_hostnames(machines):
keys, values = zip(*machines.items())
values = [v.split('.') for v in values]
for i, s in ((-1, slice(-1)), (0, slice(1, None))):
while all(values):
if not len(set(v[i] for v in values)) == 1:
break
if any(len(v) <= 1 for v in values):
break
values = [v[s] for v in values]
return {k: '.'.join(v) for k, v in zip(keys, values)}
I'd use a utility function to remove a common prefix from a sequence of sequences, then pass in reversed sequences to remove trailing parts:
from itertools import dropwhile, izip_longest
def remove_common_prefix(*parts):
# always leaves a last common element in place
zipped = izip_longest(*(p[:-1] for p in parts), fillvalue=None)
stripped = dropwhile(lambda v: len(set(v)) == 1, zipped)
res = [filter(None, part) + (old[-1],) for part, old in zip(zip(*stripped), parts)]
# filtered everything away? Then return just the last parts
return res or [p[-1:] for p in parts]
def shorten_hostnames(machines):
# edge-case; faster to just return the first part
if len(machines) == 1:
return {k: v.split('.', 1)[0] for k, v in machines.items()}
keys, values = zip(*machines.items()) # for easier processing and re-assembling
parts = remove_common_prefix(*(v.split('.')[::-1] for v in values))
parts = remove_common_prefix(*(part[::-1] for part in parts))
return {k: '.'.join(v) for k, v in zip(keys, parts)}
This handles both your input and names of uneven length:
>>> shorten_hostnames(machines)
{'b.ace.site.info': 'b.ace', 'a.ace.site.info': 'a.ace', 'b.bob.site.info': 'b.bob', 'a.bob.site.info': 'a.bob'}
>>> shorten_hostnames({'foo': 'a.ace.site', 'bar': 'a.ace.site.info'})
{'foo': 'site', 'bar': 'site.info'}
>>> shorten_hostnames({'ace.a.site.info': 'ace.a.site.info', 'ace.b.site.info': 'ace.b.site.info'})
{'ace.b.site.info': 'b', 'ace.a.site.info': 'a'}
>>> shorten_hostnames({'ace.a.site.info': 'ace.a.site.info'})
{'ace.a.site.info': 'ace'}
def shorten_hostnames(machines):
def trim(hostnames, head=True):
while all(len(v) > 1 for v in hostnames) and len(set(v[0 if head else -1] for v in hostnames)) == 1:
hostnames[:] = [v[1:] if head else v[:-1] for v in hostnames]
keys, values = zip(*machines.items())
hostnames = [v.split('.') for v in values]
trim(hostnames, False)
trim(hostnames)
return {k: '.'.join(v) for k, v in zip(keys, hostnames)}
Related
I have list of strings as input
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
expected outputs
str_list1_output = [e1, e2] [g1, o1]
str_list2_output = [e1] [o1] [e2] [g1]
I wrote below code but it is failing for str_list2
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
entities, properties = zip(*(s.split(":") for s in str_list1))
print(set(entities)) # {'e1', 'e2'}
print(set(properties)) # {'g1', 'o1'}
I have 2 questions:-
What is wrong in the code? Above code returning same output for str_list2 as well instead of 4 separate list "[e1] [o1] [e2] [g1]"
Can I use zip to generate 2 separate sets of entities and properties instead of converting list to set while printing it?
Not sure if this is an exact answer to your question (namely, why the code you have tried is not giving the result you want), but I believe the code below will give the desired result, and perhaps the insight needed to answer your question can be found here:
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
str_list3 = ['e1:g1', 'e1:o1', 'e2:o1']
def foo(str_list):
from collections import defaultdict
dct = defaultdict(list)
for s in str_list:
entity, property = s.split(":")
dct[entity] += [property]
for k in dct:
dct[k] = tuple(dct[k])
rdct = defaultdict(list)
for k, v in dct.items():
rdct[v] += [k]
L = []
for k, v in rdct.items():
L += [v, list(k)]
print(L)
foo(str_list1)
foo(str_list2)
foo(str_list3)
Output:
[['e1', 'e2'], ['g1', 'o1']]
[['e1'], ['o1'], ['e2'], ['g1']]
[['e1'], ['g1', 'o1'], ['e2'], ['o1']]
I am trying to access a specific key in a nest dictionary, then match its value to a string in a list. If the string in the list contains the string in the dictionary value, I want to override the dictionary value with the list value. below is an example.
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl'
}
The key I'm looking for is B, the objective is to override string6 with string6~, string4 with string4~, and so on for all B keys found in the my_iterable.
I have written a function to compute the Levenshtein distance between two strings, but I am struggling to write an efficient ways to override the values of the keys.
def find_and_replace(key, dictionary, original_list):
for k, v in dictionary.items():
if k == key:
#function to check if original_list item contains v
yield v
elif isinstance(v, dict):
for result in find_and_replace(key, v, name_list):
yield result
elif isinstance(v, list):
for d in v:
if isinstance(d, dict):
for result in find_and_replace(key, d, name_list):
yield result
if I call
updated_dict = find_and_replace('B', my_iterable, my_list)
I want updated_dict to return the below:
{'A':'xyz',
'B':'string6~',
'C':[{'B':'string4~', 'D':'123'}],
'E':[{'F':'321', 'B':'string1~'}],
'G':'jkl'
}
Is this the right approach to the most efficient solution, and how can I modify it to return a dictionary with the updated values for B?
You can use below code. I have assumed the structure of input dict to be same throughout the execution.
# Input List
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
# Input Dict
# Removed duplicate key "B" from the dict
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl',
}
# setting search key
search_key = "B"
# Main code
for i, v in my_iterable.items():
if i == search_key:
if not isinstance(v,list):
search_in_list = [i for i in my_list if v in i]
if search_in_list:
my_iterable[i] = search_in_list[0]
else:
try:
for j, k in v[0].items():
if j == search_key:
search_in_list = [l for l in my_list if k in l]
if search_in_list:
v[0][j] = search_in_list[0]
except:
continue
# print output
print (my_iterable)
# Result -> {'A': 'xyz', 'B': 'string6~', 'C': [{'B': 'string4~', 'D': '123'}], 'E': [{'F': '321', 'B': 'string1~'}], 'G': 'jkl'}
Above can has scope of optimization using list comprehension or using
a function
I hope this helps and counts!
In some cases, if your nesting is kind of complex you can treat the dictionary like a json string and do all sorts of replacements. Its probably not what people would call very pythonic, but gives you a little more flexibility.
import re, json
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl'}
json_str = json.dumps(my_iterable, ensure_ascii=False)
for val in my_list:
json_str = re.sub(re.compile(f"""("[B]":\\W?")({val[:-1]})(")"""), r"\1" + val + r"\3", json_str)
my_iterable = json.loads(json_str)
print(my_iterable)
So what I was trying to do was output the string "33k22k11k", which is just the last value followed by the reversed last key followed by the second last value followed by the second last reversed key and so on. I'm not sure how to get the reversed key value for the specific loop that I am in. From the code I currently I have, I get the output:
dict = {"k1":1, "k2":2, "k3":3}
current=""
current_k=""
for k,v in dict.items():
for i in k:
current_k=i+current_k
current=str(v)+current_k+current
print(current)
print(current_k)
33k2k1k22k1k11k
3k2k1k
Edited
First of all, if you are on python < 3.6, dict does not keep the order of items. You might want to use collections.OrderedDict for your purpose.
d = {"k1":1, "k2":2, "k3":3}
d.keys()
# dict_keys(['k2', 'k1', 'k3'])
whereas,
d = OrderedDict()
d['k1'] = 1
d['k2'] = 2
d['k3'] = 3
d.keys()
# odict_keys(['k1', 'k2', 'k3'])
With our new d, you can either add the key and values and reverse it:
res = ''
for k, v in d.items():
res += str(k) + str(v)
res[::-1]
# '33k22k11k'
or reversely iterate:
res = ''
for k, v in reversed(d.items()):
res += str(v)[::-1] + str(k)[::-1]
res
# '33k22k11k'
I may be wrong but it seems like you would want to reset the value of current_k each time you access a new key
dict = {"k1":1, "k2":2, "k3":3}
current=""
for k,v in dict.items():
current_k=""
for i in k:
current_k=i+current_k
current=str(v)+current_k+current
print(current)
print(current_k)
Why not simply do:
print(''.join([a+str(b) for a,b in dict.items()])[::-1])
Output:
"33k22k11k"
But if the values are different from the keys, do:
print(''.join([str(b)[::-1]+a for a,b in dict.items()[::-1]]))
You can use the Python map function to create the reversed string(using f-string) for each key/value pair and then join it.
dict1 = {"k1":1, "k2":2, "k3":3}
new_dict = "".join(map(lambda k, v: f'{k}{v}'[::-1] , dict1.keys(), dict1.values()))
Output:
33k22k11k
You can do something like this perhaps:
dict = {"k1":1, "k2":2, "k3":3}
print("".join(list(reversed([str(v)+"".join(reversed(k)) for k, v in dict.items()]))))
Output:
33k22k11k
I would like to recognize and group duplicates values in a dictionary. To do this I build a pseudo-hash (better read signature) of my data set as follow:
from pickle import dumps
taxonomy = {}
binder = defaultdict(list)
for key, value in ds.items():
signature = dumps(value)
taxonomy[signature] = value
binder[signature].append(key)
For a concrete use-case see this question.
Unfortunately I realized that if the following statement is True:
>>> ds['key1'] == ds['key2']
True
This one is not always True anymore:
>>> dumps(ds['key1']) == dumps(ds['key2'])
False
I notice the key order on the dumped output differ for both dict. If I copy/paste the output of ds['key1'] and ds['key2'] into new dictionaries I can make the comparison successful.
As an overkill alternative I could traverse my dataset recursively and replace dict instances with OrderedDict:
import copy
def faithfulrepr(od):
od = od.deepcopy(od)
if isinstance(od, collections.Mapping):
res = collections.OrderedDict()
for k, v in sorted(od.items()):
res[k] = faithfulrepr(v)
return repr(res)
if isinstance(od, list):
for i, v in enumerate(od):
od[i] = faithfulrepr(v)
return repr(od)
return repr(od)
>>> faithfulrepr(ds['key1']) == faithfulrepr(ds['key2'])
True
I am worried about this naive approach because I do not know whether I cover all the possible situations.
What other (generic) alternative can I use?
The first thing is to remove the call to deepcopy which is your bottleneck here:
def faithfulrepr(ds):
if isinstance(ds, collections.Mapping):
res = collections.OrderedDict(
(k, faithfulrepr(v)) for k, v in sorted(ds.items())
)
elif isinstance(ds, list):
res = [faithfulrepr(v) for v in ds]
else:
res = ds
return repr(res)
However sorted and repr have their drawbacks:
you can't trully compare custom types;
you can't use mappings with different types of keys.
So the second thing is to get rid of faithfulrepr and compare objects with __eq__:
binder, values = [], []
for key, value in ds.items():
try:
index = values.index(value)
except ValueError:
values.append(value)
binder.append([key])
else:
binder[index].append(key)
grouped = dict(zip(map(tuple, binder), values))
This question already has answers here:
How to filter a dictionary according to an arbitrary condition function?
(7 answers)
Closed 7 years ago.
I have a dictionary of string keys and float values.
mydict = {}
mydict["joe"] = 20
mydict["bill"] = 20.232
mydict["tom"] = 0.0
I want to filter the dictionary to only include pairs that have a value greater than zero.
In C#, I would do something like this:
dict = dict.Where(r=>r.Value > 0);
What is the equivalent code in Python?
d = dict((k, v) for k, v in d.iteritems() if v > 0)
In Python 2.7 and up, there's nicer syntax for this:
d = {k: v for k, v in d.items() if v > 0}
Note that this is not strictly a filter because it does create a new dictionary.
Assuming your original dictionary is d1 you could use something like:
d2 = dict((k, v) for k, v in d1.items() if v > 0)
By the way, note that dict is already reserved in python.
The dict constructor can take a sequence of (key,value) pairs, and the iteritems method of a dict produces a sequence of (key,value) pairs. It's two great tastes that taste great together.
newDict = dict([item for item in oldDict.iteritems() if item[1] > 0])
foo = {}
foo["joe"] = 20
foo["bill"] = 20.232
foo["tom"] = 0.0
bar = dict((k,v) for k,v in foo.items() if v>0)
dict is a keyword in Python so I replaced it with foo.
first of all you should not use the keyword dict as a variable name as it pollutes the namespace, and prevents you from referencing the dict class in the current or embedded scope.
d = {}
d["joe"] = 20
d["bill"] = 20.232
d["tom"] = 0.0
# create an intermediate generator that is fed into dict constructor
# via a list comprehension
# this is more efficient that the pure "[...]" variant
d2 = dict(((k, v) for (k, v) in d.iteritems() if v > 0))
print d2
# {'bill': 20.232, 'joe': 20}
Alternatively, you could just create the generator and iterator over it directly. This more like a "filter", because the generator only references the values in the original dict instead of making a subset copy; and hence is more efficient than creating a new dictionary :
filtered = ((k, v) for (k, v) in d.iteritems() if v > 0)
print filtered
# <generator object <genexpr> at 0x034A18F0>
for k, v in filtered:
print k, v
# bill 20.232
# joe 20
try
y = filter(lambda x:dict[x] > 0.0,dict.keys())
the lambda is feed the keys from the dict, and compares the values in the dict for each key, against the criteria, returning back the acceptable keys.