Remove common elements of hostnames (shorten hostnames) - DRY

Remove common elements of hostnames (shorten hostnames) - DRY - python

Explanation:
I wrote the following function to shorten the hostnames of machines used in production. The names have been changed here, but the same structure and format has been preserved. The code below is clunky, and I would like to make the code DRY (Don't Repeat Yourself). Readability is also important, as this is code that might need to be maintained or understood by more than just myself.
Code:
def shorten_hostnames(machines):
# split items
d = {k: v.split('.') for k, v in machines.items()}
# trim end
while all(d.values()):
if not len(set([v[-1] for v in d.values()])) == 1:
break
if not all(len(v) > 1 for v in d.values()):
break
d = {k: v[:-1] for k, v in d.items()}
# trim start
while all(d.values()):
if not len(set([v[0] for v in d.values()])) == 1:
break
if not all(len(v) > 1 for v in d.values()):
break
d = {k: v[1:] for k, v in d.items()}
# join items
d = {k: '.'.join(v) for k, v in d.items()}
# return shortened hostnames
return d
Sample Input:
machines = {'a.ace.site.info': 'a.ace.site.info',
'b.ace.site.info': 'b.ace.site.info',
'a.bob.site.info': 'a.bob.site.info',
'b.bob.site.info': 'b.bob.site.info',}
Output:
>>> for k, v in shorten_hostnames(machines).items():
print k, '-->', v
b.ace.site.info --> b.ace
a.ace.site.info --> a.ace
b.bob.site.info --> b.bob
a.bob.site.info --> a.bob
Where and why I need your help:
I was trying to embed a function that would do the trimming from either end based on supplied parameters, but I can't figure out how to modify the slice notation to trim from either the start or the end of the list. I am sure there is a simple solution I am overlooking, either with slice notation or something else.
Gotcha's:
There are a few things that need to be mentioned here that are what you would call a "Gotcha". If only one hostname is passed into the function, (example machines = {'a.ace.site.info': 'a.ace.site.info'}) it should return only the first part (in the example a). Also - there should be no duplicate results in the final answer. Also - the hostnames can have different lengths from each other (not the same amount of segments)
Afterthought:
Once a proper solution can be found, I will edit the question title and tags to better reflect how this can apply to future visitors of the site. For instance, if slice notation is the solution (and it can be applied dynamically) I would probably modify the question to reflect that dynamic slice notation is the topic of the question.
More Sample Input and Expected Ouput:
# In
machines = {'ace.a.site.info': 'ace.a.site.info',
'ace.b.site.info': 'ace.b.site.info',}
# Out
ace.b.site.info --> b
ace.a.site.info --> a
# In
machines = {'a.ace.site.info': 'a.ace.site.info',}
# Out
a.ace.site.info --> a
# In
machines = {'ace.a.site.info': 'ace.a.site.info',
'ace.b.site.com': 'ace.b.site.com',}
# Out
ace.b.site.com --> b.site.com
ace.a.site.info --> a.site.info

At the very least, split out the values and keys into separate lists, then process just the values before reconstituting your dictionary, and use a short loop to pick an index for start and end trimming:
def shorten_hostnames(machines):
keys, values = zip(*machines.items())
values = [v.split('.') for v in values]
for i, s in ((-1, slice(-1)), (0, slice(1, None))):
while all(values):
if not len(set(v[i] for v in values)) == 1:
break
if any(len(v) <= 1 for v in values):
break
values = [v[s] for v in values]
return {k: '.'.join(v) for k, v in zip(keys, values)}
I'd use a utility function to remove a common prefix from a sequence of sequences, then pass in reversed sequences to remove trailing parts:
from itertools import dropwhile, izip_longest
def remove_common_prefix(*parts):
# always leaves a last common element in place
zipped = izip_longest(*(p[:-1] for p in parts), fillvalue=None)
stripped = dropwhile(lambda v: len(set(v)) == 1, zipped)
res = [filter(None, part) + (old[-1],) for part, old in zip(zip(*stripped), parts)]
# filtered everything away? Then return just the last parts
return res or [p[-1:] for p in parts]
def shorten_hostnames(machines):
# edge-case; faster to just return the first part
if len(machines) == 1:
return {k: v.split('.', 1)[0] for k, v in machines.items()}
keys, values = zip(*machines.items()) # for easier processing and re-assembling
parts = remove_common_prefix(*(v.split('.')[::-1] for v in values))
parts = remove_common_prefix(*(part[::-1] for part in parts))
return {k: '.'.join(v) for k, v in zip(keys, parts)}
This handles both your input and names of uneven length:
>>> shorten_hostnames(machines)
{'b.ace.site.info': 'b.ace', 'a.ace.site.info': 'a.ace', 'b.bob.site.info': 'b.bob', 'a.bob.site.info': 'a.bob'}
>>> shorten_hostnames({'foo': 'a.ace.site', 'bar': 'a.ace.site.info'})
{'foo': 'site', 'bar': 'site.info'}
>>> shorten_hostnames({'ace.a.site.info': 'ace.a.site.info', 'ace.b.site.info': 'ace.b.site.info'})
{'ace.b.site.info': 'b', 'ace.a.site.info': 'a'}
>>> shorten_hostnames({'ace.a.site.info': 'ace.a.site.info'})
{'ace.a.site.info': 'ace'}

def shorten_hostnames(machines):
def trim(hostnames, head=True):
while all(len(v) > 1 for v in hostnames) and len(set(v[0 if head else -1] for v in hostnames)) == 1:
hostnames[:] = [v[1:] if head else v[:-1] for v in hostnames]
keys, values = zip(*machines.items())
hostnames = [v.split('.') for v in values]
trim(hostnames, False)
trim(hostnames)
return {k: '.'.join(v) for k, v in zip(keys, hostnames)}

Related

Split list of strings based on occurrence count

I have list of strings as input
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
expected outputs
str_list1_output = [e1, e2] [g1, o1]
str_list2_output = [e1] [o1] [e2] [g1]
I wrote below code but it is failing for str_list2
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
entities, properties = zip(*(s.split(":") for s in str_list1))
print(set(entities)) # {'e1', 'e2'}
print(set(properties)) # {'g1', 'o1'}
I have 2 questions:-
What is wrong in the code? Above code returning same output for str_list2 as well instead of 4 separate list "[e1] [o1] [e2] [g1]"
Can I use zip to generate 2 separate sets of entities and properties instead of converting list to set while printing it?

Not sure if this is an exact answer to your question (namely, why the code you have tried is not giving the result you want), but I believe the code below will give the desired result, and perhaps the insight needed to answer your question can be found here:
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
str_list3 = ['e1:g1', 'e1:o1', 'e2:o1']
def foo(str_list):
from collections import defaultdict
dct = defaultdict(list)
for s in str_list:
entity, property = s.split(":")
dct[entity] += [property]
for k in dct:
dct[k] = tuple(dct[k])
rdct = defaultdict(list)
for k, v in dct.items():
rdct[v] += [k]
L = []
for k, v in rdct.items():
L += [v, list(k)]
print(L)
foo(str_list1)
foo(str_list2)
foo(str_list3)
Output:
[['e1', 'e2'], ['g1', 'o1']]
[['e1'], ['o1'], ['e2'], ['g1']]
[['e1'], ['g1', 'o1'], ['e2'], ['o1']]

Get specific key of a nested iterable and check if its value exists in a list

I am trying to access a specific key in a nest dictionary, then match its value to a string in a list. If the string in the list contains the string in the dictionary value, I want to override the dictionary value with the list value. below is an example.
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl'
}
The key I'm looking for is B, the objective is to override string6 with string6~, string4 with string4~, and so on for all B keys found in the my_iterable.
I have written a function to compute the Levenshtein distance between two strings, but I am struggling to write an efficient ways to override the values of the keys.
def find_and_replace(key, dictionary, original_list):
for k, v in dictionary.items():
if k == key:
#function to check if original_list item contains v
yield v
elif isinstance(v, dict):
for result in find_and_replace(key, v, name_list):
yield result
elif isinstance(v, list):
for d in v:
if isinstance(d, dict):
for result in find_and_replace(key, d, name_list):
yield result
if I call
updated_dict = find_and_replace('B', my_iterable, my_list)
I want updated_dict to return the below:
{'A':'xyz',
'B':'string6~',
'C':[{'B':'string4~', 'D':'123'}],
'E':[{'F':'321', 'B':'string1~'}],
'G':'jkl'
}
Is this the right approach to the most efficient solution, and how can I modify it to return a dictionary with the updated values for B?

You can use below code. I have assumed the structure of input dict to be same throughout the execution.
# Input List
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
# Input Dict
# Removed duplicate key "B" from the dict
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl',
}
# setting search key
search_key = "B"
# Main code
for i, v in my_iterable.items():
if i == search_key:
if not isinstance(v,list):
search_in_list = [i for i in my_list if v in i]
if search_in_list:
my_iterable[i] = search_in_list[0]
else:
try:
for j, k in v[0].items():
if j == search_key:
search_in_list = [l for l in my_list if k in l]
if search_in_list:
v[0][j] = search_in_list[0]
except:
continue
# print output
print (my_iterable)
# Result -> {'A': 'xyz', 'B': 'string6~', 'C': [{'B': 'string4~', 'D': '123'}], 'E': [{'F': '321', 'B': 'string1~'}], 'G': 'jkl'}
Above can has scope of optimization using list comprehension or using
a function
I hope this helps and counts!

In some cases, if your nesting is kind of complex you can treat the dictionary like a json string and do all sorts of replacements. Its probably not what people would call very pythonic, but gives you a little more flexibility.
import re, json
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl'}
json_str = json.dumps(my_iterable, ensure_ascii=False)
for val in my_list:
json_str = re.sub(re.compile(f"""("[B]":\\W?")({val[:-1]})(")"""), r"\1" + val + r"\3", json_str)
my_iterable = json.loads(json_str)
print(my_iterable)

Reversing the key values in a dictionary (advanced reverse string in Python)

So what I was trying to do was output the string "33k22k11k", which is just the last value followed by the reversed last key followed by the second last value followed by the second last reversed key and so on. I'm not sure how to get the reversed key value for the specific loop that I am in. From the code I currently I have, I get the output:
dict = {"k1":1, "k2":2, "k3":3}
current=""
current_k=""
for k,v in dict.items():
for i in k:
current_k=i+current_k
current=str(v)+current_k+current
print(current)
print(current_k)
33k2k1k22k1k11k
3k2k1k

Edited
First of all, if you are on python < 3.6, dict does not keep the order of items. You might want to use collections.OrderedDict for your purpose.
d = {"k1":1, "k2":2, "k3":3}
d.keys()
# dict_keys(['k2', 'k1', 'k3'])
whereas,
d = OrderedDict()
d['k1'] = 1
d['k2'] = 2
d['k3'] = 3
d.keys()
# odict_keys(['k1', 'k2', 'k3'])
With our new d, you can either add the key and values and reverse it:
res = ''
for k, v in d.items():
res += str(k) + str(v)
res[::-1]
# '33k22k11k'
or reversely iterate:
res = ''
for k, v in reversed(d.items()):
res += str(v)[::-1] + str(k)[::-1]
res
# '33k22k11k'

I may be wrong but it seems like you would want to reset the value of current_k each time you access a new key
dict = {"k1":1, "k2":2, "k3":3}
current=""
for k,v in dict.items():
current_k=""
for i in k:
current_k=i+current_k
current=str(v)+current_k+current
print(current)
print(current_k)

Why not simply do:
print(''.join([a+str(b) for a,b in dict.items()])[::-1])
Output:
"33k22k11k"
But if the values are different from the keys, do:
print(''.join([str(b)[::-1]+a for a,b in dict.items()[::-1]]))

You can use the Python map function to create the reversed string(using f-string) for each key/value pair and then join it.
dict1 = {"k1":1, "k2":2, "k3":3}
new_dict = "".join(map(lambda k, v: f'{k}{v}'[::-1] , dict1.keys(), dict1.values()))
Output:
33k22k11k

You can do something like this perhaps:
dict = {"k1":1, "k2":2, "k3":3}
print("".join(list(reversed([str(v)+"".join(reversed(k)) for k, v in dict.items()]))))
Output:
33k22k11k

Find duplicates for mixed type values in dictionaries

I would like to recognize and group duplicates values in a dictionary. To do this I build a pseudo-hash (better read signature) of my data set as follow:
from pickle import dumps
taxonomy = {}
binder = defaultdict(list)
for key, value in ds.items():
signature = dumps(value)
taxonomy[signature] = value
binder[signature].append(key)
For a concrete use-case see this question.
Unfortunately I realized that if the following statement is True:
>>> ds['key1'] == ds['key2']
True
This one is not always True anymore:
>>> dumps(ds['key1']) == dumps(ds['key2'])
False
I notice the key order on the dumped output differ for both dict. If I copy/paste the output of ds['key1'] and ds['key2'] into new dictionaries I can make the comparison successful.
As an overkill alternative I could traverse my dataset recursively and replace dict instances with OrderedDict:
import copy
def faithfulrepr(od):
od = od.deepcopy(od)
if isinstance(od, collections.Mapping):
res = collections.OrderedDict()
for k, v in sorted(od.items()):
res[k] = faithfulrepr(v)
return repr(res)
if isinstance(od, list):
for i, v in enumerate(od):
od[i] = faithfulrepr(v)
return repr(od)
return repr(od)
>>> faithfulrepr(ds['key1']) == faithfulrepr(ds['key2'])
True
I am worried about this naive approach because I do not know whether I cover all the possible situations.
What other (generic) alternative can I use?

The first thing is to remove the call to deepcopy which is your bottleneck here:
def faithfulrepr(ds):
if isinstance(ds, collections.Mapping):
res = collections.OrderedDict(
(k, faithfulrepr(v)) for k, v in sorted(ds.items())
)
elif isinstance(ds, list):
res = [faithfulrepr(v) for v in ds]
else:
res = ds
return repr(res)
However sorted and repr have their drawbacks:
you can't trully compare custom types;
you can't use mappings with different types of keys.
So the second thing is to get rid of faithfulrepr and compare objects with __eq__:
binder, values = [], []
for key, value in ds.items():
try:
index = values.index(value)
except ValueError:
values.append(value)
binder.append([key])
else:
binder[index].append(key)
grouped = dict(zip(map(tuple, binder), values))

The best way to filter a dictionary in Python [duplicate]

This question already has answers here:
How to filter a dictionary according to an arbitrary condition function?
(7 answers)
Closed 7 years ago.
I have a dictionary of string keys and float values.
mydict = {}
mydict["joe"] = 20
mydict["bill"] = 20.232
mydict["tom"] = 0.0
I want to filter the dictionary to only include pairs that have a value greater than zero.
In C#, I would do something like this:
dict = dict.Where(r=>r.Value > 0);
What is the equivalent code in Python?

d = dict((k, v) for k, v in d.iteritems() if v > 0)
In Python 2.7 and up, there's nicer syntax for this:
d = {k: v for k, v in d.items() if v > 0}
Note that this is not strictly a filter because it does create a new dictionary.

Assuming your original dictionary is d1 you could use something like:
d2 = dict((k, v) for k, v in d1.items() if v > 0)
By the way, note that dict is already reserved in python.

The dict constructor can take a sequence of (key,value) pairs, and the iteritems method of a dict produces a sequence of (key,value) pairs. It's two great tastes that taste great together.
newDict = dict([item for item in oldDict.iteritems() if item[1] > 0])

foo = {}
foo["joe"] = 20
foo["bill"] = 20.232
foo["tom"] = 0.0
bar = dict((k,v) for k,v in foo.items() if v>0)
dict is a keyword in Python so I replaced it with foo.

first of all you should not use the keyword dict as a variable name as it pollutes the namespace, and prevents you from referencing the dict class in the current or embedded scope.
d = {}
d["joe"] = 20
d["bill"] = 20.232
d["tom"] = 0.0
# create an intermediate generator that is fed into dict constructor
# via a list comprehension
# this is more efficient that the pure "[...]" variant
d2 = dict(((k, v) for (k, v) in d.iteritems() if v > 0))
print d2
# {'bill': 20.232, 'joe': 20}
Alternatively, you could just create the generator and iterator over it directly. This more like a "filter", because the generator only references the values in the original dict instead of making a subset copy; and hence is more efficient than creating a new dictionary :
filtered = ((k, v) for (k, v) in d.iteritems() if v > 0)
print filtered
# <generator object <genexpr> at 0x034A18F0>
for k, v in filtered:
print k, v
# bill 20.232
# joe 20

try
y = filter(lambda x:dict[x] > 0.0,dict.keys())
the lambda is feed the keys from the dict, and compares the values in the dict for each key, against the criteria, returning back the acceptable keys.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove common elements of hostnames (shorten hostnames) - DRY - python

Related

Split list of strings based on occurrence count

Get specific key of a nested iterable and check if its value exists in a list

Reversing the key values in a dictionary (advanced reverse string in Python)

Find duplicates for mixed type values in dictionaries

The best way to filter a dictionary in Python [duplicate]

Categories

Resources