Split list of strings based on occurrence count

Split list of strings based on occurrence count - python

I have list of strings as input
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
expected outputs
str_list1_output = [e1, e2] [g1, o1]
str_list2_output = [e1] [o1] [e2] [g1]
I wrote below code but it is failing for str_list2
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
entities, properties = zip(*(s.split(":") for s in str_list1))
print(set(entities)) # {'e1', 'e2'}
print(set(properties)) # {'g1', 'o1'}
I have 2 questions:-
What is wrong in the code? Above code returning same output for str_list2 as well instead of 4 separate list "[e1] [o1] [e2] [g1]"
Can I use zip to generate 2 separate sets of entities and properties instead of converting list to set while printing it?

Not sure if this is an exact answer to your question (namely, why the code you have tried is not giving the result you want), but I believe the code below will give the desired result, and perhaps the insight needed to answer your question can be found here:
str_list1 = ['e1:g1', 'e2:g1', 'e1:o1', 'e2:o1']
str_list2 = ['e1:o1', 'e2:g1']
str_list3 = ['e1:g1', 'e1:o1', 'e2:o1']
def foo(str_list):
from collections import defaultdict
dct = defaultdict(list)
for s in str_list:
entity, property = s.split(":")
dct[entity] += [property]
for k in dct:
dct[k] = tuple(dct[k])
rdct = defaultdict(list)
for k, v in dct.items():
rdct[v] += [k]
L = []
for k, v in rdct.items():
L += [v, list(k)]
print(L)
foo(str_list1)
foo(str_list2)
foo(str_list3)
Output:
[['e1', 'e2'], ['g1', 'o1']]
[['e1'], ['o1'], ['e2'], ['g1']]
[['e1'], ['g1', 'o1'], ['e2'], ['o1']]

Related

How can I update all entries of a dictionary in Python with certain logic?

I have a dictionary with key value pairs of a persons name with a list of domain names like so
dictionary = {
'Trent':['help.google.com', 'smooth.google.com', 'bob.google.com'],
'Bill':['help.google.com', 'smooth.google.com', 'bob.google.com', 'trent.awesome.net']}
I want to make it so that in the dictionary, there is only the parent domain (instead of smooth.google.com it's just google.com). Ordinarily, with a regular list I'll use code like this get the parent domain names:
domains = ['help.google.com', 'smooth.google.com', 'bob.google.com', 'trent.awesome.net']
parents = []
for domain in domains:
parents.append(domain[domain.index('.') + 1:])
Now I'm trying to combine that logic with logic that makes sure that in the dictionary, among values no matter the key, there are no duplicates using Counter and list comprehension. That code is this:
cnt = Counter()
for idx in result.values():
cnt.update(idx)
res = {idx: [key for key in j if cnt[key] == 1]
for idx, j in result.items()}
When I try to combine the logic, the BEST I'll get is an empty list next to the name. Using the above example of a dictionary the result will be
'Trent':[]
I tried using two for loops like so:
cnt = Counter()
for idx in result.values():
for x in idx:
x = x[x.index('.') + 1:]
cnt.update(idx)
res = {idx: [key for key in j if cnt[key] == 1]
for idx, j in result.items()}
Any help is greatly appreciated. I hope I've provided sufficient detail in my question.

This script will filter out domains in the list and keeps only parent domains:
dictionary = {
'Trent':['help.google.com', 'smooth.google.com', 'bob.google.com'],
'Bill':['help.google.com', 'smooth.google.com', 'bob.google.com', 'trent.awesome.net']}
out = {k: [*set(vv.split('.', maxsplit=1)[-1] for vv in v)] for k, v in dictionary.items()}
print(out)
Prints:
{'Trent': ['google.com'], 'Bill': ['google.com', 'awesome.net']}
EDIT: To filter out the duplicities across every key, you can use this:
out, seen = {}, set()
for k, v in dictionary.items():
for vv in v:
domain = vv.split('.', maxsplit=1)[-1]
if domain not in seen:
out.setdefault(k, []).append(domain)
seen.add(domain)
print(out)
Prints:
{'Trent': ['google.com'], 'Bill': ['awesome.net']}

Get specific key of a nested iterable and check if its value exists in a list

I am trying to access a specific key in a nest dictionary, then match its value to a string in a list. If the string in the list contains the string in the dictionary value, I want to override the dictionary value with the list value. below is an example.
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl'
}
The key I'm looking for is B, the objective is to override string6 with string6~, string4 with string4~, and so on for all B keys found in the my_iterable.
I have written a function to compute the Levenshtein distance between two strings, but I am struggling to write an efficient ways to override the values of the keys.
def find_and_replace(key, dictionary, original_list):
for k, v in dictionary.items():
if k == key:
#function to check if original_list item contains v
yield v
elif isinstance(v, dict):
for result in find_and_replace(key, v, name_list):
yield result
elif isinstance(v, list):
for d in v:
if isinstance(d, dict):
for result in find_and_replace(key, d, name_list):
yield result
if I call
updated_dict = find_and_replace('B', my_iterable, my_list)
I want updated_dict to return the below:
{'A':'xyz',
'B':'string6~',
'C':[{'B':'string4~', 'D':'123'}],
'E':[{'F':'321', 'B':'string1~'}],
'G':'jkl'
}
Is this the right approach to the most efficient solution, and how can I modify it to return a dictionary with the updated values for B?

You can use below code. I have assumed the structure of input dict to be same throughout the execution.
# Input List
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
# Input Dict
# Removed duplicate key "B" from the dict
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl',
}
# setting search key
search_key = "B"
# Main code
for i, v in my_iterable.items():
if i == search_key:
if not isinstance(v,list):
search_in_list = [i for i in my_list if v in i]
if search_in_list:
my_iterable[i] = search_in_list[0]
else:
try:
for j, k in v[0].items():
if j == search_key:
search_in_list = [l for l in my_list if k in l]
if search_in_list:
v[0][j] = search_in_list[0]
except:
continue
# print output
print (my_iterable)
# Result -> {'A': 'xyz', 'B': 'string6~', 'C': [{'B': 'string4~', 'D': '123'}], 'E': [{'F': '321', 'B': 'string1~'}], 'G': 'jkl'}
Above can has scope of optimization using list comprehension or using
a function
I hope this helps and counts!

In some cases, if your nesting is kind of complex you can treat the dictionary like a json string and do all sorts of replacements. Its probably not what people would call very pythonic, but gives you a little more flexibility.
import re, json
my_list = ['string1~', 'string2~', 'string3~', 'string4~', 'string5~', 'string6~']
my_iterable = {'A':'xyz',
'B':'string6',
'C':[{'B':'string4', 'D':'123'}],
'E':[{'F':'321', 'B':'string1'}],
'G':'jkl'}
json_str = json.dumps(my_iterable, ensure_ascii=False)
for val in my_list:
json_str = re.sub(re.compile(f"""("[B]":\\W?")({val[:-1]})(")"""), r"\1" + val + r"\3", json_str)
my_iterable = json.loads(json_str)
print(my_iterable)

Finding all the common and sub common values in a dictionary

I am new in using dicitonaries in python. I have a simple Problem at Hand. I have dicrionary named "Input".
Input={'VAR1':['K1','K2','K3','K4','K5...'],
'VAR2':['K3','K4',...],
'VAR3':['K2','K4','K5',...]}
The number of keys in the dictionary "Input" can vary. The Output i desire is to get a list of all common values and i want to get Sub common values as a dictionary
'K4' string common in all the lists (for all key values)
'K3' is only present in the list with key'VAR1' and 'VAR2'.
So it helps if i have the corresponding keys
Output:
Common_Value=['K4',....]
Subcommon_Values1=['VAR1':['K3....'],'VAR2':['K3....']]
Subcommon_values2=['VAR1':['K5',...],'VAR3':['K5',....]]
Can anyone help me with this?
Thank you

This will get you all of the common values:
sect = None
for k,v in Input.items():
if sect == None:
sect = set( v )
else:
sect = sect.intersection( set(v) )
Common_Value = list( sect )
Until you make clear what the difference between your different SubCommonValues results are, and their actual structure (you seem to have keys in lists), can't be sure this is what you want or not:
all = None
for k,v in Input.items():
if all == None:
all = set( v )
else:
all = all.union( set(v) )
diff = all.difference( sect )
uncom = { x:list() for x in diff }
for x in diff:
for k,v in Input.items():
if x in v:
uncom[x].append(k)
grps = {}
for k,v in uncom.items():
kv = tuple(v)
if kv not in grps:
grps[kv] = [k]
else:
grps[kv].append(k)
for k,v in grps.items():
print({ x:v for x in k })

you can do something like :
result=[i for i in Input if "K4" in Input[i]]
Example :
>>> Input={'VAR1':['K1','K2','K3','K4','K5'],'VAR2':['K3','K5'],'VAR3':['K2','K4','K5']}
>>> result=[i for i in Input if "K4" in Input[i]]
>>> result
['VAR1', 'VAR3']
EDIT :
maybe something like :
commonValues=['K4', 'K2']
result=[i for i in Input if set(commonValues).issubset(set(Input[i]))]
this will return the list of key in your Input dict that contain all element of the commonValues list...

Remove common elements of hostnames (shorten hostnames) - DRY

Explanation:
I wrote the following function to shorten the hostnames of machines used in production. The names have been changed here, but the same structure and format has been preserved. The code below is clunky, and I would like to make the code DRY (Don't Repeat Yourself). Readability is also important, as this is code that might need to be maintained or understood by more than just myself.
Code:
def shorten_hostnames(machines):
# split items
d = {k: v.split('.') for k, v in machines.items()}
# trim end
while all(d.values()):
if not len(set([v[-1] for v in d.values()])) == 1:
break
if not all(len(v) > 1 for v in d.values()):
break
d = {k: v[:-1] for k, v in d.items()}
# trim start
while all(d.values()):
if not len(set([v[0] for v in d.values()])) == 1:
break
if not all(len(v) > 1 for v in d.values()):
break
d = {k: v[1:] for k, v in d.items()}
# join items
d = {k: '.'.join(v) for k, v in d.items()}
# return shortened hostnames
return d
Sample Input:
machines = {'a.ace.site.info': 'a.ace.site.info',
'b.ace.site.info': 'b.ace.site.info',
'a.bob.site.info': 'a.bob.site.info',
'b.bob.site.info': 'b.bob.site.info',}
Output:
>>> for k, v in shorten_hostnames(machines).items():
print k, '-->', v
b.ace.site.info --> b.ace
a.ace.site.info --> a.ace
b.bob.site.info --> b.bob
a.bob.site.info --> a.bob
Where and why I need your help:
I was trying to embed a function that would do the trimming from either end based on supplied parameters, but I can't figure out how to modify the slice notation to trim from either the start or the end of the list. I am sure there is a simple solution I am overlooking, either with slice notation or something else.
Gotcha's:
There are a few things that need to be mentioned here that are what you would call a "Gotcha". If only one hostname is passed into the function, (example machines = {'a.ace.site.info': 'a.ace.site.info'}) it should return only the first part (in the example a). Also - there should be no duplicate results in the final answer. Also - the hostnames can have different lengths from each other (not the same amount of segments)
Afterthought:
Once a proper solution can be found, I will edit the question title and tags to better reflect how this can apply to future visitors of the site. For instance, if slice notation is the solution (and it can be applied dynamically) I would probably modify the question to reflect that dynamic slice notation is the topic of the question.
More Sample Input and Expected Ouput:
# In
machines = {'ace.a.site.info': 'ace.a.site.info',
'ace.b.site.info': 'ace.b.site.info',}
# Out
ace.b.site.info --> b
ace.a.site.info --> a
# In
machines = {'a.ace.site.info': 'a.ace.site.info',}
# Out
a.ace.site.info --> a
# In
machines = {'ace.a.site.info': 'ace.a.site.info',
'ace.b.site.com': 'ace.b.site.com',}
# Out
ace.b.site.com --> b.site.com
ace.a.site.info --> a.site.info

At the very least, split out the values and keys into separate lists, then process just the values before reconstituting your dictionary, and use a short loop to pick an index for start and end trimming:
def shorten_hostnames(machines):
keys, values = zip(*machines.items())
values = [v.split('.') for v in values]
for i, s in ((-1, slice(-1)), (0, slice(1, None))):
while all(values):
if not len(set(v[i] for v in values)) == 1:
break
if any(len(v) <= 1 for v in values):
break
values = [v[s] for v in values]
return {k: '.'.join(v) for k, v in zip(keys, values)}
I'd use a utility function to remove a common prefix from a sequence of sequences, then pass in reversed sequences to remove trailing parts:
from itertools import dropwhile, izip_longest
def remove_common_prefix(*parts):
# always leaves a last common element in place
zipped = izip_longest(*(p[:-1] for p in parts), fillvalue=None)
stripped = dropwhile(lambda v: len(set(v)) == 1, zipped)
res = [filter(None, part) + (old[-1],) for part, old in zip(zip(*stripped), parts)]
# filtered everything away? Then return just the last parts
return res or [p[-1:] for p in parts]
def shorten_hostnames(machines):
# edge-case; faster to just return the first part
if len(machines) == 1:
return {k: v.split('.', 1)[0] for k, v in machines.items()}
keys, values = zip(*machines.items()) # for easier processing and re-assembling
parts = remove_common_prefix(*(v.split('.')[::-1] for v in values))
parts = remove_common_prefix(*(part[::-1] for part in parts))
return {k: '.'.join(v) for k, v in zip(keys, parts)}
This handles both your input and names of uneven length:
>>> shorten_hostnames(machines)
{'b.ace.site.info': 'b.ace', 'a.ace.site.info': 'a.ace', 'b.bob.site.info': 'b.bob', 'a.bob.site.info': 'a.bob'}
>>> shorten_hostnames({'foo': 'a.ace.site', 'bar': 'a.ace.site.info'})
{'foo': 'site', 'bar': 'site.info'}
>>> shorten_hostnames({'ace.a.site.info': 'ace.a.site.info', 'ace.b.site.info': 'ace.b.site.info'})
{'ace.b.site.info': 'b', 'ace.a.site.info': 'a'}
>>> shorten_hostnames({'ace.a.site.info': 'ace.a.site.info'})
{'ace.a.site.info': 'ace'}

def shorten_hostnames(machines):
def trim(hostnames, head=True):
while all(len(v) > 1 for v in hostnames) and len(set(v[0 if head else -1] for v in hostnames)) == 1:
hostnames[:] = [v[1:] if head else v[:-1] for v in hostnames]
keys, values = zip(*machines.items())
hostnames = [v.split('.') for v in values]
trim(hostnames, False)
trim(hostnames)
return {k: '.'.join(v) for k, v in zip(keys, hostnames)}

Combine Python dictionaries that have the same Key name

I have two separate Python List that have common key names in their respective dictionary. The second list called recordList has multiple dictionaries with the same key name that I want to append the first list clientList. Here are examples lists:
clientList = [{'client1': ['c1','f1']}, {'client2': ['c2','f2']}]
recordList = [{'client1': {'rec_1':['t1','s1']}}, {'client1': {'rec_2':['t2','s2']}}]
So the end result would be something like this so the records are now in a new list of multiple dictionaries within the clientList.
clientList = [{'client1': [['c1','f1'], [{'rec_1':['t1','s1']},{'rec_2':['t2','s2']}]]}, {'client2': [['c2','f2']]}]
Seems simple enough but I'm struggling to find a way to iterate both of these dictionaries using variables to find where they match.

When you are sure, that the key names are equal in both dictionaries:
clientlist = dict([(k, [clientList[k], recordlist[k]]) for k in clientList])
like here:
>>> a = {1:1,2:2,3:3}
>>> b = {1:11,2:12,3:13}
>>> c = dict([(k,[a[k],b[k]]) for k in a])
>>> c
{1: [1, 11], 2: [2, 12], 3: [3, 13]}

Assuming you want a list of values that correspond to each key in the two lists, try this as a start:
from pprint import pprint
clientList = [{'client1': ['c1','f1']}, {'client2': ['c2','f2']}]
recordList = [{'client1': {'rec_1':['t1','s1']}}, {'client1': {'rec_2':['t2','s2']}}]
clientList.extend(recordList)
outputList = {}
for rec in clientList:
k = rec.keys()[0]
v = rec.values()[0]
if k in outputList:
outputList[k].append(v)
else:
outputList[k] = [v,]
pprint(outputList)
It will produce this:
{'client1': [['c1', 'f1'], {'rec_1': ['t1', 's1']}, {'rec_2': ['t2', 's2']}],
'client2': [['c2', 'f2']]}

This could work but I am not sure I understand the rules of your data structure.
# join all the dicts for better lookup and update
clientDict = {}
for d in clientList:
for k, v in d.items():
clientDict[k] = clientDict.get(k, []) + v
recordDict = {}
for d in recordList:
for k, v in d.items():
recordDict[k] = recordDict.get(k, []) + [v]
for k, v in recordDict.items():
clientDict[k] = [clientDict[k]] + v
# I don't know why you need a list of one-key dicts but here it is
clientList = [dict([(k, v)]) for k, v in clientDict.items()]
With the sample data you provided this gives the result you wanted, hope it helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split list of strings based on occurrence count - python

Related

How can I update all entries of a dictionary in Python with certain logic?

Get specific key of a nested iterable and check if its value exists in a list

Finding all the common and sub common values in a dictionary

Remove common elements of hostnames (shorten hostnames) - DRY

Combine Python dictionaries that have the same Key name

Categories

Resources