How to extract incomplete Python objects from string - python

I am attempting to extract valid Python-parsable objects, such as dictionaries and lists, from strings. For example, from the string "[{'a' : 1, 'b' : 2}]", the script will extract [{'a' : 1, 'b' : 2}] since the {} and [] denote completed Python objects.
However, when the string output is incomplete, such as "[{'a' : 1, 'b' : 2}, {'a' : 1'}]", I only attempt to extract {'a' : 1, 'b' : 2} and place it into a list [{'a' : 1, 'b' : 2}], as the second Python object is not yet complete and therefore must be left out.
I tried to write a regex pattern to match completed {} or [], it works for simple output but failing on nested list or dict.
Code:
import re
def match_dict_list(string):
pattern = r"\[?\{[^\}\]]*\}\]?|\[?\[[^\]\[]*\]\]?"
matches = re.findall(pattern, string)
return matches
But it's failing on """[[1, 2, 3], [11, 12, 21]""" because it's matching [[1, 2, 3], [11, 12, 21] while the expected output is only [1, 2, 3], [11, 12, 21] and put it in list [ [[1, 2, 3], [11, 12, 21] ]
Some test cases
Case 1: "[{'a' : 1, 'b' : 2}, {'a' : 1'"
Expected output: [{'a': 1, 'b': 2}]
Case 2: '[[1, 2, 3], [11, 12, 21]'
Expected output: [[1, 2, 3], [11, 12, 21]]
Case 3: """[{'a': [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}], 'b': [{'a':"""
Expected output: [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]
I am getting the output from APIs but can't do anything from their side; sometimes, the server output is complete, and sometimes, it's incomplete.
I also tried the updated pattern : \[?\{[^\}\]]*\}\]?|\[[^\]\[]*\]|\[\[[^\]\[]*\]\] but it's failing on third case. what is the best option to solve this kind of issue?
I can't use ast.literal_eval because as I mentioned above the string output is incomplete such as " [ { 'a' : 1 } , {'b' : ".

I can't use ast.literal_eval because as I mentioned above the string output is incomplete such as " [ { 'a' : 1 } , {'b' : "
But you can iteratively try to apply ast.literal_eval after slicing off the last character and closing the first bracket [similarly to how json.loads was used in this solution suggested in #mous's comment].
def eval_brokenLiteral(litStr:str, defaultVal=None, printError=True):
try: return ast.literal_eval(litStr)
except SyntaxError as se: evalError = se
# litStr = litStr[:getattr(evalError, 'offset', len(litStr))]
bracketPairs = {'{': '}', '[': ']'}
closers, curCloser = [], ''
for c in ''.join(litStr.split()):
if c not in bracketPairs: break
curCloser = bracketPairs[c] + curCloser
closers.append(curCloser)
for closer in closers:
subStr = litStr.strip()
while subStr[1:]:
try: return ast.literal_eval(subStr + closer)
except SyntaxError: subStr = subStr[:-1].strip()
if printError: print(repr(evalError))
return defaultVal
[If it can't find a valid literal, it will return None unless some other defaultVal is specified.]
Tests:
testCases = [
"[{'a' : 1, 'b' : 2}, {'a' : 1'}]",
'[[1, 2, 3], [11, 12, 21]',
"""[{'a': [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}], 'b': [{'a':"""
]
for ti, tc in enumerate(testCases, 1):
print(f'Case {ti}: {repr(tc)}\n - ----> ', end='')
op = eval_brokenLiteral(tc)
print(f'Output ---> {repr(op)}' if op else '', '\n\n---\n')
Case 1: "[{'a' : 1, 'b' : 2}, {'a' : 1'}]"
----> Output ---> [{'a': 1, 'b': 2}]
Case 2: '[[1, 2, 3], [11, 12, 21]'
----> Output ---> [[1, 2, 3], [11, 12, 21]]
Case 3: "[{'a': [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}], 'b': [{'a':"
----> Output ---> [{'a': [{'a': 1, 'b': 2}, {'a': 1, 'b': 2}]}]

Related

Merging dictionaries not including duplicate values in python

I would like to merge two dictionaries, but if they have the same key, I would only merge non-duplicate values.
The following code works, but I have a question if it's possible to rewrite this when trying to get a union by using | or (**dict1, **dict2)? When I tried using |, my output would be from this dict_merge({ 'A': [1, 2, 3] }, { 'A': [2, 3, 4] }) to this {'A': [2, 3, 4]}
def dict_merge(dict1, dict2):
for key in dict2.keys():
if key in dict1.keys():
d3 = dict1[key] + dict2[key]
d3 = set(d3)
dict1[key] = list(d3)
else:
dict1[key] = dict2[key]
return dict1
dict_merge({ 'A': [1, 2, 3] }, { 'B': [2, 4, 5, 6]})
Output
{ 'A': [1, 2, 3], 'B': [2, 4, 5, 6] }
Giving your two dictionaries names, let's get the union of their keys.
>>> d1 = { 'A': [1, 2, 3] }
>>> d2 = { 'A': [2, 3, 4] }
>>> d1.keys() | d2.keys()
{'A'}
Assuming the lists are really sets based on your code, we can now iterate over the union of the keys in a dictionary comprehension, and union those two sets and turning them back into a list.
>>> {k: list(set(d1.get(k, [])) | set(d2.get(k, []))) for k in d1.keys() | d2.keys()}
{'A': [1, 2, 3, 4]}
If we incorporate some more interesting dictionaries and repeat the same dictionary comprehension:
>>> d1 = {'A': [1,2,3], 'B': [4,5,6]}
>>> d2 = {'B': [5,6,7,8], 'C': [9,10]}
>>> {k: list(set(d1.get(k, [])) | set(d2.get(k, []))) for k in d1.keys() | d2.keys()}
{'C': [9, 10], 'A': [1, 2, 3], 'B': [4, 5, 6, 7, 8]}
Is that the solution you're wanting?
In [61]: my_dict = {}
...: d1 = {'A': [1, 2, 3], 'C': 123}
...: d2 = {'B': [2, 3, 4], 'A': [1, 2, 3]}
...: for i in set(d1).symmetric_difference(set(d2)):
...: my_dict[i] = d1[i] if i in d1 else d2[i]
...: print(my_dict)
Output :
{'B': [2, 3, 4], 'C': 123}

Python extract common patterns of length X among a set of sequences

Let's say I have the following :
data = ['ABCD', 'ABABC', 'BCAABCD']
I'm trying to make a function, that uses Counter taking three argvs, one for the data, second for the minimum proportion of the number of sequences that must have this pattern for being taken into account, and a third one that is the maximum pattern length.
A working function should gives me the following :
>>> check(data, 0.50, 2)
Counter({'A': 3, 'AB': 3, 'B': 3, 'BC': 3, 'C': 3, 'CD': 2, 'D': 2})
>>> check(data, 0.34, 4)
Counter({'A': 3, 'AB': 3, 'ABC': 3, 'ABCD': 2, 'B': 3, 'BC': 3, 'BCD': 2, 'C': 3, 'CD': 2, 'D': 2})
I'm really lost on this thing, I just know how to get the combinations thing for two or more letter like this :
Counter(combinations(data[0], 2)) & Counter(combinations(data[1], 2)) & Counter(combinations(data[2], 2))
And I also know how to get the sum of the letters in all elements of data with :
Counter(data[0]) + Counter(data[1]) + Counter(data[2])
(Strange thing, I couldn't manage to do this sum using list comprehension as I would've liked to do because of an error saying I can't do '+' between 'str' and 'int'
If you guys can't give me full code, no problem, I only need some guidance on how to start the whole thing and to get the logic.
Have a nice day to the one who read my whole thing :)
You can use a recursive generator function to get all combinations of the merged substrings (with length <= the maximum) in data and find the substring intersections using collections.defaultdict:
from collections import defaultdict
data = ['ABCD', 'ABABC', 'BCAABCD']
def combos(d, l, c = []):
if c:
yield ''.join(c)
if d and len(c) < l:
yield from combos(d[1:], l, c+[d[0]])
if not c:
yield from combos(d[1:], l, c)
def check(d, p, l):
_d = defaultdict(set)
for i in d:
for j in combos(i, l):
_d[j].add(i)
return {a:len(b) for a, b in _d.items() if len(b)/len(d) >= p}
print(check(data, 0.50, 2))
print(check(data, 0.34, 4))
Output:
{'A': 3, 'AB': 3, 'B': 3, 'BC': 3, 'C': 3, 'CD': 2, 'D': 2}
{'A': 3, 'AB': 3, 'ABC': 3, 'ABCD': 2, 'B': 3, 'BC': 3, 'BCD': 2, 'C': 3, 'CD': 2, 'D': 2}

Python combine values of identical dictionaries without using looping

I have list of identical dictionaries:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
I need to get something like this:
a = [1, 4, 7]
b = [2, 5, 8]
c = [3, 6, 9]
I know how to do in using for .. in .., but is there way to do it without looping?
If i do
a, b, c = zip(*my_list)
i`m getting
a = ('a', 'a', 'a')
b = ('b', 'b', 'b')
c = ('c', 'c', 'c')
Any solution?
You need to extract all the values in my_list.You could try:
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
a, b, c = zip(*map(lambda d: d.values(), my_list))
print(a, b, c)
# (1, 4, 7) (2, 5, 8) (3, 6, 9)
Pointed out by #Alexandre,This work only when the dict is ordered.If you couldn't make sure the order, consider the answer of yatu.
You will have to loop to obtain the values from the inner dictionaries. Probably the most appropriate structure would be to have a dictionary, mapping the actual letter and a list of values. Assigning to different variables is usually not the best idea, as it will only work with the fixed amount of variables.
You can iterate over the inner dictionaries, and append to a defaultdict as:
from collections import defaultdict
out = defaultdict(list)
for d in my_list:
for k,v in d.items():
out[k].append(v)
print(out)
#defaultdict(list, {'a': [1, 4, 7], 'b': [2, 5, 8], 'c': [3, 6, 9]})
Pandas DataFrame has just a factory method for this, so if you already have it as a dependency or if the input data is large enough:
import pandas as pd
my_list = ...
df = pd.DataFrame.from_rows(my_list)
a = list(df['a']) # df['a'] is a pandas Series, essentially a wrapped C array
b = list(df['b'])
c = list(df['c'])
Please find the code below. I believe that the version with a loop is much easier to read.
my_list = [{'a': 1, 'b': 2, 'c': 3}, {'a': 4, 'b': 5, 'c': 6}, {'a': 7, 'b': 8, 'c': 9}]
# we assume that all dictionaries have the sames keys
a, b, c = map(list, map(lambda k: map(lambda d: d[k], my_list), my_list[0]))
print(a,b,c)

Restructuring the hierarchy of dictionaries in Python?

If I have a nested dictionary in Python, is there any way to restructure it based on keys?
I'm bad at explaining, so I'll give a little example.
d = {'A':{'a':[1,2,3],'b':[3,4,5],'c':[6,7,8]},
'B':{'a':[7,8,9],'b':[4,3,2],'d':[0,0,0]}}
Re-organize like this
newd = {'a':{'A':[1,2,3],'B':[7,8,9]},
'b':{'A':[3,4,5],'B':[4,3,2]},
'c':{'A':[6,7,8]},
'd':{'B':[0,0,0]}}
Given some function with inputs like
def mysteryfunc(olddict,newkeyorder):
????
mysteryfunc(d,[1,0])
Where the [1,0] list passed means to put the dictionaries 2nd level of keys in the first level and the first level in the 2nd level. Obviously the values need to be associated with their unique key values.
Edit:
Looking for an answer that covers the general case, with arbitrary unknown nested dictionary depth.
Input:
d = {'A':{'a':[1,2,3],'b':[3,4,5],'c':[6,7,8]},
'B':{'a':[7,8,9],'b':[4,3,2],'d':[0,0,0]}}
inner_dict={}
for k,v in d.items():
print(k)
for ka,va in v.items():
val_list=[]
if ka not in inner_dict:
val_dict={}
val_dict[k]=va
inner_dict[ka]=val_dict
else:
val_dict=inner_dict[ka]
val_dict[k]=va
inner_dict[ka]=val_dict
Output:
{'a': {'A': [1, 2, 3], 'B': [7, 8, 9]},
'b': {'A': [3, 4, 5], 'B': [4, 3, 2]},
'c': {'A': [6, 7, 8]},
'd': {'B': [0, 0, 0]}}
you can use 2 for loops, one to iterate over each key, value pair and the second for loop to iterate over the nested dict, at each step form the second for loop iteration you can build your desired output:
from collections import defaultdict
new_dict = defaultdict(dict)
for k0, v0 in d.items():
for k1, v1 in v0.items():
new_dict[k1][k0] = v1
print(dict(new_dict))
output:
{'a': {'A': [1, 2, 3], 'B': [7, 8, 9]},
'b': {'A': [3, 4, 5], 'B': [4, 3, 2]},
'c': {'A': [6, 7, 8]},
'd': {'B': [0, 0, 0]}}
You can use recursion with a generator to handle input of arbitrary depth:
def paths(d, c = []):
for a, b in d.items():
yield from ([((c+[a])[::-1], b)] if not isinstance(b, dict) else paths(b, c+[a]))
from collections import defaultdict
def group(d):
_d = defaultdict(list)
for [a, *b], c in d:
_d[a].append([b, c])
return {a:b[-1][-1] if not b[0][0] else group(b) for a, b in _d.items()}
print(group(list(paths(d))))
Output:
{'a': {'A': [1, 2, 3], 'B': [7, 8, 9]}, 'b': {'A': [3, 4, 5], 'B': [4, 3, 2]}, 'c': {'A': [6, 7, 8]}, 'd': {'B': [0, 0, 0]}}

Iterate over list of dictionaries and find matching elements from a list and append value of matching key to defaultdict

I have a list of dictionaries. Let's call it: list_of_dict. The dictionaries in the list are in the form:
{'a' : 1,
'b' : 5,
'c' : 3,
'd' : 6}
and
{'a' : 3,
'f' : 2,
'g' : 1,
'h' : 3,
'i' : 5,
'j' : 3}
I have another list called list_to_match which only holds some letters: ['a', 'f', 'x']
I need to iterate over list_of_dict and find the keys that match with the element in the list and append the values to a an empty defaultdict of list items. If not found, append 0 to the list.
The defaultdict is initialized as:
d = collections.defaultdict(list)
I want the eventual defaultdict to look like:
{'a' : [1, 3],
'f' : [0, 2],
'x' : [0, 0]}
So far, I have:
for ld in list_of_dict:
for match in list_to_match:
for k, v in ld.items():
d[match].append(v)
d[match].append(0)
Now all of this works apart from the last line because obviously, match does not exist in that scope.
Now, all I get in the defaultdict is:
{'a' : [1, 3],
'f' : [2]}
The 0 is missing and so is x. How do I fix it?
You can use a dictionary comprehension:
{i: [j[i] if i in j else 0 for j in list_of_dict] for i in list_to_match}
Yields:
{'a': [1, 3], 'f': [0, 2], 'x': [0, 0]}
Even simpler:
{i: [j.get(i, 0) for j in list_of_dict] for i in list_to_match}
You could do this, no need to use defaultdict:
list_of_dict = [{'a': 1, 'b': 5,
'c': 3,
'd': 6}, {'a': 3,
'f': 2,
'g': 1,
'h': 3,
'i': 5,
'j': 3}]
list_to_match = ['a', 'f', 'x']
d = {}
for match in list_to_match:
for ld in list_of_dict:
d.setdefault(match, []).append(ld.get(match, 0))
print(d)
Output
{'a': [1, 3], 'x': [0, 0], 'f': [0, 2]}
IIUC, you can just do:
for m in list_to_match:
d[m] = [ld.get(m, 0) for ld in list_of_dict]
print(d)
#defaultdict(list, {'a': [1, 3], 'f': [0, 2], 'x': [0, 0]})
This would also work for a regular dictionary if you didn't want to use a defaultdict.

Categories

Resources