how to merged list of lists of nested dictionary - python

I have a dictionary like this:
{'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
u'546454793': [55111167133],u'546456387': [55111167139],
u'546456925': [55111167140],u'546458931': [55111226912],
u'546458951': [55111226914],u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}
I want to get merged list of all the lists in nested dictionary.
Output should be like this:
[55111491410,55111400572,55111186735,55111438755,55111281815,55111461870,55111167133,55111167139,....55111403171,55111461858]

An elegant answer based on regex and on the fact that all the values of interest are among square brackets
import re
pat = r'(?<=\[).+?(?=\])'
s = """{'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
u'546454793': [55111167133],u'546456387': [55111167139],
u'546456925': [55111167140],u'546458931': [55111226912],
u'546458951': [55111226914],u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}"""
print('[%s]' % ', '.join(map(str, re.findall(pat, s))))
Output
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171, 55111461858]

xJust a list comprehension using the dict's values and the inner dict values would do the job. But do remember that dicts are not ordered in python till 3.6. So if you are using the older version, the resulting list would also not be in any order
>>> dct = {'CO,': {u'123456': [55111491410]},
... u'OA,': {u'3215': [55111400572]},
... u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
... u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
... u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
... u'546454793': [55111167133],u'546456387': [55111167139],
... u'546456925': [55111167140],u'546458931': [55111226912],
... u'546458951': [55111226914],u'546459861': [55111226916],
... u'546460165': [55111403171, 55111461858]}}
>>>
>>> [e for idct in dct.values() for lst in idct.values() for e in lst]
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171, 55111461858]

d = {'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735], u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815], u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462], u'546454149': [55111128782],
u'546454793': [55111167133], u'546456387': [55111167139],
u'546456925': [55111167140], u'546458931': [55111226912],
u'546458951': [55111226914], u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}
z = []
for i in d.keys():
for j in d[i].keys():
z.append(d[i][j][0])
print(z)
output:
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171]

Related

Python Remove duplicates and original from nested list based on specific key

I m trying to delete all duplicates & original from a nested list based on specific column.
Example
list = [['abc',3232,'demo text'],['def',9834,'another text'],['abc',0988,'another another text'],['poi',1234,'text']]
The key column is the first (abc, def, abc) and based on this I want to remove any item (plus the original) which has the same value with the original.
So the new list should contain:
newlist = [['def',9834,'another text'],['poi',1234,'text']]
I found many similar topics but not for nested lists...
Any help please?
You can construct a list of keys
keys = [x[0] for x in list]
and select only those records for which the key occurs exactly once
newlist = [x for x in list if keys.count(x[0]) == 1]
Use collections.Counter:
from collections import Counter
lst = [['abc',3232,'demo text'],['def',9834,'another text'],['abc',988,'another another text'],['poi',1234,'text']]
d = dict(Counter(x[0] for x in lst))
print([x for x in lst if d[x[0]] == 1])
# [['def', 9834, 'another text'],
# ['poi', 1234, 'text']]
Also note that you shouldn't name your list as list as it shadows the built-in list.
Using a list comprehension.
Demo:
l = [['abc',3232,'demo text'],['def',9834,'another text'],['abc', 988,'another another text'],['poi',1234,'text']]
checkVal = [i[0] for i in l]
print( [i for i in l if not checkVal.count(i[0]) > 1 ] )
Output:
[['def', 9834, 'another text'], ['poi', 1234, 'text']]
Using collections.defaultdict for an O(n) solution:
L = [['abc',3232,'demo text'],
['def',9834,'another text'],
['abc',988,'another another text'],
['poi',1234,'text']]
from collections import defaultdict
d = defaultdict(list)
for key, num, txt in L:
d[key].append([num, txt])
res = [[k, *v[0]] for k, v in d.items() if len(v) == 1]
print(res)
[['def', 9834, 'another text'],
['poi', 1234, 'text']]

Filter List by Longest Element Containing a String

I want to filter a list of all items containing the same last 4 digits, I want to print the longest of them.
For example:
lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
# want to return abcdabcd1234 and poiupoiupoiupoiu7890
In this case, we print the longer of the elements containing 1234, and the longer of the elements containing 7890. Finding the longest element containing a certain element is not hard, but doing it for all items in the list (different last four digits) efficiently seems difficult.
My attempt was to first identify all the different last 4 digits using list comprehension and slice:
ids=[]
for x in lst:
ids.append(x[-4:])
ids = list(set(ids))
Next, I would search through the list by index, with a "max_length" variable and "current_id" to find the largest elements of each id. This is clearly very inefficient and was wondering what the best way to do this would be.
Use a dictionary:
>>> lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
>>> d = {} # to keep the longest items for digits.
>>> for item in lst:
... key = item[-4:] # last 4 characters
... d[key] = max(d.get(key, ''), item, key=len)
...
>>> d.values() # list(d.values()) in Python 3.x
['abcdabcd1234', 'poiupoiupoiupoiu7890']
from collections import defaultdict
d = defaultdict(str)
lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
for x in lst:
if len(x) > len(d[x[-4:]]):
d[x[-4:]] = x
To display the results:
for key, value in d.items():
print key,'=', value
which produces:
1234 = abcdabcd1234
7890 = poiupoiupoiupoiu7890
itertools is great. Use groupby with a lambda to group the list into the same endings, and then from there it is easy:
>>> from itertools import groupby
>>> lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
>>> [max(y, key=len) for x, y in groupby(lst, lambda l: l[-4:])]
['abcdabcd1234', 'poiupoiupoiupoiu7890']
Slightly more generic
import string
import collections
lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
z = [(x.translate(None, x.translate(None, string.digits)), x) for x in lst]
x = collections.defaultdict(list)
for a, b in z:
x[a].append(b)
for k in x:
print k, max(x[k], key=len)
1234 abcdabcd1234
7890 poiupoiupoiupoiu7890

How to get the values in split python?

['column1:abc,def', 'column2:hij,klm', 'column3:xyz,pqr']
I want to get the values after the :. Currently if I split it takes into account column1, column2, column3 as well, which I dont want. I want only the values.
This is similar to key-values pair in dictionary. The only dis-similarity is that it is list of strings.
How will I split it?
EDITED
user_widgets = Widgets.objects.filter(user_id = user_id)
if user_widgets:
for widgets in user_widgets:
widgets_list = widgets.gadgets_list //[u'column1:', u'column2:', u'column3:widget_basicLine']
print [item.split(":")[1].split(',') for item in widgets_list] //yields list index out of range
But when the widgets_list value is copied from the terminal and passed it runs correctly.
user_widgets = Widgets.objects.filter(user_id = user_id)
if user_widgets:
for widgets in user_widgets:
widgets_list = [u'column1:', u'column2:', u'column3:widget_basicLine']
print [item.split(":")[1].split(',') for item in widgets_list] //prints correctly.
Where I'm going wrong?
You can split items by ":", then split the item with index 1 by ",":
>>> l = ['column1:abc,def', 'column2:hij,klm', 'column3:xyz,pqr']
>>> [item.split(":")[1].split(',') for item in l]
[['abc', 'def'], ['hij', 'klm'], ['xyz', 'pqr']]
Nothing wrong with a 'for' loop and testing if your RH has actual data:
li=[u'column1:', u'column2:', u'column3:widget_basicLine', u'column4']
out=[]
for us in li:
us1,sep,rest=us.partition(':')
if rest.strip():
out.append(rest)
print out # [u'widget_basicLine']
Which can be reduced to a list comprehension if you wish:
>>> li=[u'column1:', u'column2:', u'column3:widget_basicLine', u'column4']
>>> [e.partition(':')[2] for e in li if e.partition(':')[2].strip()]
[u'widget_basicLine']
And you can further split by the comma if you have data:
>>> li=[u'column1:', u'column2:a,b', u'column3:c,d', u'column4']
>>> [e.partition(':')[2].split(',') for e in li if e.partition(':')[2].strip()]
[[u'a', u'b'], [u'c', u'd']]

How to split a list into subsets based on a pattern?

I'm doing this but it feels this can be achieved with much less code. It is Python after all. Starting with a list, I split that list into subsets based on a string prefix.
# Splitting a list into subsets
# expected outcome:
# [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
def func(l, newlist=[], index=0):
newlist.append([i for i in l if i.startswith('sub_%s' % index)])
# create a new list without the items in newlist
l = [i for i in l if i not in newlist[index]]
if len(l):
index += 1
func(l, newlist, index)
func(mylist)
You could use itertools.groupby:
>>> import itertools
>>> mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
>>> for k,v in itertools.groupby(mylist,key=lambda x:x[:5]):
... print k, list(v)
...
sub_0 ['sub_0_a', 'sub_0_b']
sub_1 ['sub_1_a', 'sub_1_b']
or exactly as you specified it:
>>> [list(v) for k,v in itertools.groupby(mylist,key=lambda x:x[:5])]
[['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
Of course, the common caveats apply (Make sure your list is sorted with the same key you're using to group), and you might need a slightly more complicated key function for real world data...
In [28]: mylist = ['sub_0_a', 'sub_0_b', 'sub_1_a', 'sub_1_b']
In [29]: lis=[]
In [30]: for x in mylist:
i=x.split("_")[1]
try:
lis[int(i)].append(x)
except:
lis.append([])
lis[-1].append(x)
....:
In [31]: lis
Out[31]: [['sub_0_a', 'sub_0_b'], ['sub_1_a', 'sub_1_b']]
Use itertools' groupby:
def get_field_sub(x): return x.split('_')[1]
mylist = sorted(mylist, key=get_field_sub)
[ (x, list(y)) for x, y in groupby(mylist, get_field_sub)]

match/search dict keys in elements of a list

I have a dictionary,
dct = {'slab1': {'name':'myn1', 'age':20}, 'slab2':{'name':'myn2','age':200}}
lst = {'/store/dir1/dir_slab1/tindy', '/store/dir2/dirslab2_fine/tunka','/store/dir1/dirslab3/lunku'}
How can I search for 'slab1', 'slab2' which are the keys of the dictionary in the list elements ? If there is a match, I would like to print the matched element and the 'age' from the dictionary. So, in the above example, I should get something like :
'/store/dir1/dir_slab1/tindy', 20
'/store/dir2/dirslab2_fine/tunka', 200
Thanks for any suggestion
>>> dct = {'slab1': {'name':'myn1', 'age':20}, 'slab2':{'name':'myn2','age':200}}
>>> lst = {'/store/dir1/dir_slab1/tindy', '/store/dir2/dirslab2_fine/tunka','/store/dir1/dirslab3/lunku'}
>>> for item in lst:
... for pattern in dct:
... if pattern in item:
... print "%s, %s" % (item, dct[pattern]["age"])
...
/store/dir1/dir_slab1/tindy, 20
/store/dir2/dirslab2_fine/tunka, 200
You can also use list comprehension notation to get pairs:
>>> [(item, dct[pattern]["age"]) for item in lst for pattern in dct if pattern in item]
[('/store/dir1/dir_slab1/tindy', 20), ('/store/dir2/dirslab2_fine/tunka', 200)]
P.S. There are some misunderstandings in your question (eg, list variable is not a list), but solution should look like this.

Categories

Resources