match/search dict keys in elements of a list - python

I have a dictionary,
dct = {'slab1': {'name':'myn1', 'age':20}, 'slab2':{'name':'myn2','age':200}}
lst = {'/store/dir1/dir_slab1/tindy', '/store/dir2/dirslab2_fine/tunka','/store/dir1/dirslab3/lunku'}
How can I search for 'slab1', 'slab2' which are the keys of the dictionary in the list elements ? If there is a match, I would like to print the matched element and the 'age' from the dictionary. So, in the above example, I should get something like :
'/store/dir1/dir_slab1/tindy', 20
'/store/dir2/dirslab2_fine/tunka', 200
Thanks for any suggestion

>>> dct = {'slab1': {'name':'myn1', 'age':20}, 'slab2':{'name':'myn2','age':200}}
>>> lst = {'/store/dir1/dir_slab1/tindy', '/store/dir2/dirslab2_fine/tunka','/store/dir1/dirslab3/lunku'}
>>> for item in lst:
... for pattern in dct:
... if pattern in item:
... print "%s, %s" % (item, dct[pattern]["age"])
...
/store/dir1/dir_slab1/tindy, 20
/store/dir2/dirslab2_fine/tunka, 200
You can also use list comprehension notation to get pairs:
>>> [(item, dct[pattern]["age"]) for item in lst for pattern in dct if pattern in item]
[('/store/dir1/dir_slab1/tindy', 20), ('/store/dir2/dirslab2_fine/tunka', 200)]
P.S. There are some misunderstandings in your question (eg, list variable is not a list), but solution should look like this.

Related

Pythonic way to count empty and non-empty lists in a dictionary

I have a dictionary of lists.
I want to count the number of empty and non-empty lists and print the first element of the non-empty ones at the same time.
Is there a more elegant(Python-like) way of doing it?
incorrect = 0
correct = 0
for key, l in dictionary.items():
try:
print(l[0])
correct += 1
except IndexError:
incorrect += 1
pass
print("Number of empty lists: ", incorrect)
print("Number of non-empty lists: ", correct)
A list comprehension seems like it would work well here. Assuming I've understood your question correctly you currently have a dictionary which looks something like this:
list_dict = {"list_1": [1], "list_2": [], "list_3": [2, 3]}
So you can do something like this:
first_element_of_non_empty = [l[0] for l in list_dict.values() if l]
Here we make use of the fact that empty lists in python evaluate to False in boolean comparisons.
Then to find counts is pretty straightforward, obviously the number of non empty lists is just going to be the length of the output of the comprehension and then the empty is just the difference between this number and the total entries in the dictionary.
num_non_empty = len(first_element_of_non_empty)
num_empty = len(list_dict) - num_non_empty
print("Number of empty arrays: ", num_empty)
print("Number of non-empty arrays: ", num_non_empty)
Get the first non-empty item:
next(array for key, array in dictionary.items() if array)
Count empty and none empty items:
correct = len([array for key, array in dictionary.items() if array])
incorrect = len([array for key, array in dictionary.items() if not array])
You can use filter(None, [...]) to remove values that evaluate to False, and then use map to retrieve the first elements of those values.
At the Python (3.7.3) command line:
>>> d = {'a': [1], 'b': [], 'c': [3, 4]}
>>>
>>> first_elements = tuple(map(
... lambda v: v[0],
... filter(None, d.values()),
... ))
>>> non_empty_count = len(first_elements)
>>> empty_count = len(d) - non_empty_count
>>>
>>> print(first_elements, non_empty_count, empty_count)
(1, 3) 2 1

how to merged list of lists of nested dictionary

I have a dictionary like this:
{'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
u'546454793': [55111167133],u'546456387': [55111167139],
u'546456925': [55111167140],u'546458931': [55111226912],
u'546458951': [55111226914],u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}
I want to get merged list of all the lists in nested dictionary.
Output should be like this:
[55111491410,55111400572,55111186735,55111438755,55111281815,55111461870,55111167133,55111167139,....55111403171,55111461858]
An elegant answer based on regex and on the fact that all the values of interest are among square brackets
import re
pat = r'(?<=\[).+?(?=\])'
s = """{'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
u'546454793': [55111167133],u'546456387': [55111167139],
u'546456925': [55111167140],u'546458931': [55111226912],
u'546458951': [55111226914],u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}"""
print('[%s]' % ', '.join(map(str, re.findall(pat, s))))
Output
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171, 55111461858]
xJust a list comprehension using the dict's values and the inner dict values would do the job. But do remember that dicts are not ordered in python till 3.6. So if you are using the older version, the resulting list would also not be in any order
>>> dct = {'CO,': {u'123456': [55111491410]},
... u'OA,': {u'3215': [55111400572]},
... u'KO,': {u'asdas': [55111186735],u'5541017924': [55111438755]},
... u'KU': {u'45645': [55111281815],u'546465238': [55111461870]},
... u'TU': {u'asdfds': [55111161462],u'546454149': [55111128782],
... u'546454793': [55111167133],u'546456387': [55111167139],
... u'546456925': [55111167140],u'546458931': [55111226912],
... u'546458951': [55111226914],u'546459861': [55111226916],
... u'546460165': [55111403171, 55111461858]}}
>>>
>>> [e for idct in dct.values() for lst in idct.values() for e in lst]
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171, 55111461858]
d = {'CO,': {u'123456': [55111491410]},
u'OA,': {u'3215': [55111400572]},
u'KO,': {u'asdas': [55111186735], u'5541017924': [55111438755]},
u'KU': {u'45645': [55111281815], u'546465238': [55111461870]},
u'TU': {u'asdfds': [55111161462], u'546454149': [55111128782],
u'546454793': [55111167133], u'546456387': [55111167139],
u'546456925': [55111167140], u'546458931': [55111226912],
u'546458951': [55111226914], u'546459861': [55111226916],
u'546460165': [55111403171, 55111461858]}}
z = []
for i in d.keys():
for j in d[i].keys():
z.append(d[i][j][0])
print(z)
output:
[55111491410, 55111400572, 55111186735, 55111438755, 55111281815, 55111461870, 55111161462, 55111128782, 55111167133, 55111167139, 55111167140, 55111226912, 55111226914, 55111226916, 55111403171]

Python Remove duplicates and original from nested list based on specific key

I m trying to delete all duplicates & original from a nested list based on specific column.
Example
list = [['abc',3232,'demo text'],['def',9834,'another text'],['abc',0988,'another another text'],['poi',1234,'text']]
The key column is the first (abc, def, abc) and based on this I want to remove any item (plus the original) which has the same value with the original.
So the new list should contain:
newlist = [['def',9834,'another text'],['poi',1234,'text']]
I found many similar topics but not for nested lists...
Any help please?
You can construct a list of keys
keys = [x[0] for x in list]
and select only those records for which the key occurs exactly once
newlist = [x for x in list if keys.count(x[0]) == 1]
Use collections.Counter:
from collections import Counter
lst = [['abc',3232,'demo text'],['def',9834,'another text'],['abc',988,'another another text'],['poi',1234,'text']]
d = dict(Counter(x[0] for x in lst))
print([x for x in lst if d[x[0]] == 1])
# [['def', 9834, 'another text'],
# ['poi', 1234, 'text']]
Also note that you shouldn't name your list as list as it shadows the built-in list.
Using a list comprehension.
Demo:
l = [['abc',3232,'demo text'],['def',9834,'another text'],['abc', 988,'another another text'],['poi',1234,'text']]
checkVal = [i[0] for i in l]
print( [i for i in l if not checkVal.count(i[0]) > 1 ] )
Output:
[['def', 9834, 'another text'], ['poi', 1234, 'text']]
Using collections.defaultdict for an O(n) solution:
L = [['abc',3232,'demo text'],
['def',9834,'another text'],
['abc',988,'another another text'],
['poi',1234,'text']]
from collections import defaultdict
d = defaultdict(list)
for key, num, txt in L:
d[key].append([num, txt])
res = [[k, *v[0]] for k, v in d.items() if len(v) == 1]
print(res)
[['def', 9834, 'another text'],
['poi', 1234, 'text']]

if line not startswith item from a list

I would like to know how to take those items from a list that don't start like some of the items from another list.
I want to make something like:
list_results = ['CONisotig124', '214124', '2151235', '235235', 'PLEisotig1235', 'PLEisotig2354', '12512515', 'CONisotig1325', '21352']
identifier_list=['CON','VEN','PLE']
for item in list_results:
if not item.startswith( "some ID from the identifier_list" ):
print item
So, how do I say:
if not item.startswith( "some ID from the identifier_list" ):
str.startswith() can take a tuple of strings to test for:
prefix can also be a tuple of prefixes to look for.
Use this together with a list comprehension:
identifier_list = ('CON', 'VEN', 'PLE') # tuple, not list
[elem for elem in list_results if not elem.startswith(identifier_list)]
Demo:
>>> list_results = ['CONisotig124', '214124', '2151235', '235235', 'PLEisotig1235', 'PLEisotig2354', '12512515', 'CONisotig1325', '21352']
>>> identifier_list = ('CON', 'VEN', 'PLE') # tuple, not list
>>> [elem for elem in list_results if not elem.startswith(identifier_list)]
['214124', '2151235', '235235', '12512515', '21352']
that's pretty straight-forward, you almost got it:
for item in list_results:
bad_prefix = False
for id in identifier_list:
if item.startswith(id):
bad_prefix = True
break
if not bad_prefix:
print item

Filter List by Longest Element Containing a String

I want to filter a list of all items containing the same last 4 digits, I want to print the longest of them.
For example:
lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
# want to return abcdabcd1234 and poiupoiupoiupoiu7890
In this case, we print the longer of the elements containing 1234, and the longer of the elements containing 7890. Finding the longest element containing a certain element is not hard, but doing it for all items in the list (different last four digits) efficiently seems difficult.
My attempt was to first identify all the different last 4 digits using list comprehension and slice:
ids=[]
for x in lst:
ids.append(x[-4:])
ids = list(set(ids))
Next, I would search through the list by index, with a "max_length" variable and "current_id" to find the largest elements of each id. This is clearly very inefficient and was wondering what the best way to do this would be.
Use a dictionary:
>>> lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
>>> d = {} # to keep the longest items for digits.
>>> for item in lst:
... key = item[-4:] # last 4 characters
... d[key] = max(d.get(key, ''), item, key=len)
...
>>> d.values() # list(d.values()) in Python 3.x
['abcdabcd1234', 'poiupoiupoiupoiu7890']
from collections import defaultdict
d = defaultdict(str)
lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
for x in lst:
if len(x) > len(d[x[-4:]]):
d[x[-4:]] = x
To display the results:
for key, value in d.items():
print key,'=', value
which produces:
1234 = abcdabcd1234
7890 = poiupoiupoiupoiu7890
itertools is great. Use groupby with a lambda to group the list into the same endings, and then from there it is easy:
>>> from itertools import groupby
>>> lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
>>> [max(y, key=len) for x, y in groupby(lst, lambda l: l[-4:])]
['abcdabcd1234', 'poiupoiupoiupoiu7890']
Slightly more generic
import string
import collections
lst = ['abcd1234','abcdabcd1234','gqweri7890','poiupoiupoiupoiu7890']
z = [(x.translate(None, x.translate(None, string.digits)), x) for x in lst]
x = collections.defaultdict(list)
for a, b in z:
x[a].append(b)
for k in x:
print k, max(x[k], key=len)
1234 abcdabcd1234
7890 poiupoiupoiupoiu7890

Categories

Resources