Python: Search in an array of dictionaries by a given key

Python: Search in an array of dictionaries by a given key - python

I have an array of dictionaries d which I obtain by parsing a JSON file: d = r.json()
Assuming d then contains
d = [
{'change':'112','end_time':'2020-05-12','hostname':'a,b,c,d,e','ref':'345','start_time':'202-04-2020'},
{'change':'182','end_time':'2020-05-12','hostname':'a1,b1,c1,d1,e1','ref':'325','start_time':'202-04-2020'},
{'change':'122','end_time':'2020-05-12','hostname':'g,h,i,j,k','ref':'315','start_time':'202-04-2020'},
{'change':'112','end_time':'2020-05-12','hostname':'o,t1,h1,e4,n7','ref':'345','start_time':'202-04-2020'},
]
where all the hostnames are different from each other, how can I then perform a search like
if hostname=='a1':
print change (i.e 182)

You need to iterate over the list, split the hostnames into a list and check if the hostname you are searching for exists in that list.
hostname = 'a1'
for row in d:
hostnames = row['hostname'].split(',')
if hostname in hostnames:
print(row['change'])

The Pythonic way of solving this (using a comprehension) is also the easiest.
# for your a1 example
change_a1 = [i['change'] for i in d
if 'a1' in i['hostname']]
For an arbitrary search, just wrap it as a function
def find_change(host):
change = [i['change'] for i in d
if host in i['hostname']]
return change

First of all you have a lot of json structure errors:
d=[{'change':'112','end_time':'2020-05-12','hostname':'a,b,c,d,e','ref':'345','start_time':'202-04-2020'},
{'change':'182','end_time':'2020-05-12','hostname':'a1,b1,c1,d1,e1','ref':'325','start_time':'202-04-2020'},
{'change':'122','end_time':'2020-05-12','hostname':'g,h,i,j,k','ref':'315','start_time':'202-04-2020'},
{'change':'112','end_time':'2020-05-12','hostname':'o,t1,h1,e4,n7','ref':'345','start_time':'202-04-2020'}]
hostname='a1'
for row in d:
arr = row['hostname'].split(",")
if hostname in arr:
print(row['change'])
#parse all the keys for learning.
for row in d:
for k in row.keys():
if k == "hostname":
arr = row[k].split(",")
for s in arr:
#print(s)
if s =='a1':
row['change'] = '777'
print(d)
after that use reverse to re-arrange the tuples in json.
Have fun !

Related

get the file names list by another list

I am new to Python and felt some kind of confused...
I have a List like:
List_all = ["aawoobbcc", "aawoobbca", "aabbcskindd","asakindsbbss","wooedakse","sdadakindwsd","xxxxsdsd"]
and also a keyword list:
Key = ["woo","kind"]
and I want to get something like this:
[
["aawoobbcc", "aawoobbca","wooedakse"],
["aabbcskindd","asakindsbbss","sdadakindwsd"]
]
I have tried list_sub = [file for file in List_all if Key in file]
or list_sub = [file for file in List_all if k for k in Key in file]
which were not right.
how could I go through the elements in Key for the substring of elements in List?
Thanks a lot!

One approach (O(n^2)), is the following:
res = [[e for e in List_all if k in e] for k in Key]
print(res)
Output
[['aawoobbcc', 'aawoobbca', 'wooedakse'], ['aabbcskindd', 'asakindsbbss', 'sdadakindwsd']]
A simpler to understand solution (for newbies) is to use nested for loops:
res = []
for k in Key:
filtered = []
for e in List_all:
if k in e:
filtered.append(e)
res.append(filtered)
A more advanced solution, albeit more performant (for really long lists), is to use a regular expression in conjunction with a defaultdict:
import re
from collections import defaultdict
List_all = ["aawoobbcc", "aawoobbca", "aabbcskindd", "asakindsbbss", "wooedakse", "sdadakindwsd", "xxxxsdsd"]
Key = ["woo", "kind"]
extract_key = re.compile(f"{'|'.join(Key)}")
table = defaultdict(list)
for word in List_all:
if match := extract_key.search(word):
table[match.group()].append(word)
res = [table[k] for k in Key if k in table]
print(res)
Output
[['aawoobbcc', 'aawoobbca', 'wooedakse'], ['aabbcskindd', 'asakindsbbss', 'sdadakindwsd']]
Note that this solution consider that each string contains only one key.

Remove file name duplicates in a list

I have a list l:
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
In this list, I need to remove duplicates without considering the extension. The expected output is below.
l = ['Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
I tried:
l = list(set(x.split('.')[0] for x in l))
But getting only unique filenames without extension
How could I achieve it?

You can use a dictionary comprehension that uses the name part as key and the full file name as the value, exploiting the fact that dict keys must be unique:
>>> list({x.split(".")[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the file names can be in more sophisticated formats (such as with directory names, or in the foo.bar.xls format) you should use os.path.splitext:
>>> import os
>>> list({os.path.splitext(x)[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']

If the order of the end result doesn't matter, we could split each item on the period. We'll regard the first item in the list as the key and then keep the item if the key is unique.
oldList = l
setKeys = set()
l = []
for item in oldList:
itemKey = item.split(".")[0]
if itemKey in setKeys:
pass
else:
setKeys.add(itemKey)
l.append(item)

Try this
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
for x in l:
name = x.split('.')[0]
find = 0
for index,d in enumerate(l, start=0):
txt = d.split('.')[0]
if name == txt:
find += 1
if find > 1:
l.pop(index)
print(l)

#Selcuk Definitely the best solution, unfortunately I don't have enough reputation to vote you answer.
But I would rather use el[:el.rfind('.')] as my dictionary key than os.path.splitext(x)[0] in order to handle the case where we have sophisticated formats in the name. that will give something like this:
list({x[:x.rfind('.')]: x for x in l}.values())

associate 2 lists based on ID

I'm trying to merge data from 2 lists by an ID:
list_a = [
(u'65d92438497c', u'compute-0'),
(u'051df48db621', u'compute-4'),
(u'd6160db0cbcd', u'compute-3'),
(u'23fc20b59bd6', u'compute-1'),
(u'0db2e733520d', u'controller-1'),
(u'89334dac8a59', u'compute-2'),
(u'51cf9d50b02e', u'compute-5'),
(u'f4fe106eaeab', u'controller-2'),
(u'06cc124662dc', u'controller-0')
]
list_b = [
(u'65d92438497c', u'p06619'),
(u'051df48db621', u'p06618'),
(u'd6160db0cbcd', u'p06620'),
(u'23fc20b59bd6', u'p06622'),
(u'0db2e733520d', u'p06612'),
(u'89334dac8a59', u'p06621'),
(u'51cf9d50b02e', u'p06623'),
(u'f4fe106eaeab', u'p06611'),
(u'06cc124662dc', u'p06613')
]
list_ab = [
(u'65d92438497c', u'p06619', u'compute-0'),
(u'051df48db621', u'p06618', u'compute-4'),
(u'd6160db0cbcd', u'p06620', u'compute-3'),
(u'23fc20b59bd6', u'p06622', u'compute-1'),
(u'0db2e733520d', u'p06612', u'controller-1'),
(u'89334dac8a59', u'p06621', u'compute-2'),
(u'51cf9d50b02e', u'p06623', u'compute-5'),
(u'f4fe106eaeab', u'p06611', u'controller-2'),
(u'06cc124662dc', u'p06613', u'controller-0')
]
You can see that the first field in an ID, identical between list_a and list_b and I need to merge on this value
I'm not sure what type of data I need for result_ab
The purpose of this is to find 'compute-0' from 'p06619' so maybe there is a better way than merge.

You are using a one-dimensional list containing a tuple, it could be not needed. Anyway, to obtain the output you require:
list_a = [(u'65d92438497c', u'compute-0')]
list_b = [(u'65d92438497c', u'p-06619')]
result_ab = None
if list_a[0][0] == list_b[0][0]:
result_ab = [tuple(list(list_a[0]) + list(list_b[0][1:]))]

Here is my solution :
merge = []
for i in range(0,len(list_a)):
if list_a[i][0] == list_b[i][0]:
merge.append([tuple(list(list_a[i]) + list(list_b[i][1:]))])

The idea is to create a dictionary with the keys as the first element of both the lists and values as the list object with all the elements matching that key.
Next, just iterate over the dictionary and create the required new list object:
from collections import defaultdict
res = defaultdict(list)
for elt in list_a:
res[elt[0]].extend([el for el in elt[1:]])
for elt in list_b:
res[elt[0]].extend([el for el in elt[1:]])
list_ab = []
for key, value in res.items():
elt = tuple([key, *[val for val in value]])
list_ab.append(elt)
print(list_ab)

Python remove elements matching pattern from array

I have a dictionary that contains strings as keys and lists as values.
I'd like to remove all list elements that contain the strings "food", "staging", "msatl" and "azeus". I have the below code already, but am having a hard time applying the logic I have in filterIP to the rest of the strings I have.
def filterIP(fullList):
regexIP = re.compile(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}$')
return filter(lambda i: not regexIP.search(i), fullList)
groups = {key : [domain.replace('fake.com', 'env.fake.com')
for domain in filterIP(list(set(items)))]
for (key, items) in groups.iteritems() }
for key, value in groups.iteritems():
value.sort()
meta = { "_meta" : { "hostvars" : hostvars } }
groups.update(meta)
print(self.json_format_dict(groups, pretty=True))
Example of current output
"role_thumper": [
"thumper-msatl1-prod-1.env.fake.com",
"thumper-msatl1-prod-2.env.fake.com",
"thumper-rwva1-prod-1.env.fake.com",
"thumper-rwva1-prod-2.env.fake.com",
"thumper-rwva1-prod-3.env.fake.com",
"thumper-rwva1-prod-4.env.fake.com",
"thumper-rwva1-prod-5.env.fake.com",
"thumper-rwva1-prod-6.env.fake.com",
"thumper-staging-1.env.fake.com"
],
"role_thumper_mongo": [
"thumper-mongo-staging-1.env.fake.com",
"thumper-mongo-staging-2.env.fake.com",
"thumpermongo-rwva1-staging-1.env.fake.com",
"thumpermongo-rwva1-staging-2.env.fake.com"
],
"role_thumper_mongo_arb": [
"thumper-mongo-arb-staging-1.env.fake.com",
"thumpermongo-arb-rwva1-staging-1.env.fake.com"
],

A list comprehension is what you're after
x= ["a", "b", "aa", "aba"]
x_filtered = [i for i in x if "a" not in i]
print(x_filtered)
>>> ['b']
This is just shorthand for a for loop.
x_filtered = []
for i in x:
if "a" not in i:
x_filtered.append(i)

A simple way to accomplish your task would be to iterate over each lists in the dictionary. Create new lists based upon your criteria, and assign the new lists to the same keys but in a new dictionary. Here is how that would look like in code:
def filter_words(groups, words):
d = {}
for key, domains in groups.iteritems():
new_domains = []
for domain in domains:
if not any(word in domain for word in words):
new_domains.append(domain)
d[key] = new_domains
return d
And you would call it like so:
groups = filter_words(groups, {"food", "staging", "msatl" and "azeus"})
The "meat" of the code above is the second for loop:
for domain in domains:
if not any(word in domain for word in words):
new_domains.append(domain)
This code goes over each string in the current key's list, and filters out all invalid strings according to a list of invalid words.

If I understand you correctly, this might help.
Set up an exclude list:
exclude= ["food", "staging", "msatl", "azeus"]
Test list ( I couldn't really find instances in your examples)
test= ["food", "staging", "msatl", "azeus", "a", "bstaging"]
Run list comprehension (the name of iterators don't matter, you can pick more appropriate ones)
result= [i for i in test if not any([e for e in exclude if e in i])]
result
['a']
The answer above by #Julian gives a good explanation of what list comprehensions do. This uses two of them, the any part is True if there is any match in the exclude list.
Hope this helps.

Python: How to evaluate a part of each item in a list, and append matching results?

Problem:
Trying to evaluate first 4 characters of each item in list.
If the first 4 chars match another first 4 chars in the list, then append the last three digits to the first four. See example below.
Notes:
The list values are not hard coded.
The list always has this structure "####.###".
Only need to match first 4 chars in each item of list.
Order is not essential.
Code:
Grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102", "094G.032"]
Desired Output:
Grid = ["094G.016\019\032", "194P.005\015", "093T.021\102"]
Research:
I know that sets can find duplicates, could I use a set to evaluate only the 1st 4 chars, would I run into a problem since indexing of sets cannot be done?
Would it be better to split the list items into the 2 parts. The four digits before the period ("094G"), and a separate list of the three digits after the period ("093"), compare them, then join them in a new list?
Is there a better way of doing this all together that I'm not realizing?

Here is one straightforward way to do it.
from collections import defaultdict
grid = ['094G.016', '094G.019', '194P.005', '194P.015', '093T.021', '093T.102', '094G.032']
d = defaultdict(list)
for item in grid:
k,v = item.split('.')
d[k].append(v)
result = ['%s.%s' % (k, '/'.join(v)) for k, v in d.items()]
Gives unordered result:
['093T.021/102', '194P.005/015', '094G.016/019/032']

What you'll most likely want is a dictionary mapping the first part of each code to a list of second parts. You can build the dictionary like so:
mappings = {} #Empty dictionary
for code in Grid: #Loop over each code
first, second = code.split('.') #Separate the code into first.second
if first in mappings: #if the first was already found
mappings[first].append(second) #add the second to those already computed
else:
mappings[first] = [second] #otherwise, put it in a new list
Once you have the dictionary, it will be quite simple to loop over it and combine the second parts together (ideally, using '\\'.join)

Sounds like a job for defaultdict.
from containers import defaultdict
grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102"]
d = defaultdict(set)
for item in grid:
prefix, suffix = item.split(".")
d[prefix].add(suffix)
output = [ "%s.%s" % (prefix, "/".join(d[prefix]), ) for prefix in d ]

>>> from itertools import groupby
>>> Grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102", "094G.032"]
>>> Grid = sorted(Grid, key=lambda x:x.split(".")[0])
>>> gen = ((k, g) for k, g in groupby(Grid, key=lambda x:x.split(".")[0]))
>>> gen = ((k,[x.split(".") for x in g]) for k, g in gen)
>>> gen = list((k + '.' + '/'.join(x[1] for x in g) for k, g in gen))
>>> for x in gen:
... print(x)
...
093T.021/102
094G.016/019/032
194P.005/015

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Search in an array of dictionaries by a given key - python

You need to iterate over the list, split the hostnames into a list and check if the hostname you are searching for exists in that list. hostname = 'a1' for row in d: hostnames = row['hostname'].split(',') if hostname in hostnames: print(row['change'])

Related

get the file names list by another list

Remove file name duplicates in a list

associate 2 lists based on ID

Python remove elements matching pattern from array

Python: How to evaluate a part of each item in a list, and append matching results?

Categories

Resources