Consider the below json object, Here I need to take the parent key by matching the value using regular expression.
{
"PRODUCT": {
"attribs": {
"U1": {
"name": "^U.*1$"
},
"U2": {
"name": "^U.*2$"
},
"U3": {
"name": "^U.*3$"
},
"U4": {
"name": "^U.*4$"
},
"U5": {
"name": "^U.*5$"
},
"P1": {
"name": "^P.*1$"
}
}
}
}
I will be passing a String like this "U10001", It should return the key(U1) by matching the regular expression(^U.*1$).
If I am passing a String like this "P200001", It should return the key(P1) by matching the regular expression(^P.*1$).
I am looking for some help regarding the same, Any help is appreciated.
I'm not sure how you are getting your JSON, but you added python as a tag so I'm assuming at somepoint you will have it stored as a string in your code.
First decode the string into a python dict.
import json
my_dict = json.loads(my_json)["PRODUCT"]["attribs"]
If the JSON is formatted as above you should get a dict with keys as your U1, U2, etc.
Now you can use filter in python to apply your regular expression logic, and re to do the actual matching.
import re
test_string = "U10001"
def re_filter(item):
return re.match(item[1]["name"], test_string)
result = filter(re_filter, my_dict.items())
# Just get the matching attribute names
print [i[0] for i in result]
I haven't ran the code so it might need some syntax fixing, but this should give you the general idea. Of course you will need to make it more generic to allow multiple products.
How about this:
import re
my_dict = {...}
def get_key(dict_, test):
return next(k for k, v in dict_.items() if re.match(v['name'], test))
test = "U10001"
result = get_key(my_dict['PRODUCT']['attribs'], test))
print(result) # U1
Can you please elaborate on what you exactly want to design? Here's a quick way to return the desired key.
import re
def getKey(string):
return re.search('^(.\d)\d+', string).group(1)
If you want to loop over the whole json, then load it into dictionary and then loop over the "PRODUCT"->"attribs" dictionary to get required key-
import json, re
f = open('../file/path/here')
d = json.loads(f.read())
patents = d['PRODUCT']['attribs']
for key,val in patent_attribute.items():
patent_group = re.search('^(.\d)\d+', val['name']).group(1) #returns U1 U2,U3,.. or P1,P2,P3,..
#do whatever with patent_group(U1/P1 etc)
Related
Problem statement:
This is how I am invoking my prepare_payload.py ,
python3 prepare_payload.py ['Test_B1','Test_B2','Test_B3'] [https://10.5.5.1,https://10.5.5.2,https://10.5.5.3] ['abc','efg','sss']
my json payload which I am trying to prepare:
{
"billing_account_number": "",
"vcenter_url": "",
"cred_header": "",
}
Expected output:
{
"billing_account_number": "Test_B1",
"vcenter_url": "https://10.5.5.1",
"cred_header": "abc",
}
{
"billing_account_number": "Test_B2",
"vcenter_url": "https://10.5.5.2",
"cred_header": "efg",
}
{
"billing_account_number": "Test_B3",
"vcenter_url": "https://10.5.5.3",
"cred_header": "sss",
}
my code:
import json
import os
import sys
master_list = []
billing_account_number = sys.argv[1]
ip_addr = sys.argv[2]
cred_header = sys.argv[3]
res = list(map(str, billing_account_number.strip('[]').split(',')))
ip = list(map(str, ip_addr.strip('[]').split(',')))
cred_headers = list(map(str, cred_header.strip('[]').split(',')))
master_list.append(res)
master_list.append(ip)
master_list.append(cred_headers)
def prepare_payload():
with open("rabbitmq_payload.json") as fh:
data = json.load(fh)
print('================================================')
return data
data = prepare_payload()
for i in master_list:
for j in i:
data['billing_account_number'] = j
data['vcenter_url'] = j
data['cred_header'] = j
print(data)
I am not able to figure if I have to merge these individual list such as res, IP, cred_headers into a single list and then iterate like main list [res[0],IP[0],cred_headers[0]] and then start replacing key value pair in my data dictionary?
Please help me if there is any built in function I can rely on or any efficient approach to solve this problem. Thank you in advance for all the awesome python coders!
It's kind of inconvenient to pass in lists as command line arguments. Better to use standard input, but nonetheless, take a look at this code:
import sys
import ast
billing_account_number = ast.literal_eval(sys.argv[1])
ip_addr = ast.literal_eval(sys.argv[2])
cred_header = ast.literal_eval(sys.argv[3])
output = []
for i in range(len(billing_account_number)):
output.append({"billing_account_number": billing_account_number[i], "ip_addr": ip_addr[i], "cred_header": cred_header[i]})
print(output)
The output:
[{'billing_account_number': 'Test_B1', 'ip_addr': 'https://10.5.5.1', 'cred_header': 'abc'}, {'billing_account_number': 'Test_B2', 'ip_addr': 'https://10.5.5.2', 'cred_header': 'efg'}, {'billing_account_number': 'Test_B3', 'ip_addr': 'https://10.5.5.3', 'cred_header': 'sss'}]
You will need to wrap your command line arguments in quotes so that it parses correctly. Also, this code assumes that all 3 lists are the same length. If that's not guaranteed, then you'll need to iterate over the longest list and set the missing values to empty string.
(It seems like you're asking different questions. How to take argv better is different from how to merge lists. This is for the question in your text. It's also hard to tell what you're trying to do with your file.)
You have content in 3 lists that you want to both merge, and convert to dictionaries.
The builtin way to merge lists is zip()
zip(res, ips, cred_headers)
There are multiple ways to convert this to dictionaries. One is a list comprehension:
result = [{
"billing_account_number": res_item,
"ip_addr": ips_item,
"cred_header": cred_item,
} for res_item, ips_item, cred_item in zip(res, ips, cred_headers)]
payload = {
"data": {
"name": "John",
"surname": "Doe"
}
}
print(payload["data"]["name"])
I want to print out the value of 'name' inside the json. I know the way to do it like above. But is there also a way to print out the value of 'name' with only 1 'search string'?
I'm looking for something like this
print(payload["data:name"])
Output:
John
If you were dealing with nested attributes of an object I would suggest operator.attrgetter, however, the itemgetter in the same module does not seems to support nested key access. It is fairly easy to implement something similar tho:
payload = {
"data": {
"name": "John",
"surname": "Doe",
"address": {
"postcode": "667"
}
}
}
def get_key_path(d, path):
# Remember latest object
obj = d
# For each key in the given list of keys
for key in path:
# Look up that key in the last object
if key not in obj:
raise KeyError(f"Object {obj} has no key {key}")
# now we know the key exists, replace
# last object with obj[key] to move to
# the next level
obj = obj[key]
return obj
print(get_key_path(payload, ["data"]))
print(get_key_path(payload, ["data", "name"]))
print(get_key_path(payload, ["data", "address", "postcode"]))
Output:
$ python3 ~/tmp/so.py
{'name': 'John', 'surname': 'Doe', 'address': {'postcode': '667'}}
John
667
You can always later decide on a separator character and use a single string instead of path, however, you need to make sure this character does not appear in a valid key. For example, using |, the only change you need to do in get_key_path is:
def get_key_path(d, path):
obj = d
for key in path.split("|"): # Here
...
There isn't really a way you can do this by using the 'search string'. You can use the get() method, but like getting it using the square brackets, you will have to first parse the dictionary inside the data key.
You could try creating your own function that uses something like:
str.split(sep=None, maxsplit=-1)
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).
def get_leaf_value(d, search_string):
if ":" not in search_string:
return d[search_string]
next_d, next_search_string = search_string.split(':', 1)
return get_value(d[next_d], next_search_string)
payload = {
"data": {
"name": "John",
"surname": "Doe"
}
}
print(payload["data"]["name"])
print(get_leaf_value(payload, "data:name"))
Output:
John
John
This approach will only work if your data is completely nested dictionaries like in your example (i.e., no lists in non-leaf nodes) and : is not part of any keys obviously.
Here is an alternative. Maybe an overkill, it depends.
jq uses a single "search string" - an expression called 'jq program' by the author - to extract and transform data. It is a powerful tool meaning the jq program can be quite complex. Reading a good tutorial is almost a must.
import pyjq
payload = ... as posted in the question ...
expr = '.data.name'
name = pyjq.one(expr, payload) # "John"
The original project (written in C) is located here. The python jq libraries are build on top of that C code.
I have a string which is little complex in that, it has some objects embedded as values. I need to convert them to proper dict or json.
foo = '{"Source": "my source", "Input": {"User_id": 18, "some_object": this_is_a_variable_actually}}'
Notice that the last key some_object has a value which is neither a string nor an int. Hence when I do a json.loads or ast.literal_eval, I am getting malformed string errors, and so Converting a String to Dictionary doesn't work.
I have no control over the source of the string.
Is it possible to convert such strings to dictionary
The result I need is a dict like this
dict = {
"Source" : "Good",
"object1": variable1,
"object2": variable2
}
The thing here is I wouldn't know what is variable1 or 2. There is no pattern here.
One point I want to mention here is that, If I can make the variables as just plain strings, that is also fine
For example,
dict = {
"Source" : "Good",
"object1": "variable1",
"object2": "variable2"
}
This will be good for my purpose. Thanks for all the answers.
It's a bit of a kludge using the demjson module which allows you to parse most of a somewhat non-confirming JSON syntax string and lists the errors... You can then use that to replace the invalid tokens found and put quotes around it just so it parses correctly, eg:
import demjson
import re
foo = '{"Source": "my source", "Input": {"User_id": 18, "some_object": this_is_a_variable_actually}}'
def f(json_str):
res = demjson.decode(json_str, strict=False, return_errors=True)
if not res.errors:
return res
for err in res.errors:
var = err.args[1]
json_str = re.sub(r'\b{}\b'.format(var), '"{}"'.format(var), json_str)
return demjson.decode(json_str, strict=False)
res = f(foo)
Gives you:
{'Input': {'User_id': 18, 'some_object': 'this_is_a_variable_actually'}, 'Source': 'my source'}
Note that while this should work in the example data presented, your mileage may vary if there's other nuisances in your input that require further munging.
From the geolocation api browser query, I get this:
browser=opera&sensor=true&wifi=mac:B0-48-7A-99-BD-86|ss:-72|ssid:Baldur WLAN|age:4033|chan:6&wifi=mac:00-24-FE-A7-BA-94|ss:-83|ssid:wlan23-k!17|age:4033|chan:10&wifi=mac:90-F6-52-3F-60-64|ss:-95|ssid:Baldur WLAN|age:4033|chan:13&device=mcc:262|mnc:7|rt:3&cell=id:15479311|lac:21905|mcc:262|mnc:7|ss:-107|ta:0&location=lat:52.398529|lng:13.107570
I would like to access all the single values local structured. My approach is to create a json array more in depth, than split it up by "&" first and "=" afterwards to get an array of all values in the query. Another approach is to use regex (\w+)=(.*) after splitting by "&" ends in the same depth but I need there more details accessible as datatype.
The resulting array should look like:
{
"browser": ["opera"],
...
"location": [{
"lat": 52.398529,
"lng": 13.107570
}],
...
"wifi": [{
"mac": "00-24-FE-A7-BA-94",
"ss": -83,
...
},
{
"mac": "00-24-FE-A7-BA-94",
"ss": -83,
...
}]
Or something similar that I can parse with an additional json library to access the values using python. Can anyone help with this?
Here a solution passing from a dictionary
import re
import json
transform a string to a dictionary, sepfield is the field separator,
def str_to_dict(s, sepfield, sepkv, infields=None):
""" transform a string to a dictionary
s: the string to transform
sepfield: the string with the field separator char
sepkv: the string with the key value separator
infields: a function to be applied to the values
if infields is defined a list of elements with common keys returned
for each key, otherwise the value is associated to the key as it is"""
pattern = "([^%s%s]*?)%s([^%s]*)" % (sepkv, sepfield, sepkv, sepfield)
matches = re.findall(pattern, s)
if infields is None:
return dict(matches)
else:
r=dict()
for k,v in matches:
parsedval=infields(v)
if k not in r:
r[k] = []
r[k].append(parsedval)
return r
def second_level_parsing(x):
return x if x.find("|")==-1 else str_to_dict(x, "|",":")
json.dumps(str_to_dict(s, "&", "=", second_level_parsing))
You can easily extend for multiple levels. Note that the different behaviour whether the infields function is defined or not is to match the output you asked for.
I'm having problems while parsing a JSON with python, and now I'm stuck.
The problem is that the entities of my JSON are not always the same. The JSON is something like:
"entries":[
{
"summary": "here is the sunnary",
"extensions": {
"coordinates":"coords",
"address":"address",
"name":"name"
"telephone":"123123"
"url":"www.blablablah"
},
}
]
I can move through the JSON, for example:
for entrie in entries:
name =entrie['extensions']['name']
tel=entrie['extensions']['telephone']
The problem comes because sometimes, the JSON does not have all the "fields", for example, the telephone field, sometimes is missing, so, the script fails with KeyError, because the key telephone is missing in this entry.
So, my question: how could I run this script, leaving a blank space where telephone is missing?
I've tried with:
if entrie['extensions']['telephone']:
tel=entrie['extensions']['telephone']
but I think is not ok.
Use dict.get instead of []:
entries['extensions'].get('telephone', '')
Or, simply:
entries['extensions'].get('telephone')
get will return the second argument (default, None) instead of raising a KeyError when the key is not found.
If the data is missing in only one place, then dict.get can be used to fill-in missing the missing value:
tel = d['entries'][0]['extensions'].get('telelphone', '')
If the problem is more widespread, you can have the JSON parser use a defaultdict or custom dictionary instead of a regular dictionary. For example, given the JSON string:
json_txt = '''{
"entries": [
{
"extensions": {
"telephone": "123123",
"url": "www.blablablah",
"name": "name",
"coordinates": "coords",
"address": "address"
},
"summary": "here is the summary"
}
]
}'''
Parse it with:
>>> class BlankDict(dict):
def __missing__(self, key):
return ''
>>> d = json.loads(json_txt, object_hook=BlankDict)
>>> d['entries'][0]['summary']
u'here is the summary'
>>> d['entries'][0]['extensions']['color']
''
As a side note, if you want to clean-up your datasets and enforce consistency, there is a fine tool called Kwalify that does schema validation on JSON (and on YAML);
There are several useful dictionary features that you can use to work with this.
First off, you can use in to test whether or not a key exists in a dictionary:
if 'telephone' in entrie['extensions']:
tel=entrie['extensions']['telephone']
get might also be useful; it allows you to specify a default value if the key is missing:
tel=entrie['extensions'].get('telephone', '')
Beyond that, you could look into the standard library's collections.defaultdict, but that might be overkill.
Two ways.
One is to make sure that your dictionaries are standard, and when you read them in they have all fields. The other is to be careful when accessing the dictionaries.
Here is an example of making sure your dictionaries are standard:
__reference_extensions = {
# fill in with all standard keys
# use some default value to go with each key
"coordinates" : '',
"address" : '',
"name" : '',
"telephone" : '',
"url" : ''
}
entrie = json.loads(input_string)
d = entrie["extensions"]
for key, value in __reference_extensions:
if key not in d:
d[key] = value
Here is an example of being careful when accessing the dictionaries:
for entrie in entries:
name = entrie['extensions'].get('name', '')
tel = entrie['extensions'].get('telephone', '')