I have the json data, which are as follows:
'[
{"max":0,"min":0,"name":"tom","age":18},
{"max":0,"min":0,"name":"jack","age":28},
.....
]'
Now that I know name=tom, how can I get the dict containing tom through regular expressions?Or there are other better ways?
like this
'{"max":0,"min":0,"name":"tom","age":18}'
Thank you very much!!
Assuming this is a list of dicts:
lst=[{"max":0,"min":0,"name":"tom","age":18},
{"max":0,"min":0,"name":"jack","age":28}]
Then
print(list(filter(lambda x:x["name"]=="tom",lst)))
Outputs
[{'max': 0, 'min': 0, 'name': 'tom', 'age': 18}]
You can deserialize json file with json module.
And then iterate over dictionaries in the usual way. Like this:
import json
from typing import Dict
with open("file.json", "r") as f:
data: Dict = json.load(f)
for d in data:
if d["name"] == "tom":
print(d)
Notice that your JSON file is malformed with unusual :.
Here is the correct one:
[
{
"max": 0,
"min": 0,
"name": "tom",
"age": 18
},
{
"max": 0,
"min": 0,
"name": "jack",
"age": 28
}
]
As others also mentioned your problem looks like it would be handier to use json.
But if you're limited to regex by whatever, here is a simple regex that would work:
{.+\"name\":\"tom\".+}
You should use the build-in json module to parse a Json file. The RegEx is not the best choice because the Json is a standard format and there are many parsers for it! You can see below how you can use the built-in Python Json parser module.
test.json:
[
{
"max":0,
"min":0,
"name":"tom",
"age":18
},
{
"max":0,
"min":0,
"name":"jack",
"age":28
}
]
Code:
import json
with open("test.json", "r") as opened_file:
load_data = json.load(opened_file)
for elem in load_data:
if elem["name"] == "tom":
print(elem)
Output:
>>> python3 test.py
{'max': 0, 'min': 0, 'name': 'tom', 'age': 18}
Another solution, without regex, using only strings (assuming s has the entire data):
idx = s.find('"name":"tom"') + len('"name":"tom"')
idx2 = s[:idx].rfind('{')
dict_json = s[idx2:idx] + s[idx:idx+s[idx:].find('}')+1]
dict_json will be {"max":0,"min":0,"name":"tom","age":18}
Related
I am not sure if this question has been asked before but I could not find it.
I have a python dictionary where all values are a list. So, for example:
d = {"car" : ["toyota", "honda"], "bus" : ["hackney", "bombardier"]
When I try to dump this to a json file via:
with open(output.json, 'w') as f:
json.dump(d, f)
I get:
{
"car": [
"toyota",
"honda"
],
"bus": [
"hackney",
"bombardier"
]
}
But I want it to look like:
{
"car": [ "toyota", "honda"]
"bus": [ "hackney", "bombardier"]
}
I tried with indent=2 and indent=4 but yet no luck! Any ideas? I ideally want to accomplish this without having to use any other packages.
The only way that you can do this is by print it.
I'm not certain where the problem is with this and I'm not really fluent with JSON, but here goes.
I have a dataset that I processed in pandas but won't likely be able to use later. I've exported it both as JSON records and JSON splits1.
[{'reference': '2019-73','Latitude': 1.045,
'Longitude': 103.65, date': '2019-09-30T00:00:00.000Z},
...{etc},{etc}]
To bring this back into a vanilla python file, I have this:
event_ids = set()
with codecs.open(data_directory, encoding='utf_8') as f: # open the json file
for event_json in f: # iterate through each line (json record) in the file
event = json.loads(event_json) # convert the json record to a Python dict
event_ids.add(event(u'index') # add the event the the event_id set
But I get one of a few types of errors ("SyntaxError: unexpected EOF" while parsing as above, but others depending how I money things up.
My sense is this is because its trying to read the entire JSON as a single element but I don't know for sure though the error message goes away if I remove the last line of code. What am I doing wrong and equally importantly, what concept am I missing?
hrokr, one thing you need to add is an iterator for each item:
event_latitudes=set()
data_directory='events.txt'
with codecs.open(data_directory, encoding='utf_8') as f:
for event_json in f:
event = json.loads(event_json)
for item in event:
event_latitudes.add(item[u'Latitude'])
To deal with errors you can use try except blocks, especially around the for loop to deal with errors in the json. If you post a partial sample file that doesn't work on github, i can look at it and help further.
Source json:
[
{
"data1": 0,
"data2": 1,
"data3": 2
},
{
"data1": 0,
"data2": 1,
"data3": 2
},
{
"data1": 0,
"data2": 1,
"data3": 2
},
{
"data1": 0,
"data2": 1,
"data3": 2
}
]
Python code:
import json
with open("file.json",'r') as f:
var = json.load(f);
print(var[0])
print(var[1])
print(var[2])
Result:
{'data1': 0, 'data2': 1, 'data3': 2}
{'data1': 0, 'data2': 1, 'data3': 2}
{'data1': 0, 'data2': 1, 'data3': 2}
You can read the entire file as a json using json.load , reading line per line is not recommended because json files may be formatted in multiple ways.
Consider the below json object, Here I need to take the parent key by matching the value using regular expression.
{
"PRODUCT": {
"attribs": {
"U1": {
"name": "^U.*1$"
},
"U2": {
"name": "^U.*2$"
},
"U3": {
"name": "^U.*3$"
},
"U4": {
"name": "^U.*4$"
},
"U5": {
"name": "^U.*5$"
},
"P1": {
"name": "^P.*1$"
}
}
}
}
I will be passing a String like this "U10001", It should return the key(U1) by matching the regular expression(^U.*1$).
If I am passing a String like this "P200001", It should return the key(P1) by matching the regular expression(^P.*1$).
I am looking for some help regarding the same, Any help is appreciated.
I'm not sure how you are getting your JSON, but you added python as a tag so I'm assuming at somepoint you will have it stored as a string in your code.
First decode the string into a python dict.
import json
my_dict = json.loads(my_json)["PRODUCT"]["attribs"]
If the JSON is formatted as above you should get a dict with keys as your U1, U2, etc.
Now you can use filter in python to apply your regular expression logic, and re to do the actual matching.
import re
test_string = "U10001"
def re_filter(item):
return re.match(item[1]["name"], test_string)
result = filter(re_filter, my_dict.items())
# Just get the matching attribute names
print [i[0] for i in result]
I haven't ran the code so it might need some syntax fixing, but this should give you the general idea. Of course you will need to make it more generic to allow multiple products.
How about this:
import re
my_dict = {...}
def get_key(dict_, test):
return next(k for k, v in dict_.items() if re.match(v['name'], test))
test = "U10001"
result = get_key(my_dict['PRODUCT']['attribs'], test))
print(result) # U1
Can you please elaborate on what you exactly want to design? Here's a quick way to return the desired key.
import re
def getKey(string):
return re.search('^(.\d)\d+', string).group(1)
If you want to loop over the whole json, then load it into dictionary and then loop over the "PRODUCT"->"attribs" dictionary to get required key-
import json, re
f = open('../file/path/here')
d = json.loads(f.read())
patents = d['PRODUCT']['attribs']
for key,val in patent_attribute.items():
patent_group = re.search('^(.\d)\d+', val['name']).group(1) #returns U1 U2,U3,.. or P1,P2,P3,..
#do whatever with patent_group(U1/P1 etc)
Here is the problem - I have a string in the following format (note: there are no line breaks). I simply want this string to be serialized in a python dictionary or a json object to navigate easily. I have tried both ast.literal_eval and json but the end result is either an error or simply another string. I have been scratching my head over this for sometimes and I know there is a simple and elegant solution than to just write my own parser.
{
table_name:
{
"columns":
[
{
"col_1":{"col_1_1":"value_1_1","col_1_2":"value_1_2"},
"col_2":{"col_2_1":"value_2_1","col_2_2":"value_2_2"},
"col_3":"value_3","col_4":"value_4","col_5":"value_5"}],
"Rows":1,"Total":1,"Flag":1,"Instruction":none
}
}
Note, that JSON decoder expects each property name to be enclosed in double quotes.Use the following approach with re.sub() and json.loads() functions:
import json, re
s = '{table_name:{"columns":[{"col_1":{"col_1_1":"value_1_1","col_1_2":"value_1_2"},"col_2":{"col_2_1":"value_2_1","col_2_2":"value_2_2"},"col_3":"value_3","col_4":"value_4","col_5":"value_5"}],"Rows":1,"Total":1,"Flag":1,"Instruction":none}}'
s = re.sub(r'\b(?<!\")([_\w]+)(?=\:)', r'"\1"', s).replace('none', '"None"')
obj = json.loads(s)
print(obj)
The output:
{'table_name': {'columns': [{'col_5': 'value_5', 'col_2': {'col_2_1': 'value_2_1', 'col_2_2': 'value_2_2'}, 'col_3': 'value_3', 'col_1': {'col_1_2': 'value_1_2', 'col_1_1': 'value_1_1'}, 'col_4': 'value_4'}], 'Flag': 1, 'Total': 1, 'Instruction': 'None', 'Rows': 1}}
I have JSON data as an array of dictionaries which comes as the request payload.
[
{ "Field1": 1, "Feld2": "5" },
{ "Field1": 3, "Feld2": "6" }
]
I tried ijson.items(f, '') which yields the entire JSON object as one single item. Is there a way I can iterate the items inside the array one by one using ijson?
Here is the sample code I tried which is yielding the JSON as one single object.
f = open("metadatam1.json")
objs = ijson.items(f, '')
for o in objs:
print str(o) + "\n"
[{'Feld2': u'5', 'Field1': 1}, {'Feld2': u'6', 'Field1': 3}]
I'm not very familiar with ijson, but reading some of its code it looks like calling items with a prefix of "item" should work to get the items of the array, rather than the top-level object:
for item in ijson.items(f, "item"):
# do stuff with the item dict