I wonder if there is a way to decode a JSON-like string.
I got string:
'{ hotel: { id: "123", name: "hotel_name"} }'
It's not a valid JSON string, so I can't decode it directly with the python API.
Python will only accept a stringified JSON string like:
'{ "hotel": { "id": "123", "name": "hotel_name"} }'
where properties are quoted to be a string.
Use demjson module, which has ability to decode in non-strict mode.
In [1]: import demjson
In [2]: demjson.decode('{ hotel: { id: "123", name: "hotel_name"} }')
Out[2]: {u'hotel': {u'id': u'123', u'name': u'hotel_name'}}
You could try and use a wrapper for a JavaScript engine, like pyv8.
import PyV8
ctx = PyV8.JSContext()
ctx.enter()
# Note that we need to insert an assignment here ('a ='), or syntax error.
js = 'a = ' + '{ hotel: { id: "123", name: "hotel_name"} }'
a = ctx.eval(js)
a.hotel.id
>> '123' # Prints
#vartec has already pointed out demjson, which works well for slightly invalid JSON. For data that's even less JSON compliant I've written barely_json:
from barely_json import parse
print(parse('[no, , {complete: yes, where is my value?}]'))
prints
[False, '', {'complete': True, 'where is my value?': ''}]
Not very elegant and not robust (and easy to break), but it may be possible to kludge it with something like:
kludged = re.sub('(?i)([a-z_].*?):', r'"\1":', string)
# { "hotel": { "id": "123", "name": "hotel_name"} }
You may find that using pyparsing and the parsePythonValue.py example could do what you want as well... (or modified fairly easily to do so) or the jsonParser.py could be modified to not require quoted key values.
Related
I am trying to cut the last word of a json key value using .split()[-1] with pyjq, but failing with error: jq: error: syntax error, unexpected '('
The key/value - "subject": "The user has user id: 2432343f3f-34kfert-343mn5788886"
The JSON:
[
{
"id": "The user has user id: 76e195fa-67f1-4ea6-bb0e-29c123855978",
"date": "2018-11-01T08:41:53Z"
},
{
"id": "The user has user id: 195fa76e-67f1-4ea6-bb0e-5597829c1238",
"date": "2018-10-31T14:43:04Z"
}
]
response_read = open('my.json', 'r')
response_read_parsed = json.loads(response_read.read())
rule = pyjq.all('.value[] | { "id": .["subject"].split()[-1], "date": .receivedDateTime }', response_read_parsed)
But this approach works if i write without pyjq
myid= (response_read_parsed['subject'].split()[-1])
print json.dumps(myid, indent=4)
As there are multiple entries like above, i decided to filter using pyjq.
Is there any mistake i have done? i am not able to figure it out still. Please help. Thank you very much.
jq's split requires an argument, which must be a valid JSON string. In your case, you might want to use splits instead as it takes a regular expression argument. However, splits produces a stream, so you would presumably want to write something along the lines of:
.value[]
| { "id": [.["subject"] | splits(" *")][-1],
"date": .receivedDateTime }
I currently have JSON in the below format.
Some of the Key values are NOT properly formatted as they are missing double quotes (")
How do I fix these key values to have double-quotes on them?
{
Name: "test",
Address: "xyz",
"Age": 40,
"Info": "test"
}
Required:
{
"Name": "test",
"Address": "xyz",
"Age": 40,
"Info": "test"
}
Using the below post, I was able to find such key values in the above INVALID JSON.
However, I could NOT find an efficient way to replace these found values with double-quotes.
s = "Example: String"
out = re.findall(r'\w+:', s)
How to Escape Double Quote inside JSON
Using Regex:
import re
data = """{ Name: "test", Address: "xyz"}"""
print( re.sub("(\w+):", r'"\1":', data) )
Output:
{ "Name": "test", "Address": "xyz"}
You can use PyYaml. Since JSON is a subset of Yaml, pyyaml may overcome the lack of quotes.
Example
import yaml
dirty_json = """
{
key: "value",
"key2": "value"
}
"""
yaml.load(dirty_json, yaml.SafeLoader)
I had few more issues that I faced in my JSON.
Thought of sharing the final solution that worked for me.
jsonStr = re.sub("((?=\D)\w+):", r'"\1":', jsonStr)
jsonStr = re.sub(": ((?=\D)\w+)", r':"\1"', jsonStr)
First Line will fix this double-quotes issue for the Key. i.e.
Name: "test"
Second Line will fix double-quotes issue for the value. i.e. "Info": test
Also, above will exclude double-quoting within date timestamp which have : (colon) in them.
You can use online formatter. I know most of them are throwing error for not having double quotes but below one seems handling it nicely!
JSON Formatter
The regex approach can be brittle. I suggest you find a library that can parse the JSON text that is missing quotes.
For example, in Kotlin 1.4, the standard way to parse a JSON string is using Json.decodeFromString. However, you can use Json { isLenient = true }.decodeFromString to relax the requirements for quotes. Here is a complete example in JUnit.
import kotlinx.serialization.Serializable
import kotlinx.serialization.decodeFromString
import kotlinx.serialization.json.Json
import org.junit.jupiter.api.Assertions
import org.junit.jupiter.api.Test
#Serializable
data class Widget(val x: Int, val y: String)
class JsonTest {
#Test
fun `Parsing Json`() {
val w: Widget = Json.decodeFromString("""{"x":123, "y":"abc"}""")
Assertions.assertEquals(123, w.x)
Assertions.assertEquals("abc", w.y)
}
#Test
fun `Parsing Json missing quotes`() {
// Json.decodeFromString("{x:123, y:abc}") failed to decode due to missing quotes
val w: Widget = Json { isLenient = true }.decodeFromString("{x:123, y:abc}")
Assertions.assertEquals(123, w.x)
Assertions.assertEquals("abc", w.y)
}
}
I have A json data like this:
json_data = '{"data":"[{"Date":"3/17/2017","Steam Total":60},{"Date":"3/18/2017","Steam Total":15},{"Date":"3/19/2017","Steam Total":1578},{"Date":"3/20/2017","Steam Total":1604}]", "data_details": "{"data_key":"Steam Total", "given_high":"1500", "given_low":"1000", "running_info": []}"}'
json_input_data = json_data["data"]
json_input_additional_info = json_data["data_details"]
I am getting an error:
TypeError: string indices must be integers, not str
I think there is an error in the json data. Can someone Help me on this?
In you code has some issues.
The code: json_input_data = json_data["data"], the variable json_data is not a Json Object, is a String Object and you try get a string position by string key, for get a Json object from string json use json api: json
You Json string isn't valid, this is a valid version:
{"data":[{"Date":"3/17/2017","Steam Total":60},{"Date":"3/18/2017","Steam Total":15},{"Date":"3/19/2017","Steam Total":1578},{"Date":"3/20/2017","Steam Total":1604}], "data_details": {"data_key":"Steam Total", "given_high":"1500", "given_low":"1000", "running_info": []}}
Now, your code works fine.
Try parsing your json_data to JSON format (with JSON.parse(json_data)). Currently it's type is string - which is exactly what your error says.
As Pongpira Upra pointed out, your json is not well formed and should be something like this.
{
"data":[
{
"Date":"3/17/2017",
"Steam Total":60
},
{
"Date":"3/18/2017",
"Steam Total":15
},
{
"Date":"3/19/2017",
"Steam Total":1578
},
{
"Date":"3/20/2017",
"Steam Total":1604
}
],
"data_details":{
"data_key":"Steam Total",
"given_high":"1500",
"given_low":"1000",
"running_info":[]
}
}
In order to retrieve information you should write
json_data[0]["Date"]
This would print "3/17/2017"
You declare a string called json_data and, well, then it acts like a string. That is what the exception tells you. Like others here tried to say - you do also have an error in your data, but the exception you supplied is due to accessing the string as if it was a dictionary. You need to add a missing call to e.g. json.loads(...).
You were right. Your JSON is indeed wrong.
Can you try using this json?
{
"data":[
{
"Date":"3/17/2017",
"Steam Total":60
},
{
"Date":"3/18/2017",
"Steam Total":15
},
{
"Date":"3/19/2017",
"Steam Total":1578
},
{
"Date":"3/20/2017",
"Steam Total":1604
}
],
"data_details":{
"data_key":"Steam Total",
"given_high":"1500",
"given_low":"1000",
"running_info":[]
}
}
I wonder if there is a way to decode a JSON-like string.
I got string:
'{ hotel: { id: "123", name: "hotel_name"} }'
It's not a valid JSON string, so I can't decode it directly with the python API.
Python will only accept a stringified JSON string like:
'{ "hotel": { "id": "123", "name": "hotel_name"} }'
where properties are quoted to be a string.
Use demjson module, which has ability to decode in non-strict mode.
In [1]: import demjson
In [2]: demjson.decode('{ hotel: { id: "123", name: "hotel_name"} }')
Out[2]: {u'hotel': {u'id': u'123', u'name': u'hotel_name'}}
You could try and use a wrapper for a JavaScript engine, like pyv8.
import PyV8
ctx = PyV8.JSContext()
ctx.enter()
# Note that we need to insert an assignment here ('a ='), or syntax error.
js = 'a = ' + '{ hotel: { id: "123", name: "hotel_name"} }'
a = ctx.eval(js)
a.hotel.id
>> '123' # Prints
#vartec has already pointed out demjson, which works well for slightly invalid JSON. For data that's even less JSON compliant I've written barely_json:
from barely_json import parse
print(parse('[no, , {complete: yes, where is my value?}]'))
prints
[False, '', {'complete': True, 'where is my value?': ''}]
Not very elegant and not robust (and easy to break), but it may be possible to kludge it with something like:
kludged = re.sub('(?i)([a-z_].*?):', r'"\1":', string)
# { "hotel": { "id": "123", "name": "hotel_name"} }
You may find that using pyparsing and the parsePythonValue.py example could do what you want as well... (or modified fairly easily to do so) or the jsonParser.py could be modified to not require quoted key values.
i'm having trouble with a simple question.
a = {
"apiVersion": "2.1",
"data": {
"startIndex": 1,
"items": [{
"id": "YVA3UoZM0zU",
"title": "Trailer - Lisbela eo Prisioneiro"
}]
}
}
i don't know how to get the info id.
this is a string.
so, i tried to make this
import simplejson as json
>>> type(js)
<type 'dict'>
js = json.loads(a)
print js['data'{'items'[{'id'}]}]
>>> syntax error
this syntax is invalid, how could I get this info? it's supposed to be easy. where I'm making wrong?
Try:
js['data']['items'][0]['id']
It would appear that there may be multiple items in this structure. If you'd like to extract all item ids as a list, the following will do it:
[item['id'] for item in js['data']['items']]