I'm trying to import some JSON data into Python with urllib and json.load but I want to cut out the first x and the last y characters because it's making the json data invalid.
thePage = urllib.urlopen("http://datafile.dat")
myData = json.load(thePage)
I want to do something like json.load(thePage[10:-10]), but since urlopen doesn't return a string, I can't slice it. What can I do?
You can get the text from the request by doing .read(). With this, you should use json.loads() instead of json.load(), as you're inputting a string as a parameter.
You can do your normal slicing if necessary with the HTML as a string.
Related
I am writing a program to call an API. I am trying to convert my data payload into json. Thus, I am using json.loads() to achieve this.
However, I have encountered the following problem.
I set my variable as following:
apiVar = [
"https://some.url.net/api/call", #url
'{"payload1":"email#user.net", "payload2":"stringPayload"}',#payload
{"Content-type": "application/json", "Accept": "text/plain"}#headers
]
Then I tried to convert apiVar[1] value into json object.
jsonObj = json.loads(apiVar[1])
However, instead of giving me output like the following:
{"payload1":"email#user.net", "payload2":"stringPayload"}
It gives me this instead:
{'payload1':'email#user.net', 'payload2':'stringPayload'}
I know for sure that this is not a valid json format. What I would like to know is, why does this happen? I try searching a solution for it but am not able to find anything on it. All code examples suggest it should have given me the double quote instead.
How should I fix it so that it will give the double quote output?
json.loads() takes a JSON string and converts it into the equivalent Python datastructure, which in this case is a dict containing strings. And Python strings display in single quotes by default.
If you want to convert a Python datastructure to JSON, use json.dumps(), which will return a string. Or if you're outputting straight to a file, use json.dump().
In any case, your payload is already valid JSON, so the only reason to load it is if you want to make changes to it before calling the API.
You need to use the json.dumps to convert the object back into json format.
The string with single quotes that you are reverencing is probably a str() or repr() method that is simply used to visualize the data as a python object (dictionary) not a json object. try taking a look at this:
print(type(jsonObj))
print(str(jsonObj))
print(json.dumps(jsonObj))
I need to make a request to the api of my client, and the api returns this data:
[6,0,'VT3zrYA',5,'USUeZWA',5,0,0,0,0,0,4,0,0,0,2,0,0,3,0,0,0,0,2,0,1,["portale.titolari.client.config.ShoulderDTO/4121330600","java.util.HashSet/3273092938","MATTEO SBRAGIA","java.util.ArrayList/4159755760","java.util.Date/3385151746","MATTEO"],0,7]
How can I parse this data and extract the following fields :
MATTEO SBRAGIA
MATTEO
I've tried this code, but it's not working :
data = json.load(output_data)
pprint data
This in fact is not a valid JSON string because it contains single quotes '. You can replace all single quotes with double quotes and then parse the string but it's a question whether this was intentional or a mistake:
import json
s = '[6,0,\'VT3zrYA\',5,\'USUeZWA\',5,0,0,0,0,0,4,0,0,0,2,0,0,3,0,0,0,0,2,0,1,["portale.titolari.client.config.ShoulderDTO/4121330600","java.util.HashSet/3273092938","MATTEO SBRAGIA","java.util.ArrayList/4159755760","java.util.Date/3385151746","MATTEO"],0,7]'
data = json.loads(s.replace("\'", '"'))
print(data[26][2])
print(data[26][5])
prints:
$ python test.py
MATTEO SBRAGIA
MATTEO
I am getting this as my response
b'{"userdetails":[["{\\”user_id\\":[\\”54562af66ffd\\"],\\”user_name\\":[\\"bewwrking\\"],\\”room\\":[\\"31\\”]}'
I want to convert it into proper json without any double slashes.
Is there any buildin function for that or i need to do string replace
If you have control over how it is being sent, I would recommend doing to_string on any relevant field/keys that you are sending as json. I had some weird json responses before sanitizing the input to json_dump.
remove the leading b and run replace as below.
s = '{"userdetails":[["{\\"user_id\\":[\\"54562af66ffd\\"],\\"user_name\\":[\\"bewwrking\\"],\\"room\\":[\\"31\\"]}'
s = s.replace('\','')
print(s)
{"userdetails":[["{"user_id":["54562af66ffd"],"user_name":["bewwrking"],"room":["31"]}
I get this string from stdin.
{u'trades': [Custom(time=1418854520, sn=47998, timestamp=1418854517,
price=322, amount=0.269664, tid=48106793, type=u'ask',
start=1418847319, end=1418847320), Custom(time=1418854520, sn=47997,
timestamp=1418854517, price=322, amount=0.1, tid=48106794,
type=u'ask', start=1418847319, end=1418847320),
Custom(time=1418854520, sn=47996, timestamp=1418854517, price=321.596,
amount=0.011, tid=48106795, type=u'ask', start=1418847319,
end=1418847320)]}
My program fails when i try to access jsonload["trades"]. If i use jsonload[0] I only receive one character: {.
I checked it isn't a problem from get the text from stdin, but I don't know if it is a problem of format received (because i used Incursion library) or if it is a problem in my python code. I have tried many combinations about json.load/s and json.dump/s but without success.
inputdata = sys.stdin.read()
jsondump = json.dumps(inputdata)
jsonload = json.loads(jsondump)
print jsonload
print type(jsonload) # return me "<type 'unicode'>"
print repr(jsonload) # return me same but with u" ..same string.... "
for row in jsonload["trades"]: # error here: TypeError: string indices must be integers
You read input data into a string. This is then turned into a JSON encoded string by json.dumps. You then turn it back into a plain string using json.loads. You have not interpreted the original data as JSON at any point.
Try just converting the input data from json:
inputdata = sys.stdin.read()
jsonload = json.loads(inputdata)
However this will not work because you have not got valid JSON data in your snippet. It looks like serialized python code. You can check the input data using http://jsonlint.com
The use of u'trades' shows me that you have a unicode python string. The JSON equivalent would be "trades". To convert the python code you can eval it, but this is a dangerous operation if the data comes from an untrusted source.
I'm writing an python script which will extract the url of facebook video. But in the source of the video page, i see some characters of form \uxxxxxx in the url.
for instance url is in this form
https\u00253A\u00255C\u00252F\u00255C\u00252Ffbcdn-video-a.akamaihd.net\u00255C\u00252Fhvideo-ak-prn2\u00255C\u00252Fv\u00255C\u00252F753002_318048581647953_53890_n.mp4\u00253Foh\u00253D64e3e8ecf7e88f1da335d88949b2dc1f\u002526oe\u00253D52226D10\u002526__gda__\u00253D1377987338_9e37fb163a1d37d4b06ab7cff668f7dc\u002522\u00252C\u002522
\u00253A is colon (:), but how do i convert it.
When i did like
>>> x.decode('unicode_escape').encode('ascii','ignore')
i get
'https%3A%5C%2F%5C%2Ffbcdn-video-a.akamaihd.net%5C%2Fhvideo-ak-prn2%5C%2Fv%5C%2F753002_318048581647953_53890_n.mp4%3Foh%3D64e3e8ecf7e88f1da335d88949b2dc1f%26oe%3D52226D10%26__gda__%3D1377987338_9e37fb163a1d37d4b06ab7cff668f7dc%22%2C%22
I want exact url not percentage.
I searched a lot but couldn't find any help.
Thanks in advance
Edit
Is there any way if I pass the whole source of facebook page and then convert all such complex unicode character to simple one.
>>> import urllib
>>> s = b'https\u00253A\u00255C\u00252F\u00255C\u00252Ffbcdn-video'
>>> print urllib.unquote_plus(s.decode('unicode_escape'))
https:\/\/fbcdn-video
It seems that your string is backslashed.
>>> import re
>>> import urllib
>>> s = b'https\u00253A\u00255C\u00252F\u00255C\u00252Ffbcdn-video'
>>> re.sub(r'\\(.)', r'\1', urllib.unquote_plus(s.decode('unicode_escape')))
u'https://fbcdn-video'