Scrapy javascript json object loads - python

I am scraping one web site which has javascript json object. How can i convert that javascript json object to pure json object. I need JSON.stringfy method like javascript. How can i do that on python.
{
title: 'Erkek Gri Sandalet',
description: '<ul><li>Asıl Dış Materyal: Suni Deri</li><li>İç Materyal: Suni Deri</li><li>Taban Materyali: Kauçuk Taban</li></ul>',
url: '/erkek-gri-sandalet-3500250',
code: 'C-362686',
id: '3500250'
}
i am getting an error when passing above string into the json.loads() The error is
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
How can i convert to this javascript json object to native json. I used https://www.freeformatter.com/json-formatter.html website to validate above json and that website validate and converted native JSON easily.

You could do something funky with regular expressions, but actually you should take advantage of the fact that even though this is not valid JSON, it is valid YAML. Install PyYAML and you can just do yaml.load(data).

Related

How to extract a javascript object as json from a HTML page using python or nodejs?

https://yeastmine.yeastgenome.org/yeastmine/customQuery.do
The above webpage has something like this. As far as I understand, JSON does not support single quote, only double quote is allowed. So the things in {} is not a valid JSON object. What is the best way to extract this object from the resulted HTML page and convert it to JSON? Thanks.
var helpMap = {'NcRNAGene': ...
This one mentions JSON.stringify. But I am not sure how to first get helpMap as JS object in the first place in python or nodejs.
Convert JS object to JSON string
In the console of that website you can write javascript. In this case you are right that JSON.Stringify is what you want here, you use it by passing the javascript object helpMap into it as a parameter, the result is the JSON-encoded string:
jsonString = JSON.stringify(helpMap)
console.log(jsonString)
You should be able to copy that json string out of your console (in chrome there will be a "Copy" button at the end of it).
Suppose the webpage is downloaded to x.html, run the following.
grep '^ \+var helpMap' < x.html | ./main.js
main.js has the following code.
fs=require('fs');
data = fs.readFileSync(process.stdin.fd);
eval(data.toString());
console.log(helpMap);
Then use JSON.stringify() on helpMap if necesssary.

Get JSON response in Python, but in original JavaScript format

I am putting a JSON response into a variable via requests.json() like this:
response = requests.get(some_url, params=some_params).json()
This however converts JSON's original " to Python's ', true to True, null to None.
This poses a problem when trying to save the response as text and the convert it back to JSON - sure, I can use .replace() for all conversions mentioned above, but even once I do that, I get other funny json decoder errors.
Is there any way in Python to get JSON response and keep original JavaScript format?
json() is the JSON decoder method. You are looking at a Python object, that is why it looks like Python.
Other formats are listed on the same page, starting from Response Content
.text: text - it has no separate link/paragraph, it is right under "Response Content"
.content: binary, as bytes
.json(): decoded JSON, as Python object
.raw: streamed bytes (so you can get parts of content as it comes)
You need .text for getting text, including JSON data.
You can get the raw text of your response with requests.get(some_url, params=some_params).text
It is the json method which converts to a Python friendly format.

JSON load twitter API error

I have used tweepy user timeline API to extract information for some users. Here is the link to the file.
The content of the file is in string format. I tried to load the same using JSON.loads(<string>) but it is showing the error
ValueError: Expecting property name: line 1 column 2 (char 1)
I need to make the string to work as a dict/json so that I can iterate on the keys.
Can we see the full code of how you implemented the API call, but also try converting it into a collection first. try
from collection import Counter
Counter(data)

Expecting value: line 1 column 1 (char 0) python

I'm newbie in python and trying to parse data in my application using these lines of codes
json_str = request.body.decode('utf-8')
py_str = json.loads(json_str)
But I'm getting this error on json.loads
Expecting value: line 1 column 1 (char 0)
this is json formatted data that I send from angular app (Updated)
Object { ClientTypeId: 6, ClientName: "asdasd", ClientId: 0, PhoneNo: "123", FaxNo: "123", NTN: "1238", GSTNumber: "1982", OfficialAddress: "sads", MailingAddress: "asdasd", RegStartDate: "17-Aug-2016", 15 more… }
these are the values that I get in json_str
ClientTypeId=5&ClientName=asdasd&ClientId=0&PhoneNo=123&FaxNo=123&NTN=123&GSTNumber=12&OfficialAddress=adkjh&MailingAddress=adjh&RegStartDate=09-Aug-2016&RegEndDate=16-Aug-2016&Status=1&CanCreateUser=true&UserQuotaFor=11&UserQuotaType=9&MaxUsers=132123&ApplyUserCharges=true&ApplyReportCharges=true&EmailInvoice=true&BillingType=1&UserCharges=132&ReportCharges=123&MonthlyCharges=123&BillingDate=16-Aug-2016&UserSessionId=324
I don't know what's wrong in it.. can anyone mention what's the mistake is??
Your data is not JSON-formatted, not even the one you included in your updated answer. Your data is a JavaScript-object, not an encoded string. Please note the "N" in JSON: Notation -- it is a format inspired from how data is written in JavaScript code, but runtime JavaScript data is not represented in JSON. The "JSON" you pasted is how your browser represents the object to you, it is not proper JSON (that would be {"ClientTypeId": 6, ...} -- note the quotes around the property name).
When sending this data to the server, you have to encode it. You think you are sending it JSON-encoded, but you aren't. You are sending it "web form encoded" (data of type application/x-www-form-urlencoded).
Now either you have to learn how to send the data in JSON format from Angular, or use the correct parsing routine in Python: urllib.parse.parse_qs. Depending on the library you are using, there might be a convenience method to access the data as well, as this is a common use case.

python requests json returns single quote

i'm playing a little with google places api and requests
I got :
r = requests.get(self.url, params={'key': KEY, 'location': self.location, 'radius': self.radius, 'types': "airport"}, proxies=proxies)
r returns a 200 code, fine, but I'm confused by what r.json() returns compared to r.content
extract of r.json() :
{u'html_attributions': [],
u'next_page_token': u'CoQC-QAAABT4REkkX9NCxPWp0JcGK70kT4C-zM70b11btItnXiKLJKpr7l2GeiZeyL5y6NTDQA6ASDonIe5OcCrCsUXbK6W0Y09FqhP57ihFdQ7Bw1pGocLs_nAJodaS4U7goekbnKDlV3TaL8JMr4XpQBvlMN2dPvhFayU6RcF5kwvIm1YtucNOAUk-o4kOOziaJfeLqr3bk_Bq6DoCBwRmSEdZj34RmStdrX5RAirQiB2q_fHd6HPuHQzZ8EfdggqRLxpkFM1iRSnfls9WlgEJDxGB91ILpBsQE3oRFUoGoCfpYA-iW7E3uUD_ufby-JRqxgjD2isEIn8tntmFDjzQmjOraFQSEC6RFpAztLuk7l2ayfXsvw4aFO9gIhcXtG0LPucJkEa2nj3PxUDl',
u'results': [{u'geometry': {u'location': {u'lat': -33.939923,
u'lng': 151.175276}},
while extract of r.content :
'{\n "html_attributions" : [],\n "next_page_token" : "CoQC-QAAABT4REkkX9NCxPWp0JcGK70kT4C-zM70b11btItnXiKLJKpr7l2GeiZeyL5y6NTDQA6ASDonIe5OcCrCsUXbK6W0Y09FqhP57ihFdQ7Bw1pGocLs_nAJodaS4U7goekbnKDlV3TaL8JMr4Xp
so r.content has the double quotes like a "correct" json object while r.json() seems to have changed all double-quotes in single-quotes.
Should I care about it or not ? I can still access r.json() contents fine, just wondered if this was normal for requests to return an object with single quotes.
The json() method doesn't actually return JSON. It returns a python object (read: dictionary) that contains the same information as the json data. When you print it out, the quotes are added for the sake of readability, they are not actually in your data.
Should I care about it or not?
Not.
What you can however is to add
jsonresponse=json.dump(requests.get(xxx).json())
in order to get valid json in jsonresponse.
Python uses single or double quotes for strings. By default, it'll display single quote for strings.
However, JSON specification only consider double quotes to mark strings.
Note that requests' response.json() will return native Python types which are slightly different from their JSON representation you can see with response.content.
You are seeing the single quotes because you are looking at Python, not JSON.
Calling Response.json attempts to parse the content of the Response as JSON. If it is successful, it will return a combination of dicts, lists and native Python types as #Two-Bit Alchemist alluded to in his comment.
Behind the scenes, The json method is just calling complexjson.loads on the response text (see here). If you dig further to look at the requests.compat module to figure out what complexjson is, it is the simplejson package if it is importable on the system (i.e. installed) and the standard library json package otherwise (see here). So, modulo considerations about the encoding, you can read a call to Response.json as equivalent to:
import requests
import json
response = requests.get(...)
json.loads(response.text)
TL;DR: nothing exciting is happening and no, what is returned from Response.json is not intended to be valid JSON but rather valid JSON transformed into Python data structures and types.

Categories

Resources