Python-Can't Pull JSON Format from JSON Source

Python-Can't Pull JSON Format from JSON Source - python

I'm trying to scrape data from Verizon's buyback pricing site. I found the source of the information while going through "Net" requests in my browser. The site is in JSON format, but nothing I do will let me download that data https://www.verizonwireless.com/vzw/browse/tradein/ajax/deviceSearch.jsp?act=models&car=Verizon&man=Apple&siz=large
I can't remember everything I've tried, but here are the issues I'm having. Also, I'm not sure how to insert multiple code blocks.
import json,urllib,requests
res=urllib.request.urlopen(url)
data=json.loads(res)
TypeError: the JSON object must be str, not 'bytes'
import codecs
reader=codecs.getreader('utf-8')
obj=json.load(reader(res))
ValueError: Expecting value: line 1 column 1 (char 0)
#this value error happens with other similar attempts, such as....
res=requests.get(url)
res.json()#Same error Occurs
At this point I've researched many hours and can't find a solution. I'm assuming that the site is not formatted normally or I'm missing something obvious. I see the JSON requests/structure in my web developer tools.
Does anybody have any ideas or solutions for this? Please let me know if you have questions.

You need to send a User-Agent HTTP header field. Try this program:
import requests
url='https://www.verizonwireless.com/vzw/browse/tradein/ajax/deviceSearch.jsp?act=models&car=Verizon&man=Apple&siz=large'
# Put your own contact info in next line
headers = {'User-agent':'MyBot/0.1 (+user#example.com)'}
r = requests.get(url, headers=headers)
print(r.json()['models'][0]['name'])
Result:
iPhone 6S

Related

Receiving Error message: JSONDecodeError when attempting to use API

I am following along with Python for Data Analysis and am on Chapter 6 looking at using APIs.
I wish to connect to sources provided by National Grid on their Data Portal. They provide a number of URLs (e.g. several found here, https://data.nationalgrideso.com/ancillary-services/obligatory-reactive-power-service-orps-utilisation). I want to read these directly into pandas rather than download the Excel/csv file and then open that.
I am receiving the error message
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
after attempting the following:
import requests
import codecs
import json
url = 'https://data.nationalgrideso.com/backend/dataset/7e142b03-8650-4f46-8420-7ce1e84e1e5b/resource/a61c6c26-62ec-41e1-ae25-ed95f4562274/download/reactive-utilisation-data-apr-2020-mar-2022.csv'
resp = request.get(url)
decoded_data = codecs.decode(resp.text.encode(), 'utf-8-sig')
data = json.loads(decoded_data)
I understand that I need to use 'utf-8-sig' due to a particular BOM appearing on the first line otherwise.
I have looked at answers regarding the same error message but nothing is working for me at present. The API is working in the browser and I am receiving a response of 200 and data is being returned. Perhaps I am missing something more fundamental in the approach?

Scraping data over websockets

I am trying to get the daily price data from this specific webpage:
https://www.londonstockexchange.com/stock/CS1/amundi/company-page
Those data are represented in the chart.
I run out of idea to try to reach those data. I assume that those data are transfered though one of the websocket connection that is made and retrievable in the browser console.
enter image description here
I tried to simulate the websocket connection and send the same binary than the front app.
from websocket import create_connection
s = create_connection("wss://82-99-29-151.infrontservices.com/wsrt/2/4")
hex_1 = "3e000000010..."
hex_2 = "13000000010..."
hex_3 = "1e000000010..."
ws.send(binascii.unhexlify(hex_1))
ws.send(binascii.unhexlify(hex_2))
ws.send(binascii.unhexlify(hex_3))
result = ws.recv()
Then I tried to decode this response with all the possible encoding as follow:
import binascii
from encodings.aliases import aliases
for v in [v for k, v in aliases.items()]:
try:
print(result.decode(v))
except:
print(f"ERROR {v}")
And naturally, I have no interpretable output that I can exploit. I could think that a cipher is used here. But I have no more idea how to investigate further.
Do you have any idea about that? :)
Thanks in advance !
AL Ko
EDIT 1
enter image description here
We can see one the datapoint with the value 16990 for a given date. This is what I am looking for is the whole time series of the chart.

After you read my comment and get informed about scraping, and decide to proceed carefully,
Python can retrieve this JSON with just a few lines of code
import requests
url = "https://api.londonstockexchange.com/api/gw/lse/instruments/alldata/CS1"
response = requests.get(url=url).json()
# print some data from the json
print(response_json)
print(response_json.get("description"))
print(response_json.get("bid"))
I found this data using the "network" tab, a few more show up when you hit "reload", but they seem to be empty.

How can I add a header to urllib.request.urlretrieve keeping my variables?

I'm trying to download a file from a website but it looks like it is detecting urllib and doesn't allow it to download (I'm getting the error "urllib.error.HTTPError: HTTP Error 403: Forbidden").
How can I fix this? I found on the internet that I had to add a header but the answers weren't going the way I need (It was using Request and I didn't find anything about an argument to add in urllib.request.urlretrieve() for a header)
I'm using Python 3.6
Here's the code:
import urllib.request
filelink = 'https://randomwebsite.com/changelog.txt'
filename = filelink.rsplit('/', 1)
filename = str(filename[1])
urllib.request.urlretrieve(filelink, filename)
I want to include a header to give me the permission to download the file but I need to keep a line like the last one, using the two variables (one for the link of the file and one for the name that depends of the link).
Thanks already for your help !

Check the below link:
https://stackoverflow.com/a/7244263/5903276
The most correct way to do this would be to use the urllib.request.urlopen function to return a file-like object that represents an HTTP response and copy it to a real file using shutil.copyfileobj.

Python urllib module TypeError

I'm trying to get into CTF's and I found a cool website ment to practice some web based CTF skills called ctf.slothparadise.com. I've managed to get 4 of the Flags but two of them are giving me the finger and sadly I've had to dust off the good Ol' Python skills.
import urllib.error
import urllib.request
import urllib.parse
import urllib
import sys
while True:
about_page = urllib.request.urlopen("http://ctf.slothparadise.com/about.php").read()
if "KEY" in about_page:
print(about_page)
sys.exit(0)
ctf.slothpython.com/about.php is the page I'm programming for and it spits out the key in the source code every 1000 visitors. Instead of being a moron and refreshing it till 1000 I wrote that code in hopes it would keep opening the page until the phrase "KEY" appeared in the pages source code.
I'm getting this: (TypeError: 'str' does not support the buffer interface)
From what I know about TypeErrors I'm guessing that I may have "KEY" in the wrong format perhaps? I'm not really sure, I also may not even be using the right modules but the old urllib2 module I would typically use for this got split up into different modules so I'm learning as I go with these new modules.
Any help is appreciated in fixing this issue, also if my interpretaion of TypeErrors is wrong feel free to correct me.

The object returned by urlopen().read() acts like a context manager.
You are not using it correctly.
Try something like that:
import urllib.request
while True:
with urllib.request.urlopen('http://ctf.slothparadise.com/about.php') as response:
html = response.read()
if b"KEY" in html:
print(html)
sys.exit(0)

urllib.request.urlopen returns an http.client.HTTPResponse object and that object's read returns an encoded bytes object. How to decode may be in the returned http header, or in your case, embedded in an html meta tag. You likely don't want to parse the html for this particular test, so just look for the bytes object b'KEY'.
I don't know what you want to do with the data next, but if you want it to print nicely or scan the html, then you will have to do some parsing.
import urllib.error
import urllib.request
import urllib.parse
import urllib
import sys
while True:
about_page = urllib.request.urlopen("http://ctf.slothparadise.com/about.php").read()
if b"KEY" in about_page:
print(about_page)
sys.exit(0)

Make about_page a string with
about_page=str(urllib.request.urlopen("http://ctf.slothparadise.com/about.php").read())
This should make your code work. Hope this helps!!

How to use Python to pipe a .htm file to a website

I have a file, gather.htm which is a valid HTML file with header/body and forms. If I double click the file on the Desktop, it properly opens in a web browser, auto-submits the form data (via <SCRIPT LANGUAGE="Javascript">document.forms[2].submit();</SCRIPT>) and the page refreshes with the requested data.
I want to be able to have Python make a requests.post(url) call using gather.htm. However, my research and my trail-and-error has provided no solution.
How is this accomplished?
I've tried things along these lines (based on examples found on the web). I suspect I'm missing something simple here!
myUrl = 'www.somewhere.com'
filename='/Users/John/Desktop/gather.htm'
f = open (filename)
r = requests.post(url=myUrl, data = {'title':'test_file'}, files = {'file':f})
print r.status_code
print r.text
And:
htmfile = 'file:///Users/John/Desktop/gather.htm'
files = {'file':open('gather.htm')}
webbrowser.open(url,new=2)
response = requests.post(url)
print response.text
Note that in the 2nd example above, the webbrowser.open() call works correctly but the requests.post does not.
It appears that everything I tried failed in the same way - the URL is opened and the page returns default data. It appears the website never receives the gather.htm file.

Since your request is returning 200 OK, there is nothing wrong getting your post request to the server. It's hard to give you an exact answer, but the problem lies with how the server is handling the request. Either your post request is being formatted in a way that the server doesn't recognise, or the server hasn't been set up to deal with them at all. If you're managing the website yourself, some additional details would help.
Just as a final check, try the following:
r = requests.post(url=myUrl, data={'title':'test_file', 'file':f})

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python-Can't Pull JSON Format from JSON Source - python

Related

Receiving Error message: JSONDecodeError when attempting to use API

Scraping data over websockets

How can I add a header to urllib.request.urlretrieve keeping my variables?

Python urllib module TypeError

How to use Python to pipe a .htm file to a website

Categories

Resources