Send Github Markdown API request with urllib in python 3 got BadStatusLine - python

I have follow python code to convert a markdown using github API.
gfm.py (python 3 code)
import traceback
import json
import urllib.request
import http.client
import sys
try:
content = open(sys.argv[1], 'r').read()
data = {"text": content,"mode": 'gfm'}
headers = {'Content-Type': 'application/json'}
bytes = json.dumps(data).encode('utf-8')
url = "https://api.github.com/markdown"
request = urllib.request.Request(url, data=bytes, headers=headers)
result = urllib.request.urlopen(request).read().decode('utf-8')
print(result)
except http.client.BadStatusLine:
traceback.print_exc()
except:
traceback.print_exc()
Scripts and test markdown files used below are contained here: https://gist.github.com/xpol/6332952
When convert small markdown file (eg. gfm.py Sample.md in gist), it got fine result.
When convert large markdown file (eg. gfm.py Cheatsheet.md in gist), it got http.client.BadStatusLine: '' when at the urllib.request.urlopen line.
Could any one know what's wrong with this?
Thanks a lot!

I can not reproduce this but I will suggest regardless that you use a library like python-requests or even an API wrapper listed in the GitHub API docs.

Related

How to print selected text from JSON file using Python

I'm new to python and have undertaken my first project to automate something for my role (I'm in the network space, so forgive me if this is terrible!).
I'm required to to download a .json file from the below link:
https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
My script goes through and retrieves the manual download link.
The reason I'm getting the URL in this way, is that the download link changes every fortnight when MS update the file.
My preference is to extract the "addressPrefixes" contents from the names of "AzureCloud.australiacentral", "AzureCloud.australiacentral2", "AzureCloud.australiaeast", "AzureCloud.australiasoutheast".
I'm then wanting to strip out characters of " & ','.
Each of the subnet ranges should then reside on a new line and be placed in a text file.
If I perform the below, I'm able to get the output that I am wanting.
Am I correct in thinking that I can use a for loop to achieve this? If so, would it be better to use a Python dictionary as opposed to using JSON formatted output?
# Script to check Azure IPs
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Import Modules for script
import requests
import re
import json
import urllib.request
search = 'https://download.*?\.json'
ms_dl_centre = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
requests_get = requests.get(ms_dl_centre)
json_url_search = re.search(search, requests_get.text)
json_file = json_url_search.group(0)
with urllib.request.urlopen(json_file) as url:
contents = json.loads(url.read().decode())
print(json.dumps(contents['values'][1]['properties']['addressPrefixes'], indent = 0)) #use this to print contents from json entry 1
I'm not convinced that using re to parse HTML is a good idea. BeautifulSoup is more suited to the task. Upon inspection of the HTML response I note that there's a span element of class file-link-view1 that seems to uniquely identify the URL to the JSON download. Assuming that to be a robust approach (i.e. Microsoft don't change the way the download URL is presented) then this is how I'd do it:-
import requests
from bs4 import BeautifulSoup
namelist = ["AzureCloud.australiacentral", "AzureCloud.australiacentral2",
"AzureCloud.australiaeast", "AzureCloud.australiasoutheast"]
baseurl = 'https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519'
with requests.Session() as session:
response = session.get(baseurl)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
downloadurl = soup.find('span', class_='file-link-view1').find('a')['href']
response = session.get(downloadurl)
response.raise_for_status()
json = response.json()
for n in json['values']:
if n['name'] in namelist:
print(n['name'])
for ap in n['properties']['addressPrefixes']:
print(ap)
#andyknight, thanks for your direction. I'd up vote you but as I'm a noob, it doesn't permit from doing so.
I've taken the basis of your python script and added in some additional components.
I removed the print statement for the region name in the .txt file, as this is file is referenced by a firewall, which is looking for IP addresses.
I've added in Try/Except/Else for portion of the script, to identify if there is ever an error with reaching the URL, or other unspecified error. I've leveraged logging to send an email based on the status of the script. If an exception is thrown I get an email with traceback information, otherwise I receive an email advising the script was successful.
This writes out the specific prefixes for AU regions into a .txt file.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import requests
import logging
import logging.handlers
from bs4 import BeautifulSoup
smtp_handler = logging.handlers.SMTPHandler(mailhost=("sanitised.smtp[.]xyz", 25),
fromaddr="UpdateIPs#sanitised[.]xyz",
toaddrs="FriendlyAdmin#sanitised[.]xyz",
subject=u"Check Azure IP Script completion status.")
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()
logger.addHandler(smtp_handler)
namelist = ["AzureCloud.australiacentral", "AzureCloud.australiacentral2",
"AzureCloud.australiaeast", "AzureCloud.australiasoutheast"]
baseurl = 'https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519'
with requests.Session() as session:
response = session.get(baseurl)
try:
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
downloadurl = soup.find('span', class_='file-link-view1').find('a')['href']
response = session.get(downloadurl)
response.raise_for_status()
json = response.json()
for n in json['values']:
if n['name'] in namelist:
for ap in n['properties']['addressPrefixes']:
with open('Check_Azure_IPs.txt', 'a') as file:
file.write(ap + "\n")
except requests.exceptions.HTTPError as e:
logger.exception(
"URL is no longer valid, please check the URL that's defined in this script with MS, as this may have changed.\n\n")
except Exception as e:
logger.exception("Unknown error has occured, please review script")
else:
logger.info("Script has run successfully! Azure IPs have been updated.")
Please let me know if you think there is a better way to handle this, otherwise this is marked as answered. I appreciate your help greatly!

Convert urllib2 python code to use urllib module

I have the following code below which runs using the urllib2 module, but I have a requirement to upgrade to Python 3.x and this prevents the use of urllib2. I am aware it is split across urllib.request and urllib.error, but I am struggling to convert the following code to use the urllib module instead after reading through the doc and other relevant questions. Any help is greatly appreciated.
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request(url=event['ResponseURL'], data=data)
request.add_header('Content-Type', '')
request.get_method = lambda: 'PUT'
url = opener.open(request)
All you need to do is replace urllib2 with urllib.request. You are not using anything that has moved to other urllib.* modules:
import urllib.request
opener = urllib.request.build_opener(urllib.request.HTTPHandler)
request = urllib.request.Request(url=event['ResponseURL'], data=data)
request.add_header('Content-Type', '')
request.get_method = lambda: 'PUT'
url = opener.open(request)
You can always run the 2to3 command-line tool on your Python 2 code and see what changes it makes; the default action is to output changes on stdout in unified diff format.
The urllib fixer will then also add imports for urllib.error and urllib.parse at the top, because it knows that code that imported urllib2 could need any of the 3 urllib.* modules; it isn't smart enough to limit the import only to those that are actually needed after transforming the rest of the urllib2 references in the module.

How do I get the HTML of a website using Python 3?

I've been trying to do this with repl.it and have tried several solutions on this site, but none of them work. Right now, my code looks like
import urllib
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
print (urllib.urlopen(url).read())
but it just says "AttributeError: module 'urllib' has no attribute 'urlopen'".
If I add import urllib.urlopen, it tells me there's no module named that. How can I fix my problem?
The syntax you are using for the urllib library is from Python v2. The library has changed somewhat for Python v3. The new notation would look something more like:
import urllib.request
response = urllib.request.urlopen("http://www.google.com")
html = response.read()
The html object is just a string, with the returned HTML of the site. Much like the original urllib library, you should not expect images or other data files to be included in this returned object.
The confusing part here is that, in Python 3, this would fail if you did:
import urllib
response = urllib.request.urlopen("http://www.google.com")
html = response.read()
This strange module-importing behavior is, I am told, as intended and working. BUT it is non-intuitive and awkward. More importantly, for you, it makes the situation harder to debug. Enjoy.
Python3
import urllib
import requests
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
r = urllib.request.urlopen(url).read()
print(r)
or
import urllib.request
url = "http://www.pythonchallenge.com/pc/def/linkedlist.php?nothing=12345"
r = urllib.request.urlopen(url).read()
print(r)

Python reading json from a url [duplicate]

I am trying to GET a URL using Python and the response is JSON. However, when I run
import urllib2
response = urllib2.urlopen('https://api.instagram.com/v1/tags/pizza/media/XXXXXX')
html=response.read()
print html
The html is of type str and I am expecting a JSON. Is there any way I can capture the response as JSON or a python dictionary instead of a str.
If the URL is returning valid JSON-encoded data, use the json library to decode that:
import urllib2
import json
response = urllib2.urlopen('https://api.instagram.com/v1/tags/pizza/media/XXXXXX')
data = json.load(response)
print data
import json
import urllib
url = 'http://example.com/file.json'
r = urllib.request.urlopen(url)
data = json.loads(r.read().decode(r.info().get_param('charset') or 'utf-8'))
print(data)
urllib, for Python 3.4
HTTPMessage, returned by r.info()
"""
Return JSON to webpage
Adding to wonderful answer by #Sanal
For Django 3.4
Adding a working url that returns a json (Source: http://www.jsontest.com/#echo)
"""
import json
import urllib
url = 'http://echo.jsontest.com/insert-key-here/insert-value-here/key/value'
respons = urllib.request.urlopen(url)
data = json.loads(respons.read().decode(respons.info().get_param('charset') or 'utf-8'))
return HttpResponse(json.dumps(data), content_type="application/json")
Be careful about the validation and etc, but the straight solution is this:
import json
the_dict = json.load(response)
resource_url = 'http://localhost:8080/service/'
response = json.loads(urllib2.urlopen(resource_url).read())
Python 3 standard library one-liner:
load(urlopen(url))
# imports (place these above the code before running it)
from json import load
from urllib.request import urlopen
url = 'https://jsonplaceholder.typicode.com/todos/1'
you can also get json by using requests as below:
import requests
r = requests.get('http://yoursite.com/your-json-pfile.json')
json_response = r.json()
Though I guess it has already answered I would like to add my little bit in this
import json
import urllib2
class Website(object):
def __init__(self,name):
self.name = name
def dump(self):
self.data= urllib2.urlopen(self.name)
return self.data
def convJSON(self):
data= json.load(self.dump())
print data
domain = Website("https://example.com")
domain.convJSON()
Note : object passed to json.load() should support .read() , therefore urllib2.urlopen(self.name).read() would not work .
Doamin passed should be provided with protocol in this case http
This is another simpler solution to your question
pd.read_json(data)
where data is the str output from the following code
response = urlopen("https://data.nasa.gov/resource/y77d-th95.json")
json_data = response.read().decode('utf-8', 'replace')
None of the provided examples on here worked for me. They were either for Python 2 (uurllib2) or those for Python 3 return the error "ImportError: No module named request". I google the error message and it apparently requires me to install a the module - which is obviously unacceptable for such a simple task.
This code worked for me:
import json,urllib
data = urllib.urlopen("https://api.github.com/users?since=0").read()
d = json.loads(data)
print (d)

Convert results from url lib.request [duplicate]

I am trying to GET a URL using Python and the response is JSON. However, when I run
import urllib2
response = urllib2.urlopen('https://api.instagram.com/v1/tags/pizza/media/XXXXXX')
html=response.read()
print html
The html is of type str and I am expecting a JSON. Is there any way I can capture the response as JSON or a python dictionary instead of a str.
If the URL is returning valid JSON-encoded data, use the json library to decode that:
import urllib2
import json
response = urllib2.urlopen('https://api.instagram.com/v1/tags/pizza/media/XXXXXX')
data = json.load(response)
print data
import json
import urllib
url = 'http://example.com/file.json'
r = urllib.request.urlopen(url)
data = json.loads(r.read().decode(r.info().get_param('charset') or 'utf-8'))
print(data)
urllib, for Python 3.4
HTTPMessage, returned by r.info()
"""
Return JSON to webpage
Adding to wonderful answer by #Sanal
For Django 3.4
Adding a working url that returns a json (Source: http://www.jsontest.com/#echo)
"""
import json
import urllib
url = 'http://echo.jsontest.com/insert-key-here/insert-value-here/key/value'
respons = urllib.request.urlopen(url)
data = json.loads(respons.read().decode(respons.info().get_param('charset') or 'utf-8'))
return HttpResponse(json.dumps(data), content_type="application/json")
Be careful about the validation and etc, but the straight solution is this:
import json
the_dict = json.load(response)
resource_url = 'http://localhost:8080/service/'
response = json.loads(urllib2.urlopen(resource_url).read())
Python 3 standard library one-liner:
load(urlopen(url))
# imports (place these above the code before running it)
from json import load
from urllib.request import urlopen
url = 'https://jsonplaceholder.typicode.com/todos/1'
you can also get json by using requests as below:
import requests
r = requests.get('http://yoursite.com/your-json-pfile.json')
json_response = r.json()
Though I guess it has already answered I would like to add my little bit in this
import json
import urllib2
class Website(object):
def __init__(self,name):
self.name = name
def dump(self):
self.data= urllib2.urlopen(self.name)
return self.data
def convJSON(self):
data= json.load(self.dump())
print data
domain = Website("https://example.com")
domain.convJSON()
Note : object passed to json.load() should support .read() , therefore urllib2.urlopen(self.name).read() would not work .
Doamin passed should be provided with protocol in this case http
This is another simpler solution to your question
pd.read_json(data)
where data is the str output from the following code
response = urlopen("https://data.nasa.gov/resource/y77d-th95.json")
json_data = response.read().decode('utf-8', 'replace')
None of the provided examples on here worked for me. They were either for Python 2 (uurllib2) or those for Python 3 return the error "ImportError: No module named request". I google the error message and it apparently requires me to install a the module - which is obviously unacceptable for such a simple task.
This code worked for me:
import json,urllib
data = urllib.urlopen("https://api.github.com/users?since=0").read()
d = json.loads(data)
print (d)

Categories

Resources