How can I request (get) and read an xml file using python? - python

I tried requesting an RSS feed on Treasury Direct using Python. In the past I've used urllib, or requests libraries to serve this purpose and it's worked fine. This time however, I continue to get the 406 status error, which I understand is the page's way of telling me it doesn't accept my header details from the request. I've tried altering it however to no avail.
This is how I've tried
import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
user_agent = {'User-agent': 'Mozilla/5.0'}
response = requests.get(url, headers = user_agent)
print response.text
Environments: Python 2.7 and 3.4.
I also tried accessing via curl with the same exact error.
I believe this to be page specific, but can't figure out how to appropriately frame a request to read this page.
I found an API on the page which I can read the same data in json so this issue is now more of a curiosity to me than a true problem.
Any answers would be greatly appreciated!
Header Details
{'surrogate-control': 'content="ESI/1.0",no-store', 'content-language': 'en-US', 'x-content-type-options': 'nosniff', 'x-powered-by': 'Servlet/3.0', 'transfer-encoding': 'chunked', 'set-cookie': 'BIGipServerpl_www.treasurydirect.gov_443=3221581322.47873.0000; path=/; Httponly; Secure, TS01598982=016b0e6f4634928e3e7e689fa438848df043a46cb4aa96f235b0190439b1d07550484963354d8ef442c9a3eb647175602535b52f3823e209341b1cba0236e4845955f0cdcf; Path=/', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'keep-alive': 'timeout=10, max=100', 'connection': 'Keep-Alive', 'cache-control': 'no-store', 'date': 'Sun, 23 Apr 2017 04:13:00 GMT', 'x-frame-options': 'SAMEORIGIN', '$wsep': '', 'content-type': 'text/html;charset=ISO-8859-1'}

You need to add accept to headers request:
import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
headers = {'accept': 'application/xml;q=0.9, */*;q=0.8'}
response = requests.get(url, headers=headers)
print response.text

Related

Python - Why does requests.post return 200 without actually uploading the file?

Note: I previously posted this question, but it was associated with another question where my question is not actually answered.
I'm trying to learn Python by converting some bash scripts into Python. I'm stuck on uploading a file to my web host. Here's what I'm trying along with the result:
import requests
user = '....'
password = '....'
myfile={'file': open('/Users/mnewman/Desktop/myports.txt' ,'rb')}
myurl = 'https://www.example.com/'
headers={"user-agent": "Mozilla/5.0 (Macintosh;\ Intel Mac OS X 10_15_7) \
AppleWebKit/605.1.15(KHTML, like Gecko) Version/14.0.1 Safari/605.1.15"}
r = requests.post(url=myurl, data={}, files=myfile, \
auth=(user, password), headers=headers)
print(r.status_code)
print(r.headers)
200
{'Date': 'Sat, 12 Dec 2020 22:51:04 GMT', 'Server': 'Apache', 'Upgrade': 'h2,h2c',
'Connection': 'Upgrade, Keep-Alive', 'Last-Modified': 'Sun, 30 Aug 2020 23:31:39 GMT',
'Accept-Ranges': 'none', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip',
'Content-Length': '1227', 'Keep-Alive': 'timeout=5, max=75', 'Content-Type':
'text/html'}
Even thought the response code is 200, the file was not actually uploaded. How can I figure out why the file was not uploaded even though it returned a status code of 200?

Why am I getting a different response from server with URL passed from list to requests.get() vs. hard-coded string?

I'm trying to scrape pre-rendered JSON from multiple URLS from a particular server.
When I use requests.get() with a hardcoded URL, or a string-type variable containing a hard-coded URL, I get the JSON I want from server.
requests.get("https://example.url/example.cgi/example")
the .headers property on the response object returns:
{'Server': 'nginx', 'Date': 'Wed, 28 Oct 2020 00:49:22 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'X-Confex-Backend': 'es-director', 'Content-Encoding': 'gzip'}
But, when I pass the exact same url from a list to requests.get():
url_list = ['https://example.url/example.cgi/example']
for url in url_list:
requests.get(url)
I do not get the JSON response from the server. I get HTML instead with none of the JSON I want, shown here by the header to the response object (can't post the contents or the server URL here):
{'Server': 'nginx', 'Date': 'Wed, 28 Oct 2020 00:49:22 GMT', 'Content-Type': 'text/html; charset=UTF-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Pragma': 'no-cache', 'Cache-Control': 'no-cache,no-store', 'Expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'X-Confex-Backend': 'weba13', 'Content-Encoding': 'gzip'}
I'm stumped. I've tried converting the list item variable to string, re-encoding it, etc... I've even tried switching up the order of the get requests in my testing and I get the same results. What is going on with requests.get() that a URL passed as a list item to the method gets a very different response from the server than the same URL when it's hard-coded into the method, or into a variable passed to the method? What am I missing? Suffice to say, for obvious reasons it would be great to iterate requests.get() through a list of URLs for this particular purpose...
I figured it out. -Facepalm-
So, in this particular case I had compiled the URLs in the list objects from scraping HTML on saidsame site. It turns out that those URLs are one character off from the hardcoded URLs I was seeing XHR GET requests use to get the JSON. app.cgi vs api.cgi in the middle of the URL. Now I know to run checks on that sort of thing in the future.

Missing headers on POST request with python requests

I'm currently using the python requests module to perform automated HTTP tasks on a website.
The problem is that I don't get the same results on my console as on my browser.
This is what I get when making a POST request on my browser:
This is what I get when making the POST request through the python requests module and running the .headers method on the request:
{
'Date': 'Fri, 14 Jul 2017 15:19:22 GMT',
'Content-Type': 'text/html; charset=utf-8',
'Transfer-Encoding': 'chunked',
'Connection': 'keep-alive',
'Cache-Control': 'private',
'Location': '/cart/view',
'Set-Cookie': 'png.notice=9Hz8GWQ38JQZqTrqcsnn1J5nfgIZt71orHtf71mI+rwqFpQg4RnV7BqZni/GgIS/SmUnC4jgnhjQuDhZNW2adxeLctG+bToT0wTTbgxe40t5RmbVv1viuH2gkL1eH2xN3IavOUBhVXm+JlQrmVnHLocqjgvWi8wAClLYmrShY1U2ege9; expires=Fri, 14-Jul-2017 15:34:03 GMT; path=/; HttpOnly',
'X-Powered-By': 'ASP.NET',
'X-UA-Compatible': 'IE=Edge,chrome=1',
'Server': 'cloudflare-nginx',
'CF-RAY': '37e575befbf43c35-CDG'
}
Notice how the two results are completely different.
I'm trying to get the "Location" header inside the Response headers (the one beginning with "https://live.adyen.com/hpp...".
What am I doing wrong here?
EDIT: This is my source code:
request = session.post('https://www.nakedcph.com/cart/process', data=user_info)
request.url
# outputs 'https://www.nakedcph.com/cart/view' (probably the issue)
request.headers
# outputs the headers (but not all of them?)
PS: After making the POST request, the website redirects to the URL inside the "Location" header from the Response headers.
I figured it out. Missed some parameters in the post request. My bad.

Failed to upload file to server using Python Requests

I use Requests and python2.7 in order to fill some values in a form and upload (submit) an image to the server.
I actually execute the script from the server pointing to the file I need to upload. The file is located in the /home directory and I have made sure it has full permissions.
Although I get a 200 Response, nothing is uploaded. This is part of my code:
import requests
try:
headers = {
"Referer": 'url_for_upload_form',
"sessionid": sessionid # retrieved earlier
}
files = {'file': ('doc_file', open('/home/test.png', 'rb'))}
payload = {
'title': 'test',
'source': 'source',
'date': '2016-10-26 02:13',
'csrfmiddlewaretoken': csrftoken # retrieved earlier -
}
r = c.post(upload_url, files=files, data=payload, headers=headers)
print r.headers
print r.status_code
except:
print "Error uploading file"
As I said I get a 200 Response and the headers returned are:
{'Content-Language': 'en', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'csrftoken=fNfJU8vrvOLAnJ5h7QriPIQ7RkI755VQ; expires=Tue, 17-Oct-2017 08:04:58 GMT; Max-Age=31449600; Path=/', 'Vary': 'Accept-Language, Cookie', 'Server': 'nginx/1.6.0', 'Connection': 'keep-alive', 'Date': 'Tue, 18 Oct 2016 08:04:58 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Type': 'text/html; charset=utf-8'}
Does anyone have any idea what I am doing wrong? Am I missing something basic here?

Python code to do GET request from pipedrive API

I am using python-pipedrive to wrap Pipedrive's API though it doesn't quite work out of the box on python3 (which I'm using) so I modified it. I'm having trouble with just the Http requests portion.
This is what taught me how to use Httplib2: https://github.com/jcgregorio/httplib2/wiki/Examples-Python3
Basically, I just want to send a GET request to this:
https://api.pipedrive.com/v1/persons/123?api_token=1234abcd1234abcd
This works:
from httplib2 import Http
from urllib.parse import urlencode
PIPEDRIVE_API_URL = "https://api.pipedrive.com/v1/persons/123?api_token=1234abcd1234abcd"
response, data = http.request(PIPEDRIVE_API_URL, method='GET',
headers={'Content-Type': 'application/x-www-form-urlencoded'})
However, Pipedrive returns an error 401 with 'You need to be authorized to make this request.' if I do this:
PIPEDRIVE_API_URL = "https://api.pipedrive.com/v1/"
parameters = 'persons/123'
api_token = '1234abcd1234abcd'
response, data = http.request(PIPEDRIVE_API_URL + parameters,
method='GET', body=urlencode(api_token),
headers={'Content-Type': 'application/x-www-form-urlencoded'})
The actual response is:
response =
{'server': 'nginx',
'status': '401',
'connection': 'keep-alive',
'set-cookie': 'pipe-session=7b6ddadbc67abdadb6a67dbadcb; path=/; domain=.pipedrive.com; secure; httponly',
'date': 'Sat, 11 Jun 2016 06:50:13 GMT',
'transfer-encoding': 'chunked',
'x-frame-options': 'SAMEORIGIN',
'content-type': 'application/json, charset=UTF-8',
'x-xss-protection': '1; mode=block'}
data =
{'success': False,
'error': 'You need to be authorized to make this request.'}
How do I properly provide the api_token as a parameter (body) to the GET request? Anyone know what I'm doing wrong?
You need to provide the api_token as a query parameter. Concatenate the stings like this
PIPEDRIVE_API_URL = "https://api.pipedrive.com/v1/"
route = 'persons/123'
api_token = '1234abcd1234abcd'
response, data = http.request(PIPEDRIVE_API_URL + route + '?api_token=' + api_token,
method='GET',
headers={'Content-Type': 'application/x-www-form-urlencoded'})

Categories

Resources