Totally newbie on python, I had to try to upload a gzip file using requests module and python3.5 launched from a freeBSD bash.
Here is my upload.py after gathering the Internet information I managed to get:
#!/usr/bin/env python3.5
import requests, os.path, sys, glob, time, re, datetime
urlPydio ='https://remote.pydio.server.fr'
certPydio = 'remote.pydio.server.ssl.crt'
depotDistant = 'ws'
urlComplet = urlPydio + "/api/" + depotDistant + "/"
loginPydio = 'user.name'`
passwordPydio = 'PASSWORD'
repLocal = os.path.abspath('/var/squid/logs/access_log_gz/')
yesterday = datetime.date.today() - datetime.timedelta(days=1)
listFichiers = os.listdir(repLocal)
for fichier in listFichiers:`
if re.search ('^\w{5}_\w{4}_\w+_'+ str(yesterday) + '.gz$', fichier):
nomFichierComplet = repLocal +'/'+ fichier
headers = {'x-File-Name': fichier}
urlAction = urlComplet + "upload/input_stream" + "/remote_folder/remote_folder/"
print(nomFichierComplet, urlAction)
files = {fichier: open(nomFichierComplet, 'rb')}
try:
rest = requests.put(urlAction, files=files, auth=(loginPydio,passwordPydio), headers=headers, verify=False)
codeRetour = rest.status_code
if codeRetour == 200:
print('file sent succcessfully', codeRetour, rest.headers)
else:
print('error sending file ', codeRetour, rest.text)
except requests.exceptions.RequestException as e:
print(e)
As a result, I managed to get the log.gz file on remote pydio server but there is some headers stuff added in the file, making it impossible to unzip. Open with a notepad, the 3 lines to remove are :
--223ef42df08f4792ba6ef6e71cdf749c
Content-Disposition: form-data; name="log.gz"; filename="log.gz"
I tried some request.post stuff, unsuccessful, I tried to change headers values to None, no way to send the file.
I also tried to rest.prepare() to del headers['Content-Disposition'] unsuccessfully too
rest.headers returns :
{'Content-Type': 'text/html; charset=UTF-8', 'Date': 'Thu, 05 Jan 2017 10:10:16 GMT', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'AjaXplorer=i6m1ud1cgocokaehc58gelc8m2; path=/; HttpOnly', 'Vary': 'Accept-Encoding', 'Connection': 'keep-alive', 'Server': 'nginx', 'Cache-Control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Content-Encoding': 'gzip', 'Expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'Pragma': 'no-cache'}
Now, I'm here, asking you some help or some papers/posts to read.
I just want to upload a log.gz file without any change to the file itself so it can be downloaded and unzipped.
Related
Note: I previously posted this question, but it was associated with another question where my question is not actually answered.
I'm trying to learn Python by converting some bash scripts into Python. I'm stuck on uploading a file to my web host. Here's what I'm trying along with the result:
import requests
user = '....'
password = '....'
myfile={'file': open('/Users/mnewman/Desktop/myports.txt' ,'rb')}
myurl = 'https://www.example.com/'
headers={"user-agent": "Mozilla/5.0 (Macintosh;\ Intel Mac OS X 10_15_7) \
AppleWebKit/605.1.15(KHTML, like Gecko) Version/14.0.1 Safari/605.1.15"}
r = requests.post(url=myurl, data={}, files=myfile, \
auth=(user, password), headers=headers)
print(r.status_code)
print(r.headers)
200
{'Date': 'Sat, 12 Dec 2020 22:51:04 GMT', 'Server': 'Apache', 'Upgrade': 'h2,h2c',
'Connection': 'Upgrade, Keep-Alive', 'Last-Modified': 'Sun, 30 Aug 2020 23:31:39 GMT',
'Accept-Ranges': 'none', 'Vary': 'Accept-Encoding', 'Content-Encoding': 'gzip',
'Content-Length': '1227', 'Keep-Alive': 'timeout=5, max=75', 'Content-Type':
'text/html'}
Even thought the response code is 200, the file was not actually uploaded. How can I figure out why the file was not uploaded even though it returned a status code of 200?
I tried requesting an RSS feed on Treasury Direct using Python. In the past I've used urllib, or requests libraries to serve this purpose and it's worked fine. This time however, I continue to get the 406 status error, which I understand is the page's way of telling me it doesn't accept my header details from the request. I've tried altering it however to no avail.
This is how I've tried
import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
user_agent = {'User-agent': 'Mozilla/5.0'}
response = requests.get(url, headers = user_agent)
print response.text
Environments: Python 2.7 and 3.4.
I also tried accessing via curl with the same exact error.
I believe this to be page specific, but can't figure out how to appropriately frame a request to read this page.
I found an API on the page which I can read the same data in json so this issue is now more of a curiosity to me than a true problem.
Any answers would be greatly appreciated!
Header Details
{'surrogate-control': 'content="ESI/1.0",no-store', 'content-language': 'en-US', 'x-content-type-options': 'nosniff', 'x-powered-by': 'Servlet/3.0', 'transfer-encoding': 'chunked', 'set-cookie': 'BIGipServerpl_www.treasurydirect.gov_443=3221581322.47873.0000; path=/; Httponly; Secure, TS01598982=016b0e6f4634928e3e7e689fa438848df043a46cb4aa96f235b0190439b1d07550484963354d8ef442c9a3eb647175602535b52f3823e209341b1cba0236e4845955f0cdcf; Path=/', 'strict-transport-security': 'max-age=31536000; includeSubDomains', 'keep-alive': 'timeout=10, max=100', 'connection': 'Keep-Alive', 'cache-control': 'no-store', 'date': 'Sun, 23 Apr 2017 04:13:00 GMT', 'x-frame-options': 'SAMEORIGIN', '$wsep': '', 'content-type': 'text/html;charset=ISO-8859-1'}
You need to add accept to headers request:
import requests
url = 'https://www.treasurydirect.gov/TA_WS/securities/announced/rss'
headers = {'accept': 'application/xml;q=0.9, */*;q=0.8'}
response = requests.get(url, headers=headers)
print response.text
I use Requests and python2.7 in order to fill some values in a form and upload (submit) an image to the server.
I actually execute the script from the server pointing to the file I need to upload. The file is located in the /home directory and I have made sure it has full permissions.
Although I get a 200 Response, nothing is uploaded. This is part of my code:
import requests
try:
headers = {
"Referer": 'url_for_upload_form',
"sessionid": sessionid # retrieved earlier
}
files = {'file': ('doc_file', open('/home/test.png', 'rb'))}
payload = {
'title': 'test',
'source': 'source',
'date': '2016-10-26 02:13',
'csrfmiddlewaretoken': csrftoken # retrieved earlier -
}
r = c.post(upload_url, files=files, data=payload, headers=headers)
print r.headers
print r.status_code
except:
print "Error uploading file"
As I said I get a 200 Response and the headers returned are:
{'Content-Language': 'en', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'csrftoken=fNfJU8vrvOLAnJ5h7QriPIQ7RkI755VQ; expires=Tue, 17-Oct-2017 08:04:58 GMT; Max-Age=31449600; Path=/', 'Vary': 'Accept-Language, Cookie', 'Server': 'nginx/1.6.0', 'Connection': 'keep-alive', 'Date': 'Tue, 18 Oct 2016 08:04:58 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'Content-Type': 'text/html; charset=utf-8'}
Does anyone have any idea what I am doing wrong? Am I missing something basic here?
I am using python to download files from webpage, and after I submit the request, I get the headers like below:
r.headers
{'Content-Disposition': 'attachment; filename=SABR_Download.xls',
'Content-Encoding': 'UTF-8', 'Transfer-Encoding': 'chunked', 'Expires': '0', 'Keep-Alive': 'timeout=5, max=100',
'Server': 'Apache',
'Connection': 'Keep-Alive',
'Pragma': 'no-cache',
'Cache-Control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'Date': 'Wed, 22 Jun 2016 04:21:54 GMT',
'Content-Type': 'application/excel; charset=UTF-8'}
and here is the response header from Chrome developer tool:
Response Headers
view source
Cache-Control:no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Connection:Keep-Alive
Content-Disposition:attachment; filename=SABR_Download.xls
Content-Encoding:UTF-8
Content-Type:application/excel; charset=UTF-8
Date:Wed, 22 Jun 2016 04:19:53 GMT
Expires:0
Keep-Alive:timeout=5, max=100
Pragma:no-cache
Server:Apache
Transfer-Encoding:chunked
The default format of the download file is in Excel. My question is how could I download the file in csv format, instead of excel format. thank you.
Unless the site you're downloading from provides a specific URL for downloading in a CSV format, you will have to download the Excel spreadsheet and parse it to get what you think is a reasonable translation of the data.
There is a Python module called xlrd that can help you with this: https://github.com/python-excel/xlrd
I am trying to authenticate users of my Django application into Facebook via the oauth2 Python package.
def myView(request):
consumer = oauth2.Consumer(
key = settings.FACEBOOK_APP_ID,
secret = settings.FACEBOOK_APP_SECRET)
# Request token URL for Facebook.
request_token_url = "https://www.facebook.com/dialog/oauth/"
# Create client.
client = oauth2.Client(consumer)
# The OAuth Client request works just like httplib2 for the most part.
resp, content = client.request(request_token_url, "GET")
# Return a response that prints out the Facebook response and content.
return HttpResponse(str(resp) + '\n\n ------ \n\n' + content)
However, I am directed to a page that contains an error when I go to this view. The error has this response from Facebook.
{'status': '200', 'content-length': '16418', 'x-xss-protection': '0',
'content-location': u'https://www.facebook.com/dialog/oauth/?oauth_body_hash=2jmj7l5rSw0yVb%2FvlWAYkK%2FYBwk%3D&oauth_nonce=53865791&oauth_timestamp=1342666292&oauth_consumer_key=117889941688718&oauth_signature_method=HMAC-SHA1&oauth_version=1.0&oauth_signature=XD%2BZKqhJzbOD8YBJoU1WgQ4iqtU%3D',
'x-content-type-options': 'nosniff',
'transfer-encoding': 'chunked',
'expires': 'Sat, 01 Jan 2000 00:00:00 GMT',
'connection': 'keep-alive',
'-content-encoding': 'gzip',
'pragma': 'no-cache',
'cache-control': 'private, no-cache, no-store, must-revalidate',
'date': 'Thu, 19 Jul 2012 02:51:33 GMT',
'x-frame-options': 'DENY',
'content-type': 'text/html; charset=utf-8',
'x-fb-debug': 'yn3XYqMylh3KFcxU9+FA6cQx8+rFtP/9sJICRgj3GOQ='}
Does anyone see anything awry in my code? I have tried concatenating arguments as strings to request_token_url to no avail. I am sure that my Facebook app ID and secret string are correct.