python unparsed data in response headers - requests module - python

I have a problem with response from webapp.
smth = requests.get('httpxxx')
then I get:
requests.packages.urllib3.exceptions.HeaderParsingError:
[MissingHeaderBodySeparatorDefect()], unparsed data: 'Compression
Index: 1\r\nContent-Type: text/plain;charset=UTF-8\r\nContent-Length:
291\r\nDate: Wed, 01 Jun 2016 15:03:05 GMT\r\n\r\n'
but some headers are parsable:
smth.headers
{'Cache-Control': 'no-cache', 'Server': 'Apache-Coyote/1.1', 'Pragma': 'no-cache', 'Last-Modified': 'Wed, 01 Jun 2016 15:24:58 GMT'}
When contentlength is more than 1000 then response will be compressed, so destroy everything.

Related

Download large number of byte ranges from file using Python requests

I've got a script that identifies byte ranges in a very large file that I'd like to download. I'm using Python's requests library to download the content, specifying the byte ranges of interest in the range header. Here's a simplified version of the code (without the logic that constructs the byte range string):
import requests
URL = 'https://ncei.noaa.gov/data/rapid-refresh/access/historical/analysis/201602/20160217/rap_130_20160217_2200_000.grb2'
byte_range = '0-33510, 110484-147516, 219121-253904, 421175-454081, 685402-719065, 1039572-1076567, 1299982-1333158, 1398139-1429817, 1492109-1522167, \
1662765-1689414, 1870865-1896117, 2120537-2145725, 2301018-2335355, 2404445-2439381, 2511283-2547104, 2717931-2750956, 2971504-3001716, 3268591-3295610, \
3395201-3395200, 3395201-3461393, 3593639-3593638, 3593639-3659732, 3792859-3792858, 3792859-3859312, 4183232-4183231, 4183232-4245378, 4668359-4668358, \
4668359-4728450, 5283559-5283558, 5283559-5344745, 7251508-7317016, 7498496-7558460'
response = requests.get(URL, headers={"Range": "bytes={}".format(byte_range)})
print(response.headers)
As far as I can tell, this is a valid request and I don't get any errors. However, it downloads the entire file rather than the specified ranges. The output:
{'Date': 'Sat, 16 Oct 2021 15:26:17 GMT', 'Server': 'Apache', 'Strict-Transport-Security': 'max-age=31536000', 'Last-Modified': 'Thu, 18 Feb 2016 15:22:25 GMT', 'ETag': '"cfde25-52c0cef11064f"', 'Accept-Ranges': 'bytes', 'Content-Length': '13622821', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Headers': 'X-Requested-With, Content-Type', 'Connection': 'close'}
To debug, I tried shortening the number of byte ranges and that seemed to work. Seems like the max number of ranges for the request to return a subset of the file is in the low 20s.
# Shorter list of byte ranges
URL = 'https://ncei.noaa.gov/data/rapid-refresh/access/historical/analysis/201602/20160217/rap_130_20160217_2200_000.grb2'
byte_range = '0-33510, 110484-147516, 219121-253904, 421175-454081, 685402-719065, 1039572-1076567, 1299982-1333158, 1398139-1429817, 1492109-1522167, \
1662765-1689414, 1870865-1896117, 2120537-2145725, 2301018-2335355, 2404445-2439381, 2511283-2547104, 2717931-2750956, 2971504-3001716, 3268591-3295610'
response = requests.get(URL, headers={"Range": "bytes={}".format(byte_range)})
print(response.headers)
In this case, the content type is a multipart byte range, as expected.
{'Date': 'Sat, 16 Oct 2021 15:26:41 GMT', 'Server': 'Apache', 'Strict-Transport-Security': 'max-age=31536000', 'Last-Modified': 'Thu, 18 Feb 2016 15:22:25 GMT', 'ETag': '"cfde25-52c0cef11064f"', 'Accept-Ranges': 'bytes', 'Content-Length': '577544', 'Content-Type': 'multipart/byteranges; boundary=ccc35e3764d85dea', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Headers': 'X-Requested-With, Content-Type', 'Connection': 'close'}
My question now is where's the limitation - is this an issue with requests, the server, or just an issue with HTTP headers? I could break this up into multiple requests, but I need to try to avoid spamming the server with a lot of requests near the same time (this byte range list could get pretty long depending on what I want from the file). If I had to break up the request, what's the most efficient way to do so? I really don't want to download more data than I need as these files can be quite large.

What is the 'retry-after' timer on steam's marketplace?

I'm working on a python script that grabs the prices of items from the steam marketplace.
My problem is that if I let it run for too long, it gets an HTTP 429 error.
I want to avoid this, but the header retry-after is not found in server's response.
Here's a sample of the response headers
('Server', 'nginx')
('Content-Type', 'application/json; charset=utf-8')
('X-Frame-Options', 'DENY')
('Expires', 'Mon, 26 Jul 1997 05:00:00 GMT')
('Cache-Control', 'no-cache')
('Vary', 'Accept-Encoding')
('Date', 'Wed, 08 May 2019 03:58:30 GMT')
('Content-Length', '6428')
('Connection', 'close')
('Set-Cookie', 'sessionid=14360f3a5309bb1531932884; path=/; secure')
('Set-Cookie', 'steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure')
EDIT: heres the code and sample output.
note that nothing inside of the try statement will be run for this example
def getPrice(card, game):
url = 'https://steamcommunity.com/market/search/render/?query='
url = url+card+" "+game
url = url.replace(" ", "+")
print(url)
try:
data = urllib.request.urlopen(url)
h = data.getheaders()
for item in h:
print(item)
#print(data.getheaders())
#k = data.headers.keys()
json_data = json.loads(data.read())
pprint.pprint(json_data)
except Exception as e:
print(e.headers)
return 0
sample output on 3 different calls:
https://steamcommunity.com/market/search/render/?query=Glub+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:49 GMT
Connection: close
Set-Cookie: sessionid=5d1ea46f5095d9c28e141dd5; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
https://steamcommunity.com/market/search/render/?query=Qaahl+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:49 GMT
Connection: close
Set-Cookie: sessionid=64e7956224b18e6d89cc45c0; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
https://steamcommunity.com/market/search/render/?query=Odshan+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:50 GMT
Connection: close
Set-Cookie: sessionid=a7acd1023b4544809914dc6e; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
Try this.
def getPrice(card, game):
url = 'https://steamcommunity.com/market/search/render/?query='
url = url+card+" "+game
url = url.replace(" ", "+")
print(url)
while True:
try:
data = urllib.request.urlopen(url)
h = data.getheaders()
for item in h:
print(item)
json_data = json.loads(data.read())
pprint.pprint(json_data)
except Exception as e:
import time
milliseconds = 10000
time.sleep(milliseconds)
Use your milliseconds value.

How to inser varible in to mysql inside python

Commands as .format and %s I use in every possible combination without any progress.
It work right when I use it this way:
last_issue = jira.search_issues('assignee = "ahmet" order by created desc')[0]
But I need assignee to be a varible and if I use it this way or smthing like:
assignee = "ahmet"
last_issue = jira.search_issues('assignee =', assignee, 'order by created desc')[0]
It gives mistake like
response headers = {'Vary': 'User-Agent', 'X-AREQUESTID': '578x1623860x1', 'X-ASESSIONID': 'x0ubjs', 'X-ASEN': 'SEN-L0000000', 'Cache-Control': 'no-cache, no-store, no-transform', 'X-Content-Type-Options': 'nosniff', 'X-AUSERNAME': 'ekaterina', 'X-Seraph-LoginReason': 'OK', 'Content-Encoding': 'gzip', 'Transfer-Encoding': 'chunked', 'Date': 'Mon, 11 Sep 2017 09:38:10 GMT', 'Content-Type': 'text/html;charset=UTF-8', 'Server': 'nginx/1.13.0', 'Connection': 'keep-alive'}
response text =
How should I make a variable in appropriate way?
It works!
var = "assignee = '{}' order by created desc".format(assignee)
last_issue = jira.search_issues(var)[0]

How to download a part of a large file from Drive API using Python script with range header option

based on
https://developers.google.com/drive/web/manage-downloads#partial_download
I have created a function, but I cannot get it to work.
How should I pass a range option to the headers?
resp, content = service._http.request(download_url, headers={'Range': 'bytes=0-299'})
def download_file(service, file_id):
drive_file = service.files().get(fileId=file_id).execute()
download_url = drive_file.get('downloadUrl')
title = drive_file.get('title')
originalFilename = drive_file.get('originalFilename')
if download_url:
resp, content = service._http.request(download_url, headers={'Range': 'bytes=0-299'})
if resp.status == 200:
file = 'tmp.mp4'
with open(file, 'wb') as f:
while True:
tmp = content.read()
if not tmp:
break
f.write(tmp)
return title, file
else:
print 'An error occurred: %s' % resp
return None
else:
return None
I'm getting:
An error occurred: {'status': '206', 'alternate-protocol':
'443:quic,p=0.02', 'content-length': '300',
'access-control-allow-headers': 'Accept, Accept-Language,
Authorization, Cache-Control, Content-Disposition, Content-Encoding,
Content-Language, Content-Length, Content-MD5, Content-Range,
Content-Type, Date, GData-Version, Host, If-Match, If-Modified-Since,
If-None-Match, If-Unmodified-Since, Origin, OriginToken, Pragma,
Range, Slug, Transfer-Encoding, X-ClientDetails, X-GData-Client,
X-GData-Key, X-Goog-AuthUser, X-Goog-PageId,
X-Goog-Encode-Response-If-Executable, X-Goog-Correlation-Id,
X-Goog-Request-Info, X-Goog-Experiments, x-goog-iam-role,
x-goog-iam-authorization-token, X-Goog-Spatula, X-Goog-Upload-Command,
X-Goog-Upload-Content-Disposition, X-Goog-Upload-Content-Length,
X-Goog-Upload-Content-Type, X-Goog-Upload-File-Name,
X-Goog-Upload-Offset, X-Goog-Upload-Protocol, X-Goog-Visitor-Id,
X-HTTP-Method-Override, X-JavaScript-User-Agent, X-Pan-Versionid,
X-Origin, X-Referer, X-Upload-Content-Length, X-Upload-Content-Type,
X-Use-HTTP-Status-Code-Override, X-YouTube-VVT, X-YouTube-Page-CL,
X-YouTube-Page-Timestamp', 'content-disposition':
'attachment;filename="1981-0930 Public Program, Day 7, Part 1,
Vishuddhi Chakra,
NYC.mpg";filename*=UTF-8\'\'1981-0930%20Public%20Program%2C%20Day%207%2C%20Part%201%2C%20Vishuddhi%20Chakra%2C%20NYC.mpg',
'access-control-allow-credentials': 'false', 'expires': 'Sun, 28 Dec
2014 09:09:35 GMT', 'server': 'UploadServer ("Built on Dec 19 2014
10:24:45 (1419013485)")', 'content-range': 'bytes 0-299/1885163442',
'cache-control': 'private, max-age=0', 'date': 'Sun, 28 Dec 2014
09:09:35 GMT', 'access-control-allow-origin': '*',
'access-control-allow-methods': 'GET,OPTIONS', 'content-type':
'video/mpeg'}
Thank you
The code
resp, content = service._http.request(download_url, headers={'Range': 'bytes=0-299'})
is correct

Python Requests: response object does not contain 'status' header

This is Requests 1.1.0 and Python 2.6.4 (also same behavior on Python 2.7.2).
>>> import requests
>>> response = requests.get('http://www.google.com')
>>> response.status_code
200
>>> print response.headers.get('status')
None
According to the docs, there should be a headers['status'] entry with a string like "200 OK".
Here is the full contents of the headers dict:
>>> response.headers
{'x-xss-protection': '1; mode=block', 'transfer-encoding': 'chunked', 'set-cookie': 'PREF=ID=74b29ee465454efd:FF=0:TM=1362094463:LM=1362094463:S=Xa96iJQX_9BrC-Vm; expires=Sat, 28-Feb-2015 23:34:23 GMT; path=/; domain=.google.com, NID=67=IH21bLPTK2gLTHCyDCMEs3oN5g1uMV99U4Wsc2YA00AbFt4fQCoywQNEQU0pR6VuaNhhQGFCsqdr0FnWbPcym-pizo0xVuS6WBJ9EOTeSFARpzrsiHh6HNnaQeCnxCSH; expires=Fri, 30-Aug-2013 23:34:23 GMT; path=/; domain=.google.com; HttpOnly', 'expires': '-1', 'server': 'gws', 'cache-control': 'private, max-age=0', 'date': 'Thu, 28 Feb 2013 23:34:23 GMT', 'p3p': 'CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."', 'content-type': 'text/html; charset=ISO-8859-1', 'x-frame-options': 'SAMEORIGIN'}
Here is where I got the idea that this dict should contain a 'status' entry.
Am I doing something wrong?
You're looking for the "reason"
>>> x=requests.get("http://apple.adam.gs")
>>> x.reason
'OK'
>>>
custom.php contains:
header("HTTP/1.1 200 Testing")
Results in:
>>> x=requests.get("http://apple.adam.gs/custom.php")
>>> print x.reason
Testing
>>>

Categories

Resources