Curl vs Python Requests (GET) - python

I'm currently modernizing a request from a bash curl to a python request. Here is before:
curl -X GET --header 'Accept: text/csv' 'https://URL?afterDate=1650914642000&beforeDate=2000000000000'
Here is the new process:
import requests
import pandas as pd
import io
args = {
'params': {'afterDate': 1650914642000, 'beforeDate': 2000000000000},
'data': {},
'headers': {'Accept': 'text/csv'},
'timeout': 3600
}
req = 'URL'
r = requests.request("GET", req, **args)
str_buffer = io.StringIO()
str_buffer.write(r.text)
str_buffer.seek(0)
df = pd.read_csv(
str_buffer,
escapechar="\\",
sep=",",
quotechar='"',
low_memory=False
)
Surprisingly, this is not working. The curl works and the requests version does not. Talking to the engineers on the other end, they are seeing all green lights and a bit of latency, but nothing very abnormal. As far as I can tell, the request just sits and does not receive the data the endpoint returns.
Additional Context:
This process is for catching a daily delta load, so the 2000000000000 param is just an arbitrarily high value and even for a single minute of time I cannot return data. The endpoint is a java based api hitting a mongo source, and there is a fair amount of latency, even on the curl request.
I'm at a loss of how to dig deeper than "It's not working on my machine", if anyone knows of anything that could help, help a guy out!
EDIT:
After 30minutes of running (it should take about 10-15 min), when I keyboardInterrupt I get this trace (this is the end):
File "/home/rriley2/anaconda3/envs/env_etl/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1839, in recv_into
result = _lib.SSL_read(self._ssl, buf, nbytes)
KeyboardInterrupt

Related

Trying to send Python HTTPConnection content after accepting 100-continue header

I've been trying to debug a Python script I've inherited. It's trying to POST a CSV to a website via HTTPLib. The problem, as far as I can tell, is that HTTPLib doesn't handle receiving a 100-continue response, as per python http client stuck on 100 continue. Similarly to that post, this "Just Works" via Curl, but for various reasons we need this to run from a Python script.
I've tried to employ the work-around as detailed in an answer on that post, but I can't find a way to use that to submit the CSV after accepting the 100-continue response.
The general flow needs to be like this:
-> establish connection
-> send data including "expect: 100-continue" header, but not including the JSON body yet
<- receive "100-continue"
-> using the same connection, send the JSON body of the request
<- receive the 200 OK message, in a JSON response with other information
Here's the code in its current state, with my 10+ other commented remnants of other attempted workarounds removed:
#!/usr/bin/env python
import os
import ssl
import http.client
import binascii
import logging
import json
#classes taken from https://stackoverflow.com/questions/38084993/python-http-client-stuck-on-100-continue
class ContinueHTTPResponse(http.client.HTTPResponse):
def _read_status(self, *args, **kwargs):
version, status, reason = super()._read_status(*args, **kwargs)
if status == 100:
status = 199
return version, status, reason
def begin(self, *args, **kwargs):
super().begin(*args, **kwargs)
if self.status == 199:
self.status = 100
def _check_close(self, *args, **kwargs):
return super()._check_close(*args, **kwargs) and self.status != 100
class ContinueHTTPSConnection(http.client.HTTPSConnection):
response_class = ContinueHTTPResponse
def getresponse(self, *args, **kwargs):
logging.debug('running getresponse')
response = super().getresponse(*args, **kwargs)
if response.status == 100:
setattr(self, '_HTTPConnection__state', http.client._CS_REQ_SENT)
setattr(self, '_HTTPConnection__response', None)
return response
def uploadTradeIngest(ingestFile, certFile, certPass, host, port, url):
boundary = binascii.hexlify(os.urandom(16)).decode("ascii")
headers = {
"accept": "application/json",
"Content-Type": "multipart/form-data; boundary=%s" % boundary,
"Expect": "100-continue",
}
context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
context.load_cert_chain(certfile=certFile, password=certPass)
connection = ContinueHTTPSConnection(
host, port=port, context=context)
with open(ingestFile, "r") as fh:
ingest = fh.read()
## Create form-data boundary
ingest = "--%s\r\nContent-Disposition: form-data; " % boundary + \
"name=\"file\"; filename=\"%s\"" % os.path.basename(ingestFile) + \
"\r\n\r\n%s\r\n--%s--\r\n" % (ingest, boundary)
print("pre-request")
connection.request(
method="POST", url=url, headers=headers)
print("post-request")
#resp = connection.getresponse()
resp = connection.getresponse()
if resp.status == http.client.CONTINUE:
resp.read()
print("pre-send ingest")
ingest = json.dumps(ingest)
ingest = ingest.encode()
print(ingest)
connection.send(ingest)
print("post-send ingest")
resp = connection.getresponse()
print("response1")
print(resp)
print("response2")
print(resp.read())
print("response3")
return resp.read()
But this simply returns a 400 "Bad Request" response. The problem (I think) lies with the formatting and type of the "ingest" variable. If I don't run it through json.dumps() and encode() then the HTTPConnection.send() method rejects it:
ERROR: Got error: memoryview: a bytes-like object is required, not 'str'
I had a look at using the Requests library instead, but I couldn't get it to use my local certificate bundle to accept the site's certificate. I have a full chain with an encrypted key, which I did decrypt, but still ran into constant SSL_VERIFY errors from Requests. If you have a suggestion to solve my current problem with Requests, I'm happy to go down that path too.
How can I use HTTPLib or Requests (or any other libraries) to achieve what I need to achieve?
In case anyone comes across this problem in future, I ended up working around it with a bit of a kludge. I tried HTTPLib, Requests, and URLLib3 are all known to not handle the 100-continue header, so... I just wrote a Python wrapper around Curl via the subprocess.run() function, like this:
def sendReq(upFile):
sendFile=f"file=#{upFile}"
completed = subprocess.run([
curlPath,
'--cert',
args.cert,
'--key',
args.key,
targetHost,
'-H',
'accept: application/json',
'-H',
'Content-Type: multipart/form-data',
'-H',
'Expect: 100-continue',
'-F',
sendFile,
'-s'
], stdout=subprocess.PIPE, universal_newlines=True)
return completed.stdout
The only issue I had with this was that it fails if Curl was built against the NSS libraries, which I resolved by including a statically-built Curl binary with the package, the path to which is contained in the curlPath variable in the code. I obtained this binary from this Github repo.

Fileupload not working with QNetworkAccessManager

I have a swagger generated (python-flask) web-server, which supports a variety of requests, which are fed by requests sent from an instance of QNetworkAccessManager (GET, PUT, POST).
They actually work like a charm, so I assume, that I got the main usage right, more or less.
What doesn't work though, is file upload using POST request. I actually tried several variants:
Providing the file content with QNetworkAccessManager::send,
Providing the file-pointer with QNetworkAccessManager::send (yes, I made sure, that the file is open and valid during the complete operation and cleaned up afterwards),
Using QMultiPart (see example below),
Playing with the request parameters didn't help either
But whatever I try, the request is recorded in the web-server, but connexion.request.files.getlist('file') is empty.
On the other hand, using curl -X POST --header 'Content-Type: multipart/form-data' --header 'Accept: application/json' -F "file=#/path/to/file/image.jpg" {"type":"formData"} '$myURL' makes it [<FileStorage: 'image.jpg' ('image/jpeg')>].
void uploadImage(QUrl const& url, std::filesystem::path const& filepath)
{
QNetworkRequest request {url};
request.setRawHeader("Content-Type", "multipart/form-data");
QHttpMultiPart *multiPart = new QHttpMultiPart(QHttpMultiPart::FormDataType);
QHttpPart imagePart;
imagePart.setHeader(QNetworkRequest::ContentDispositionHeader, QVariant("form-data; name=\"" + QString::fromStdString(filepath.filename().native()) + "\"; filename=\"" + QString::fromStdString(filepath.native()) + "\""));
imagePart.setHeader(QNetworkRequest::ContentTypeHeader, QVariant("image/jpeg"));
QFile *file = new QFile(QString::fromStdString(filepath.native()));
if (!file->open(QIODevice::ReadOnly)) {
std::cout << "could not open file" << std::endl;
}
QByteArray fileContent(file->readAll());
imagePart.setBody(fileContent);
multiPart->append(imagePart);
request.setHeader(QNetworkRequest::ContentLengthHeader, file->size());
auto reply = _manager->post(request, multiPart);
file->setParent(reply);
connect(reply, &QNetworkReply::finished, reply, &QNetworkReply::deleteLater);
}
QProcess using the curl command wasn't that happy and me neither, because, even though I'm running the code currently on a Linux machine, I don't want to restrict to Linux only.
Are there any suggestions on how to use Qt-Network in this context?

Robot Framework: send binary data in POST request body with

I have a problem with getting my test running using Robot Framework and robotframework-requests. I need to send a POST request and a binary data in the body. I looked at this question already, but it's not really answered. Here's how my test case looks like:
Upload ${filename} file
Create Session mysession http://${ADDRESS}
${data} = Get Binary File ${filename}
&{headers} = Create Dictionary Content-Type=application/octet-stream Accept=application/octet-stream
${resp} = Post Request mysession ${CGIPath} data=${data} headers=&{headers}
[Return] ${resp.status_code} ${resp.text}
The problem is that my binary data is about 250MB. When the data is read with Get Binary File I see that memory consumption goes up to 2.x GB. A few seconds later when the Post Request is triggered my test is killed by OOM. I already looked at files parameter, but it seems it uses multipart encoding upload, which is not what I need.
My other thought was about passing open file handler directly to underlying requests library, but I guess that would require robotframework-request modification. Another idea is to fall back to curl for this test only.
Am I missing something in my test? What is the better way to address this?
I proceeded with the idea of robotframework-request modification and added this method
def post_request_binary(
self,
alias,
uri,
path=None,
params=None,
headers=None,
allow_redirects=None,
timeout=None):
session = self._cache.switch(alias)
redir = True if allow_redirects is None else allow_redirects
self._capture_output()
method_name = "post"
method = getattr(session, method_name)
with open(path, 'rb') as f:
resp = method(self._get_url(session, uri),
data=f,
params=self._utf8_urlencode(params),
headers=headers,
allow_redirects=allow_redirects,
timeout=self._get_timeout(timeout),
cookies=self.cookies,
verify=self.verify)
self._print_debug()
# Store the last session object
session.last_resp = resp
self.builtin.log(method_name + ' response: ' + resp.text, 'DEBUG')
return resp
I guess I can improve it a bit and create a pull request.

Launch Jenkins parametrized build from script

I am trying to launch a Jenkins parametrized job from a python script. Due to environment requirements, I can't install python-jenkins. I am using raw requests module.
This job I am trying to launch has three parameters:
string (let's call it payload)
string (let's call it target)
file (a file, optional)
I've searched and search, without any success.
I managed to launch the job with two string parameters by launching:
import requests
url = "http://myjenkins/job/MyJobName/buildWithParameters"
target = "http://10.44.542.62:20000"
payload = "{payload: content}"
headers = {"Content-Type": "application/x-www-form-urlencoded"}
msg = {
'token': 'token',
'payload': [ payload ],
'target': [ target ],
}
r = requests.post(url, headers=headers, data=msg)
However I am unable to send a file and those arguments in single request.
I've tried requests.post file argument and failed.
It turns out it is impossible to send both data and file in a single request via HTTP.
import jenkinsapi
from jenkinsHandler import JenkinsHandler
in your python script
Pass parameters to buildJob() , (like < your JenkinsHandler object name>.buildJob())
JenkinsHandler module has functions like init() , buildJob(), isRunning() which helps in triggering the build
Here is an example:
curl -vvv -X POST http://127.0.0.1:8080/jenkins/job/jobname/build
--form file0='#/tmp/yourfile'
--form json='{"parameter": [{"name":"file", "file":"file0"}]}'
--form json='{"parameter": [{"name":"payload", "value":"123"}]
--form json='{"parameter": [{"name":"target", "value":"456"}]}'

Unable to GET entire page with Python request

I'm trying to get a long JSON response (~75 Mbytes) from a webpage, However I can only receive the first 25 Mbytes or so.
I've used urllib2 and python-requests but neither work. I've tried reading parts in separately and streaming the data, but this doesn't work either.
An example of the data can be found here:
http://waterservices.usgs.gov/nwis/iv/?site=14377100&format=json&parameterCd=00060&period=P260W
My code is as follows:
r = requests.get("http://waterservices.usgs.gov/nwis/iv/?site=14377100&format=json&parameterCd=00060&period=P260W")
usgs_data = r.json() # script breaks here
# Save Longitude and Latitude of river
latitude = usgs_data["value"]["timeSeries"][0]["sourceInfo"]["geoLocation"]["geogLocation"]["latitude"]
longitude = usgs_data["value"]["timeSeries"][0]["sourceInfo"]["geoLocation"]["geogLocation"]["longitude"]
# dictionary of all past river flows in cubic feet per second
river_history = usgs_data['value']['timeSeries'][0]['values'][0]['value']
It breaks with:
ValueError: Expecting object: line 1 column 13466329 (char 13466328)
When the script tries to decode the JSON (i.e. usgs_data = r.json()).
This is because the full data hasn't been received and is therefore not a valid JSON object.
The problem seems to be that the server won't serve more than 13MB of data at a time.
I have tried that URL using a number of HTTP clients including curl and wget, and all of them bomb out at about 13MB. I have also tried enabling gzip compression (as should you), but the results were still truncated at 13MB after decompression.
You are requesting too much data because the period=P260W specifies 260 weeks. If you try setting period=P52W you should find that you are able to retrieve a valid JSON response.
To reduce the amount of data transferred, set the Accept-Encoding header like this:
url = 'http://waterservices.usgs.gov/nwis/iv/'
params = {'site': 11527000, 'format': 'json', 'parameterCd': '00060', 'period': 'P52W'}
r = requests.get(url, headers={'Accept-Encoding': 'gzip,deflate'})

Categories

Resources