GET request with files argument in requests library Python - python

I find this code and I really don't understand it, how is it possible to send data (not query) with GET request
response = requests.get(
check_all_info_url_2, files=multipart_form_data, timeout=30)
and what is files= argument in the get request.

Since requests.get is just a wrapper function this will just call requests.request. Unless requests.session implementes any checking, it will happily send off a GET request with multipart data in it.
Is this valid? Not to my knowledge, although I'm willing to be proven wrong. No api I have ever written would accept file upload on a GET request. But not every server will even check the method, so perhaps this code is interacting with a badly written server which doesn't reject for wrong method, or perhap's it's even interacting with a worse server which expects file upload with GET. There are lots of broken servers out there ;)
In any case, the reason this works with requests is that it just passes keyword arguments through to the underlying session without performing any kind of validation.

Related

'CORS request did not succeed' when uploading an image and Flask raises error (Firefox only)

I am trying to debug a CORS issue with my app. Specifically, it fails only in Firefox and, it seems, only with somewhat bigger files.
I am using flask on the backend and I am trying to upload a "faulty" image to my service. When I say faulty, I mean that the backend should reject the image with a 400 (only accept PNG, not JPG). Uploading a PNG of any size works ok. However, when I reject the JPG file, the browser request fails with Network error and I cannot capture the 400-error to display a user-friendly message. From the backend's side, everything is the same, always same headers returned, be it accepted or rejected request, POST or OPTIONS.
However, I have noticed that it only fails with somewhat bigger files. If I send a JPG of a few KBs, it works. If I send a JPG of a few MBs, it fails.
I have looked at everything
curl-ing the backend gives all the right headers
there are no OPTIONS requests logged by the browsers, but if there were, I've also checked those with curl for the right headers
I'm only using HTTP (not HTTPS), so no problems with certificates
I have disabled all extensions, so no possible blocking from the browser
maybe other things that I cannot remember
What can possibly be the cause? Note that everything works as expected
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://localhost:8083/api/image. (Reason: CORS request did not succeed).
Well, after a couple of hours of trials, it appears this has nothing to do with CORS. This is probably the most confusing error message. To cite from Firefox' documentation on this (emphasis mine):
The HTTP request which makes use of CORS failed because the HTTP connection failed at either the network or protocol level. The error is not directly related to CORS, but is a fundamental network error of some kind. In many cases, it is caused by a browser plugin (e.g. an ad blocker or privacy protector) blocking the request.
So, this should actually indicate that the problem is on the backend, although it is very subtle.
Since in my code I am rejecting the request based on the trasmitted filename, I never read the content of the request if the name ends with .jpg. Instead, I reject it immediately. This is a problem with Flask's development server, which does not empty the input stream in such cases (issue here).
So, if you want to deal with this while keeping the development server, you should consume the input. In my case, I added a custom error handler, like so:
class BadRequestError(ValueError):
"""Raised when a request does not conform to the protocol"""
pass
#app.errorhandler(BadRequestError)
def bad_request_handler(error):
# throw away the request data to avoid closing the connection before receiving all of it
# http://flask.pocoo.org/snippets/47/
_ = request.data
_ = request.form
response = jsonify(str(error))
response.status_code = 400
return response
and then, in the code, I always raise BadRequestError('...'), instead of just returning a 400-response.

Unable to get complete source code of web page using Python [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

How to send Http response to specific requestLine in python?

Can I send a http response to a specific requestLine outside the function I received the request in it.
As an example I receive the request then pass this request to some functions and I want to send a response to this request, if there is allowed?
If you're going to split the path your program is taking, so it's doing two different things at the same time, then you'll need to utilize control-flow constructs that do this like threads/processes/events/async.
You might want to look at Celery (http://www.celeryproject.org/) and Jobtastic (http://policystat.github.io/jobtastic/).

How to perform Tornado requests within a Tornado request

I am using the Tornado Web Server (version 4.1) with Python 2.7 to create a REST web application. One of my request handlers (web.RequestHandler) handles batch requests consisting of multiple HTTP requests combined into one HTTP request using the multipart/mixed content type. I currently have the batch request handler able to receive the POST request and parse the multipart/mixed content into individual requests that look like this:
GET /contacts/3 HTTP/1.1
Accept: application/json
My question is, what would be a good way of converting these inner batched requests into requests that Tornado can service from within my request handler? I would like to collect the responses within my batch request handler and, once these requests are all complete, return a single multipart/mixed response containing all the batched responses.
Using an HTTPClient to execute the batched requests feels like overkill. It seems like I should be able to build a request object and inject it into the web.Application for processing--I'm at a loss as to how to do this however. Thanks!
Tornado doesn't have any direct support for this. Going through an HTTP client is probably going to be the simplest solution. However, if you're really interested in avoiding that route, here's a sketch of a solution, which relies on the interfaces defined in the tornado.httputil module.
Define a class that implements the HTTPConnection interface by saving the arguments to write and write_headers into internal buffers.
The Application is an HTTPServerConnectionDelegate. Call its start_request method with an instance of your connection class as both arguments (the first argument doesn't really matter, but it should be unique and since we won't be reusing "connections" that object is fine).
start_request returns an HTTPMessageDelegate. Call its headers_received, data_received (for POST/PUT), and finish methods to make your request. Once you have called finish, the handler will run and make calls back into your connection object.

What is the difference between HTTP Post URL with /post and without using Python requests module?

I am using Python 2.7 with the requests module to send http post with parameters. I encountered a strange problem.
To do http post, it is just one line;
x = requests.post(URL, params)
I have no problem with the params. It is the URL that puzzled me.
Sometimes, this URL http://hostname/path/post works. Sometimes, I use http://hostname/path without the /post to get the HTTP post to work. I am puzzled why is this so. What is the difference between the two? Under what conditions do I use which one?
'http://hostname/path/post' is a path. You could in principle issue and HTTP GET request to that same path (although probably you wouldn't get anything meaningful back).
In general, you should look at the site's API documentation and post to the url that they say you should post to without adding anything extra to the url.
There are two different concepts, url and HTTP method. You are confused by trying to mix them.
url - an address you talk to
The url is addressing something on some server. If you get valid url, you can take it as a string, do not read in, and use it. Consider it to be a string.
If I would link it to a visiting your friend, url is address of a doors to come to.
HTTP method (POST, GET, DELETE...)
There are multiple HTTP methods which differ in the way, how you talk to given url.
Linking it to visiting a friend, it would be the way, you try to make the doors open (use the bell, knock or use a hammer)

Categories

Resources