If, for example, i send a request to google, it looks like thios in fiddler:
GET http://www.google.com HTTP/1.1
Accept-Encoding: identity
Host: google.com
Connection: close
User-Agent: Python-urllib/2.7
and i want to make it look like this:
GET http://www.google.com HTTP/1.1
Accept-Encoding: identity
Host: google.com
Connection: close
User-Agent: Python-urllib/2.7
a=112&b=2335
How can i do it using urllib2/1?
Thanks!!
Related
On the instagram login page, if one inspects the element of the POST call for the url 'https://www.instagram.com/accounts/web_create_ajax/', it lists the following as headers:
Host: www.instagram.com
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://www.instagram.com/
X-CSRFToken: 7dmO9F3JuVGvSXumd79yByPxnHoWHz1A
X-Instagram-AJAX: c2d8f4380025
Content-Type: application/x-www-form-urlencoded
X-Requested-With: XMLHttpRequest
Content-Length: 102
Cookie: csrftoken=7dmO9F3JuVGvSXumd79yByPxnHoWHz1A; mid=W30zsQAEAAErXHJ3iUojfTceCd53; mcd=3; csrftoken=7dmO9F3JuVGvSXumd79yByPxnHoWHz1A; rur=FTW
Connection: keep-alive
I am wondering if anyone would have any idea what X-Instagram-AJAX is and how I can generate it each time. Is it connected as a pair with X-CSRFToken? Thanks.
Follow, like etc requests working without this header. I don't know what is it but i think instagram dedects suspicious requests with this and then log it. You can get this value on any page in instagram This is x-instagram-ajax value
You can parse it and use.
I'm trying to connect to website with python requests, but not with my real IP. So, I found some proxy on the internet and wrote this code:
import requests
proksi = {
'http': 'http://5.45.64.97:3128'
}
x = requests.get('http://www.whatsmybrowser.org/', proxies = proksi)
print(x.text)
When I get output, proxy simple don't work. Site returns my real IP Address. What I did wrong? Thanks.
The answer is quite simple. Although it is a proxy service, it doesn't guarantee 100% anonymity. When you send the HTTP GET request via the proxy server, the request sent by your program to the proxy server is:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Now, when the proxy server sends this request to the actual destination, it sends:
GET http://www.whatsmybrowser.org/ HTTP/1.1
Host: www.whatsmybrowser.org
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.10.0
Via: 1.1 naxserver (squid/3.1.8)
X-Forwarded-For: 122.126.64.43
Cache-Control: max-age=18000
Connection: keep-alive
As you can see, it throws your IP (in my case, 122.126.64.43) in the HTTP header: X-Forwarded-For and hence the website knows that the request was sent on behalf of 122.126.64.43
Read more about this header at: https://www.rfc-editor.org/rfc/rfc7239
If you want to host your own squid proxy server and want to disable setting X-Forwarded-For header, read: http://www.squid-cache.org/Doc/config/forwarded_for/
We have our own private cloud where I have installed OpenStack Swift. I have a working node (proxy and storage) that allows me to store and retrieve if I use the openstack and swift python cli to store and retrieve files. Additionally I am able to use the python API on a remote machine to store and retrieve files.
The root of my question is how to debug temp url and crossdomain filter issues. Is there a way to turn on detailed debug logging for these filters?
I have the default logging set to
log_name = swift
log_facility = LOG_LOCAL0
log_level = DEBUG
The situation I am trying to troubleshoot is as follows. When I try and use temp url and cross domain (for CORS), I get a 401. I debugged the code and it appears to be a invalid HMAC error. Based on research, this appears to be a date time issue where the client and the server have missed matched times. However both are running the ntpd service so the time should be in sync.
For CORS, it appears that preflight OPTIONS request is succeeding. The subsequent PUT is failing with a 401....
"No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://blahost' is therefore not allowed access. The response had HTTP status code 401."
The strange part is the OPTIONS request is returning "access-control-allow-origin" instead of 'Access-Control-Allow-Origin'... the case is off.
Preflight request:
OPTIONS /v1/AUTH_99cf99f26aaa4b2c923806231b03334c/436/88b6d895-6dbf-4f29-904d-96c9b7959016?temp_url_sig=4a953c34372e37b2a22bb31fb0581a7eb7f02cee&temp_url_expires=1441508891 HTTP/1.1
Host: 23.253.200.41:8080
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Access-Control-Request-Method: PUT
Origin: http://blahhost
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36
Access-Control-Request-Headers: accept, content-type
Accept: */*
Referer: http://blahost/binder/436/site/419/folder/17560/file
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Preflight Response:
HTTP/1.1 200 OK
access-control-allow-origin: http://blahost
access-control-allow-methods: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE
access-control-allow-headers: content-type, accept
Allow: HEAD, GET, PUT, POST, COPY, OPTIONS, DELETE
Content-Length: 0
X-Trans-Id: tx9e359777dfb94148858cd-0055eba012
Date: Sun, 06 Sep 2015 02:08:18 GMT
Connection: keep-alive
Subsequent PUT request(notice it is missing the Access-Control-Allow-Origin)
PUT /v1/AUTH_99cf99f26aaa4b2c923806231b03334c/436/88b6d895-6dbf-4f29-904d-96c9b7959016?temp_url_sig=4a953c34372e37b2a22bb31fb0581a7eb7f02cee&temp_url_expires=1441508891 HTTP/1.1
Host: 23.253.200.41:8080
Connection: keep-alive
Content-Length: 16231
Pragma: no-cache
Cache-Control: no-cache
Accept: */*
Origin: http://blahost
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36
Content-Type: application/pdf
Referer: http://blahost/binder/436/site/419/folder/17560/file
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
I would appreciate any advice on how to troubleshoot.
Thanks
Greg
There are a couple of default headers that HTTPie sets. I'm wondering if there is a way to remove some header, like Accept-Encoding?
The reason I like to unset Accept-Encoding is to check our server's behavior about HTTP Compression.
Per https://github.com/jakubroztocil/httpie#http-headers , you can override those headers. For example, set Accept-Encoding to empty to achieve the same effect as if you had removed it -- per the rules at http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3 .
Add the Headers follow by a colon.
No headers:
http -v https://jsonplaceholder.typicode.com/todos/1 \
Accept: \
Accept-Encoding: \
Connection: \
Host: \
User-Agent:
Request:
GET /todos/1 HTTP/1.1
Host: jsonplaceholder.typicode.com
Response:
HTTP/1.1 200 OK
...
Standard:
http -v https://jsonplaceholder.typicode.com/todos/1
Request:
GET /todos/1 HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: jsonplaceholder.typicode.com
User-Agent: HTTPie/0.9.8
Response:
HTTP/1.1 200 OK
...
The -v option display the request. Also, remember no spaces after \ in multiline bash commands.
I have a question about Python regex. I don't have much information about Python regex. I am working with HTTP request messages and parsing them with regex. As you know, the HTTP GET messages are in this format.
GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: 10.2.0.12
Connection: Keep-Alive
I want to parse the URI, method, user-agent, and the host areas of the message. My regex for this job is:
r'^({0})\s+(\S+)\s+[^\n]*$\n.*^User-Agent:\s*(\S+)[^\n]*$\n.*^Host:\s*(\S+)[^\n]*$\n'.format('|'.join(methods)), re.MULTILINE|re.DOTALL)
But, when the message comes up with like
GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive
I can not catch them because of the places of host or, user-agent changed. So I need a generic regex that will catch all of them, even if the places of host, method, uri are changed in the message.
Readability Counts (The Zen of Python)
Use findall() for each subexpression you want to find. This way your regex will be short, readable, and independent of the location of the subexpression.
Define a simple, readable regex:
>>> user=re.compile("User-Agent: (.*?)\n")
Test it with two different http headers:
>>> s1='''GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive'''
>>> s2='''GET / HTTP/1.0
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Host: 10.2.0.12
Connection: Keep-Alive'''
>>> user.findall(s1)
['Wget/1.12 (linux-gnu)']
>>> user.findall(s2)
['Wget/1.12 (linux-gnu)']
Parse the whole headers into a dictionary like so?
headers = """GET / HTTP/1.0
Host: 10.2.0.12
User-Agent: Wget/1.12 (linux-gnu)
Accept: */*
Connection: Keep-Alive"""
headers = headers.splitlines()
firstLine = headers.pop(0)
(verb, url, version) = firstLine.split()
d = {'verb' : verb, 'url' : url, 'version' : version}
for h in headers:
h = h.split(': ')
if len(h) < 2:
continue
field=h[0]
value= h[1]
d[field] = value
print d
print d['User-Agent']
print d['url']