I was trying to write a spdy proxy server, but have problems compress/decompress spdy name/value blocks.
I was using python 3.3 zlib library to compress/decompress with dictionary.
When receiving spdy frames from chrome 31, frames can be parsed most of the time, but some name/value blocks can not be decompressed correctly.
I have 3 test cases:
import zlib
dictionary = (
b"optionsgetheadpostputdeletetraceacceptaccept-charsetaccept-encodingaccept-"
b"languageauthorizationexpectfromhostif-modified-sinceif-matchif-none-matchi"
b"f-rangeif-unmodifiedsincemax-forwardsproxy-authorizationrangerefererteuser"
b"-agent10010120020120220320420520630030130230330430530630740040140240340440"
b"5406407408409410411412413414415416417500501502503504505accept-rangesageeta"
b"glocationproxy-authenticatepublicretry-afterservervarywarningwww-authentic"
b"ateallowcontent-basecontent-encodingcache-controlconnectiondatetrailertran"
b"sfer-encodingupgradeviawarningcontent-languagecontent-lengthcontent-locati"
b"oncontent-md5content-rangecontent-typeetagexpireslast-modifiedset-cookieMo"
b"ndayTuesdayWednesdayThursdayFridaySaturdaySundayJanFebMarAprMayJunJulAugSe"
b"pOctNovDecchunkedtext/htmlimage/pngimage/jpgimage/gifapplication/xmlapplic"
b"ation/xhtmltext/plainpublicmax-agecharset=iso-8859-1utf-8gzipdeflateHTTP/1"
b".1statusversionurl\0")
def decompress(buf):
decompressor = zlib.decompressobj(zdict=dictionary)
return decompressor.decompress(buf)
def compress(buf):
compressor = zlib.compressobj(zdict=dictionary)
return compressor.compress(buf)
if __name__ == '__main__':
# Test 1: buf -(compress)-> cb -(decompress)->buf2, buf2 become ''
buf = b'\x00\x01\x00\x06status\x00\x1a200 Connection established'
print(buf)
cb = compress(buf)
print(cb) # b'x\xbb\xdf\xa2Q\xb2'
buf = decompress(cb)
print(buf) # b''
# Test 2: This name/value block data was sent by chrome, which decompressed correctly
print(decompress(b'8\xea\xdf\xa2Q\xb2b`e`\x01\xe5\x12\x06\x9e4`\xc6K\x02\x06\x83^r~.\x03[.0o\xe6\xa70\xb0;\xfb\xfb\xf9\xb9:\x8700\x83\x14\x0b\x00\x04PZbrjR~~\xb6^r~\xae\x95\x89\x891#\x001p!\x12<C\x8eo~UfNN\xa2\xbe\xa9\x9e\x81\x82Fxf^J~y\xb1\x82_\x88\x82\x99\x9e\xa1\xb5B\xb8\x7f\xb8\x99\x89\xa6\x82#\xd0K\xa9\xe1\xa9I\xde\x99%\xfa\xa6\xc6\xe6z\xc6f\n\x1a\xde\x1e!\xbe>:\n9\x99\xd9\xa9\n\xee\xa9\xc9\xd9\xf9\x9a\n\xce\x19\xc0\xdc\x9b\xaaol\xa8g\xa0ghfj\xa0gf\xac\x10\x9c\x98\x96X\x94\t\xd5\xc5\xc0\x0e\xf5\x04\x03\x07\xcco\x00\x00\x00\x00\xff\xff'))
# b'\x00\x05\x00\x04host\x00\x0cfacebook.com\x00\x06method\x00\x07CONNECT\x00\x03url\x00\x10facebook.com:443\x00\nuser-agent\x00lMozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\x00\x07version\x00\x08HTTP/1.1'
# Test 3: This was another name/value block data sent by chrome, which can not be decompressed
print(decompress(b'"\xcd+\x00\x01\x94\x96\x98\x9c\x9a\x94\x9f\x9f\xad\x97\x9c\x9fkebb\x0c\x10#\x83\xca+\x00\x00\x00\x00\xff\xff'))
# Error -3 while decompressing data: incorrect header check
I'm new to python3+zlib (use python 2.7 before this project) and spdy.
I really appreciate your help.
You need to flush for both compression and decompression. Otherwise some or all of the data remains in the object. I.e.:
def decompress(buf):
decompressor = zlib.decompressobj()
result = decompressor.decompress(buf)
return result + decompressor.flush()
def compress(buf):
compressor = zlib.compressobj()
result = compressor.compress(buf)
return result + compressor.flush()
Related
I am trying to work on a website that has simple captcha. Here's the link.
Steps:
One is supposed to type a case number e.g. 200078510, then type the numbers in the captcha, then click on Search button.
Progress:
I could solve the part of the captcha, but when trying to use the POST method in requests library, I didn't get a valid response. I got this string حدث خطأ ما , which means that Something went wrong. A successful response would have included the case number in the response e.g. 200078510.
Question:
90% of the time myCaptcha is correct so the problem, I think, is with the POST request. Can anyone see what is wrong with my POST request?
I provide a working VBA example at the end, as additional info, in case that helps.
Here's the code that I could do till now:
import requests
import cv2
import numpy as np
import pytesseract
from PIL import Image
sNumber = 'Number.png'
sTemp = 'Temp.png'
pytesseract.pytesseract.tesseract_cmd=r'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'
def getCaptcha():
response = requests.get("https://eservices.moj.gov.kw/captcha/imgCaptcha.jsp")
with open(sNumber, "wb") as f:
f.write(response.content)
f.close()
img = cv2.imread(sNumber)
lower = np.array([0, 0, 0])
upper = np.array([46, 46, 255])
thresh = cv2.inRange(img, lower, upper)
thresh = 255 - thresh
cv2.imwrite(sTemp, thresh)
img=Image.open(sTemp)
text=pytesseract.image_to_string(img, lang='eng',config='--psm 10 --oem 3 -c tessedit_char_whitelist=0123456789')
return text
myCaptcha = getCaptcha()
print(myCaptcha)
payload = {'txtCaseNo': '200078510', 'txtCaptcha2': myCaptcha, 'searchType': '0'}
r = requests.post("https://eservices.moj.gov.kw/viewResults/validateCase.jsp", data=payload)
print(r.url)
print(r.text)
I even tried using headers like that and the same problem
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36',
'Content-Type':'application/x-www-form-urlencoded'}
payload = {'txtCaseNo': '200078510', 'txtCaptcha2': myCaptcha, 'searchType': '0'}
r = requests.post("https://eservices.moj.gov.kw/viewResults/validateCase.jsp",headers = headers, data=payload)
Simply I need to be able to use the POST method of the requests package so as to be able to send the suitable arguments and then navigate to multiple sections that are related to the searched number.
Supplementary Information (Working example reference in VBA):
I have a working code in VBA for the entire process. The code navigates to a URL and enter a number and enter the numbers on captcha. Here's the code:
Public vCaptcha
Sub Test()
Dim wsIndex As Worksheet, wsData As Worksheet, http As New XMLHTTP60, html As New HTMLDocument, htmlData As New HTMLDocument, postCasePane As Object, oTables As Object, postTable As Object, postWrongSec As Object, strArg As String, xTemp As String, sTemp As String, r As Long, lr As Long, i As Long, ii As Long, vMAX As Long, cnt As Long
Set wsIndex = ThisWorkbook.Worksheets("Index")
Set wsData = ThisWorkbook.Worksheets("Data")
wsData.Range("A1").CurrentRegion.Offset(1).ClearContents
For r = 2 To wsIndex.Cells(Rows.Count, 1).End(xlUp).Row
If r Mod 10 = 0 Then ThisWorkbook.Save
lr = wsData.Cells(Rows.Count, 1).End(xlUp).Row + 1
If wsIndex.Cells(r, 1).Value = "" Then GoTo Skipper
sPoint:
Application.StatusBar = "Case Number: " & wsIndex.Cells(r, 1).Value & " ------- Row " & r
DecryptCaptcha
strArg = "txtCaseNo=" & wsIndex.Cells(r, 1).Value & "&txtCaptcha2=" & vCaptcha & "&searchType=0"
With http
.Open "POST", "https://eservices.moj.gov.kw/viewResults/validateCase.jsp", False
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send strArg
html.body.innerHTML = .responseText
Set postWrongSec = html.querySelector("span[lang='AR-KW']")
If Not postWrongSec Is Nothing Then
If postWrongSec.innerText = "ÚÝæÇ: ÑãÒ ÇáÍãÇíÉ ÛíÑ ÕÍíÍ !!!" Then
cnt = cnt + 1
Debug.Print "Wrong Captcha " & cnt: GoTo sPoint
End If
End If
Set postCasePane = html.querySelector("#caseViewPane span h4")
If postCasePane Is Nothing Then wsData.Range("A" & lr).Value = wsIndex.Cells(r, 1).Value: wsData.Range("C" & lr).Value = "ÑÞã ÇáÞÖíÉ ÛíÑ ÕÍíÍ": GoTo Skipper
.Open "POST", "https://eservices.moj.gov.kw/viewResults/viewLastEvents.jsp", False
.setRequestHeader "Content-type", "application/x-www-form-urlencoded"
.send
html.body.innerHTML = .responseText
End With
Set html = Nothing: Set htmlData = Nothing
Skipper:
Application.Wait Now + TimeValue("00:00:05")
Next r
Application.StatusBar = Empty
MsgBox "Done...", 64
End Sub
And this is the part the is responsible for the captcha
Private Sub DecryptCaptcha()
Dim res, sDestFolder As String, strFile As String, sURL As String
sDestFolder = ThisWorkbook.Path & "\"
strFile = "Number.png"
sURL = "https://eservices.moj.gov.kw/captcha/imgCaptcha.jsp"
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", sURL, False
.send
res = .responseBody
End With
With CreateObject("ADODB.Stream")
.Type = 1
.Open
.write res
.SaveToFile sDestFolder & strFile, 2
End With
vCaptcha = CleanNumber(ScriptFile(sDestFolder & strFile))
End Sub
Function ScriptFile(strImage As String) As String
Dim wshShell As Object, sOutput As String, strCommand As String
sOutput = ThisWorkbook.Path & "\OutputNumber.txt"
strCommand = "Powershell.exe -File ""C:\Users\" & Environ("USERNAME") & "\Desktop\ConvertImage.ps1"" " & strImage
Set wshShell = CreateObject("WScript.Shell")
wshShell.Run strCommand, 0, True
ScriptFile = CreateObject("Scripting.FileSystemObject").OpenTextFile(sOutput).ReadAll
End Function
Function CleanNumber(ByVal strText As String) As String
With CreateObject("VBScript.RegExp")
.IgnoreCase = True
.Global = True
.Pattern = "[^0-9]"
If .Test(strText) Then
CleanNumber = WorksheetFunction.Trim(.Replace(strText, vbNullString))
Else
CleanNumber = strText
End If
End With
End Function
And as for the powershell file these are the contents
$image=$args[0]
$desktop= (Join-Path $env:USERPROFILE 'Desktop')
$imagefile=(Join-Path $desktop 'NumberNew.png')
$textfile=(Join-Path $desktop 'OutputNumber')
cd (Join-Path $desktop '\')
magick convert $image -resize 300x160 -density 300 -quality 100 $imagefile
magick convert $imagefile -negate -lat 300x160+40% -negate $imagefile
tesseract.exe $imagefile $textfile -l eng
Of course the code requires tesseract to be installed and also the imagemagick to deal and manipulate the image. The code is working in VBA but I would like to use python for that to improve my skills. Now I am stuck and have no more points of success. Thanks advanced for help.
How can improve my multithreading speed in my code?
My code takes 130 seconds with 100 threads to do 700 requests which is really slow and frustrating assuming that i use 100 threads.
My code edits the parameter values from an url and makes a request to it including the original url (unedited) the urls are received from a file (urls.txt)
Let me show you an example:
Let's consider the following url:
https://www.test.com/index.php?parameter=value1¶meter2=value2
The url contains 2 parameters so my code will make 3 requests.
1 request to the original url:
https://www.test.com/index.php?parameter=value1¶meter2=value2
1 request to the first modified value:
https://www.test.com/index.php?parameter=replaced_value¶meter2=value2
1 request to the second modified value:
https://www.test.com/index.php?parameter=value1¶meter2=replaced_value
I have tried using asyncio for this but I had more success with concurrent.futures
I even tried increasing the threads which I thought it was the issue at first but in this case wasnt if I would increase the threads considerably then the script would freeze at start for 30-50 seconds and it really didnt increased the speed as i expected
I assume this is a code issue how I build up the multithreading becuase I saw other people achieved incredible speeds with concurrent.futures
import requests
import uuid
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
start = time.time()
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
def make_request(url2):
try:
if '?' and '=':
request_1 = requests.get(url2, headers=headers, timeout=10)
url2_modified = url2.split("?")[1]
times = url2_modified.count("&") + 1
for x in range(0, times):
split1 = url2_modified.split("&")[x]
value = split1.split("=")[1]
parameter = split1.split("=")[0]
url = url2.replace('='+value, '=1')
request_2 = requests.get(url, stream=True, headers=headers, timeout=10)
html_1 = request_1.text
html_2 = request_2.text
print(request_1.status_code + ' - ' + url2)
print(request_2.status_code + ' - ' + url)
except requests.exceptions.RequestException as e:
return e
def runner():
threads= []
with ThreadPoolExecutor(max_workers=100) as executor:
file1 = open('urls.txt', 'r', errors='ignore')
Lines = file1.readlines()
count = 0
for line in Lines:
count += 1
threads.append(executor.submit(make_request, line.strip()))
runner()
end = time.time()
print(end - start)
Inside loop in make_request you run normal requests.get and it doesn't use thread (or any other method) to make it faster - so it has to wait for end of previous request to run next request.
In make_request I use another ThreadPoolExecutor to run every requests.get (created in loop) in separated thread
executor.submit(make_modified_request, modified_url)
and it gives me time ~1.2s
If I use normal
make_modified_request(modified_url)
then it gives me time ~3.2s
Minimal working example:
I use real urls https://httpbin.org/get so everyone can simply copy and run it.
from concurrent.futures import ThreadPoolExecutor
import requests
import time
#import urllib.parse
# --- constansts --- (PEP8: UPPER_CASE_NAMES)
HEADERS = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
# --- functions ---
def make_modified_request(url):
"""Send modified url."""
print('send:', url)
response = requests.get(url, stream=True, headers=HEADERS)
print(response.status_code, '-', url)
html = response.text # ???
# ... code to process HTML ...
def make_request(url):
"""Send normal url and create threads with modified urls."""
threads = []
with ThreadPoolExecutor(max_workers=10) as executor:
print('send:', url)
# send base url
response = requests.get(url, headers=HEADERS)
print(response.status_code, '-', url)
html = response.text # ???
#parts = urllib.parse.urlparse(url)
#print('query:', parts.query)
#arguments = urllib.parse.parse_qs(parts.query)
#print('arguments:', arguments) # dict {'a': ['A'], 'b': ['B'], 'c': ['C'], 'd': ['D'], 'e': ['E']}
arguments = url.split("?")[1]
arguments = arguments.split("&")
arguments = [arg.split("=") for arg in arguments]
print('arguments:', arguments) # list [['a', 'A'], ['b', 'B'], ['c', 'C'], ['d', 'D'], ['e', 'E']]
for name, value in arguments:
modified_url = url.replace('='+value, '=1')
print('modified_url:', modified_url)
# run thread with modified url
threads.append(executor.submit(make_modified_request, modified_url))
# run normal function with modified url
#make_modified_request(modified_url)
print('[make_request] len(threads):', len(threads))
def runner():
threads = []
with ThreadPoolExecutor(max_workers=10) as executor:
#fh = open('urls.txt', errors='ignore')
fh = [
'https://httpbin.org/get?a=A&b=B&c=C&d=D&e=E',
'https://httpbin.org/get?f=F&g=G&h=H&i=I&j=J',
'https://httpbin.org/get?k=K&l=L&m=M&n=N&o=O',
'https://httpbin.org/get?a=A&b=B&c=C&d=D&e=E',
'https://httpbin.org/get?f=F&g=G&h=H&i=I&j=J',
'https://httpbin.org/get?k=K&l=L&m=M&n=N&o=O',
]
for line in fh:
url = line.strip()
# create thread with url
threads.append(executor.submit(make_request, url))
print('[runner] len(threads):', len(threads))
# --- main ---
start = time.time()
runner()
end = time.time()
print('time:', end - start)
BTW:
I was thinking to use single
executor = ThreadPoolExecutor(max_workers=10)
and later use the same executor in all functions - and maybe it would run little faster - but at this moment I don't have working code.
I am able to generate and stream text on the fly, but unable to generate and stream a compressed file on the fly.
from flask import Flask, request, Response,stream_with_context
import zlib
import gzip
app = Flask(__name__)
def generate_text():
for x in range(10000):
yield f"this is my line: {x}\n".encode()
#app.route('/stream_text')
def stream_text():
response = Response(stream_with_context(generate_text()))
return response
def generate_zip():
for x in range(10000):
yield zlib.compress(f"this is my line: {x}\n".encode())
#app.route('/stream_zip')
def stream_zip():
response = Response(stream_with_context(generate_zip()), mimetype='application/zip')
response.headers['Content-Disposition'] = 'attachment; filename=data.gz'
return response
if __name__ == '__main__':
app.run(host='0.0.0.0', port=8000, debug=True)
Than using curl and gunzip:
curl http://127.0.0.1:8000/stream_zip > data.gz
gunzip data.gz
gunzip: data.gz: not in gzip format
I don't care if it is zip, gzip, or any other type of compression.
generate_text in my real code generates over 4 GB of data so I would like to compress on the fly.
Saving text to file, zipping, returning zip file, and than deleting is not the solution I'm after.
I need to be in a loop generating some text -> compress that text -> streaming compressed data until I'm done.
zip/gzip ... anything is fine as long as it works.
You are yielding a series of compressed documents, not a single compressed stream. Don't use zlib.compress(), it includes the header and forms a single document.
You need to create a zlib.compressobj() object instead, and use the Compress.compress() method on that object to produce a stream of data (followed by a final call to Compress.flush()):
def generate_zip():
compressor = zlib.compressobj()
for x in range(10000):
chunk = compressor.compress(f"this is my line: {x}\n".encode())
if chunk:
yield chunk
yield compressor.flush()
The compressor can produce empty blocks when there is not enough data yet to produce a full compressed-data chunk, the above only yields if there is actually anything to send. Because your input data is so highly repetitive and thus the data can be efficiently compressed, this yields only 3 times (once with 2-byte header, once with about 21kb of compressed data covering the first 8288 iterations over range(), and finally with the remaining 4kb for the rest of the loop).
In aggregate, this produces the same data as a single zlib.compress() call with all inputs concatenated. The correct mime-type for this data format is application/zlib, not application/zip.
This format is not readily decompressible with gzip however, not without some trickery. That's because the above doesn't yet produce a GZIP file, it just produces a raw zlib-compressed stream. To make it GZIP compatible, you need to configure the compression correctly, send a header first, and add a CRC checksum and data length value at the end:
import zlib
import struct
import time
def generate_gzip():
# Yield a gzip file header first.
yield bytes([
0x1F, 0x8B, 0x08, 0x00, # Gzip file, deflate, no filename
*struct.pack('<L', int(time.time())), # compression start time
0x02, 0xFF, # maximum compression, no OS specified
])
# bookkeeping: the compression state, running CRC and total length
compressor = zlib.compressobj(
9, zlib.DEFLATED, -zlib.MAX_WBITS, zlib.DEF_MEM_LEVEL, 0)
crc = zlib.crc32(b"")
length = 0
for x in range(10000):
data = f"this is my line: {x}\n".encode()
chunk = compressor.compress(data)
if chunk:
yield chunk
crc = zlib.crc32(data, crc) & 0xFFFFFFFF
length += len(data)
# Finishing off, send remainder of the compressed data, and CRC and length
yield compressor.flush()
yield struct.pack("<2L", crc, length & 0xFFFFFFFF)
Serve this as application/gzip:
#app.route('/stream_gzip')
def stream_gzip():
response = Response(stream_with_context(generate_gzip()), mimetype='application/gzip')
response.headers['Content-Disposition'] = 'attachment; filename=data.gz'
return response
and the result can be decompressed on the fly:
curl http://127.0.0.1:8000/stream_gzip | gunzip -c | less
While I was extremely impressed by Martijn's solution, I decided to roll my own one that uses pigz for better performance:
def yield_pigz(results, compresslevel=1):
cmd = ['pigz', '-%d' % compresslevel]
pigz_proc = subprocess.Popen(cmd, bufsize=0,
stdin=subprocess.PIPE, stdout=subprocess.PIPE)
def f():
for result in results:
pigz_proc.stdin.write(result)
pigz_proc.stdin.flush()
pigz_proc.stdin.close()
try:
t = threading.Thread(target=f)
t.start()
while True:
buf = pigz_proc.stdout.read(4096)
if len(buf) == 0:
break
yield buf
finally:
t.join()
pigz_proc.wait()
Keep in mind that you'll need to import subprocess and threading for this to work. You will also need to install pigz program (already in repositories of most Linux distributions -- on Ubuntu, just use sudo apt install pigz -y).
Example usage:
from flask import Flask, Response
import subprocess
import threading
import random
app = Flask(__name__)
def yield_something_random():
for i in range(10000):
seq = [chr(random.randint(ord('A'), ord('Z'))) for c in range(1000)]
yield ''.join(seq)
#app.route('/')
def index():
return Response(yield_pigz(yield_something_random()))
I think that currently you just sending the generator instead of the data!
You may want to do something like this (I haven't tested it, so may need some change):
def generate_zip():
import io
with gzip.GzipFile(fileobj=io.BytesIO(), mode='w') as gfile:
for x in xrange(10000):
gfile.write("this is my line: {}\n".format(x))
return gfile.read()
Working generate_zip() with low memory consumption :) :
def generate_zip():
buff = io.BytesIO()
gz = gzip.GzipFile(mode='w', fileobj=buff)
for x in xrange(10000):
gz.write("this is my line: {}\n".format(x))
yield buff.read()
buff.truncate()
gz.close()
yield buff.getvalue()
I am using python requests to obtain a file's source code, and then parse a string from the source. The string I am trying to parse is magic: 8susjdhdyrhsisj3864jsud (not always the same string). If I observe the source by printing it out to the screen it shows just fine. When I parse the string sometimes I get a result, and other times I get nothing. Please see the following screenshots: http://i.imgur.com/NW1zFZK.png, http://i.imgur.com/cb9e2cb.png. Now the string I want always appears in the source so it must be a regex issue? I've tried findall and search, but both methods give me the same outcome. Results sometimes and other times I get nothing. What seems to be my issue?
class Solvemedia():
def __init__(self, key):
self.key = key
def timestamp(self, source):
timestamp_regex = re.compile(ur'chalstamp:\s+(\d+),')
print re.findall(timestamp_regex, source)
def magic(self, source):
magic_regex = re.compile(ur'magic:\s+\'(\w+)\',')
print re.findall(magic_regex, source)
def source(self):
solvemedia = requests.Session()
solvemedia.headers.update({
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
})
source = solvemedia.get('http://api.solvemedia.com/papi/challenge.script?k={}'.format(self.key)).text
return source
def test(self):
js_source = self.source()
print js_source
self.magic(js_source)
self.timestamp(js_source)
solvemedia = Solvemedia('HUaZ-6d2wtQT3-LkLVDPJB5C.E99j9ZK')
solvemedia.test()
There is a . in one of the values, but \w doesn't match dots. Compare:
magic: 'AZJEXYx.ZsExcTHvjH9mwQ',
// ^
with:
magic: 'xfF9i4YBAQP1EgoNhgEBAw',
A better bet is to allow all characters except a quote:
magic_regex = re.compile(ur"magic:\s+'([^']+)',")
Demo:
>>> import re
>>> samples = [
... u"magic: 'xfF9i4YBAQP1EgoNhgEBAw',",
... u"magic: 'AZJEXYx.ZsExcTHvjH9mwQ',",
... ]
>>> magic_regex = re.compile(ur"magic:\s+'([^']+)',")
>>> for sample in samples:
... print magic_regex.search(sample).group(1)
...
xfF9i4YBAQP1EgoNhgEBAw
AZJEXYx.ZsExcTHvjH9mwQ
I was wondering how I would go about checking HTTP headers to determine whether the request is valid or malformed. How can I do this in Python, more specifically, how can I do this in GAE?
For some debugging and viewing the request with the headers I use the following DDTHandler class.
import cgi
import wsgiref.handlers
import webapp2
class DDTHandler(webapp2.RequestHandler):
def __start_display(self):
self.response.out.write("<!--\n")
def __end_display(self):
self.response.out.write("-->\n")
def __show_dictionary_items(self,dictionary,title):
if (len(dictionary) > 0):
request = self.request
out = self.response.out
out.write("\n" + title + ":\n")
for key, value in dictionary.iteritems():
out.write(key + " = " + value + "\n")
def __show_request_members(self):
request = self.request
out = self.response.out
out.write(request.url+"\n")
out.write("Query = "+request.query_string+"\n")
out.write("Remote = "+request.remote_addr+"\n")
out.write("Path = "+request.path+"\n\n")
out.write("Request payload:\n")
if (len(request.arguments()) > 0):
for argument in request.arguments():
value = cgi.escape(request.get(argument))
out.write(argument+" = "+value+"\n")
else:
out.write("Empty\n")
self.__show_dictionary_items(request.headers, "Headers")
self.__show_dictionary_items(request.cookies, "Cookies")
def view_request(self):
self.__start_display()
self.__show_request_members()
self.__end_display()
def view(self, aString):
self.__start_display()
self.response.out.write(aString+"\n")
self.__end_display()
Example:
class RootPage(DDTHandler):
def get(self):
self.view_request()
Will output the request and contains the headers.
So check the code and get what you need. Thought as said, a malformed "invalid" request won't probably hit your app.
<!--
http://localhost:8081/
Query =
Remote = 127.0.0.1
Path = /
Request payload:
Empty
Headers:
Referer = http://localhost:8081/_ah/login?continue=http%3A//localhost%3A8081/
Accept-Charset = ISO-8859-7,utf-8;q=0.7,*;q=0.3
Cookie = hl=en_US; dev_appserver_login="test#example.com:False:185804764220139124118"
User-Agent = Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.52 Safari/537.17
Host = localhost:8081
Accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language = en-US,en;q=0.8,el;q=0.6
Cookies:
dev_appserver_login = test#example.com:False:185804764220139124118
hl = en_US
-->