I am trying to extract the Httponly, Secure, domain and path from the given cookie in python. How do I do that ???
import requests
target_url = "https://www.google.com/"
try:
response1 = requests.get(target_url)
if response1.status_code == 200:
response2 = response1.headers['Set-Cookie']
print(response2)
except Exception as e:
print(str(e))
targets = response2.split('; ')
for target in targets:
print(target)
Result
1P_JAR=2020-01-26-18; expires=Tue, 25-Feb-2020 18:45:29 GMT; path=/; domain=.google.com; Secure, NID=196=vCD6Y6ltvTmjf_VRFN9SUuqEN7OEJKjEoJg4XhiBc8Xivdez5boKQ8QzcCYung7EKe58kso1333yCrqq_Wq2QXwCZPAIrwHbo1lITA8lvqRtJERF-S6t9mMVEOg_o_Jpne5oRL3vwn8ReeV8f3Exx6ScJipPsm9MlXXir1fisho; expires=Mon, 27-Jul-2020 18:45:29 GMT; path=/; domain=.google.com; HttpOnly
1P_JAR=2020-01-26-18
expires=Tue, 25-Feb-2020 18:45:29 GMT
path=/
domain=.google.com
Secure, NID=196=vCD6Y6ltvTmjf_VRFN9SUuqEN7OEJKjEoJg4XhiBc8Xivdez5boKQ8QzcCYung7EKe58kso1333yCrqq_Wq2QXwCZPAIrwHbo1lITA8lvqRtJERF-S6t9mMVEOg_o_Jpne5oRL3vwn8ReeV8f3Exx6ScJipPsm9MlXXir1fisho
expires=Mon, 27-Jul-2020 18:45:29 GMT
path=/
domain=.google.com
HttpOnly
A little string manipulation should do the trick:
targets = response2.split('; ')
for target in targets:
print(target)
Output:
1P_JAR=2020-01-26-17
expires=Tue, 25-Feb-2020 17:36:40 GMT
path=/
domain=.google.com
Secure, NID=196=dXGexcdgLL0Ndy85DQj-Yg5aySfe_th__wZRtmnu2V2alQQdl807dMDLSTeEKb2CEfGpV17fIej7uXIp6w5Nb0Npab4nrf38fQwi480iYF8DYxa-ggSN-PTXVXeGvrwKRnmDYWmfYynSvpD-C9UUiXI59baq1dsdDtwsIL-zzq0
expires=Mon, 27-Jul-2020 17:36:40 GMT
path=/
domain=.google.com
HttpOnly
To get only, for example, "domain" use:
print(targets[3])
If you don't know the order of cookies, you can try a dictionary:
cookies = dict()
for target in targets:
if '=' in target:
key=target.split('=')[0]
value=target.split('=')[1]
cookies.update({key:value})
else:
cookies.update({target:target})
cookies.get('domain')
Output:
.google.com
Related
I am using requests and I need to extract a certain value from response headers set cookie. I cant use r.cookies because that doesnt add expiration, path, domain, etc and I need those values.
When I do
test = r.headers['set-cookie']
print(test)
I get a response as so:
'cookie1 = cookie1value; expires=datehere; path=/; domain=domainhere, cookie2 = cookie2value; expires=datehere; path=/; domain=domainhere,cookie3 = cookie3value; Domain=.domain.com; Path=/; Expires=Wed, 04 Nov 2020 19:44:17 GMT; Max-Age=31536000; Secure
I need to extract the value of cookie3 with all of its tags.
You could use re
import re
test = 'cookie1 = cookie1value; expires=datehere; path=/; domain=domainhere, cookie2 = cookie2value; expires=datehere; path=/; domain=domainhere,cookie3 = cookie3value; expires=datehere; path=/; domain=domainhere,cookie4 = cookie4value; expires=datehere; path=/; domain=domainhere'
p = re.compile(r'cookie3 = (.*)')
print(p.findall(test)[0])
i have many cookie strings that i get from a http response and save in a set. For example like this:
cookies = set()
cookies.add("__cfduid=123456789101112131415116; expires=Thu, 27-Aug-20 10:10:10 GMT; path=/; domain=.example.com; HttpOnly; Secure")
cookies.add("MUID=16151413121110987654321; domain=.bing.com; expires=Mon, 21-Sep-2020 10:10:11 GMT; path=/;, MUIDB=478534957198492834; path=/; httponly; expires=Mon, 21-Sep-2020 10:10:11 GMT")
Now i would like to parse that strings to an array or something else to access the data (domain, expires, ...) easier. For example like this:
cookie['MUID']['value']
cookie['MUID']['domain']
cookie['MUIDB']['path']
cookie['__cfduid']['Secure']
...
But i don't know how i do this. I try it with the SimpleCookie from http.cookies but i get not the expected result.
You should create a python dictionary for this.
from collections import defaultdict
cookies = defaultdict(str)
list_of_strings = ["__cfduid=123456789101112131415116; expires=Thu, 27-Aug-20 10:10:10 GMT; path=/; domain=.example.com; HttpOnly; Secure"]# this is your list of strings you want to add
for string in list_of_strings:
parts = string.split(";")
for part in parts:
temp = part.split("=")
if len(temp) == 2:
cookies[temp[0]] = temp[1]
I'm working on a python script that grabs the prices of items from the steam marketplace.
My problem is that if I let it run for too long, it gets an HTTP 429 error.
I want to avoid this, but the header retry-after is not found in server's response.
Here's a sample of the response headers
('Server', 'nginx')
('Content-Type', 'application/json; charset=utf-8')
('X-Frame-Options', 'DENY')
('Expires', 'Mon, 26 Jul 1997 05:00:00 GMT')
('Cache-Control', 'no-cache')
('Vary', 'Accept-Encoding')
('Date', 'Wed, 08 May 2019 03:58:30 GMT')
('Content-Length', '6428')
('Connection', 'close')
('Set-Cookie', 'sessionid=14360f3a5309bb1531932884; path=/; secure')
('Set-Cookie', 'steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure')
EDIT: heres the code and sample output.
note that nothing inside of the try statement will be run for this example
def getPrice(card, game):
url = 'https://steamcommunity.com/market/search/render/?query='
url = url+card+" "+game
url = url.replace(" ", "+")
print(url)
try:
data = urllib.request.urlopen(url)
h = data.getheaders()
for item in h:
print(item)
#print(data.getheaders())
#k = data.headers.keys()
json_data = json.loads(data.read())
pprint.pprint(json_data)
except Exception as e:
print(e.headers)
return 0
sample output on 3 different calls:
https://steamcommunity.com/market/search/render/?query=Glub+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:49 GMT
Connection: close
Set-Cookie: sessionid=5d1ea46f5095d9c28e141dd5; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
https://steamcommunity.com/market/search/render/?query=Qaahl+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:49 GMT
Connection: close
Set-Cookie: sessionid=64e7956224b18e6d89cc45c0; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
https://steamcommunity.com/market/search/render/?query=Odshan+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:50 GMT
Connection: close
Set-Cookie: sessionid=a7acd1023b4544809914dc6e; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
Try this.
def getPrice(card, game):
url = 'https://steamcommunity.com/market/search/render/?query='
url = url+card+" "+game
url = url.replace(" ", "+")
print(url)
while True:
try:
data = urllib.request.urlopen(url)
h = data.getheaders()
for item in h:
print(item)
json_data = json.loads(data.read())
pprint.pprint(json_data)
except Exception as e:
import time
milliseconds = 10000
time.sleep(milliseconds)
Use your milliseconds value.
I have a problem with response from webapp.
smth = requests.get('httpxxx')
then I get:
requests.packages.urllib3.exceptions.HeaderParsingError:
[MissingHeaderBodySeparatorDefect()], unparsed data: 'Compression
Index: 1\r\nContent-Type: text/plain;charset=UTF-8\r\nContent-Length:
291\r\nDate: Wed, 01 Jun 2016 15:03:05 GMT\r\n\r\n'
but some headers are parsable:
smth.headers
{'Cache-Control': 'no-cache', 'Server': 'Apache-Coyote/1.1', 'Pragma': 'no-cache', 'Last-Modified': 'Wed, 01 Jun 2016 15:24:58 GMT'}
When contentlength is more than 1000 then response will be compressed, so destroy everything.
This is Requests 1.1.0 and Python 2.6.4 (also same behavior on Python 2.7.2).
>>> import requests
>>> response = requests.get('http://www.google.com')
>>> response.status_code
200
>>> print response.headers.get('status')
None
According to the docs, there should be a headers['status'] entry with a string like "200 OK".
Here is the full contents of the headers dict:
>>> response.headers
{'x-xss-protection': '1; mode=block', 'transfer-encoding': 'chunked', 'set-cookie': 'PREF=ID=74b29ee465454efd:FF=0:TM=1362094463:LM=1362094463:S=Xa96iJQX_9BrC-Vm; expires=Sat, 28-Feb-2015 23:34:23 GMT; path=/; domain=.google.com, NID=67=IH21bLPTK2gLTHCyDCMEs3oN5g1uMV99U4Wsc2YA00AbFt4fQCoywQNEQU0pR6VuaNhhQGFCsqdr0FnWbPcym-pizo0xVuS6WBJ9EOTeSFARpzrsiHh6HNnaQeCnxCSH; expires=Fri, 30-Aug-2013 23:34:23 GMT; path=/; domain=.google.com; HttpOnly', 'expires': '-1', 'server': 'gws', 'cache-control': 'private, max-age=0', 'date': 'Thu, 28 Feb 2013 23:34:23 GMT', 'p3p': 'CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."', 'content-type': 'text/html; charset=ISO-8859-1', 'x-frame-options': 'SAMEORIGIN'}
Here is where I got the idea that this dict should contain a 'status' entry.
Am I doing something wrong?
You're looking for the "reason"
>>> x=requests.get("http://apple.adam.gs")
>>> x.reason
'OK'
>>>
custom.php contains:
header("HTTP/1.1 200 Testing")
Results in:
>>> x=requests.get("http://apple.adam.gs/custom.php")
>>> print x.reason
Testing
>>>