Parse cookie string to array or something else in python - python

i have many cookie strings that i get from a http response and save in a set. For example like this:
cookies = set()
cookies.add("__cfduid=123456789101112131415116; expires=Thu, 27-Aug-20 10:10:10 GMT; path=/; domain=.example.com; HttpOnly; Secure")
cookies.add("MUID=16151413121110987654321; domain=.bing.com; expires=Mon, 21-Sep-2020 10:10:11 GMT; path=/;, MUIDB=478534957198492834; path=/; httponly; expires=Mon, 21-Sep-2020 10:10:11 GMT")
Now i would like to parse that strings to an array or something else to access the data (domain, expires, ...) easier. For example like this:
cookie['MUID']['value']
cookie['MUID']['domain']
cookie['MUIDB']['path']
cookie['__cfduid']['Secure']
...
But i don't know how i do this. I try it with the SimpleCookie from http.cookies but i get not the expected result.

You should create a python dictionary for this.
from collections import defaultdict
cookies = defaultdict(str)
list_of_strings = ["__cfduid=123456789101112131415116; expires=Thu, 27-Aug-20 10:10:10 GMT; path=/; domain=.example.com; HttpOnly; Secure"]# this is your list of strings you want to add
for string in list_of_strings:
parts = string.split(";")
for part in parts:
temp = part.split("=")
if len(temp) == 2:
cookies[temp[0]] = temp[1]

Related

Extract Httponly, Secure, domain and path from a cookie

I am trying to extract the Httponly, Secure, domain and path from the given cookie in python. How do I do that ???
import requests
target_url = "https://www.google.com/"
try:
response1 = requests.get(target_url)
if response1.status_code == 200:
response2 = response1.headers['Set-Cookie']
print(response2)
except Exception as e:
print(str(e))
targets = response2.split('; ')
for target in targets:
print(target)
Result
1P_JAR=2020-01-26-18; expires=Tue, 25-Feb-2020 18:45:29 GMT; path=/; domain=.google.com; Secure, NID=196=vCD6Y6ltvTmjf_VRFN9SUuqEN7OEJKjEoJg4XhiBc8Xivdez5boKQ8QzcCYung7EKe58kso1333yCrqq_Wq2QXwCZPAIrwHbo1lITA8lvqRtJERF-S6t9mMVEOg_o_Jpne5oRL3vwn8ReeV8f3Exx6ScJipPsm9MlXXir1fisho; expires=Mon, 27-Jul-2020 18:45:29 GMT; path=/; domain=.google.com; HttpOnly
1P_JAR=2020-01-26-18
expires=Tue, 25-Feb-2020 18:45:29 GMT
path=/
domain=.google.com
Secure, NID=196=vCD6Y6ltvTmjf_VRFN9SUuqEN7OEJKjEoJg4XhiBc8Xivdez5boKQ8QzcCYung7EKe58kso1333yCrqq_Wq2QXwCZPAIrwHbo1lITA8lvqRtJERF-S6t9mMVEOg_o_Jpne5oRL3vwn8ReeV8f3Exx6ScJipPsm9MlXXir1fisho
expires=Mon, 27-Jul-2020 18:45:29 GMT
path=/
domain=.google.com
HttpOnly
A little string manipulation should do the trick:
targets = response2.split('; ')
for target in targets:
print(target)
Output:
1P_JAR=2020-01-26-17
expires=Tue, 25-Feb-2020 17:36:40 GMT
path=/
domain=.google.com
Secure, NID=196=dXGexcdgLL0Ndy85DQj-Yg5aySfe_th__wZRtmnu2V2alQQdl807dMDLSTeEKb2CEfGpV17fIej7uXIp6w5Nb0Npab4nrf38fQwi480iYF8DYxa-ggSN-PTXVXeGvrwKRnmDYWmfYynSvpD-C9UUiXI59baq1dsdDtwsIL-zzq0
expires=Mon, 27-Jul-2020 17:36:40 GMT
path=/
domain=.google.com
HttpOnly
To get only, for example, "domain" use:
print(targets[3])
If you don't know the order of cookies, you can try a dictionary:
cookies = dict()
for target in targets:
if '=' in target:
key=target.split('=')[0]
value=target.split('=')[1]
cookies.update({key:value})
else:
cookies.update({target:target})
cookies.get('domain')
Output:
.google.com

extracting a certain "set-cookie" value when multiples are returned in response headers

I am using requests and I need to extract a certain value from response headers set cookie. I cant use r.cookies because that doesnt add expiration, path, domain, etc and I need those values.
When I do
test = r.headers['set-cookie']
print(test)
I get a response as so:
'cookie1 = cookie1value; expires=datehere; path=/; domain=domainhere, cookie2 = cookie2value; expires=datehere; path=/; domain=domainhere,cookie3 = cookie3value; Domain=.domain.com; Path=/; Expires=Wed, 04 Nov 2020 19:44:17 GMT; Max-Age=31536000; Secure
I need to extract the value of cookie3 with all of its tags.
You could use re
import re
test = 'cookie1 = cookie1value; expires=datehere; path=/; domain=domainhere, cookie2 = cookie2value; expires=datehere; path=/; domain=domainhere,cookie3 = cookie3value; expires=datehere; path=/; domain=domainhere,cookie4 = cookie4value; expires=datehere; path=/; domain=domainhere'
p = re.compile(r'cookie3 = (.*)')
print(p.findall(test)[0])

What is the 'retry-after' timer on steam's marketplace?

I'm working on a python script that grabs the prices of items from the steam marketplace.
My problem is that if I let it run for too long, it gets an HTTP 429 error.
I want to avoid this, but the header retry-after is not found in server's response.
Here's a sample of the response headers
('Server', 'nginx')
('Content-Type', 'application/json; charset=utf-8')
('X-Frame-Options', 'DENY')
('Expires', 'Mon, 26 Jul 1997 05:00:00 GMT')
('Cache-Control', 'no-cache')
('Vary', 'Accept-Encoding')
('Date', 'Wed, 08 May 2019 03:58:30 GMT')
('Content-Length', '6428')
('Connection', 'close')
('Set-Cookie', 'sessionid=14360f3a5309bb1531932884; path=/; secure')
('Set-Cookie', 'steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure')
EDIT: heres the code and sample output.
note that nothing inside of the try statement will be run for this example
def getPrice(card, game):
url = 'https://steamcommunity.com/market/search/render/?query='
url = url+card+" "+game
url = url.replace(" ", "+")
print(url)
try:
data = urllib.request.urlopen(url)
h = data.getheaders()
for item in h:
print(item)
#print(data.getheaders())
#k = data.headers.keys()
json_data = json.loads(data.read())
pprint.pprint(json_data)
except Exception as e:
print(e.headers)
return 0
sample output on 3 different calls:
https://steamcommunity.com/market/search/render/?query=Glub+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:49 GMT
Connection: close
Set-Cookie: sessionid=5d1ea46f5095d9c28e141dd5; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
https://steamcommunity.com/market/search/render/?query=Qaahl+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:49 GMT
Connection: close
Set-Cookie: sessionid=64e7956224b18e6d89cc45c0; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
https://steamcommunity.com/market/search/render/?query=Odshan+Crawl
Server: nginx
Content-Type: application/json; charset=utf-8
X-Frame-Options: DENY
Expires: Mon, 26 Jul 1997 05:00:00 GMT
Cache-Control: no-cache
Content-Encoding: gzip
Vary: Accept-Encoding
Content-Length: 24
Date: Wed, 08 May 2019 04:24:50 GMT
Connection: close
Set-Cookie: sessionid=a7acd1023b4544809914dc6e; path=/; secure
Set-Cookie: steamCountry=CA%7C2020e87b713c54ddc925e4c38b0bf705; path=/; secure
Try this.
def getPrice(card, game):
url = 'https://steamcommunity.com/market/search/render/?query='
url = url+card+" "+game
url = url.replace(" ", "+")
print(url)
while True:
try:
data = urllib.request.urlopen(url)
h = data.getheaders()
for item in h:
print(item)
json_data = json.loads(data.read())
pprint.pprint(json_data)
except Exception as e:
import time
milliseconds = 10000
time.sleep(milliseconds)
Use your milliseconds value.

Get a header with Python and convert in JSON (requests - urllib2 - json)

I’m trying to get the header from a website, encode it in JSON to write it to a file.
I’ve tried two different ways without success.
FIRST with urllib2 and json
import urllib2
import json
host = ("https://www.python.org/")
header = urllib2.urlopen(host).info()
json_header = json.dumps(header)
print json_header
in this way I get the error:
TypeError: is not
JSON serializable
So I try to bypass this issue by converting the object to a string -> json_header = str(header)
In this way I can json_header = json.dumps(header) but the output it’s weird:
"Date: Wed, 02 Jul 2014 13:33:37 GMT\r\nServer: nginx\r\nContent-Type:
text/html; charset=utf-8\r\nX-Frame-Options:
SAMEORIGIN\r\nContent-Length: 45682\r\nAccept-Ranges: bytes\r\nVia:
1.1 varnish\r\nAge: 1263\r\nX-Served-By: cache-fra1220-FRA\r\nX-Cache: HIT\r\nX-Cache-Hits: 2\r\nVary: Cookie\r\nStrict-Transport-Security:
max-age=63072000; includeSubDomains\r\nConnection: close\r\n"
SECOND with requests
import requests
r = requests.get(“https://www.python.org/”)
rh = r.headers
print rh
{'content-length': '45682', 'via': '1.1 varnish', 'x-cache': 'HIT',
'accept-ranges': 'bytes', 'strict-transport-security':
'max-age=63072000; includeSubDomains', 'vary': 'Cookie', 'server':
'nginx', 'x-served-by': 'cache-fra1226-FRA', 'x-cache-hits': '14',
'date': 'Wed, 02 Jul 2014 13:39:33 GMT', 'x-frame-options':
'SAMEORIGIN', 'content-type': 'text/html; charset=utf-8', 'age':
'1619'}
In this way the output is more JSON like but still not OK (see the ‘ ‘ instead of “ “ and other stuff like the = and ;).
Evidently there’s something (or a lot) I’m not doing in the right way.
I’ve tried to read the documentation of the modules but I can’t understand how to solve this problem.
Thank you for your help.
There are more than a couple ways to encode headers as JSON, but my first thought would be to convert the headers attribute to an actual dictionary instead of accessing it as requests.structures.CaseInsensitiveDict
import requests, json
r = requests.get("https://www.python.org/")
rh = json.dumps(r.headers.__dict__['_store'])
print rh
{'content-length': ('content-length', '45474'), 'via': ('via', '1.1
varnish'), 'x-cache': ('x-cache', 'HIT'), 'accept-ranges':
('accept-ranges', 'bytes'), 'strict-transport-security':
('strict-transport-security', 'max-age=63072000; includeSubDomains'),
'vary': ('vary', 'Cookie'), 'server': ('server', 'nginx'),
'x-served-by': ('x-served-by', 'cache-iad2132-IAD'), 'x-cache-hits':
('x-cache-hits', '1'), 'date': ('date', 'Wed, 02 Jul 2014 14:13:37
GMT'), 'x-frame-options': ('x-frame-options', 'SAMEORIGIN'),
'content-type': ('content-type', 'text/html; charset=utf-8'), 'age':
('age', '1483')}
Depending on exactly what you want on the headers you can specifically access them after this, but this will give you all the information contained in the headers, if in a slightly different format.
If you prefer a different format, you can also convert your headers to a dictionary:
import requests, json
r = requests.get("https://www.python.org/")
print json.dumps(dict(r.headers))
{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT",
"accept-ranges": "bytes", "strict-transport-security":
"max-age=63072000; includeSubDomains", "vary": "Cookie", "server":
"nginx", "x-served-by": "cache-at50-ATL", "x-cache-hits": "5", "date":
"Wed, 02 Jul 2014 14:08:15 GMT", "x-frame-options": "SAMEORIGIN",
"content-type": "text/html; charset=utf-8", "age": "951"}
If you are only interested in the header, make a head request. convert the CaseInsensitiveDict in a dict object and then convert it to json.
import requests
import json
r = requests.head('https://www.python.org/')
rh = dict(r.headers)
json.dumps(rh)
import requests
import json
r = requests.get('https://www.python.org/')
rh = r.headers
print json.dumps( dict(rh) ) # use dict()
result:
{"content-length": "45682", "via": "1.1 varnish", "x-cache": "HIT", "accept-ranges": "bytes", "strict-transport-security": "max-age=63072000; includeSubDomains", "vary": "Cookie", "server": "nginx", "x-served-by": "cache-fra1224-FRA", "x-cache-hits": "5", "date": "Wed, 02 Jul 2014 14:08:04 GMT", "x-frame-options": "SAMEORIGIN", "content-type": "text/html; charset=utf-8", "age": "3329"}
I know this is an old question, but I stumbled across it when trying to put together a quick and dirty Python curl-esque URL getter. I kept getting an error:
TypeError: Object of type 'CaseInsensitiveDict' is not JSON serializable
The above solutions are good if need to output a JSON string immediately, but in my case I needed to return a python dictionary of the headers, and I wanted to normalize the capitalization to make all keys lowercase.
My solution was to use a dict comprehension:
import requests
response = requests.head('https://www.python.org/')
my_dict = {
'body': response.text,
'http_status_code': response.status_code,
'headers': {k.lower(): v for (k, v) in response.headers.items()}
}

Python Requests: response object does not contain 'status' header

This is Requests 1.1.0 and Python 2.6.4 (also same behavior on Python 2.7.2).
>>> import requests
>>> response = requests.get('http://www.google.com')
>>> response.status_code
200
>>> print response.headers.get('status')
None
According to the docs, there should be a headers['status'] entry with a string like "200 OK".
Here is the full contents of the headers dict:
>>> response.headers
{'x-xss-protection': '1; mode=block', 'transfer-encoding': 'chunked', 'set-cookie': 'PREF=ID=74b29ee465454efd:FF=0:TM=1362094463:LM=1362094463:S=Xa96iJQX_9BrC-Vm; expires=Sat, 28-Feb-2015 23:34:23 GMT; path=/; domain=.google.com, NID=67=IH21bLPTK2gLTHCyDCMEs3oN5g1uMV99U4Wsc2YA00AbFt4fQCoywQNEQU0pR6VuaNhhQGFCsqdr0FnWbPcym-pizo0xVuS6WBJ9EOTeSFARpzrsiHh6HNnaQeCnxCSH; expires=Fri, 30-Aug-2013 23:34:23 GMT; path=/; domain=.google.com; HttpOnly', 'expires': '-1', 'server': 'gws', 'cache-control': 'private, max-age=0', 'date': 'Thu, 28 Feb 2013 23:34:23 GMT', 'p3p': 'CP="This is not a P3P policy! See http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657 for more info."', 'content-type': 'text/html; charset=ISO-8859-1', 'x-frame-options': 'SAMEORIGIN'}
Here is where I got the idea that this dict should contain a 'status' entry.
Am I doing something wrong?
You're looking for the "reason"
>>> x=requests.get("http://apple.adam.gs")
>>> x.reason
'OK'
>>>
custom.php contains:
header("HTTP/1.1 200 Testing")
Results in:
>>> x=requests.get("http://apple.adam.gs/custom.php")
>>> print x.reason
Testing
>>>

Categories

Resources