I have a CGI script for which I've successfully set a cookie (which I can see in Firefox/Chrome!) which has (say) the name uid and the content 1. I don't seem to understand how to access this cookie from another CGI script--and I'm working in Python 2.4 so a lot of the examples I've found may not apply.
This code prints "can't get uid" followed by the rest of the page:
c = Cookie.SimpleCookie(os.environ.get("HTTP_COOKIE"))
print("Content-Type: text/html")
print c.output()
print("\n\n")
uid = c.get("uid")
#uid = c["uid"].value # this would create an error and page would fail totally
if uid is None:
print("can't get uid")
uid = 1 # set manually to prevent the rest of the page from failing
I haven't done anything fishy with the domain the cookie applies to, so I don't understand why this doesn't grab the uid value. By the way, if I try to print c.output(), it's blank.
First thing is are you sure the webserver or the framework is setting the HTTP_COOKIE environment variable?
Otherwise, in one of your script you may want to store the cookies in the CookieJar file in the file system and access the set cookies from there.
import cookielib
COOKIEFILE = 'Cookies.lwp'
cookiejar = cookielib.LWPCookieJar()
cookiejar.load(COOKIEFILE)
cookiejar["uid"] = 1
cookiejar.save(COOKIEFILE)
Load the same cookiejar and do the get of uid in the other script.
Okay, I think I figured this out! I confirmed that os.environ.get("HTTP_COOKIE") was getting something, and then played with the order of the elements in my tiny test until it worked. Then I reproduced that order in my more complicated script. (Specifically: content type declaration, two newlines, get cookie, get value from cookie, everything else.)
The main thing I've learned about Python and CGI is that the order of elements (starting with the content type declaration) is very fussy. Thanks very much for the hints in the right direction.
Related
I'm working with Python requests module and have ran into an issue I can't seem to resolve. I'm relatively new to requests, but even with googling/documentation I have hit a wall.
I am trying to get a link (or "domain") from a specific cookie that is in the response of a GET request i've made. I can only seem to print the cookie, and not the domain. Explanation below:
Code (See comments):
import pickle
import requests
import time
r = requests.post('https://example.com/AddToCartURL')
print("----------------------------------------")
cookies = r.cookies #prints all cookies
print(r.cookies)
time.sleep(3)
# Code below will now use cookies and do a GET request to checkout URL
r = requests.get("https://example.com/checkout", cookies=cookies)
print("HEADERS")
time.sleep(1)
print(r.headers)
print("---------")
print("Cookie")
print(r.cookies)
print("---------") # THIS IS WHERE MY ISSUE IS :
print(r.cookies['checkout']) # This prints the cookie itself
print(r.cookies['checkout']['domain'])
outputs & issues:
So here are my issues:
#1 - The cookieJar cookie is shown like so when I print r.cookies:
<Cookie checkout=r3jk43nb42knj32--fjnk3jk2jkn2323njk for www.example.com/checkout/url>
And when I print(r.cookies['checkout']) I get the cookie, obviously:
r3jk43nb42knj32--fjnk3jk2jkn2323njk
Well what I'm trying to do here, essentially, is get the domain from it, which I try to do as:
print(r.cookies['checkout']['domain'])
getting the response:
Traceback (most recent call last):
File "/Users/user/PycharmProjects/project/Main.py", line 29, in <module>
print(r.cookies['checkout']['domain'])
TypeError: string indices must be integers
Which is something you'll find from a quick google search. Documentation wise I wasn't able to find a clear answer, probably because I'm still bad at searching. I tried the obvious of using an integer, presuming they're referencing indexing. However, this prints a single character of the cookie itself.
What i'm trying to print, specifically, is example.com/checkout/url from the cookie above. I'm trying to interact with it to continue my code on through the checkout process.
This is an example of the full cookie jar: (called by print(r.cookies))
<RequestsCookieJar[<Cookie random_other_cookie= 3j32fj302fj023jfi for example.com/checkoutURL>, <Cookie checkout=r3jk43nb42knj32--fjnk3jk2jkn2323njk for example.com/checkoutURL>]>
--------------------------------
#2 - Am I tackling this the wrong way?
Little more background information, the GET request above gives a response of 301. Permanently moved. I am 99% sure that the URL that I end up at (at least, front end wise) is the URL I need/the same URL as in the cookie above.
My question is should I not be trying to grab the domain from the cookie, and instead somehow try to grab the redirection URL?
(aka the URL that the request ends up at, not the original url https://example.com/checkout)
------------------------------------
I hope I outlined my issues well enough. This is my first post on StackOverflow after months of lurking around for answers.
Thank you.
On Facebook I want to find fb_dtsg to make a status:
import urllib, urllib2, cookielib
jar = cookielib.CookieJar()
cookie = urllib2.HTTPCookieProcessor(jar)
opener = urllib2.build_opener(cookie)
data = urllib.urlencode({'email':"email",'pass':"password", "Log+In":"Log+In"})
req = urllib2.Request('http://www.facebook.com/login.php')
opener.open(req, data)
opener.open(req, data) #Needs to be opened twice to log on.
req2 = urllib2.Request("http://www.facebook.com/")
page = opener.open(req2)
fb_dtsg = page[page.find('name="fb_dtsg"') + 22:page.find('name="fb_dtsg"') + 33] #This just finds the value of "fb_dtsg".
Yes, this does find a value, and a value that looks like fb_dtsg would look like, but this value is always changing when I would open the webpage again and also when I would use it to make a status, it would not work, and when I would record what is happening on google chrome if I was making a status normally, I would get an working fb_dtsg value and it would not change (for a long session), and would work if I used it to try make a status. Please, please show me how I can fix this up without using the API.
The searching criteria to find fb_dtsg truncates last digit, so change 33 to 34
fb_dtsg = page[page.find('name="fb_dtsg"') + 22:page.find('name="fb_dtsg"') + 34]
Anyways you can use a better way of searching the fb_dtsg using re
re.findall('fb_dtsg.+?value="([^"]+)"',page)
As I answered in one of your early posts it may also require other hidden variables also.
If this still doesn't work, can you provide the code where you are making the post including all the post form data
BTW sorry for not looking at all your previous posts with same content :P
I am so frustrated trying to use Ruby to fetch a specific url content.
I've tried many different ways like open-uri, standard request none worked so far. I always get empty html. I also tried to use python to fetch the same url which always returned the correct html content. I am really not sure why... Please help as I am newbiew to both Ruby and Python... I want to use Ruby (prefer the tidy syntax and human friendly function names, easier to install libs using gem and homebrew (on mac) than python easy_install) but I am now considering Python because it just works (yet still trying to get my head around 2.x and 3.x issue). I may be doing something really stupid but I think is very unlikely.
ruby 1.9.2p136 (2010-12-25 revision 30365) [i386-darwin10.6.0]
Implementation 1:
url = URI.parse('http//:www.stackoverflow.com/') req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) }
puts res.body #empty
Implementation 2:
doc = Nokogiri::HTML(open("http//:www.stackoverflow.com/", "User-Agent" => "Safari"))
#empty
#I tried to use without user agent, without Nokogiri none worked.
Python Implementation which worked every time perfectly
f = urllib.urlopen("http//:www.stackoverflow.com/")
# Read from the object, storing the page's contents in 's'.
s = f.read()
f.close()
print s
If that is your exact code it is invalid for several reasons.
http: should be http://
URL needs a path. if you want the root page of example.com it needs to be http://example.com/ the trailing slash is significant.
if you put 2 lines of code on one line you need to use ; to denote the end of the first line
SO
require 'net/http'
url = URI.parse('http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) }
puts res.body
Same is true with using open in nokogiri
EDIT: that site is returning bad results many times:
counter = 0
20.times do
url = URI.parse('http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia')
req = Net::HTTP::Get.new(url.path)
res = Net::HTTP.start(url.host, url.port) {|http| http.request(req) }
sleep 1
counter +=1 unless res.body.empty?
end
puts counter
for me this only returned once a non empty body. If you substitute in another site it works all the time
curl "http://www.yellowpages.com.au/search/listings?clue=plumber&locationClue=Australia"
Yields the same inconsistent results.
Two examples with openURI (standard lib), a wrapper for (among others) the rather cumbersome Net::HTTP :
require 'open-uri'
open("http://www.stackoverflow.com/"){|f| puts f.read}
puts URI::parse("http://www.google.com/").read
I'm trying to learn to use Python to create dynamic web content. Problem I'm having right out the door, though, is that when I try to do a mySQL query, absolutely nothing happens. There's no error message... it looks like the script simply stops running when I import the module that enables connection to the database.
This does do exactly what I'd expect when I try to run it from the command line.
#!/usr/bin/python
print "Content-Type: text/xml"
print
#if I type, for example, print "<b>test</b>" here, it appears in the browser window
#msql contains the credentials for connecting to database
#it is NOT in public_html
import msql
#no print instructions after this point are followed
connex=msql.msqlConn()
db=msql.MySQLdb
cursor=connex.cursor(db.cursors.DictCursor)
cursor.execute("SELECT * FROM userActions")
#run the query
xmlOutput=""
rows=cursor.fetchall()
#output the results
for row in rows:
xmlOutput+="<action>"
xmlOutput+="<actionId>"+str(row["actionId"])+"</actionId>"
xmlOutput+="<userId>"+str(row["userId"])+"</userId>"
xmlOutput+="<actText>"+str(row["action"])+"</actText>"
xmlOutput+="<date>"+str(row["dateStamp"])+"</date>"
xmlOutput+="</action>"
xmlOutput="<list>"+xmlOutput+"</list>"
print xmlOutput
This would be my first stab at this, so it merely stands to reason that this should work. I've found nothing online that would suggest otherwise, though.
Please, take 24h to learn something like Django. Django has an ORM and a XML serializer that will make your life easier ensuring proper (and legal) xml, it really pays up.
You can enable cgi traceback to see what happens:
import cgitb
cgitb.enable()
(I don't think it causes your problem, and clients understand \n as well but it is better to use sys.write("...\r\n") instead of print to print HTTP headers)
Edited: try to add Content-length: XXXX\r\n to the header.
I'm trying to extract the response header of a URL request. When I use firebug to analyze the response output of a URL request, it returns:
Content-Type text/html
However when I use the python code:
urllib2.urlopen(URL).info()
the resulting output returns:
Content-Type: video/x-flv
I am new to python, and to web programming in general; any helpful insight is much appreciated. Also, if more info is needed please let me know.
Thanks in advance for reading this post
Try to request as Firefox does. You can see the request headers in Firebug, so add them to your request object:
import urllib2
request = urllib2.Request('http://your.tld/...')
request.add_header('User-Agent', 'some fake agent string')
request.add_header('Referer', 'fake referrer')
...
response = urllib2.urlopen(request)
# check content type:
print response.info().getheader('Content-Type')
There's also HTTPCookieProcessor which can make it better, but I don't think you'll need it in most cases. Have a look at python's documentation:
http://docs.python.org/library/urllib2.html
Content-Type text/html
Really, like that, without the colon?
If so, that might explain it: it's an invalid header, so it gets ignored, so urllib guesses the content-type instead, by looking at the filename. If the URL happens to have ‘.flv’ at the end, it'll guess the type should be video/x-flv.
This peculiar discrepancy might be explained by different headers (maybe ones of the accept kind) being sent by the two requests -- can you check that...? Or, if Javascript is running in Firefox (which I assume you're using when you're running firebug?) -- since it's definitely NOT running in the Python case -- "all bets are off", as they say;-).
Keep in mind that a web server can return different results for the same URL based on differences in the request. For example, content-type negotiation: the requestor can specify a list of content-types it will accept, and the server can return different results to try to accomodate different needs.
Also, you may be getting an error page for one of your requests, for example, because it is malformed, or you don't have cookies set that authenticate you properly, etc. Look at the response itself to see what you are getting.
according to http://docs.python.org/library/urllib2.html there is only get_header() method and nothing about getheader .
Asking because Your code works fine for
response.info().getheader('Set cookie')
but once i execute
response.info().get_header('Set cookie')
i get:
Traceback (most recent call last):
File "baza.py", line 11, in <module>
cookie = response.info().get_header('Set-Cookie')
AttributeError: HTTPMessage instance has no attribute 'get_header'
edit:
Moreover
response.headers.get('Set-Cookie') works fine as well, not mentioned in urlib2 doc....
for getting raw data for the headers in python2, a little bit of a hack but it works.
"".join(urllib2.urlopen("http://google.com/").info().__dict__["headers"])
basically "".join(list) will the list of headers, which all include "\n" at the end.
__dict__ is a built in python variable for all dicts, basically you can select a list out of a 2d array with it.
and ofcourse ["headers"] is selecting the list value from the .info() response value dict
hope this helped you learn a few ez python tricks :)