How can I read a URL without using urlib2.urlopen in Python? - python

I've been reading and displaying a Facebook Graph api url form the Facebook Graph API like this:
facebook_info = urllib2.urlopen("https://graph.facebook.com/%s/me?fields=first_name,last_name,email&access_token=" % settings.FACEBOOK_API_VERSION + access_token)
facebook_info = facebook_info.read()
return facebook_info
I was wondering if there is a better way to do this in Python I was thinking something like Request.get(...). Where I don't use urllib2.urlopen and the the '+' sign to concatenate.

although requests library is excellent and recommended for its ease of use, I don't think using requests.get will be inherently better in this case. Your code seems fine and will work perfectly for what it does. Why do you want to change it? Style?
Or you want to build the url in a more clear way, perhaps?
url_template = "https://graph.facebook.com/{api_version}/me?fields=first_name,last_name,email&access_token={token}"
url = url_template.format(
api_version=settings.FACEBOOK_API_VERSION,
token=access_token,
)
facebook_info = requests.get(url).json()
return facebook_info

Related

Bottle - Is it possible to retrieve URL without parameters?

I have an URL of the form:
http://www.foo.com/bar?arg1=x&arg2=y
If I do:
request.url
I get:
http://www.foo.com/bar?arg1=x&arg2=y
Is it possible to get just http://www.foo.com/bar?
Looks like request.urlparts.path might be a way to do it.
Full documentation here.
Edit:
There is a way to do this via requests library
r.json()['headers']['Host']
I personally find the split function better.
You can use split function with ? as the delimiter to do this.
url = request.url.split("?")[0]
I'm not sure if this is the most effective/correct method though.
if you just want to remove the parameters to get base url do
url = url.split('?',1)[0]
this will split the url at the '?' and then give you base url
or even
url = url[:url.find('?')]
you can also use urlparse this is explained in the python docs at: https://docs.python.org/2/library/urlparse.html

Python - Facebook fb_dtsg

On Facebook I want to find fb_dtsg to make a status:
import urllib, urllib2, cookielib
jar = cookielib.CookieJar()
cookie = urllib2.HTTPCookieProcessor(jar)
opener = urllib2.build_opener(cookie)
data = urllib.urlencode({'email':"email",'pass':"password", "Log+In":"Log+In"})
req = urllib2.Request('http://www.facebook.com/login.php')
opener.open(req, data)
opener.open(req, data) #Needs to be opened twice to log on.
req2 = urllib2.Request("http://www.facebook.com/")
page = opener.open(req2)
fb_dtsg = page[page.find('name="fb_dtsg"') + 22:page.find('name="fb_dtsg"') + 33] #This just finds the value of "fb_dtsg".
Yes, this does find a value, and a value that looks like fb_dtsg would look like, but this value is always changing when I would open the webpage again and also when I would use it to make a status, it would not work, and when I would record what is happening on google chrome if I was making a status normally, I would get an working fb_dtsg value and it would not change (for a long session), and would work if I used it to try make a status. Please, please show me how I can fix this up without using the API.
The searching criteria to find fb_dtsg truncates last digit, so change 33 to 34
fb_dtsg = page[page.find('name="fb_dtsg"') + 22:page.find('name="fb_dtsg"') + 34]
Anyways you can use a better way of searching the fb_dtsg using re
re.findall('fb_dtsg.+?value="([^"]+)"',page)
As I answered in one of your early posts it may also require other hidden variables also.
If this still doesn't work, can you provide the code where you are making the post including all the post form data
BTW sorry for not looking at all your previous posts with same content :P

Parse what you google search

I'd like to write a script (preferably in python, but other languages is not a problem), that can parse what you type into a google search. Suppose I search 'cats', then I'd like to be able to parse the string cats and, for example, append it to a .txt file on my computer.
So if my searches were 'cats', 'dogs', 'cows' then I could have a .txt file like so,
cats
dogs
cows
Anyone know any APIs that can parse the search bar and return the string inputted? Or some object that I can cast into a string?
EDIT: I don't want to make a chrome extension or anything, but preferably a python (or bash or ruby) script I can run in terminal that can do this.
Thanks
If you have access to the URL, you can look for "&q=" to find the search term. (http://google.com/...&q=cats..., for example).
I can offer 2 popular solution
1) Google have a search-engine API https://developers.google.com/products/#google-search
(It have restriction on 100 requests per day)
cutted code:
def gapi_parser(args):
query = args.text; count = args.max_sites
import config
api_key = config.api_key
cx = config.cx
#Note: This API returns up to the first 100 results only.
#https://developers.google.com/custom-search/v1/using_rest?hl=ru-RU#WorkingResults
results = []; domains = set(); errors = []; start = 1
while True:
req = 'https://www.googleapis.com/customsearch/v1?key={key}&cx={cx}&q={q}&alt=json&start={start}'.format(key=api_key, cx=cx, q=query, start=start)
if start>=100: #google API does not can do more
break
con = urllib2.urlopen(req)
if con.getcode()==200:
data = con.read()
j = json.loads(data)
start = int(j['queries']['nextPage'][0]['startIndex'])
for item in j['items']:
match = re.search('^(https?://)?\w(\w|\.|-)+', item['link'])
if match:
domain = match.group(0)
if domain not in results:
results.append(domain)
domains.update([domain])
else:
errors.append('Can`t recognize domain: %s' % item['link'])
if len(domains) >= args.max_sites:
break
print
for error in errors:
print error
return (results, domains)
2) I wrote a selenuim based script what parse a page in real browser instance, but this solution have a some restrictions, for example captcha if you run searches like a robots.
A few options you might consider, with their advantages and disadvantages:
URL:
advantage: as Chris mentioned, accessing the URL and manually changing it is an option. It should be easy to write a script for this, and I can send you my perl script if you want
disadvantage: I am not sure if you can do it. I made a perl script for that before, but it didn't work because Google states that you can't use its services outside the Google interface. You might face the same problem
Google's search API:
advantage: popular choice. Good documentation. It should be a safe choice
disadvantage: Google's restrictions.
Research other search engines:
advantage: they might not have the same restrictions as Google. You might find some search engines that let you play around more and have more freedom in general.
disadvantage: you're not going to get results that are as good as Google's

Performing a twitter search in python using Oauth

I'm just being a bit of an idiot here, I think, but I've figured out how to fetch my timeline, but not how to modify that into performing a search. I've currently got:
consumer = oauth.Consumer(key=CONSUMER_KEY, secret=CONSUMER_SECRET)
access_token = oauth.Token(key=ACCESS_KEY, secret=ACCESS_SECRET)
client = oauth.Client(consumer, access_token)
response, data = client.request(searchURL)
I'm guessing it's the last line that'll change to work with the search, but I'm not sure how to format it, if I change the searchURL to the one used for actually searching (it's currently on timeline) it says it's in the wrong format.
Can anyone help?
Thanks.
Turns out it's off the form:
searchURL = https://api.twitter.com/1.1/search/tweets.json?q=obama&count=2&tresult_type=popular
That's an example search using the keyword "obama", setting the count to 2, and filtering for popular results.
response, data = client.request(searchURL)
tweets = json.loads(data)
The format of the returned tweets is a bit...awkward, but understandable with a bit of playing around.

how to search for specific file type with yahoo search API?

Does anyone know if there is some parameter available for programmatic search on yahoo allowing to restrict results so only links to files of specific type will be returned (like PDF for example)?
It's possible to do that in GUI, but how to make it happen through API?
I'd very much appreciate a sample code in Python, but any other solutions might be helpful as well.
Yes, there is:
http://developer.yahoo.com/search/boss/boss_guide/Web_Search.html#id356163
Thank you.
I found myself that something like this works OK (file type is the first argument, and query is the second):
format = sys.argv[1]
query = " ".join(sys.argv[2:])
srch = create_search("Web", app_id, query=query, format=format)
Here's what I do for this sort of thing. It exposes more of the parameters so you can tune it to your needs. This should print out the first ten PDFs URLs from the query "resume" [mine's not one of them ;) ]. You can download those URLs however you like.
The json dictionary that gets returned from the query is a little gross, but this should get you started. Be aware that in real code you will need to check whether some of the keys in the dictionary exist. When there are no results, this code will probably throw an exception.
The link that Tiago provided is good for knowing what values are supported for the "type" parameter.
from yos.crawl import rest
APPID="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
base_url = "http://boss.yahooapis.com/ysearch/%s/v%d/%s?start=%d&count=%d&type=%s" + "&appid=" + APPID
querystr="resume"
start=0
count=10
type="pdf"
search_url = base_url % ("web", 1, querystr, start, count, type)
json_result = rest.load_json(search_url)
for url in [recs['url'] for recs in json_result['ysearchresponse']['resultset_web']]:
print url

Categories

Resources