Python2 urllib/urllib2 wrong URL issue - python

I am coding a Python2 script to perform some automatic actions in a website. I'm using urllib/urllib2 to accomplish this task. It involves GET and POST requests, custom headers, etc.
I stumbled upon an issue which seems to be not mentioned in the documentation. Let's pretend we have the following valid url: https://stackoverflow.com/index.php?abc=def&fgh=jkl and we need to perform a POST request there.
How my code looks like (please ignore if you find any typo errors):
data = urllib.urlencode({ "data": "somedata", "moredata": "somemoredata" })
urllib2.urlopen(urllib2.Request("https://stackoverflow.com/index.php?abc=def&fgh=jkl", data))
No errors are shown, but according to the web server, the petition is being received to "https://stackoverflow.com/index.php" and not to "https://stackoverflow.com/index.php?abc=def&fgh=jkl". What is the problem here?
I know that I could use Requests, but I'd like to use urllib/urllib2 first.

If I'm not wrong, you should pass your request data in data dictionary you passed to the url open() function.
data = urllib.urlencode({'abc': 'def', 'fgh': 'jkl'})
urllib2.urlopen(urllib2.Request('http://stackoverflow.com/index.php'))
Also, just like you said, use Requests unless you absolutely need the low level access urllib provides.
Hope this helps.

Related

How to achieve this python post and get?

I've been trying for hours using requests and urllib. I'm so lost, misunderstood by google too. Some tips, or even anything would be useful. Thank you.
Goals: Post country code and phone numbers, then get mobile carrier etc.
Problem: Not printing anything. Variable "name" prints out None.
def do_ccc(self): #Part of bigger class
"""Phone Number Scan"""
#prefix=input("Phone Number Prefix:")
#number=input("Phone Number: ")
url=("https://freecarrierlookup.com/index.php")
from bs4 import BeautifulSoup
import urllib
data = {'cc' : "COUNTRY CODE",
'phonenum' : "PHONE NUMBER"}#.encode('ascii')
data=json.dump(data,sys.stdout)
page=urllib.request.urlopen(url, data)
soup=BeautifulSoup(page, 'html.parser')
name=soup.find('div', attrs={'class': 'col-sm-6 col-md-8'})
#^^^# Test(should print phone number)
print(name)
As Zags pointed out, it is not a good idea to use a website and violate their terms of service, especially when the site of offers a cheap API.
But answering your original question:
You are using json.loads instead of json.load resulting in an empty an empty data object.
If you look at the page, you will see that the URL for POST requests is different, getcarrier.php instead of index.php.
You would also need to convert your str from json.dumps to bytes and even then the site will reject your calls, since a hidden token is added to each request submitted by the website to prevent automatic scraping.
The problem is with what you're code is trying to do. freecarrierlookup.com is just a website. In order to do what you want, you'd need to do web scraping, which is complicated, unmaintainable, and usually a violation of the site's terms of service.
What you need to do is find an API that provides the data you're looking for. A good API will usually have either sample code or a Python library that you can use to make requests.

Convert Node.js HTTP request to Python

I have this small piece of code in node.js, which makes an API request and I need to convert this into python requests.get()....
import got from 'got'
got.get(`https://example.com/api`, {
json: true,
query: {
4,
data: `12345`
},
})
So my python code would start like this:
import requests
requests.get('https://example.com/api')
But how can I add the parameters
json: true,
query: {
4,
data: `12345`
},
in the python request ?
I would highly recommend looking at available docs when trying to solve problems like this, you'll get the answer a lot quicker generally and learn a lot more - I have linked the docs within this answer to make it easier to explore them for future use. I have never used the nodejs got library, but looked at the docs to identify what each of the parameters mean, the npm page has good documentation for this:
json - Sets content type header to "application/json", sets
accept header to "application/json", and will automatically run
JSON.parse(response). I am not aware of your familiarity with http
headers, but more information can be looked up on MDN and a list
of headers can be found on the wikipedia article for header
fields.
query - This sets the query string for the request.
I assume you are familiar with this, but more information can be
checked on the wikipedia for query string.
So, from the above it looks lke you are trying to send the following request to the server:
URL (with query string): https://example.com/api?4&data=12345
Headers
type: json/appication
accept: json/application
I would recommend reading through the python requests library user guide to get a better understanding of how to use the library.
For setting custom headers, the optional "headers" parameter can be used.
For query string, the optional "params" parameter allows for this. The only problem with params is the lack of support for a valueless key (the 4 in your example), to get around this encoding the query string in the url directly may be the best approach until the requests library supports this feature. Not sure when support will be available, but I did find a closed issue on GitHub mentioning potential support in a later version.

Python Requests Recreate Post Request with Cookies

So I was looking at my chrome console for a post request that I was making, and there is a 'cookie' value in the header file that has this data:
strTradeLastInventoryContext=730_2; bCompletedTradeOfferTutorial=true; steamMachineAuth76561198052177472=3167F37117************B82C2E; steamMachineAuth76561198189250810=E292770040E************B5F97703126DE48E; rgDiscussionPrefs=%7B%22cTopicRepliesPerPage%******%7D; sessionid=053257f1102e4967e2527ced; steamCountry=US%7C708d3************e569cc75495; steamLogin=76561198052177472%7C%7C4EC6FBDFA0****************12DE568; steamLoginSecure=765611*********************44BEC4E8BDA86264E; webTradeEligibility=%7B%22allowed%22%3A1%2C%22allowed_at_time%22%3A0%2C%22steamguard_required_days%22%3A15%2C%22sales_this_year%22%3A9%2C%22max_sales_per_year%22%3A200%2C%22forms_request***************cooldown_days%22%3A7%7D; strInventoryLastContext=730_2; recentlyVisitedAppHubs=42700%2C2***********930%2C440; timezoneOffset=-14400,0; __utma=268881843.1147920287.1419547163.1431887507.1431890089.151; __utmb=268881843.0.10.1431890089; __utmc=268881843; __utmz=268881843.1431885538.149.94.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided)
I starred out some of the cookie's data so my trade account couldn't be robbed, but you should get the point. How should I go about recreating the cookie? Like should I create a dict with the keys being the values before the '=' in the cookie and the value being what comes after the '=' sign? Sorry if the question is unclear, I'm not sure how to go about doing this. Any help would be great!
Ex. cookie = {strTradeLastInventoryContext: 730_2, ...}
There are really two options here.
If you happen to have the exact Cookie header you want to reproduce exactly as one big string (e.g., to have a requests-driven job take over a session you created in a browser, manually or using selenium or whatever), you can just pass that as an arbitrary header named Cookie instead of figuring out how to break it apart just so requests can (hopefully) reassemble the same header you wanted.
If, on the other hand, you need to create parts of it dynamically, then yes, you will want to do what you're doing—pull it apart to build a dict named cookie, then use it with requests.get(url, cookies=cookie), or req.cookies.update(cookie) or similar (if you're using sessions and prepared requests). Then you can easily modify the dict before sending it.
But the easiest way to do that is not to pull the cookie apart manually. I'm pretty sure the WebKit Developer Tools have a way to do that for you directly within Chrome. Or, if not, you can just copy the cookie as a string and then use the http.cookies module (called cookie in Python 2.x), like this:
cookie = http.cookies.BaseCookie(cookie_string)
Also, note that in many cases, you really don't even need to do this. If you can drive the login and navigation directly from requests instead of starting off in Chrome, it should end up with the full set of cookies it needs in each request. You may need to use a Session, but that's as hard as it gets.
You may want to look at the requests documentation for cookies.
You are right in that the cookie value is passed to the get call as a dictionary key/value.
cookies = {'cookie_key': 'somelongstring'}
requests.get(url, cookies=cookies)

404 error while doing an api call to Reddit

According to their documentation:
This should be enough to get the hottest new reddit submissions:
r = client.get(r'http://www.reddit.com/api/hot/', data=user_pass_dict)
But it doesn't and I get a 404 error. Am I getting the url for data request wrong?
http://www.reddit.com/api/login works though.
Your question specifically asks what you need to do to get the "hottest new" submissions. "Hottest new" doesn't really make sense as there is the "hot" view and a "new" view. The URLs for those two views are http://www.reddit.com/hot and http://www.reddit.com/new respectively.
To make those URLs more code-friendly, you can append .json to the end of the URL (any reddit URL for that matter) to get a json-representation of the data. For instance, to get the list of "hot" submissions make a GET request to http://www.reddit.com/hot.json.
For completeness, in your example, you attempt to pass in data=user_pass_dict. That's definitely not going to work the way you expect it to. While logging in is not necessary for what you want to do, if you happen to have need for more complicated use of reddit's API using python, I strongly suggest using PRAW. With PRAW you can iterate over the "hot" submissions via:
import praw
r = praw.Reddit('<REPLACE WITH A UNIQUE USER AGENT>')
for submission in r.get_frontpage():
# do something with the submission
print(vars(submission))
According to the docs, use /hot rather than /api/hot:
r = client.get(r'http://www.reddit.com/hot/', data=user_pass_dict)

How can I use python to "send" data (images) to ImageBam.com

I've read a lot about multipart/forms, mechanize and twill, but I couldn' findout howto implement a code.
Using MultipartPostHandler to POST form-data with Python
First I Tried to fill the forms on
www.imagebam.com/basic-upload
I can fill the forms but cant send the data really even if I submit() it.
after looking the source code at the page above, I realized all I need to do is "post" data in correct content-type to the page (correct me if Im wrong please)
http://www.imagebam.com/sys/upload/save
directly..
I tried to use poster.py, but couldnt understand how this stuff works. I can use mechanize and twill a little bit, but I am stucked since this is more complex than simple form posting, I think.
So my questions;
-How can I use poster.py (or user-created multipartform classes) to upload images to imagebam.com
-or any other alternative solutions :)
Don't rely completely on third party libraries like mechanize. Either implement its official api in python API ImageBam
or see this project developed in pyqt4 pymguploader to upload image and than try to implement yourself.
Mechanize is not the right tool for the task.
Implementing http://code.google.com/p/imagebam-api/ in python is way more robust.
The examples are in PHP/curl, converting them to python/urllib2 should be trivial.
Yes! I did it. I used
this question.
Here is the code:
>>> from poster.encode import multipart_encode
>>> from poster.streaminghttp import register_openers
>>> import urllib2
>>> register_openers()
<urllib2.OpenerDirector instance at 0x02CDD828>
>>> datagen, headers = multipart_encode({"file[]": open("D:\hedef\myfile.jpg","rb"),"content_type":"1","thumb_size":"350"})
>>> request = urllib2.Request("http://www.imagebam.com/sys/upload/save", datagen, headers)
>>> print urllib2.urlopen(request).read()
Now all I need to do is use BeautifulSoup to fecth the thumbnail codes :)

Categories

Resources