I'm using python without a server to deploy to. Im trying to test the accept payment flow for Paypal.
In the code after sending a POST request I store the result into a local file and then open this file using "webbrowser".
However I suspect that I am missing something here since once I log in as a user I'm not automatically redirected to authorize the transaction but rather just log in.
Now I suspect this is because once I hit the Payment API endpoint it is redirecting me to the login API with some parameters
http://www.paypal.com?hypotheticalredirecturl=etc
Now I am capturing the response i.e - the html of the paypal login page. However the whole
URL cannot be captured. Hence the part
hypotheticalredirecturl=etc
cannot be captured and I think that this is stopping the flow from going to the
authorization page after I log in as a user.
I think if I appended the "hypotheticalredirect" part to my webpage after opening it using "webbrowser" I might be able make the flow normal.
Does any one know of any way to capture the url of the response?
I tried looking into the page itself but I dont think its there.
Any help will be appreciated.
Thanks,
Ashwin
EDIT : Using urllib and urllib2. Should I be looking at httplib?
Related
I want to write a python script for a website that requires a login to enable some features and I want to find out, what I need to put in the header of my script requests (e.g. authentication token and other parameters), so they are executed the same way as requests over the browser.
Does wireshark help with this if the website uses HTTPS?
Or is my only option executing a browser script with Selenium after a manual login?
For anyone else with the same issue: you don't need to traffic your traffic from outside the browser. Just...
use Google Chrome
open developer tools
click on the Network Tab
clear the data
and do a request in the tab where the dev-tools are
open
You should see the initial request at the top followed by
subsequent ones (advertising, external image-server etc).
You can
rightclick the initial request, save it as a .har-file and use
something like https://toolbox.googleapps.com/apps/har_analyzer/ to
extract the headers of both or the request and the response.
Now you know what parameters (key and value) you need in your header and can even use submitted values like tokens and cookies in your python script
From this question, the last responder seems to think that it is possible to use python to open a webpage, let me sign in manually, go through a bunch of menus then let the python parse the page when I get where I want. The website has a weird sign in procedure so using requests and passing a user name and password will not be sufficient.
However it seems from this question that it's not a possibility.
SO the question is, is it possible? if so, do you know of some example code out there?
The way to approach this problem is when you login normally have the developer tools next to you and see what the request is sending.
When logging in to bandcamp the XHR request that's being sent is the following:
From that response you can see that an identity cookie is being sent. That's probably how they identify that you are logged in. So when you've got that cookie set you would be authorized to view logged in pages.
So in your program you could login normally using requests, save the cookie in a variable and then apply the cookie to further requests using requests.
Of course login procedures and how this authorization mechanism works may differ, but that's the general gist of it.
So when do you actually need selenium? You need it if a lot of the things are being rendered by javascript. requests is only able to get the html. So if the menus and such is rendered with javascript you won't ever be able to see that information using requests.
I am writing a web scraping application. When I enter the URL directly into a browser, it displays the JSON data I want.
However, if I use Python's request lib, or URLDownloadToFile in C++, it simply downloads the html for the login page.
The site I am trying to scrape it from (DraftKings.com) requires a login. The other sites I scrape from don't.
I am 100% sure this is related, since if I paste the url when I am logged out, I get the login page, rather than the JSON data. Once I log in, if I paste the URL again, I get the JSON data again.
The thing is that if I remain logged in, and then use the Python script or C++ app to download the JSON data, as mentioned.... it downloads the Login HTML.
Anyone know how I can fix this issue?
Please don't ask us to help with an activity that violates the terms of service of the site you are trying to (ab-)use:
Using automated means (including but not limited to harvesting bots, robots, parser, spiders or screen scrapers) to obtain, collect or access any information on the Website or of any User for any purpose.
Even if that kind of usage were allowed, the answer would be boring:
You'd need to implement the login functionality in your scraper.
I am using Python and a javascript or php sdk.
To obtain the access_token - I follow the steps indicated on the docs page (https://developers.facebook.com/docs/authentication/). I pass the redirect url to dialog/oauth and obtained the access_token. Once this is done, all output html being sent back to the browser gets rendered into to a new page, leaving the facebook iframe/canvas. (fyi, all output is done thru the usual 'self.response.out.write' function call).
Seems the PHP sdk hides this and I can't find a way to get : http://www.facebook.com/dialog/oauth?client_id=%s&redirect_uri=%s" dialog to send the output from the redirected url to the iframe/canvas that triggered the application. This is a 'Page Tab' app (not a 'App on Facebook') so have set the "Page Tab Name" and "Page Tab Url" on the basic apps config page.
I have not implemented session yet and I am wondering if that is necessary to pass the iframe target as a state variable and have it passed back along with the redirect to the uri.
I have searched many posts/etc and no luck and any help would be much appreciated !!
Tab Page Application undocumented steps :
The confusing part is the CANVAS_PAGE_URL in the example. This needs to be the web-hosted app URL (e.g. https://www.appname.appspot.com/).. This not clearly defined..
If access_token and the user_id is not found on the signed_request an auth dialog needs to be done (as per the page documentation). This needs to be done thru the script top.location.ref to endure that it launches as a dialog. This goes to a new page overwriting the canvas (or the fan-page) that triggered the app.
When the user allows the permissions on the app, the app is called thru the tab-page-canvas-url?code=".....". At this point, a redirect needs to be done (which is not documented anywhere. I had to look at the php sdk code to figure this out (fbmain.php line 17) (redirect() in python and header() php). The redirect needs to take the url for the app on the fan page : http://www.facebook.com/FAN_PAGE_NAME?sk=app_nnnnnnn
It took many hours of research and digging to understand this and hope it helps (I see a lot of questions raised around page breaking out of the iframe/canvas) and the basic problem is the extra redirect step that FB does not document anywhere...
(mail me and I am happy to share python code that is now all working nicely)
It is stated elsewhere, but to make clear: the reason your app breaks out of facebook is because the authentication dialogs get away from the original apps.facebook.com url.
This may only happen with extended permissions as the new permissions screen is two pages instead of one.
Once the authorization process is complete, the browser is redirected to the fully qualified app url on your server.
The "fix" is to send the browser back to the Facebook app using it's http://apps.facebook.com/appname address.
[that doesn't seem like good "flow" to most people, but that is how it is right now. I think there may be a different route by using the "Authenticated Referrals" on the "Auth Dialog" page of the apps, but haven't used it yet]
I use the PHP SDK and here is what I do:
Check for the "state" request parameter when your redirct_url is called after authorization. Some people had suggested using the "code" parameter, but I do not see it being returned.
''
// after completing the first authorization, the redirect url may send users away from Facebook to the redirect url itself.
//This php code redirects them back to the app page
if (isset($_GET['state'])){ header("Location: http://apps.facebook.com/appname']); exit; }
If you know a better way, please let me know!
I am wondering on how to log in to specific site, however no luck so far.
The way it happens on browser is that you click on button, it triggers jQuery AJAX request to /ajax/authorize_ajax.html with post variables login and pass. When it returns result = true it reloads document and you are logged in.
When I go to /ajax/authorize_ajax.html on my browser it gives me {"data": [{"result":false}]} in response. Using C# I did went to this address and posted login and pass and it gave me {"data": [{"result":true}]} in response. However then, of course, when I go back to main folder of the website I'm not logged in.
Can anyone help me solve this problem? I think that cookies are set via javascript, is it even possible in that case? I did some research and all I could do is this, please help me to get around with this problem. Used urllib in python and web libraries in .NET.
EDIT 0
It is setting cookie in response headers. SID, PATH & DOMAIN.
Example: sid=bf32b9ff0dfd24059665bf1d767ad401; path=/; domain=site
I don't know how to save this cookie and go back to / using this cookie. I've never done anything like this before, can someone give me some example using python?
EDIT 1
All done, thanks to this post - How to use Python to login to a webpage and retrieve cookies for later usage?
Here's a blog post I did a while ago about using an HttpWebRequest to post to a site when cookies are involved:
http://crazorsharp.blogspot.com/2009/06/c-html-screen-scraping-part-2.html
The idea is, when you get a Response using the HttpWebRequest, you can get access to the Cookies that are sent down. For every subsequent request, you can new up a CookieContainer on the request object, and add the cookies that you got into that container.