My python program basically submits a form by loading a URL. There is a security code that seems to change every second so that you have to actually be on the website to enter the form.
For example,
http://www.locationary.com/prizes/index.jsp?ACTION_TOKEN=index_jsp$JspView$BetAction&inTickets=125000000&inSecureCode=091823021&inCampaignId=3060745
The only solution I can think of is using something like Selenium...I don't know any other way of kind of simulating a web browser but not really having it be as heavy and slow as a web browser...any ideas? Or is there a way I can do this without browser automation?
EDIT:
Response to first answer: I DID get the security code using urllib...the problem is that it seems to already have changed by the time I try to load my submission url...so I'm just guessing/assuming that you have to do it in realtime...
Yes, you'll need to get the security code programmatically since it changes every time. You can do this manually with urllib, or you can use mechanize or Selenium to make things easier.
Related
I m a newbie so I will try to explain myself in a way it makes sense.
I produced my first ever python script to scrape data from a web page I use regularly at work. It just prints out couple of values in the console that previously I had to consult manually.
My problem is that every time I execute the script and the browser opens up, it seems the cache is cleared and I have to log in into that work webpage using my personal credentials and do the 2 factor authentication with my phone.
I m wondering wether there is a way to keep the cache for that browser (if I previously already logged into the web page) so I don´t need to go through authentication when I launch my script.
I m using selenium webdriver and chrome, and the option I have configured are these (in screenshot below). Is there perhaps another option I could add to keep cache?
Current options for browser
I tried to find info in the web but so far nothing.Many sites offer a guide on how to perform login by adding lines of code with the username and the password, but I would like to avoid that option as I still would need to use my phone for the 2 factor authentication, and also because this script could be used by some other colleagues in the future.
Thanks a lot for any tip or info :)
After days browsing everywhere, I found this post:
How to save and load cookies using Python + Selenium WebDriver
the second answer is actually the one that saved my life; I just had to add this to my series of options:
chrome_options.add_argument("user-data-dir=selenium")
see the provided link for the complete explanation of the options and imports to use.
Adding that option, I run the script for the first time and I still have to do the login manually and undergo authentication. But when I run it for the second time I don´t need any manual input; the data is scraped from the web, the result is returned and no need any manual action from me.
If anybody is interested in the topic please ping me.
Thanks!
I currently scrape a website and take a screenshot when a certain case happens.
I want to consume less bandwidth so im trying to do it via Requests.
I cant figure out how will i take screenshots but i thought of a workaround which:
Once the certain case happens - it will open chrome as usual - take screenshot - close chrome.
Any smarter way im not thinking about?
Thanks!
Request is a library for making HTTP requests. You can't "take a screenshot" with it, it makes no sense.
Maybe try Selenium instead.
We are testing videos in our website, and in order to play it should authenticate the user, get the authorization for the device he is playing and so on, check his entitlements etc.,
we have many varieties of network and video types to test. And I am in process of writing script which checks one of those calls are working fine for all type of videos.
Which is a POST call and need to build the param/data to post. Theres no direct way of getting one of the param value and this is how we do currently. We go to browser and play the video. Open the dev tools like firebug and capture the param value from the request header of the same call and use it in my script for rest of other 100 different calls to verify programmatically.
Is there a way in python to do the steps which we are manually doing? like open a url and capture all the calls which happens at the background just like how firebug does?
I am trying firePython, firelogger, mechanize to see if they help . but I have invested so much time figuring out doing this, so thought its time to approach some expert advice.
If you haven't looked at the Requests library, it's generally quite pleasant to work with and might make your life easier.
As far as I know, for a new request coming from a webapp, you need to reload the page to process and respond to that request.
For example, if you want to show a comment on a post, you need to reload the page, process the comment, and then show it. What I want, however, is I want to be able to add comments (something like facebook, where the comment gets added and shown without having to reload the whole page, for example) without having to reload the web-page. Is it possible to do with only Django and Python with no Javascript/AJAX knowledge?
I have heard it's possible with AJAX (I don't know how), but I was wondering if it was possible to do with Django.
Thanks,
You want to do that with out any client side code (javascript and ajax are just examples) and with out reloading your page (or at least part of it)?
If that is your question, then the answer unfortunately is you can't. You need to either have client side code or reload your page.
Think about it, once the client get's the page it will not change unless
The client requests the same page from the server and the server returns and updated one
the page has some client side code (eg: javascript) that updates the page.
You definitely want to use AJAX. Which means the client will need to run some javascript code.
If you don't want to learn javascript you can always try something like pyjamas. You can check out an example of it's HttpRequest here
But I always feel that using straight javascript via a library (like jQuery) is easier to understand than trying to force one language into another one.
To do it right, ajax would be the way to go BUT in a limited sense you can achieve the same thing by using a iframe, iframe is like another page embedded inside main page, so instead of refreshing whole page you may just refresh the inner iframe page and that may give the same effect.
More about iframe patterns you can read at
http://ajaxpatterns.org/IFrame_Call
Maybe a few iFrames and some Comet/long-polling? Have the comment submission in an iFrame (so the whole page doesn't reload), and then show the result in the long-polled iFrame...
Having said that, it's a pretty bad design idea, and you probably don't want to be doing this. AJAX/JavaScript is pretty much the way to go for things like this.
I have heard it's possible with AJAX...but I was
wondering if it was possible to do
with Django.
There's no reason you can't use both - specifically, AJAX within a Django web application. Django provides your organization and framework needs (and a page that will respond to AJAX requests) and then use some JavaScript on the client side to make AJAX calls to your Django-backed page that will respond correctly.
I suggest you go find a basic jQuery tutorial which should explain enough basic JavaScript to get this working.
I've had a look at many tutorials regarding cookiejar, but my problem is that the webpage that i want to scape creates the cookie using javascript and I can't seem to retrieve the cookie. Does anybody have a solution to this problem?
If all pages have the same JavaScript then maybe you could parse the HTML to find that piece of code, and from that get the value the cookie would be set to?
That would make your scraping quite vulnerable to changes in the third party website, but that's most often the case while scraping. (Please bear in mind that the third-party website owner may not like that you're getting the content this way.)
I responded to your other question as well: take a look at mechanize. It's probably the most fully featured scraping module I know: if the cookie is sent, then I'm sure you can get to it with this module.
Maybe you can execute the JavaScript code in a JavaScript engine with Python bindings (like python-spidermonkey or pyv8) and then retrieve the cookie. Or, as the javascript code is executed client side anyway, you may be able to convert the cookie-generating code to Python.
You could access the page using a real browser, via PAMIE, win32com or similar, then the JavaScript will be running in its native environment.