Feasible? Form automation webapp with selenium - [server side]

Feasible? Form automation webapp with selenium - [server side] - python

Tldr: Is it feasible to create a web tool/app that relies on server side Selenium automation.
Created a local script that automates form filling for car insurance quote websites and returns the cost to insure. Ie fill one form and it auto fills every other providers quote form and returns quotes.
But now I want to extend that functionality to others via some sort of webapp [flask/Django?] that handles a clients requests server side by fetching that information and returning it to the client based on their inputs.
What I’m struggling with is Selenium is limited to 5 web drivers (locally) I believe, and is resource intensive, so to me that means at most you can handle 5 website requests at once?

The short answer is YES.
The idea to solve the problem is as follows:
Create a web app which can fetch user's inputs and return something back to user. Just like the most common website.
Create a service in the web app. The service can handle what you wanted using Selenium, such as, filling form of car insurance and getting the cost to insure.
Create a webpage or API in the web app. The webpage/API calls the service mentioned above. When user use the webpage/API, Selenium will automate to do something.
So, it's done.

Related

The right API/scrape method to extract tables from a Qlikview app

I'm trying to get some tables with specific filters on this qlikview page, for future analysis: http://transferenciasabertas.planejamento.gov.br/QvAJAXZfc/opendoc.htm?document=painelcidadao.qvw&lang=en-US&host=QVS%40srvbsaiasprd01&anonymous=true
I don't want to do it manually (downloading tables for every filter). Therefore, I searched for API's for Python on qlikview website, but only found qliksense API's for SSE (like this https://github.com/qlik-oss/server-side-extension).
Is there any chance that I could automate the retrieving process that I explained using Python?

Server side extensions are used for something else. They extend Qlik's functionality to process data (for example running some statistical functions on top of the displayed data if such functions do not exists in Qlik natively)
Interestingly is that the portal link (http://transferenciasabertas.planejamento.gov.br) is a QlikView app that later redirects to a Qlik Sense app(s). It seems that anonymous users are allowed on the platform (which makes automating data retrieval easier).
Qlik Sense communicates with the browser via web sockets. So the answer to your question is - yes. You can used Python to connect to the underlying Qlik Sense Engine and make some selections and get the data back.
The not very good news is that I dont think there is dedicated Python library so you'll have to send the raw web socket requests by yourself. The documentation for the Engine API can be found at Qlik's help site
If you are open for JS solution then you can use Qlik's enigma.js library for Engine communication.
The web sockets traffic can be monitored from the browser (to view what data is being send/received and its format)

develop django app that support multiple clients simultaneously just like search engines?

I'm developing a web crawler in python using Django framework. i want it to work like a web-app. Means if I open in two different browser tabs, they should work individually, each having its own data (crawled + queued links). Both of them should start crawling from separate URL and continue their work.
currently i have designed very simple version of it. it is working in one tab, does not work in another browser tab. I have even tried opening a new window of chrome but same results.
I'm not sure what feature or library i should use for that purpose. can somebody help me?

You can pass some key in the URL:
URL PATTER<your_domain>/crowled/<P>
you can open each URL in different TAB
TAB1: <your_domain>/crowled/abcd
TAB2: <your_domain>/crowled/xyz
OR you can send some key on request.GET

I would create default page for your app which is a form to accept one or more URLs to crawl.
When the 'submit' button is pressed the list of URLs is stored in the database and a background process, using something such as celery, works through the queue of URLs.
You don't say anything about how the results of the crawl are to be stored/presented, so I'm assuming you just want to kickstart the crawl and the pages are stored in some way by the code crawling the sites - with no response sent to the web page.

create a web application using google analytics api and its client google-api-python-client?

I was going through googles's api python-client-library and google analytics api . I was able to do all steps mentioned in official docs but then I got some doubts. Since I've never done this kind of thing before, so I need your valuable suggestions/tips.
My Goal:
Want to design a web application in Python(using django/flask) and google-api-python-client. I have few matrices(coming from my web ecommerce product that is using GA.) and I'm not sure if google analytics dashboard by default support at that deep level. so I will use Google's analytic api to customize data according to my need and show in my analytic web app(which can be accessed by any one).
Doubts/Queries:
1) first of all which reporting api I would be needing for this mentioned here. core api or metadata api?
2) while I was setting up the project and client key, I chose 2nd option(OAuth 2.0 client ID)
is that ok or should I chose service account? once I selected 2nd options there were couple of radio buttons(web, android, ios, other, etc.) I chose other or should I chose web?
3) once i chose other option from radio button list, I executed my script and it prompt a browser to ask for permission, I allowed. here my question is if I put my application on production would there not be any browser, what would happen in that case?
I would really appreciate if you can help me in these queries, sorry for long question, this is my first question.
PS: Bottom line is how one should structure and develop there analytic web application in general.

The key thing to understand is that Google Analytics is an authenticated API. It is designed make it easy to allow the end user to access their own data. It is designed to be hard to allow the end user to access data they do not own.
If you are building a web application to allow your users to access their own private data It is recommended that you use a client side authentication method, such as in this example or this example.
If you are trying to build a web application that shares your private data with your users there are a few ways to go about it:
You could collect the data server side in python using a service account (note you will have to add the service account the GA account you wish it to have access).
You can take a hybrid approach and have a service account generate an access token and use the embed api to actually make the query.
In the end I would encourage you to spend some time to read Using OAuth 2.0 to Access Google APIs, and understand senarios descussed and ask yourself which of these senerios will work best for my application.

Google App Engine Live Updates (Python)

I was wondering if it would be possible to write an application using the Google App Engine and Python to create a basic calculator? However, the real question is would it be possible to have the calculator do the math without having to refresh the page?
To be more specific, I mean if there is an input box that a formula can be entered into (lets say for example the user inputs 2 + 2) and then the user clicks a submit button or calculate button, can the answer to the inputted problem be solved without the webpage having to refresh itself? If so, would it be possible to go about this without using AJAX? A very brief suggestion on how to go about this or a link to an application and its source code that updates things without refreshing the page would be greatly appreciated!
Thanks in advance for your answers!

To make a calculator in a browser that does not involve a page refresh, your best bet is to learn javascript. Searching google or stackoverflow for javascript tutorials will give you lots of options to work from. You don't need to learn python or App Engine to create the calculator. You could use app engine to serve the javascript, but you wouldn't need to write any python to just serve static content like that.

Using GAE you provide a real-time update with "Channels" http://code.google.com/appengine/docs/python/channel/
The Channel API creates a persistent connection between your
application and Google servers, allowing your application to send
messages to JavaScript clients in real time without the use of
polling.
Interesting enough for your purpose.

Fake a cookie to scrape a site in python

The site that I'm trying to scrape uses js to create a cookie. What I was thinking was that I can create a cookie in python and then use that cookie to scrape the site. However, I don't know any way of doing that. Does anybody have any ideas?

Please see Python httplib2 - Handling Cookies in HTTP Form Posts for an example of adding a cookie to a request.
I often need to automate tasks in web
based applications. I like to do this
at the protocol level by simulating a
real user's interactions via HTTP.
Python comes with two built-in modules
for this: urllib (higher level Web
interface) and httplib (lower level
HTTP interface).

If you want to do more involved browser emulation (including setting cookies) take a look at mechanize. It's simulation capabilities are almost complete (no Javascript support unfortunately): I've used it to build several scrapers with much success.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.