making urllib request in Python from the client side - python

I've written a Python application that makes web requests using the urllib2 library after which it scrapes the data. I could deploy this as a web application which means all urllib2 requests go through my web-server. This leads to the danger of the server's IP being banned due to the high number of web requests for many users. The other option is to create an desktop application which I don't want to do. Is there any way I could deploy my application so that I can get my web-requests through the client side. One way was to use Jython to create an applet but I've read that Java applets can only make web-requests to the server it is deployed on and the only way to to circumvent this is to create a server side proxy which leads us back to the problem of the server's ip getting banned.
This might sounds sound like and impossible situation and I'll probably end up creating a desktop application but I thought I'd ask if anyone knew of an alternate solution.
Thanks.

You can use a signed Java applet, they can use the Java security mechanism to enable access to any site.
This tutorial explains exactly what you have to do: http://www-personal.umich.edu/~lsiden/tutorials/signed-applet/signed-applet.html
The same might be possible from a Flash applet. Javascript is also restricted to the published site and doesn't allow being signed or security exceptions like this, AFAIK.

You probably can use AJAX requests made from JavaScript that is a part of client-side.
Use server → client communication to give commands and necessary data to make a request
…and use AJAX communication from client to 3rd party server then.

This depends on the form of "scraping" you intend to do:
You might run into problems running an AJAX call to a third-party site. Please see Screen scraping through AJAX and javascript.
An alternative would be to do it server-side, but to cache the results so that you don't hit the third-party server unnecessarily.
Check out diggstripper on google code.

Related

How to make requests from back-end to another server on user’s localhost

I’ve got a standard client-server set-up with ReScript (ReasonML) on the front-end and a Python server on the back-end.
The user is running a separate process on localhost:2000 that I’m connecting to from the browser (UI). I can send requests to their server and receive responses.
Now I need to issue those requests from my back-end server, but cannot do so directly. I’m assuming I need some way of doing it through the browser, which can talk to localhost on the user’s computer.
What are some conceptual ways to implement this (ideally with GraphQL)? Do I need to have a subscription or web sockets or something else?
Are there any specific libraries you can recommend for this (perhaps as examples from other programming languages)?
I think the easiest solution with GraphQL would be to use Subscriptions indeed, the most common Rescript GraphQL clients already have such a feature, at least ReasonRelay, Reason Apollo Hooks and Reason-URQL have it.

Python - Sending Data to a .NET Web Service with Authentication

I am working on trying to send XML data to a .NET web service, from an Asterisk Linux box, using Python.
The problems I am running into are:
-- if I use HTTPlib, I can send the XML data; but, the not authenticate first.
-- if I use HTTPlib2, I can authenticate; but, I can't get it to send the data.
In the end, I just need to periodically send data to a client's web service. On my end, I'm not concerned with the response, just sending the data.
Any thoughts on this? Maybe I'm pursuing the wrong path using HTTPlib? Thanks for any assistance at all. I'm not opposed to using a different language; Python just seemed the simplest when this project started.
A very popular module for this is
http://docs.python-requests.org/en/latest/
works great for me for all sorts of requests with and without authentication.

Python application communicating with a web server? Ideas?

I'm looking for a bit of web development advice. I'm fairly new to the area but I'm sure there are some gurus out there willing to part with some wisdom.
Objective: I'm interested in controlling a Python application on my computer from my personal web hosted site. I know, this question has been asked several times before but in each case the requirements were a bit different from my own. To reduce the length of this post I'll summarize my objective in a few bullet points:
Personal site is hosted by a web hosting company
Site uses HTML, PHP, MySQL, Python and JavaScript, the majority of everything is coded by me from the ground up
An application that is coded in Python will run on a PC within my home and will communicate with an Arduino board
The app will receive commands from the internet to control actuation via the Arduino, and will transmit sensor data back to the site (such as temperature)
Looking for the communication to be bi-directional, fast and secure
Securing the connection between site and Python app would be most ideal
I'm not looking to connect to the Python application directly, the web server must serve as the 'middle man'
So far I've considered HTTP Post and HTML forms, using sockets (Python app would run as a web server), an IRC bot and reading/writing to a text file stored on the web server.
I was also hoping to have a way to communicate with the Python app without needing to refresh the webpage, perhaps using AJAX or JavaScipt? Maybe with Flash?
Is there something I'm not considering? I feel like I'm missing something. Thanks in advance for the advice!
Just thinking out loud for how I would start out with this. First, regarding the website itself, you can just use what's easiest to you, or to the environment you're in. For example, a basic PHP page will do just fine, but if you can get a site running in Python as well, I'd prefer using the same language all over.
That said, I'm not sure why you would need to use a hosted website? Given that you're already forced to have a externally accessible PC at home for the communication, why not run a webserver on that directly (Apache, Nginx, or even something like CherryPy should do)? That webserver can then communicate with the python process that is running to control your Arduino (by using e.g. Python's xmlrpclib). If you would run things via the hosting company, you would still need some process that can handle external requests securely... something a webserver is quite good at. Just running it yourself gives you all the freedom you want, and simplifies things by lessening the number of components in your solution.
The updates on your site I'd keep quite basic: commands you want to run can be handled in the request handlers of the webserver by just calling the relevant (xmlrpclib) calls. Dynamically updating the page is best done by some AJAX calls I reckon. Based on your story, these updates are easily put in a JSON object, suitable for periodically updating only the relevant segments of your page.

Creating a python web server to recieve XML HTTP Requests

I am currently working on a project to create simple file uploader site that will update the user of the progress of an upload.
I've been attempting this in pure python (with CGI) on the server side but to get the progress of the file I obviously need send requests to the server continually. I was looking to use AJAX to do this but I was wondering how hard it would be to, instead of changing to some other framerwork (web.py for instance), just write my own web server for receiving the XML HTTP Requests?
My main problem is that sending the request is done from HTML and Javascript so it all seems like magic trickery at the moment.
Can anyone advise me as to the best way to go about receiving these requests on the server?
EDIT: It seems that a framework would be the way to go. Would web.py be a good route to take?
I would recommend to use a microframework like Sinatra for Ruby. There seem to be some equivalents for Python. What python equivalent of Sinatra would you recommend?
Such a framework allows you to simply map a single method to a route.
Writing a very basic HTTP server won't be very hard (see http://docs.python.org/library/simplehttpserver.html for an example), but you will be missing many features that are provided by real servers and web frameworks.
For your project, I suggest you pick one of the many Python web frameworks and run your application behind Apache/mod_wsgi.
There's absolutely no need to write your own web server. Plenty of options exist, including lightweight ones like nginx.
You should use one of those, and either your own custom WSGI code to receive the request, or (better) one of the microframeworks like Flask or Bottle.

AppEngine fetch through a free proxy

My (Python) AppEngine program fetches a web page from another site to scrape data from it -- but it seems like the 3rd party site is blocking requests from Google App Engine! -- I can fetch the page from development mode, but not when deployed.
Can I get around this by using a free proxy of some sort?
Can I use a free proxy to hide the fact that I am requesting from App Engine?
How do I find/choose a proxy? -- what do I need? -- how do I perform the fetch?
Is there anything else I need to know or watch out for?
Probably the correct approach is to request permission from the owners of the site you are scraping.
Even if you use a proxy, there is still a big chance that requests coming through the proxy will end up blocked as well.
Have you considered changing the user-agent?
result = urlfetch.fetch(u,headers = {'User-Agent': "Mozilla/5.0"},allow_truncated=True)
The API will always append "AppEngine-Google;" to the user-agent, but this might work if the restriction is not based on a IP address range.
What you are talking about is a valid bug in app engine sdk. Have a look at http://code.google.com/p/googleappengine/issues/detail?id=544 for bug updates, and workarounds for java and python.
I'm currently having the same problem and i was thinking about this solution (not yet tried) :
-> develop an app that fetch what you want
-> run it locally
-> fetch your local server from your initial
so the proxy is your computer which you know as not blocked
Let me know if it's works !
Well to be fair, if they don't want you doing that then you probably shouldn't. It's not nice to be mean.
But if you really want to do it, the best approach would be creating a simple proxy script and running it on a VPS or some computer with a decent enough connection.
Basically you expose a REST API from your server to your GAE, then the server just makes all the same requests it gets to the target site and returns the output.

Categories

Resources