My (Python) AppEngine program fetches a web page from another site to scrape data from it -- but it seems like the 3rd party site is blocking requests from Google App Engine! -- I can fetch the page from development mode, but not when deployed.
Can I get around this by using a free proxy of some sort?
Can I use a free proxy to hide the fact that I am requesting from App Engine?
How do I find/choose a proxy? -- what do I need? -- how do I perform the fetch?
Is there anything else I need to know or watch out for?
Probably the correct approach is to request permission from the owners of the site you are scraping.
Even if you use a proxy, there is still a big chance that requests coming through the proxy will end up blocked as well.
Have you considered changing the user-agent?
result = urlfetch.fetch(u,headers = {'User-Agent': "Mozilla/5.0"},allow_truncated=True)
The API will always append "AppEngine-Google;" to the user-agent, but this might work if the restriction is not based on a IP address range.
What you are talking about is a valid bug in app engine sdk. Have a look at http://code.google.com/p/googleappengine/issues/detail?id=544 for bug updates, and workarounds for java and python.
I'm currently having the same problem and i was thinking about this solution (not yet tried) :
-> develop an app that fetch what you want
-> run it locally
-> fetch your local server from your initial
so the proxy is your computer which you know as not blocked
Let me know if it's works !
Well to be fair, if they don't want you doing that then you probably shouldn't. It's not nice to be mean.
But if you really want to do it, the best approach would be creating a simple proxy script and running it on a VPS or some computer with a decent enough connection.
Basically you expose a REST API from your server to your GAE, then the server just makes all the same requests it gets to the target site and returns the output.
Related
Im creating a server in Python that receives POST requests, process the information in the request using some scripts (sometimes using a database) and send back a answer in JSON format. Im searching for a way to run this server and code in the cloud, in a way that i dont need my PC turned on for it to work, because my connection is very unstable.
There are a lot of web hosting companies out there, you just need to find the one that is right for you.
My personal favorite for python apps is heroku, but there are many out there. AWS is another popular one.
In future when asking questions, try to do more research before hand, and try to be more specific with questions. It would have been useful to know what kind of database you are using, or whether you're using flask or django.
I’ve got a standard client-server set-up with ReScript (ReasonML) on the front-end and a Python server on the back-end.
The user is running a separate process on localhost:2000 that I’m connecting to from the browser (UI). I can send requests to their server and receive responses.
Now I need to issue those requests from my back-end server, but cannot do so directly. I’m assuming I need some way of doing it through the browser, which can talk to localhost on the user’s computer.
What are some conceptual ways to implement this (ideally with GraphQL)? Do I need to have a subscription or web sockets or something else?
Are there any specific libraries you can recommend for this (perhaps as examples from other programming languages)?
I think the easiest solution with GraphQL would be to use Subscriptions indeed, the most common Rescript GraphQL clients already have such a feature, at least ReasonRelay, Reason Apollo Hooks and Reason-URQL have it.
This is my first post. I tried to google and find my own solution, but I don’t even know where to start with this. I want to make a phone app (with Unity(C#) ideally) and eventually a website, while also hosting a web app (python) that the phone app can send basic info (ideally a json or dict) that the web app can do some processing and database referencing and just send back a short answer or two (again ideally a json or dict). I looked into bottlepy and just communicating through http but this seems clunky and I’m sure is not the best way. I don't need (although would appreciate) a full explanation, just the term/protocol/technology I am looking for. Thanks in advance for any help!
You should look at the UnityWebRequest : https://docs.unity3d.com/Manual/UnityWebRequest.html
And maybe you should try to use a REST API on your server to communicate with a UnityWebRequest : https://codetolight.wordpress.com/2017/01/18/unity-rest-api-interaction-with-unitywebrequest-and-besthttp/
I'm looking for a bit of web development advice. I'm fairly new to the area but I'm sure there are some gurus out there willing to part with some wisdom.
Objective: I'm interested in controlling a Python application on my computer from my personal web hosted site. I know, this question has been asked several times before but in each case the requirements were a bit different from my own. To reduce the length of this post I'll summarize my objective in a few bullet points:
Personal site is hosted by a web hosting company
Site uses HTML, PHP, MySQL, Python and JavaScript, the majority of everything is coded by me from the ground up
An application that is coded in Python will run on a PC within my home and will communicate with an Arduino board
The app will receive commands from the internet to control actuation via the Arduino, and will transmit sensor data back to the site (such as temperature)
Looking for the communication to be bi-directional, fast and secure
Securing the connection between site and Python app would be most ideal
I'm not looking to connect to the Python application directly, the web server must serve as the 'middle man'
So far I've considered HTTP Post and HTML forms, using sockets (Python app would run as a web server), an IRC bot and reading/writing to a text file stored on the web server.
I was also hoping to have a way to communicate with the Python app without needing to refresh the webpage, perhaps using AJAX or JavaScipt? Maybe with Flash?
Is there something I'm not considering? I feel like I'm missing something. Thanks in advance for the advice!
Just thinking out loud for how I would start out with this. First, regarding the website itself, you can just use what's easiest to you, or to the environment you're in. For example, a basic PHP page will do just fine, but if you can get a site running in Python as well, I'd prefer using the same language all over.
That said, I'm not sure why you would need to use a hosted website? Given that you're already forced to have a externally accessible PC at home for the communication, why not run a webserver on that directly (Apache, Nginx, or even something like CherryPy should do)? That webserver can then communicate with the python process that is running to control your Arduino (by using e.g. Python's xmlrpclib). If you would run things via the hosting company, you would still need some process that can handle external requests securely... something a webserver is quite good at. Just running it yourself gives you all the freedom you want, and simplifies things by lessening the number of components in your solution.
The updates on your site I'd keep quite basic: commands you want to run can be handled in the request handlers of the webserver by just calling the relevant (xmlrpclib) calls. Dynamically updating the page is best done by some AJAX calls I reckon. Based on your story, these updates are easily put in a JSON object, suitable for periodically updating only the relevant segments of your page.
I've written a Python application that makes web requests using the urllib2 library after which it scrapes the data. I could deploy this as a web application which means all urllib2 requests go through my web-server. This leads to the danger of the server's IP being banned due to the high number of web requests for many users. The other option is to create an desktop application which I don't want to do. Is there any way I could deploy my application so that I can get my web-requests through the client side. One way was to use Jython to create an applet but I've read that Java applets can only make web-requests to the server it is deployed on and the only way to to circumvent this is to create a server side proxy which leads us back to the problem of the server's ip getting banned.
This might sounds sound like and impossible situation and I'll probably end up creating a desktop application but I thought I'd ask if anyone knew of an alternate solution.
Thanks.
You can use a signed Java applet, they can use the Java security mechanism to enable access to any site.
This tutorial explains exactly what you have to do: http://www-personal.umich.edu/~lsiden/tutorials/signed-applet/signed-applet.html
The same might be possible from a Flash applet. Javascript is also restricted to the published site and doesn't allow being signed or security exceptions like this, AFAIK.
You probably can use AJAX requests made from JavaScript that is a part of client-side.
Use server → client communication to give commands and necessary data to make a request
…and use AJAX communication from client to 3rd party server then.
This depends on the form of "scraping" you intend to do:
You might run into problems running an AJAX call to a third-party site. Please see Screen scraping through AJAX and javascript.
An alternative would be to do it server-side, but to cache the results so that you don't hit the third-party server unnecessarily.
Check out diggstripper on google code.