I want to create python web scraper to get and format some data for me and output it in json format so that other web pages can access it. I want to put this service on some of the free python hosts out there.
Because this is my first python project I have some questions.
Should I use any of the python web frameworks for this? As I am not really concern about security (I will have only couple of pages with on input) I thought to leave it just as a script.
I do need some small database. What library can you suggest for this?
Are there cron jobs on python web servers?
Do free servers allow site scrapping every X minutes?
I have python 2.7 as default in my linux. Can/ Should I work with it or should I try to get the new version up and running?
yes, it makes life easier. But you have to check what framework can be used on free server. Sometimes you can't install own modules.
sqlite doesn't need installation. mysql and postgres mostly are preinstalled on servers but you have to check it.
mostly yes but you have to check it.
some servers may not allow scraping any sites but you have to check it.
use version which is installed on server so you have to check it.
Some free servers run page 18 hours a day and freezes page on 6 hours a day - but you have to check it.
Related
My team and I have set up an account with Hostinger and have a VPS set up with its own domain. Our current Operating System is CentOS 7 64bit with Webmin/Virtualmin/LAMP and we have Webmin set up as our Cpanel. As of right now we have our HTML pages showing but our Python code is not working.
We used SSH to download Python3, MongoDB, pymongo, and flask, but are still having trouble getting our Python code to work on our web application. From here we are unsure what to do and need guidance on what our next steps should be. Thank you in advance for any help given.
It sounds like what you've gone for on your VPS is a web hosting setup rather than a bare metal VPS setup. I can see why you think you'd want web hosting, but in reality Flask works differently in that it is its own application which needs to run rather than being served like an HTML page.
There is an excellent tutorial on how to do this here. It is designed for Ubuntu (which is a good setup if you are starting fresh) but there are also versions for different linux flavours.
As a training project i have made activity tracker using python (no GUI, only command-line).
Script checks with win32gui/pywin32 and pyautogui what program is currently used, and if it is web browser what web site is in use.
Name of window, date, and amount of time spent on program/website is stored in sqlite3 database.
Then with help of pandas module same names are grouped and time is summed up.
I want to convert this script into web app using django but i am beginner in creating web apps, so i am wondering: is it possible to use this modules within django and is it even possible to create web app that works same as script mentioned earlier?
Sorry if the question is trivial. I will be grateful for every tip where and what exactly to look for in this topic.
Nope.
You can't make a server to connect through a web client/browser and see the other processes. That's a security issue, a big one.
Btw, some GNU/Linux desktop environments won't even allow the same user's processes in the same logged-in session on the same computer to see each other (Wayland).
I understand that this is never to be done. But I have a situation where I need to get something done real quick. I have to do a website where may be 200 people would register for an event. I need to present a simple registration form. Very basic functionality, register and view list of registrants. Very few hits. It would be live for about a month or so.
I know a little bit of Django which can allow me to put together this thing quickly. However, I have only worked with the Django development server.
My problem is setting up Apache to work with Django. I understand that, for Django, I need mod_wsgi installed. I have a VPS but mod_wsgi is not installed. I have asked my hosting provider to install it for me. Even if I can get mod_wsgi installed, it appears that it may take me some time to configure it and it may take a while.
I have the following questions.
Can I run this website on the Django development server? Will it hold up for very light traffic?
If I do, how do I get traffic to go from port 80 to the development server port. From the landing page, I can have the port number added to all the subsequent URLs.
I would also appreciate some guidance on getting Django to work with mod_wsgi.
Thanks
I use cloud9 for development. It is essentially a cloud ubuntu 14 virtual box, so it gives you a real url when django server is running (on port 80). Another use case of cloud 9 is for university classes, which is similar to your event use case. You can go there and setup your django project for free and people can find the page normally. But there are some restarts in your workspace that prevents it to be real server. If you pay 20 bucks per month they give you 2 premium workspaces that they assure that this does not happen ever. But I'm not sure if this is a good idea. I could not even imagine what kind of errors would you get if all 200 people chose to login at the same time, for example.
Another way to go is making a free amazon AWS account (or digital ocean) and doing your deploy there. AWS have 1 year free trial if you run only one microinstance with a particular setup which is plenty of time for your use case. I open the instance on AWS and SSH into it with cloud 9, so it feels like developing even in production. I'm far from a devops expert but I could deploy Nginx, gunicorn, django in AWS following this tutorial. You can do it too for sure, but is a lot of work.
Left my prefered choice for your use case to the end: pythonanywhere. It has free trial and it's really easy to setup. You follow some very basic steps (doing stuff with mod_wsgi that I still dont understand) and make it work in minutes. It's a whole business dedicated to serve python programs.
Hope this helps
I have a web crawling python script that takes hours to complete, and is infeasible to run in its entirety on my local machine. Is there a convenient way to deploy this to a simple web server? The script basically downloads webpages into text files. How would this be best accomplished?
Thanks!
Since you said that performance is a problem and you are doing web-scraping, first thing to try is a Scrapy framework - it is a very fast and easy to use web-scraping framework. scrapyd tool would allow you to distribute the crawling - you can have multiple scrapyd services running on different servers and split the load between each. See:
Distributed crawls
Running Scrapy on Amazon EC2
There is also a Scrapy Cloud service out there:
Scrapy Cloud bridges the highly efficient Scrapy development
environment with a robust, fully-featured production environment to
deploy and run your crawls. It's like a Heroku for Scrapy, although
other technologies will be supported in the near future. It runs on
top of the Scrapinghub platform, which means your project can scale on
demand, as needed.
As an alternative to the solutions already given, I would suggest Heroku. You can not only deploy easily a website, but also scripts for bots to run.
Basic account is free and is pretty flexible.
This blog entry, this one and this video contain practical examples of how to make it work.
There are multiple places where you can do that. Just google for "python in the cloud", you will come up with a few, for example https://www.pythonanywhere.com/.
In addition, there are also several cloud IDEs that essentially give you a small VM for free where you can develop your code in a web-based IDE and also run it in the VM, one example is http://www.c9.io.
In 2021, Replit.com makes it very easy to write and run Python in the cloud.
If you have a google e-mail account you have an access to google drive and utilities. Choose for colaboratory (or find it in more... options first). This "CoLab" is essentially your python notebook on google drive with full access to your files on your drive, also with access to your GitHub. So, in addition to your local stuff you can edit your GitHub scripts as well.
I have developed a few python programs that I want to make available online.
I am new to web services, and I am not sure what I need to do in order to create a service where somebody makes a request to an URL (for example), and the URL triggers a Python program that displays something in the user's browser, or a set of inputs are given to the program via browser, and then python does whatver it is supposed to do.
I was playing with the google app engine, which runs fine with the tutorial, and was planning to use it becuase it looks easy, but the problem with GAE is that it does not work well (or does not work at all) with some libraries that I plan to use.
I guess what I am trying to do is some sort of API using my WebFaction account.
Can anybody point me in the right directions? What choices do I have in WebFaction? What are the easiest tools available?
Thank you very much for your help in advance.
Cheers
Well, your question is a little bit generic, but here are a few pointers/tips:
Webfaction allows you to install pretty much anything you want (you need to compile it / or ask the admins to install some CentOS package for you).
They provide some default Apache server with mod_wsgi, so you can run web2py, Django or any other wsgi frameworks.
Most popular Python web frameworks have available installers in Webfaction (web2py, django...), so I would recommend you to go with one of them.
I would also install supervisord to keep your service running after some reboot/crash/problem.
I would be glad to help you if you have any specific question...