how to automated my selenium web scrapper? - python

I was able to write a script to scrape using selenium, right now I'm trying to automate it so it can work periodically on a server so I don't bother myself by running it from my local, I did a lot of googling but I got no clue of how I can do that, can anyone simplify things for me ..

In order to run a python script on a linux server periodically you can make a cronjob, since you already have a python script which most probably fetches or scrapes data and saves it in a file. You can make a cronjob and set the exact time it has to run, say for instance after every 2 hours you can do it using something like this,
crontab -e
this will open a editor in your terminal, at the bottom of the text just write timing and the command to be executed.
* * * * */path/to/your/code.py
from this link you can find out how to fill out the stars https://crontab.guru/#*_*_*_*_1
if you need anymore help with using cronjobs take a look at this https://www.geeksforgeeks.org/scheduling-python-scripts-on-linux/

You can simply redo your script on pythonanywhere and schedule it as a task and choose the frequency that you want the script to be executed. The current frequency options available include; script runs always, hourly or daily.

i dont know if it'll really work but
while True:
whole
code
here
time.sleep(period required)

Related

Is there a python module to indicate if the .py is running?

I am new to python.
I created a simple Selenium bot that basically runs on a while loop.
However, it runs on an external machine that I do not have access at all times.
It runs into unexpected crashes sometimes, and it takes hours and sometimes days until I check it again only to find a crashed bot. With that said, I'd like to know if a monitoring module or program I can implement into my bot, so I can easily check if the program is running or not.
Some have suggested a WebSocket heartbeat, however, I do not know, after extensive research, how I would go about implementing such an intricate system. Thank you for taking the time to read my question.
Any answers appreciated!
You can just put a few lines of code at the appropriate point in your script (after the Selenium bot completes its work successfully) to write the current date and time into a file.
A second script can be scheduled just to read that file and ring an alarm bell (e.g. send you an email) if it is X hours since the timestamp was last updated.
Alternatively you could write a timestamp to a database table or use whatever other shared systems you have.

In a Python bot, how to run a function only once a day?

I have a Python bot running PRAW for Reddit. It is open source and thus users could schedule this bot to run at any frequency (e.g. using cron). It could run every 10 minutes, or every 6 hours.
I have a specific function (let's call it check_logs) in this bot that should not run every execution of this bot, but rather only once a day. The bot does not have a database.
Is there a way to accomplish this in Python without external databases/files?
Generally speaking, it's better (and easier) to use the external database or file. But, if you absolutely need it you could also:
Modify the script itself, e.g. store the date of the last run in commented out last line of the script.
Store the date of the last update on the web, for example, in your case it could be a Reddit post or google doc or draft email or a site like Pastebin, etc.
Change the "modified date" of the script itself and use it as a reference.
If you're using cron you can run it with command line arguments.
And define in cron eg. python3 main.py daily for the daily run that you need and python3 main.py frequent for the other version.
I'm doing it that way and it worked optimally by now.

How can I get a python code to run over and over again?

I have a scraper that scrapes data from a website, then saves the data in .csv files. What I am looking for is a way to run this code every 10 minutes, without using a loop. I have very little knowledge on how to do this. What approach would you use?
Are you using windows or a unix based system?
If you're using UNIX, you can schedule jobs to take place at regular intervals with CRON.
Execute the following in a terminal window:
#edit crontab
crontab -e
Then add the required file you want to execute prefaced by the following CRON instructions:
*/6 * * * * /path/to/desired/Python_version /path/to/executable/file
It's impossible to repeat a code without a loop.
I can only think that you want something like Task Scheluding on Windows (https://msdn.microsoft.com/pt-br/library/windows/desktop/aa383614.aspx) or crontab/timers on Linux.

Interact with python script running infinitive loop from web

I have a python script on my raspberry-pi continuously (every 5 seconds) running a loop to control the temperature of a pot with some electronics through GPIO.
I monitor temperature on a web page by having the python script write the temperature to a text file witch I request from java script and HTTP on a web page.
I would like to pass a parameter to the python script to make changes to the controlling, like change the target temperature.
What would be the better way to do this?
I'm working on a solution, where the python script is looking for parameters in a text file and then have a second python script write changes to this file. This second python script would be run by a http request from the web page.
Is this a way to go? Or am I missing a more direct way to do this.
This must be done many time before and described on the web, but I find nothing. Maybe I don't have the right terms to describe the problem.
Any hints is appreciated.
Best regards Kresten
You have to write somewhere your configuration for looping script. So file or database are possible choices but I would say that a formatted file (ini, yaml, …) is the way to go if you have a little number of parameters.
not sure about raspberry-pi but I see these solutions:
os signal (doc here
socket see Python socket server/client programming

Trigger python script with new RSS item

I have written a small python parser for a website, in order to extract the main news of a certain section. I would now like to trigger that script every time a new item is added to the website, using the RSS feeds. I am running Raspbian. Is there any utility to warn me of such event?
Thanks
After a bit of research, I found the rsstailutility thanks to this question here. The only problem was that after a few minutes, it would either fail or quit completely. So I found this, which is exactly the same thing, only written in python and does not crash (at least for me). What I did then was set up a small bash script, which gets executed at startup, using crontab. The script is the following:
#!/bin/bash
rsstail -i 15 --initial 1 http://feeds.bbci.co.uk/news/rss.xml?edition=us | while read line
do
/Users/aUser/Desktop/myScript.py
done
This means that every time a new item is added, the script myScript.py gets executed. Just remember to sudo chmod 777 myScript.py, otherwise it fails saying that you don't have the right permissions.
You can actually write your own such utility using cronjob. This is how you can do this:
Inspect some rss xml feed and you will find the lastBuildDate tag in it. This is the tag that tells you when the feed was last changed. For example, try viewing the source code for this rss feed from BBC.
Modify the script to inspect for the lastBuildDate tag to learn if the rss feed has been updated since the last check.
Write a small cronjob to trigger the script you have written every n minutes. I have never used Raspbian but since its Debian based, it should support cronjobs. You can use python-crontab to write one. See this or this to start with python-crontab.

Categories

Resources