I wrote a mini-app, that scrapes my school's Website then looks for the title of the last post, compare it to the old title, if it's not the same, it then sends me an email.
In order for the app to work properly it needs to keep running 24/7 so that the value of the title variable is correct.
Here's the code:
import requests
from bs4 import BeautifulSoup
import schedule, time
import sys
import smtplib
#Mailing Info
from_addr = ''
to_addrs = ['']
message = """From: sender
To: receiver
Subject: New Post
A new post has been published
visit the website to view it:
"""
def send_mail(msg):
try:
s = smtplib.SMTP('localhost')
s.login('email',
'password')
s.sendmail(from_addr, to_addrs, msg)
s.quit()
except smtplib.SMTPException as e:
print(e)
#Scraping
URL = ''
title = 'Hello World'
def check():
global title
global message
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
main_section = soup.find('section', id='spacious_featured_posts_widget-2')
first_div = main_section.find('div', class_='tg-one-half')
current_title = first_div.find('h2', class_='entry-title').find('a')['title']
if current_title != title:
send_mail(message)
title = current_title
else:
send_mail("Nothing New")
schedule.every(6).hours.do(check)
while True:
schedule.run_pending()
time.sleep(0.000001)
So my question is How do I keep this code running on host using Cpanel?
I know I can use cron jobs to run it every like 2 hours or something, but I don't know how to keep the script itself running, using a terminal doesn't work when I close the page the app gets terminated
So - generally to run programs for an extended period, they would need to be daemonised. Essentially disconnected from your terminal with a double-fork, and a set-sid. Having that said, I've never actually done it myself, since it was usually either (a) the wrong solution, or (b) it's re-inventing the wheel (https://github.com/thesharp/daemonize).
In this case, I think a better course of action would be to invoke the script every 6 hours, rather than have it internally do something every 6 hours. Making your program resilient to a restart is pretty much how most systems are kept reliable, and putting them in a 'cradle' that automatically restarts them.
In your case, I'd suggest saving the title to a file, and reading from and writing to that file when the script is invoked. It would make your script simplier, and more robust, and you'd be using battle-hardened tools for the job.
A couple of years down the line, when your writing code that needs to survive the total machine crashing, and being replaced (within 6 hours, with everything installed) you can use some external form of storage (like a database) instead of a file, to make your system even more resiliant.
Related
I'm building a bot that logs into zoom at specified times and the links are being obtained from whatsapp. So i was wondering if it is was possible to retrieve those links from whatsapp directly instead of having to copy paste it into python. Google is filled with guides to send messages but is there any way to READ and RETRIEVE those messages and then manipulate it?
You can, at most, try to read WhatsApp messages with Python using Selenium WebDriver since I strongly doubt that you can access WhatsApp APIs.
Selenium is basically an automation tool that lets you automate tasks in your browser so, perhaps, you could write a Python script using Selenium that automatically opens WhatsApp and parses HTML information regarding your WhatsApp web client.
First of all, we mentioned Selenium, but we will use it only to automate the opening and closing of WhatsApp, now we have to find a way to read what's inside the WhatsApp client, and that's where the magic of Web Scraping comes is hand.
Web scraping is a process of extracting data from a website, in this case, the data is represented by the Zoom link you need to automatically obtain, while the web site is your WhatsApp client. To perform this process you need a way to extract (parse) information from the website, to do so I suggest you use Beautiful Soup, but I advise you that a minimum knowledge of how HTML works is required.
Sorry if this may not completely answer your question but this is all the knowledge I have on this specific topic.
You can open WhatsApp on browser using https://selenium-python.readthedocs.io/ in Python.
Selenium is basically an automation tool that lets you automate tasks in your browser so, perhaps, you could write a Python script using Selenium that automatically opens WhatsApp and parses HTML information regarding your WhatsApp web client.
I learn and use code from "https://towardsdatascience.com/complete-beginners-guide-to-processing-whatsapp-data-with-python-781c156b5f0b" this site. Go through the details written on mentioned link.
You have to install external python library "whatsapp-web" from this link --- "https://pypi.org/project/whatsapp-web/". Just type in command prompt / windows terminal by "python -m pip install whatsapp-web".
It will show result ---
python -m pip install whatsapp-web
Collecting whatsapp-web
Downloading whatsapp_web-0.0.1-py3-none-any.whl (21 kB)
Installing collected packages: whatsapp-web
Successfully installed whatsapp-web-0.0.1
You can read all the cookies from whatsapp web and add them to headers and use the requests module or you can also use selenium with that.
Update :
Please change the xpath's class name of each section from the current time class name of WhatsApp web by using inspect element section in WhatsApp web to use the following code. Because WhatsApp have changed its element's class names.
I have tried that in creating a WhatsApp bot using python.
But there are still many bugs because of I am also beginner.
steps based on my research :
Open browser using selenium webdriver
Login on WhatsApp using qr code
If you know from which number you are going to received the meeting link then use this step otherwise check the following process mention after this process.
Find and open the chat room where you are going to received zoom meeting link.
For getting message from known chat room to perform action
#user_name = "Name of meeting link Sender as in your contact list"
Example :
user_name = "Anurag Kushwaha"
#In above variable at place of `Anurag Kushwaha` pass Name or number of Your Teacher
# who going to sent you zoom meeting link same as you have in your contact list.
user = webdriver.find_element_by_xpath('//span[#title="{}"]'.format(user_name))
user.click()
# For getting message to perform action
message = webdriver.find_elements_by_xpath("//span[#class='_3-8er selectable-text copyable-text']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing received text message of any chat room.
for i in message:
try:
if "zoom.us" in str(i.text):
# Here you can use you code to preform action according to your need
print("Perform Your Action")
except:
pass
If you do not know by which number you are going to received the link.
Then you can get div class of any unread contact block and get open all the chat room list which are containing that unread div class.
Then check all the unread messages of open chat and get the message from the div class.
When you don't know from whom you gonna received zoom meeting link.
# For getting unread chats you can use
unread_chats = webdriver.find_elements_by_xpath("// span[#class='_38M1B']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing the number of unread message showing the contact card inside a green circle before opening the chat room.
# Open each chat using loop and read message.
for chat in unread_chats:
chat.click()
# For getting message to perform action
message = webdriver.find_elements_by_xpath("//span[#class='_3-8er selectable-text copyable-text']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing received text message of any chat room.
for i in messge:
try:
if "zoom.us" in str(i.text):
# Here you can use you code to preform action according to your need
print("Perform Your Action")
except:
pass
Note : In the above code 'webdriver' is the driver by which you open web.whatsapp.com
Example :
from selenium import webdriver
webdriver = webdriver.Chrome("ChromePath/chromedriver.exe")
webdriver.get("https://web.whatsapp.com")
# This wendriver variable is used in above code.
# If you have used any other name then please rename in my code or you can assign your variable in that code variable name as following line.
webdriver = your_webdriver_variable
A complete code reference Example :
from selenium import webdriver
import time
webdriver = webdriver.Chrome("ChromePath/chromedriver.exe")
webdriver.get("https://web.whatsapp.com")
time.sleep(25) # For scan the qr code
# Plese make sure that you have done the qr code scan successful.
confirm = int(input("Press 1 to proceed if sucessfully login or press 0 for retry : "))
if confirm == 1:
print("Continuing...")
elif confirm == 0:
webdriver.close()
exit()
else:
print("Sorry Please Try again")
webdriver.close()
exit()
while True:
unread_chats = webdriver.find_elements_by_xpath("// span[#class='_38M1B']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing the number of unread message showing the contact card inside a green circle before opening the chat room.
# Open each chat using loop and read message.
for chat in unread_chats:
chat.click()
time.sleep(2)
# For getting message to perform action
message = webdriver.find_elements_by_xpath("//span[#class='_3-8er selectable-text copyable-text']")
# In the above line Change the xpath's class name from the current time class name by inspecting span element
# which containing received text message of any chat room.
for i in messge:
try:
if "zoom.us" in str(i.text):
# Here you can use you code to preform action according to your need
print("Perform Your Action")
except:
pass
Please make sure that the indentation is equal in code blocks if you are copying it.
Can read my another answer in following link for more info about WhatsApp web using python.
Line breaks in WhatsApp messages sent with Python
I am developing WhatsApp bot using python.
For contribution you can contact at : anurag.cse016#gmail.com
Please give a star on my https://github.com/4NUR46 If this Answer helps you.
Try This Its A bit of a hassle but it might work
import pyautogui
import pyperclip
import webbrowser
grouporcontact = pyautogui.locateOnScreen("#group/contact", confidence=.6) # Take a snip of the group or contact name/profile photo
link = pyperclip.paste()
def searchforgroup():
global link
time.sleep(5)
webbrowser.open("https://web.whatsapp.com")
time.sleep(30)#for you to scan the qr code if u have done it then u can edit it to like 10 or anything
grouporcontact = pyautogui.locateOnScreen("#group/contact", confidence=.6)
x = grouporcontact[0]
y = grouporcontact[1]
if grouporcontact == None:
#Do any other option in my case i just gave it my usual link as
link = "mymeetlink"
else:
pyautogui.moveTo(x,y, duration=1)
pyautogui.click()
# end of searching group
def findlink():
global link
meetlink = pyautogui.locateOnScreen("#", confidence=.6)#just take another snap of a meet link without the code after the "/"
f = meetlink[0]
v = meetlink[1]
if meetlink == None:
#Do any other option in my case i just gave it my usual link as
link = "mymeetlink"
else:
pyautogui.moveTo(f,v, duration=.6)
pyautogui.rightClick()
pyautogui.moveRel(0,0, duration=2) # You Have to play with this it basically is considered by your screen size so just edit that and edit it till it reaches the "Copy Link Address"
pyautogui.click()
link = pyperclip.paste()
webbrowser.open(link) # to test it out
So Now You Have It Have To Install pyautogui, pyperclip
and just follow the comments in the snippet and everything should work :)
Does anyone know why this code doesn't do the job? It works perfectly when I want to scrape smaller files with data from a certain date e.g only from 2017 but not with this one. Is this file too big or something? There's no error or anything like that. Every time I run this script but with mentioned smaller file It takes about 30 seconds to download everything and save into a database so there are no mistakes in code I think. After running the script I'm just getting "Process finished with exit code 0" and nothing more.
from bs4 import BeautifulSoup
import urllib.request
from app import db
from models import CveData
from sqlalchemy.exc import IntegrityError
url = "https://cve.mitre.org/data/downloads/allitems.xml"
r = urllib.request.urlopen(url)
xml = BeautifulSoup(r, 'xml')
vuln = xml.findAll('Vulnerability')
for element in vuln:
note = element.findAll('Notes')
title = element.find('CVE').text
for element in note:
desc = element.find(Type="Description").text
test_date = element.find(Title="Published")
if test_date is None:
pass
else:
date = test_date.text
data = CveData(title,date,desc)
try:
db.session.add(data)
db.session.commit()
print("adding... " + title)
# don't stop the stream, ignore the duplicates
except IntegrityError:
db.session.rollback()
I downloaded the file that you said didn't work, and the one you said did and ran these two greps with different results:
grep -c "</Vulnerability>" allitems-cvrf-year-2019.xml
21386
grep -c "</Vulnerability>" allitems.xml
0
The program is not stopping on opening the file, it is running to completion. You aren't getting any output because there are no Vulnerability tags in the xml file. (Now my grep is not technically accurate, as I believe there could be spaces in the Vulnerability closing tag, but I doubt that is the case here.)
I want to make a simple python program to generate a captcha for a flask website. I can generate the image, but if I save it in for e.g. in /images/captcha_{id}.png ,then I would have tons of old Captchas as the website gets used.
I've tried to create a script that uses the sleep function to remove the old captchas every N time, but the problem is then, that I disable all the activity in the website for the N time.
The Captcha system is the following :
import secrets, string
from PIL import Image, ImageFont, ImageDraw
def gen_captcha(id):
alpha = string.ascii_letters + string.digits
captcha = "".join(secrets.choice(alpha) for i in range(8))
img = Image.new("RGBA", (200,100), (3, 115, 252))
font = ImageFont.truetype("arial.ttf",20)
w,h = font.getsize(captcha)
draw = ImageDraw.Draw(img)
draw.text((50,50), captcha, font=font, fill=(255, 239, 0))
img.save("captcha_{}.png".format(str(id))
return captcha
The flask app basically requests an input and displays the captcha based on the given id, and then says if req_captcha == captcha: return "You solved the captcha" it also gives an error if you don't solve it.
What I would like to know, is if I can make a little script that runs as a background process that deletes my old captchas.
I think what you're looking for is a cron job. Set one up to run a bash script that cleans up yesterdays captchas.
One of the possible approaches would be to use either the multiprocessing or threading modules available in Python. They're both quite similar in terms of API. I will base my answer on the multiprocessing approach, but you can yourself evaluate if the threaded approach suits your needs more. You can refer to this question as an example. Here's a sample implementation:
import os
import time
from multiprocessing import Process
def remove_old_captchas():
if os.fork() != 0:
return
print('Running process to remove captchas every 5 seconds ...')
while True:
time.sleep(5)
print("... Captcha removed")
if __name__ == '__main__':
p = Process(target=remove_old_captchas)
p.daemon = True
p.start()
p.join()
print('Main code running as well ...')
while True:
time.sleep(1)
print("... Request served")
In the output of that you can see the captchas being removed in a regular time interval:
Running process to remove captchas every 5 seconds ...
Main code running as well ...
... Request served
... Request served
... Request served
... Request served
... Captcha removed
... Request served
... Request served
... Request served
... Request served
... Request served
... Captcha removed
... Request served
... Request served
... Request served
In terms of design I would probably still go with a cron job like mentioned in another answer, but you asked about running a background task, so that would one possible answer. You may also like the subprocess module.
I've seen a few instances of this question, but I was not sure how to apply the changes to my particular situation. I have code that monitors a webpage for changes and refreshes every 30 seconds, as follows:
import sys
import ctypes
from time import sleep
from Checker import Checker
USERNAME = sys.argv[1]
PASSWORD = sys.argv[2]
def main():
crawler = Checker()
crawler.login(USERNAME, PASSWORD)
crawler.click_data()
crawler.view_page()
while crawler.check_page():
crawler.wait_for_table()
crawler.refresh()
ctypes.windll.user32.MessageBoxW(0, "A change has been made!", "Attention", 1)
if __name__ == "__main__":
main()
The problem is that Selenium will always show an error stating it is unable to locate the element after the first refresh has been made. The element in question, I suspect, is a table from which I retrieve data using the following function:
def get_data_cells(self):
contents = []
table_id = "table.datadisplaytable:nth-child(4)"
table = self.driver.find_element(By.CSS_SELECTOR, table_id)
cells = table.find_elements_by_tag_name('td')
for cell in cells:
contents.append(cell.text)
return contents
I can't tell if the issue is in the above function or in the main(). What's an easy way to get Selenium to refresh the page without returning such an error?
Update:
I've added a wait function and adjusted the main() function accordinly:
def wait_for_table(self):
table_selector = "table.datadisplaytable:nth-child(4)"
delay = 60
try:
wait = ui.WebDriverWait(self.driver, delay)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, table_selector)))
except TimeoutError:
print("Operation timeout! The requested element never loaded.")
Since the same error is still occurring, either my timing function is not working properly or it is not a timing issue.
I've run into the same issue while doing web scraping before and found that re-sending the GET request (instead of refreshing) seemed to eliminate it.
It's not very elegant, but it worked for me.
I appear to have fixed my own problem.
My refresh() function was written as follows:
def refresh():
self.driver.refresh()
All I did was switch frames right after the refresh() call. That is:
def refresh():
self.driver.refresh()
self.driver.switch_to.frame("content")
This took care of it. I can see that the page is now refreshing without issues.
I am trying to get a color hexcode from an XML page on my website and update a script within 5-10 seconds. I am able to read the hexcode just fine, and I am able to change the value in the XML file just fine, but the script takes awhile to reflect the update.
I want the script to update every 5 seconds by checking the XML file from my webserver, however it takes about 1 full minute before the code actually sees the update. Is my python script somehow caching the XML file? Is my webserver possibly sending a cached version? (Viewing the XML file in chrome refreshes instantly though.)
Python code:
import time
import serial
import requests
from bs4 import BeautifulSoup
ser = serial.Serial('/dev/ttyACM0',9600)
print('Connected to Arduino!')
while (True):
print('Connecting to website...')
page = requests.get('http://xanderluciano.com/pi/color.xml', timeout=5)
soup = BeautifulSoup(page.text, 'html.parser')
print('scraped hexcode: ' + soup.color.string)
hex = soup.color.string
ser.write(hex.encode('utf-8'))
print(ser.readline())
time.sleep(5);
XML File:
<?xml version="1.0" encoding="UTF-8"?>
<ledstrip>
<color>2196f3</color>
<flash>false</flash>
<freq>15</freq>
</ledstrip>
The solution was that my webserver used NGINX as a server side cache controller, I opted to disable this cache control during the development stage, just so that I could see the results instantly. Most likely there is a better way of pushing data rather than continually polling the webserver for it.