I want to make a simple python program to generate a captcha for a flask website. I can generate the image, but if I save it in for e.g. in /images/captcha_{id}.png ,then I would have tons of old Captchas as the website gets used.
I've tried to create a script that uses the sleep function to remove the old captchas every N time, but the problem is then, that I disable all the activity in the website for the N time.
The Captcha system is the following :
import secrets, string
from PIL import Image, ImageFont, ImageDraw
def gen_captcha(id):
alpha = string.ascii_letters + string.digits
captcha = "".join(secrets.choice(alpha) for i in range(8))
img = Image.new("RGBA", (200,100), (3, 115, 252))
font = ImageFont.truetype("arial.ttf",20)
w,h = font.getsize(captcha)
draw = ImageDraw.Draw(img)
draw.text((50,50), captcha, font=font, fill=(255, 239, 0))
img.save("captcha_{}.png".format(str(id))
return captcha
The flask app basically requests an input and displays the captcha based on the given id, and then says if req_captcha == captcha: return "You solved the captcha" it also gives an error if you don't solve it.
What I would like to know, is if I can make a little script that runs as a background process that deletes my old captchas.
I think what you're looking for is a cron job. Set one up to run a bash script that cleans up yesterdays captchas.
One of the possible approaches would be to use either the multiprocessing or threading modules available in Python. They're both quite similar in terms of API. I will base my answer on the multiprocessing approach, but you can yourself evaluate if the threaded approach suits your needs more. You can refer to this question as an example. Here's a sample implementation:
import os
import time
from multiprocessing import Process
def remove_old_captchas():
if os.fork() != 0:
return
print('Running process to remove captchas every 5 seconds ...')
while True:
time.sleep(5)
print("... Captcha removed")
if __name__ == '__main__':
p = Process(target=remove_old_captchas)
p.daemon = True
p.start()
p.join()
print('Main code running as well ...')
while True:
time.sleep(1)
print("... Request served")
In the output of that you can see the captchas being removed in a regular time interval:
Running process to remove captchas every 5 seconds ...
Main code running as well ...
... Request served
... Request served
... Request served
... Request served
... Captcha removed
... Request served
... Request served
... Request served
... Request served
... Request served
... Captcha removed
... Request served
... Request served
... Request served
In terms of design I would probably still go with a cron job like mentioned in another answer, but you asked about running a background task, so that would one possible answer. You may also like the subprocess module.
Related
I wrote a mini-app, that scrapes my school's Website then looks for the title of the last post, compare it to the old title, if it's not the same, it then sends me an email.
In order for the app to work properly it needs to keep running 24/7 so that the value of the title variable is correct.
Here's the code:
import requests
from bs4 import BeautifulSoup
import schedule, time
import sys
import smtplib
#Mailing Info
from_addr = ''
to_addrs = ['']
message = """From: sender
To: receiver
Subject: New Post
A new post has been published
visit the website to view it:
"""
def send_mail(msg):
try:
s = smtplib.SMTP('localhost')
s.login('email',
'password')
s.sendmail(from_addr, to_addrs, msg)
s.quit()
except smtplib.SMTPException as e:
print(e)
#Scraping
URL = ''
title = 'Hello World'
def check():
global title
global message
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
main_section = soup.find('section', id='spacious_featured_posts_widget-2')
first_div = main_section.find('div', class_='tg-one-half')
current_title = first_div.find('h2', class_='entry-title').find('a')['title']
if current_title != title:
send_mail(message)
title = current_title
else:
send_mail("Nothing New")
schedule.every(6).hours.do(check)
while True:
schedule.run_pending()
time.sleep(0.000001)
So my question is How do I keep this code running on host using Cpanel?
I know I can use cron jobs to run it every like 2 hours or something, but I don't know how to keep the script itself running, using a terminal doesn't work when I close the page the app gets terminated
So - generally to run programs for an extended period, they would need to be daemonised. Essentially disconnected from your terminal with a double-fork, and a set-sid. Having that said, I've never actually done it myself, since it was usually either (a) the wrong solution, or (b) it's re-inventing the wheel (https://github.com/thesharp/daemonize).
In this case, I think a better course of action would be to invoke the script every 6 hours, rather than have it internally do something every 6 hours. Making your program resilient to a restart is pretty much how most systems are kept reliable, and putting them in a 'cradle' that automatically restarts them.
In your case, I'd suggest saving the title to a file, and reading from and writing to that file when the script is invoked. It would make your script simplier, and more robust, and you'd be using battle-hardened tools for the job.
A couple of years down the line, when your writing code that needs to survive the total machine crashing, and being replaced (within 6 hours, with everything installed) you can use some external form of storage (like a database) instead of a file, to make your system even more resiliant.
I don't really have idea about that so I'd like you to give me some advice if you can.
Generally when I use Selenium I try to search the element that I'm interested in, but now I was thinking to develop some kind of performance test so check how much time take a specific webpage (html, script, etc...) to load.
Do you have some idea how to know the load time of html, script etc without search for a specific element of the page?
PS I use IE or Firefox
You could check the underlying javascript framework for active connections. When there are no active connections you could then assume the page is finished loading.
That, however, requires that you either know what framework the page uses, or that you must systematically check for different frameworks and then check for connections.
def get_js_framework(driver):
frameworks = [
'return jQuery.active',
'return Ajax.activeRequestCount',
'return dojo.io.XMLHTTPTransport.inFlight.length'
]
for f in frameworks:
try:
driver.execute_script(f)
except Exception:
logging.debug("{0} didn't work, trying next js framework".format(f))
continue
else:
return f
else:
return None
def load_page(driver, link):
timeout = 5
begin = time.time()
driver.get(link)
js = _get_js_framework(driver)
if js:
while driver.execute_script(js) and time.time() < begin + timeout:
time.sleep(0.25)
else:
time.sleep(timeout)
I currently using ghost.py and they have a function show() when I call it shows the website but instant close it. How to keep it open?
from ghost import Ghost
import PySide
ghost = Ghost()
with ghost.start() as session:
page, resources = session.open("https://www.instagram.com/accounts/login/?force_classic_login")
session.set_field_value("input[name=username]", "joe")
session.set_field_value("input[name=password]", "test")
session.show()
session.evaluate("alert('test')")
The session preview will remain open until session exits - by leaving the session context session.exit() is implicitly called. To keep this open you need to either not exit the session context, or not use a session context.
The former can be achieved as so:
from ghost import Ghost
import PySide
ghost = Ghost()
with ghost.start() as session:
page, resources = session.open("https://www.instagram.com/accounts/login/?force_classic_login")
session.set_field_value("input[name=username]", "joe")
session.set_field_value("input[name=password]", "test")
session.show()
session.evaluate("alert('test')")
# other python code
The latter can be achieved as so:
from ghost import Ghost
import PySide
ghost = Ghost()
session = ghost.start()
page, resources = session.open("https://www.instagram.com/accounts/login/?force_classic_login")
session.set_field_value("input[name=username]", "joe")
session.set_field_value("input[name=password]", "test")
session.show()
session.evaluate("alert('test')")
# other python code
The session will however inevitable exit when the python process ends. Also worth noting is that some operations will return as soon as the initial http request has completed. If you wish to wait until other resources have loaded you may need to call session.wait_for_page_loaded(). I have also found that some form submissions require a call to session.sleep() to behave as expected.
I am trying to get a color hexcode from an XML page on my website and update a script within 5-10 seconds. I am able to read the hexcode just fine, and I am able to change the value in the XML file just fine, but the script takes awhile to reflect the update.
I want the script to update every 5 seconds by checking the XML file from my webserver, however it takes about 1 full minute before the code actually sees the update. Is my python script somehow caching the XML file? Is my webserver possibly sending a cached version? (Viewing the XML file in chrome refreshes instantly though.)
Python code:
import time
import serial
import requests
from bs4 import BeautifulSoup
ser = serial.Serial('/dev/ttyACM0',9600)
print('Connected to Arduino!')
while (True):
print('Connecting to website...')
page = requests.get('http://xanderluciano.com/pi/color.xml', timeout=5)
soup = BeautifulSoup(page.text, 'html.parser')
print('scraped hexcode: ' + soup.color.string)
hex = soup.color.string
ser.write(hex.encode('utf-8'))
print(ser.readline())
time.sleep(5);
XML File:
<?xml version="1.0" encoding="UTF-8"?>
<ledstrip>
<color>2196f3</color>
<flash>false</flash>
<freq>15</freq>
</ledstrip>
The solution was that my webserver used NGINX as a server side cache controller, I opted to disable this cache control during the development stage, just so that I could see the results instantly. Most likely there is a better way of pushing data rather than continually polling the webserver for it.
The following code is a sample of non-asynchronous code, is there any way to get the images asynchronously?
import urllib
for x in range(0,10):
urllib.urlretrieve("http://test.com/file %s.png" % (x), "temp/file %s.png" % (x))
I have also seen the Grequests library but I couldn't figure much if that is possible or how to do it from the documentation.
You don't need any third party library. Just create a thread for every request, start the threads, and then wait for all of them to finish in the background, or continue your application while the images are being downloaded.
import threading
results = []
def getter(url, dest):
results.append(urllib.urlretreave(url, dest))
threads = []
for x in range(0,10):
t = threading.Thread(target=getter, args=('http://test.com/file %s.png' % x,
'temp/file %s.png' % x))
t.start()
threads.append(t)
# wait for all threads to finish
# You can continue doing whatever you want and
# join the threads when you finally need the results.
# They will fatch your urls in the background without
# blocking your main application.
map(lambda t: t.join(), threads)
Optionally you can create a thread pool that will get urls and dests from a queue.
If you're using Python 3 it's already implemented for you in the futures module.
Something like this should help you
import grequests
urls = ['url1', 'url2', ....] # this should be the list of urls
requests = (grequests.get(u) for u in urls)
responses = grequests.map(requests)
for response in responses:
if 199 < response.status_code < 400:
name = generate_file_name() # generate some name for your image file with extension like example.jpg
with open(name, 'wb') as f: # or save to S3 or something like that
f.write(response.content)
Here only the downloading of images would be parallel but writing each image content to a file would be sequential so you can create a thread or do something else to make it parallel or asynchronous