I currently using ghost.py and they have a function show() when I call it shows the website but instant close it. How to keep it open?
from ghost import Ghost
import PySide
ghost = Ghost()
with ghost.start() as session:
page, resources = session.open("https://www.instagram.com/accounts/login/?force_classic_login")
session.set_field_value("input[name=username]", "joe")
session.set_field_value("input[name=password]", "test")
session.show()
session.evaluate("alert('test')")
The session preview will remain open until session exits - by leaving the session context session.exit() is implicitly called. To keep this open you need to either not exit the session context, or not use a session context.
The former can be achieved as so:
from ghost import Ghost
import PySide
ghost = Ghost()
with ghost.start() as session:
page, resources = session.open("https://www.instagram.com/accounts/login/?force_classic_login")
session.set_field_value("input[name=username]", "joe")
session.set_field_value("input[name=password]", "test")
session.show()
session.evaluate("alert('test')")
# other python code
The latter can be achieved as so:
from ghost import Ghost
import PySide
ghost = Ghost()
session = ghost.start()
page, resources = session.open("https://www.instagram.com/accounts/login/?force_classic_login")
session.set_field_value("input[name=username]", "joe")
session.set_field_value("input[name=password]", "test")
session.show()
session.evaluate("alert('test')")
# other python code
The session will however inevitable exit when the python process ends. Also worth noting is that some operations will return as soon as the initial http request has completed. If you wish to wait until other resources have loaded you may need to call session.wait_for_page_loaded(). I have also found that some form submissions require a call to session.sleep() to behave as expected.
Related
I wrote a mini-app, that scrapes my school's Website then looks for the title of the last post, compare it to the old title, if it's not the same, it then sends me an email.
In order for the app to work properly it needs to keep running 24/7 so that the value of the title variable is correct.
Here's the code:
import requests
from bs4 import BeautifulSoup
import schedule, time
import sys
import smtplib
#Mailing Info
from_addr = ''
to_addrs = ['']
message = """From: sender
To: receiver
Subject: New Post
A new post has been published
visit the website to view it:
"""
def send_mail(msg):
try:
s = smtplib.SMTP('localhost')
s.login('email',
'password')
s.sendmail(from_addr, to_addrs, msg)
s.quit()
except smtplib.SMTPException as e:
print(e)
#Scraping
URL = ''
title = 'Hello World'
def check():
global title
global message
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
main_section = soup.find('section', id='spacious_featured_posts_widget-2')
first_div = main_section.find('div', class_='tg-one-half')
current_title = first_div.find('h2', class_='entry-title').find('a')['title']
if current_title != title:
send_mail(message)
title = current_title
else:
send_mail("Nothing New")
schedule.every(6).hours.do(check)
while True:
schedule.run_pending()
time.sleep(0.000001)
So my question is How do I keep this code running on host using Cpanel?
I know I can use cron jobs to run it every like 2 hours or something, but I don't know how to keep the script itself running, using a terminal doesn't work when I close the page the app gets terminated
So - generally to run programs for an extended period, they would need to be daemonised. Essentially disconnected from your terminal with a double-fork, and a set-sid. Having that said, I've never actually done it myself, since it was usually either (a) the wrong solution, or (b) it's re-inventing the wheel (https://github.com/thesharp/daemonize).
In this case, I think a better course of action would be to invoke the script every 6 hours, rather than have it internally do something every 6 hours. Making your program resilient to a restart is pretty much how most systems are kept reliable, and putting them in a 'cradle' that automatically restarts them.
In your case, I'd suggest saving the title to a file, and reading from and writing to that file when the script is invoked. It would make your script simplier, and more robust, and you'd be using battle-hardened tools for the job.
A couple of years down the line, when your writing code that needs to survive the total machine crashing, and being replaced (within 6 hours, with everything installed) you can use some external form of storage (like a database) instead of a file, to make your system even more resiliant.
I couldn't find a proper response so I post this question.
The fastest way to understand the question is the goal:
There is a main process and a subprocess (the one I want to create). The main process inspects several websites via webdriver, but sometimes it got stuck at low selenium level and don't want to change the official code. So.. I manually inspect sometimes the monitor to the check whether the process got stuck, and if so, then I change manually the url in the browser and it works again smooth. I don't want to be a human checker.. so i'd like to automate the task with a subprocess that shares the same webdriver and inspects the url by webdriver.current_url and do the work for me.
Here is my try in the minimal representative example form in which the sub-process only detects a change in the url of the webdriver
def test_sub(driver):
str_site0 = driver.current_url # get the site0 url
time.sleep(4) # give some time to the main-process to change to site1
str_site1 = driver.current_url # get the site1 url (changed by main-process)
if str_site0 == str_site1:
print('sub: no change detected')
else:
print('sub: change detected')
#endif
#enddef sub
def test_main():
""" main process changes from site0 (stackoverflow) to site1 (youtube)
sub process detects this change of url of the webdriver object (same pointer) by using
".current_url" method
"""
# init driver
pat_webdriver = r"E:\WPy64-3680\python-3.6.8.amd64\Lib\site-packages\selenium\v83_chromedriver\chromedriver.exe"
driver = webdriver.Chrome(executable_path= pat_webdriver)
time.sleep(2)
# open initial site
str_site0 = 'https://stackoverflow.com'
driver.get(str_site0)
time.sleep(2)
# init sub and try to pass the webdriver object
p = multiprocessing.Process(target=test_sub, args=(driver,)) # PROBLEM HERE! PYTHON UNCAPABLE
p.daemon = False
p.start()
# change site
time.sleep(0.5) # give some time sub query webdriver with site0
str_site1 = 'https://youtube.com' # site 1 (this needs to be detected by sub)
driver.get(str_site1)
# wait the sub to detect the change in url. and kill process (non-daemon insufficient don't know why..)
time.sleep(3)
p.terminate()
#enddef test_main
# init the program (main-process)
test_main()
the corresponding error by executing $python test_multithread.py (it's the name of the test script..) is the following one:
I'm using requests_html to scrape some site :
from requests_html import HTMLSession
for i in range (0,30):
session = HTMLSession()
r = session.get('https://www.google.com')
r.html.render()
del session
Now this code creates more than 30 sub-process of chromium as Python's sub-process. And this acquires memory, so how can I remove them?
I don't want to use psutil, as it will increase one more dependency and to kill python's sub-process python may have some built in method, I want to be enlightened, if there is so
I can't even use exit() as I have to return and then exit(inside a method), and of course I can't exit and return
You might want to try closing the session:
session = HTMLSession()
session.close()
See requests_html.HTMLSession.close.
I want to make a simple python program to generate a captcha for a flask website. I can generate the image, but if I save it in for e.g. in /images/captcha_{id}.png ,then I would have tons of old Captchas as the website gets used.
I've tried to create a script that uses the sleep function to remove the old captchas every N time, but the problem is then, that I disable all the activity in the website for the N time.
The Captcha system is the following :
import secrets, string
from PIL import Image, ImageFont, ImageDraw
def gen_captcha(id):
alpha = string.ascii_letters + string.digits
captcha = "".join(secrets.choice(alpha) for i in range(8))
img = Image.new("RGBA", (200,100), (3, 115, 252))
font = ImageFont.truetype("arial.ttf",20)
w,h = font.getsize(captcha)
draw = ImageDraw.Draw(img)
draw.text((50,50), captcha, font=font, fill=(255, 239, 0))
img.save("captcha_{}.png".format(str(id))
return captcha
The flask app basically requests an input and displays the captcha based on the given id, and then says if req_captcha == captcha: return "You solved the captcha" it also gives an error if you don't solve it.
What I would like to know, is if I can make a little script that runs as a background process that deletes my old captchas.
I think what you're looking for is a cron job. Set one up to run a bash script that cleans up yesterdays captchas.
One of the possible approaches would be to use either the multiprocessing or threading modules available in Python. They're both quite similar in terms of API. I will base my answer on the multiprocessing approach, but you can yourself evaluate if the threaded approach suits your needs more. You can refer to this question as an example. Here's a sample implementation:
import os
import time
from multiprocessing import Process
def remove_old_captchas():
if os.fork() != 0:
return
print('Running process to remove captchas every 5 seconds ...')
while True:
time.sleep(5)
print("... Captcha removed")
if __name__ == '__main__':
p = Process(target=remove_old_captchas)
p.daemon = True
p.start()
p.join()
print('Main code running as well ...')
while True:
time.sleep(1)
print("... Request served")
In the output of that you can see the captchas being removed in a regular time interval:
Running process to remove captchas every 5 seconds ...
Main code running as well ...
... Request served
... Request served
... Request served
... Request served
... Captcha removed
... Request served
... Request served
... Request served
... Request served
... Request served
... Captcha removed
... Request served
... Request served
... Request served
In terms of design I would probably still go with a cron job like mentioned in another answer, but you asked about running a background task, so that would one possible answer. You may also like the subprocess module.
I'm building a Django app and I'm using Spynner for web crawling. I have this problem and I hope someone can help me.
I have this function in the module "crawler.py":
import spynner
def crawling_js(url)
br = spynner.Browser()
br.load(url)
text_page = br.html
br.close (*)
return text_page
(*) I tried with br.close() too
in another module (eg: "import.py") I call the function in this way:
from crawler import crawling_js
l_url = ["https://www.google.com/", "https://www.tripadvisor.com/", ...]
for url in l_url:
mytextpage = crawling_js(url)
.. parse mytextpage....
when I pass the first url in to the function all is correct when I pass the second "url" python crash. Python crash in this line:br.load(url). Someone can help me? Thanks a lot
I have:
Django 1.3
Python 2.7
Spynner 1.1.0
PyQt4 4.9.1
Why you need to instantiate br = spynner.Browser() and close it every time you call crawling_js(). In a loop this will utilize a lot of resources which I think is the reason why it crashes. let's think of it like this, br is a browser instance. Therefore, you can make it browse any number of websites without the need to close it and open it again. Adjust your code this way:
import spynner
br = spynner.Browser() #you open it only once.
def crawling_js(url):
br.load(url)
text_page = br._get_html() #_get_html() to make sure you get the updated html
return text_page
then if you insist to close br later you simply do:
from crawler import crawling_js , br
l_url = ["https://www.google.com/", "https://www.tripadvisor.com/", ...]
for url in l_url:
mytextpage = crawling_js(url)
.. parse mytextpage....
br.close()