Parsing HTML with BeautifulSoup in Python freeze my script - python

I'm parsing a web page every 5 seconds to detect changes. This code runs almost all days and nights without problem, except a few times per day it freezes and I have to restart it. I could go around the problem with a timer that restarts automatically my script but I'd like to understand the problem.
Code:
while 1:
[...]
print(">>debug 1")
soup = BeautifulSoup(response, 'html.parser')
print(">>debug 2")
[...]
Output:
[...]
>>debug 1
>>debug 2
>>debug 1
This is where the script freezes. Last output is always ">>debug 1" then it freezes without crashing.
Why would that line freeze the script randomly once every ~3000 executions ? How can I investigate further ?
Thank you

Related

APScheduler Running 'date' early using Python with Selenium

I am trying to use APScheduler to automate a function call. It is supposed to read a date from a website I scrape, and then at a certain time relative to that date call the function (2 minutes before). The code for this portion of the project is here:
for mtp, url, track in zip(mtp_num_final, url_list, race_list):
scheduled_time = datetime.now() + relativedelta(minutes =+ (mtp - 2))
print(scheduled_time.strftime('%m-%d-%Y %I:%M%p'))
scheduler.add_job(two_minute_scrape(url, track, 5), 'date', scheduled_time)
The basic gist of the above code is that race_list and and url_list hold data which the two_minute_scrape function needs. I get the amount of minutes + 2 that I want to wait from mtp num final. When I run this code, the printed out date is correct - it knows exactly when I want to run the code, but then it IMMEDIATELY runs two_minute_scrape instead of waiting for that time. Is there anyway to solve this?
Thanks!

Python Modules/Libraries to correctly work in CronJobs

Im having an issue with cronjob to execute a function from a python module Pyautogui called from a python script.
Im currently running this on mac OS and running python through anaconda environment. After reading many StackOverflow & StackExchange posts, I was able to find this one (here) that was super helpful in getting my PATHs and env variables set. Was able to successfully get the python script to run with the job specified in the crontab.
However, just one line of the script (dependent on the Pyautogui module) is not executing. As most of the posts mention, this script runs with no issues when manually ran from terminal but does not result the same through cron.
Here is my crontab to run at 730am Mon-Fri;
SHELL=/bin/bash
HOME=/Users/harrisonw
PYTHONPATH=/Users/harrisonw/anaconda3/lib/python3.7/site-packages
30 7 * * 1-5 cd /Users/harrisonw/Documents/cron_jobs && /Users/harrisonw/anaconda3/bin/python3.7 online_status_pyautoguyi.py >> ~/Documents/cron_jobs/online_status_cron_output.txt
Here is my script w/ the shebang at the top line ; super simple logic to open a url then refresh that webpage every five minutes for 2 hours on a loop.
#!/Users/harrisonw/anaconda3/bin/python3.7
import os
import time
import pyautogui as py
refresh_counter= 0 #counter for whileloop to break after certain number
url= "https://www.facebook.com" #url to access and refresh
os.system("open " + url) #opens url using os library
time.sleep(10) #wait 10 secs for webpage to load
while True: #loop refresh command for 2 hours
time.sleep(300) #wait 5 mins
py.hotkey('command', 'r') #calls hotkey function "Command+R" to refresh page
print("Refreshed")
refresh_counter += 1 #count +1 for each refresh
if refresh_counter == 24: #condition to reach 24 refreshes in 5 min intervals= 2hrs
break
else: #continue loop if 24 is not reached.
continue
print(refreshed_counter)
print("\nComplete")
The line py.hotkey('command', 'r') is the issue im seeking help for.
Here is the output in the file online_status_cron_output.txt as stated in the crobtab above which confirms the script was run.
Refreshed
Refreshed
2
Complete
Im suspecting that Im missing an additional PATH to the Pyautogui module or an env variable in the crontab but not sure how to proceed from here.
Might be a silly question but is Pyautogui compatible with cronjobs?
Any insight and advise around this is appreciated. Thanks!

Reading a file every 30 minutes in Python

As you see in the below code, it is possible to open a file in a directory and read it. now i want live_token read the file every 30 minutes and print it. Can anyone help me in this regard?
I found below code as scheduling to do a job but i don't know how to do needful modifications.
schedule.every(30).minutes.do()
Sorry if this question is so basic, I am so new with Python.
def read_key():
live_key_file_loc = r'C:\key.txt'
live_key_file = open(live_key_file_loc , 'r')
global key_token
time.sleep(6)
live_token=live_key_file.read()
print(live_token)
import time
sleep_time = 30 * 60 # Converting 30 minutes to seconds
def read_key():
live_key_file_loc = r'C:\key.txt'
live_key_file = open(live_key_file_loc, 'r')
global key_token
time.sleep(6)
live_token = live_key_file.read()
print(live_token)
while(True): # This loop runs forever! Feel free to add some conditions if you want!
# If you want to read first then wait for 30 minutes then use this-
read_key()
time.sleep(sleep_time)
# If you want to wait first then read use this-
time.sleep(sleep_time)
read_key()
#jonrsharpe is right. Refer to schedule usage. You should have a script which should keep running always to fetch the token every 30 minutes from the file. I have put below a script which should work for you. If you dont want to run this file in python always, look for implementing a scheduled job.
import schedule
import time
def read_key():
with open('C:\\key.txt' , 'r') as live_key_file_loc
live_token = live_key_file_loc.read()
print(live_token)
schedule.every(30).minutes.do(read_key)
while True:
schedule.run_pending()
time.sleep(1)
There are a few steps in this process.
Search for “Task Scheduler” and open Windows Task Scheduler GUI.
Go to Actions > Create Task…
Name your action.
Under the Actions tab, click New
Find your Python Path by entering where python in the command line. Copy the result and put it in the Program/Script input.
In the "Add arguments (optional)" box, put the name of your script. Ex. - in "C:\user\your_python_project_path\yourFile.py", put "yourFile.py".
In the "Start in (optional)" box, put the path to your script. Ex. - in "C:\user\your_python_project_path\yourFile.py", put "C:\user\your_python_project_path".
Click “OK”.
Go to “Triggers” > New and choose the repetition that you want.
For more details check this site -
https://www.jcchouinard.com/python-automation-using-task-scheduler/

Python3: print statement of a variable leads to a different output

I write a Python script to manage my account on a webpage automatically.
Code Description:
The script has a while loop and at the end of the loop, it waits 12 hours before starting again.
Before the while loop starts, it's logging in to my account, and when entering the while loop, it checks if I'm still logged in. If not, it's logging in to my account again.
Problem:
After re-entering the while loop (first time everything goes fine), the script does only work, when print("Name is:") and print(name) is at the very beginning. I tested it several times and maybe it is just a bug/glitch, which was just unlucky to be caused only when the print statements aren't there, but this is very confusing me right now, how those print statements fixed my issue. I would like to know, what is or could causing the issue and how do I have to solve it properly?
Some side info:
The webpage is saving the login credentials through session cookies with a lifetime of ~6 hours. So after re-entering the script loop again, I'm not logged in for sure. If I'm reducing the wait time to 30 minutes instead of 12 hours, the script works also without the print statements.
General notes:
The script is running through nohup on my Raspberry Pi 3
Python version is 3.7.3
Code related notes:
I'm using the post method from requests to log in to my account
For checking, if I'm still logged in, I'm using beautifulSoup4
The following code is abbreviated and in a very basic shape.
"account" is an instance of a self-made class. When instantiating, it is log in itself with arguments, if given
This is the core code:
import time
import requests
from account import Account # costum made class
from bs4 import BeautifulSoup
# login credentials
name = "lol" # I replaced them with placeholders
pw = "lol"
account = Account(name, pw) # instantiating an account class and log in itself with given arguments
while True: # script loop
print("name is:") # Without those both print statements,
print(name) # the code won't work
if not account.stillAlive(): # if not signed in anymore ...
account.login(name, pw) # ... sign in again
account.doStuff() # Do the automating stuff
time.sleep(43200) # Wait 12 hours, before entering the while loop again
This is the doStuff() method from the Account class:
def doStuff(self):
html = requests.get("example.com").text # Note: example.com is only for demonstration purpose only
crawler = BeautifulSoup(html, "lxml")
lol = crawler.find("input", attrs={"name": "Submit2"}).get("value")
# ...
Error message:
So, if I'm executing the program without the print statements, I'm getting this error:
lol = crawler.find("input", attrs={"name": "Submit2"}).get("value")
AttributeError: 'NoneType' object has no attribute 'get'
This does not occur when executing with the print statements. With the print statements, the code runs fine.
My guess
My guess is, what the memory management of Python is deleting the name variable. When entering the script loop in the first time, I'm already logged in and therefore it is skipping the account.login(name, pw) part. Since this is the only part, where name is continued to be, maybe Python is interpreting this as dead code after too many time has passed without the line to be executed, and don't see the reason to keep the name/pw variable and deletes them. Still, I'm just an amateur and I don't have any expertise in this segment.
Side notes:
This is my first question I'm submitting, if I forgot or did something wrong, pls tell me.
I already searched for this problem, but I didn't find anything similar. Maybe I just searched badly, but I searched for a few hours now. If so, I apologize. (I had to wait for every test 12 hours and since I tried it several times, you can tell, I had some time available to search)

Python script with cron only finished the first loop

I have a python script which runs a for loop. I made it executable and put it in a cron job.
It posts a few tweets on twitter. For each loop, it sleeps a few seconds with random times.
However, it appears it only runs the very first loop and then stops. Every time, I only got ONE tweet. I could not figure out why.
Here is the core part of the code.
def post_message(url):
d = parse(url)
entries = d.entries
for entry in entries:
str = entry.title
tweet(str)
t = random.randint(start, stop)
time.sleep(t)
This is how I set it in cron.
0 23 * * * /home/demo/post_message.py
It only post the very first one and then stops. I am wondering if the time.sleep function stops the rest loops in cron?
Thanks.
Have you tried to run this python script from shell?
Is there any exception (Like you posted too frequently led the script to a 403 page which keeps the function from finding any appropriate html element) generated in 2nd tweet() function?

Categories

Resources