I'm passing a link as an argument in a thread, that I want to scrape the timestamp on. But in the function that the thread is pointing to, the timestamp value does not change, every time i'm rescraping it. How do you get timeLink to be dynamic and change every time it goes over the while loop? Here is the code:
def abcStart(timeLink):
while True:
res = timeLink
res.raise_for_status()
timestamp = BeautifulSoup(res.content, 'html.parser').find_all('b')
if timestamp[0].text == otherTimestamp[0].text:
work on something
break
if timestamp[0].text > otherTimestamp[0].text:
continue
else:
print('not yet')
time.sleep(30)
break
timelink = requests.get('http://example.com/somelink')
threadobj = threading.Thread(target=abcStart, args=(timelink))
threadobj.start()
threadobj.join()
It looks like there is only one http request being sent. On this line:
timelink = requests.get('http://example.com/somelink')
the abcStart() function is receiving the http response, and using that one value the whole time it is running. This will cause us to scrape the same page every time. If we want to have a different page to scrape for each loop iteration, we need to perform another http request each time. Something like this:
def abcStart(timeLink):
while True:
res = requests.get(timeLink) # send request here
res.raise_for_status()
timestamp = BeautifulSoup(res.content, 'html.parser').find_all('b')
if timestamp[0].text == otherTimestamp[0].text:
work on something
break
if timestamp[0].text > otherTimestamp[0].text:
continue
else:
print('not yet')
time.sleep(30)
break
timeLink = 'http://example.com/somelink' # declare url
threadobj = threading.Thread(target=abcStart, args=(timelink))
threadobj.start()
threadobj.join()
I guess you should move timeLink request inside your function:
def abcStart(timeLink):
while True:
res = requests.get('http://example.com/somelink')
res.raise_for_status()
timestamp = BeautifulSoup(res.content, 'html.parser').find_all('b')
if timestamp[0].text == otherTimestamp[0].text:
work on something
break
if timestamp[0].text > otherTimestamp[0].text:
continue
else:
print('not yet')
time.sleep(30)
break
threadobj = threading.Thread(target=abcStart, args=())
threadobj.start()
threadobj.join()
Related
Python - I am trying to get some code to check whether a webpage has loaded correctly, and handle a couple of exception (but not error) scenarios.
So the code would do the following
While the webpage is not loaded correctly (I have set some variables to control this)
try to check whether the page had loaded correctly
try to check whether it hasn't loaded correctly as the account is logged out
try to check whether it hasn't loaded correctly as it needs an update
otherwise wait 5 seconds and loop through the block again, until the maximum number of retries has been hit
I tried doing Try, except, except but as the errors are not specific (TypeError etc) then I get an 'Only one catch all' code rejection
I also considered an If Elif etc but then I don't know how to get it to loop back to the start if it doesn't match any of the scenarios and needs to wait and then try again.
The code I have tried
while not loaded and attempts < maxattempts and not loggedout: #confirm that the page is not loaded, not too many attempts or logged out.
try:
x1, y1=pygu.center(pygu.locateOnScreen("/whatsappopened.png", confidence=0.8))
time.sleep(2)
pygu.moveTo(x1,y1)
current_time = now.strftime("%H-%M-%S")
loaded =True
except:
x1, y1=pygu.center(pygu.locateOnScreen("whatsapploggedout.png", confidence=0.8))
time.sleep(2)
pygu.moveTo(x1,y1)
loggedout =True
except:
x1, y1=pygu.center(pygu.locateOnScreen("/whatsappupdate.png", confidence=0.8))
time.sleep(2)
pygu.hotkey('ctrl', 'w')
loggedout =True
except:
time.sleep(5)
attempts += +1
print("page not loaded after %s attempts" %(attempts))
Any guidance appreciated!
I came up with a potential solution with comments that may let you think about how to continue with the program and suit it to your needs.
attempts = 1
maxattempts = 5
loaded = False
loggedout = False
update = False
time_now = datetime.now()
while not loaded and attempts < maxattempts and not loggedout: #confirm that the page is not loaded, not too many attempts or logged out.
# Check if WhatsApp is open
try:
x1, y1=pygu.center(pygu.locateOnScreen("whatsappopened.png", confidence=0.8))
print("WhatsApp opened")
time.sleep(2)
pygu.moveTo(x1, y1)
time_now = datetime.now().strftime("%H:%M:%S")
loaded = True
except Exception as e:
# Here because WhatsApp is not open or logged out
# Check if WhatsApp is logged out
try:
x2, y2=pygu.center(pygu.locateOnScreen("whatsapploggedout.png", confidence=0.8))
print("WhatsApp logged out")
time.sleep(2)
pygu.moveTo(x2, y2)
loggedout = True
except Exception as e:
# Check if needs update
try:
x3, y3=pygu.center(pygu.locateOnScreen("whatsappupdate.png", confidence=0.8))
print("WhatsApp update")
time.sleep(2)
pygu.moveTo(x3, y3)
update = True
except Exception as e:
# Not open
print("WhatsApp not open")
time.sleep(2)
attempts += 1
time_now = datetime.now().strftime("%H:%M:%S")
loaded = False
loggedout = False
update = False
print("Attempts: " + str(attempts))
I'm experiencing an issue when calling a function inside a while loop.
The purpose of the while loop is to perform an action ,but it can only perform this action if a certain threshold appeared. This threshold is a result from another function.
When running this for the first time ,everything works ok. No threshold -no run.
The problem is ,that this threshold is affected by other parameters ,and when it changes ,it usually blocks the main program from running.
But ,at certain times, which I cannot pinpoint precisely when ,there's a "slip" and the threshold does not prevent the main program from running.
My question is ,could there be a memory leakage of some sort?
Code is below ,thanks.
def pre_run_check():
if check_outside() != 1:
return (0)
else:
return(1)
if __name__== '__main__':
while True:
time.sleep(0.5)
allow_action = None
while allow_action == None:
print ("cannot run")
try:
allow_action = pre_run_check()
except:
allow_action = 0
else:
if allow_action == 1:
print ("running")
#take action of some sort##
allow_action = None
def pre_run_check():
if check_outside() != 1:
return False
else:
return True
while True:
time.sleep(0.5)
allow_action = pre_run_check()
while not allow_action:
print ("cannot run")
try:
allow_action = pre_run_check()
if allow_action :
print ("running")
#take action of some sort##
allow_action = False
#Actualy need wait end of subprocess, Otherwise got some corrupted data/handle
break
except:
allow_action = False
time.sleep(.5)
This point is how to generate an sequential Process
Hope its helps.
I'm very new to python, but I've made a lot of progress over the last few days. The below script works fine, but I just can't figure out how implement code that would print an incremented number every time 'avail' is equal to NO. I'd like to have it print something like 'None Available 1' on the first loop, then 'None Available 2' on the second loop, then 'None Available 3' on the third loop, etc..
import requests
import time
import subprocess
from bs4 import BeautifulSoup
def get_page(url):
response = requests.get(url)
if not response.ok:
print('Server responded:', response.status_code)
else:
soup = BeautifulSoup(response.text, 'lxml')
return soup
def get_detail_data(soup):
avail = soup.find('span', id='availability').text.strip()
if avail == "YES":
return True
elif avail == "NO":
print('None Available')
return False
else:
print("Unexpected value")
return None
def main():
url ='https://www.blahblah.com'
while True:
is_available = get_detail_data(get_page(url))
if is_available:
subprocess.call(["C:\\temp\\filename.bat"], shell=False)
break
time.sleep(2)
if __name__ == '__main__':
main()
The following would probably work, but there might be a better way to structure it.
_not_avail_counter = 0
def get_detail_data(soup):
avail = soup.find('span', id='availability').text.strip()
if avail == "YES":
return True
elif avail == "NO":
_not_avail_counter += 1
print('None Available ' + str(_not_avail_counter))
return False
else:
print("Unexpected value")
return None
I would suggest changing your while True loop into a for loop on an itertools.count iterator. You can pass the value from the count to the get_detail_data function with an argument.
import itertools
def get_detail_data(soup, count): # take the count as an argument
avail = soup.find('span', id='availability').text.strip()
if ...
# ...
elif avail == "NO":
print('None Available', count) # include count here (and anywhere else you want)
# ...
def main():
url ='https://www.blahblah.com'
for c in itertools.count(): # produce the count in a loop
is_available = get_detail_data(get_page(url), c)
# ...
Note that itertools.count starts counting a zero. If you want to start at 1 (like a human usually would when counting things), you may want to pass 1 as the start argument: for c in itertools.count(1).
So what I want to do is that I want to make a sorts of monitor a website that pick ups a random number. Before it does it needs to requests to the website to see whenever it is valid or not. When it is live it will generate random numbers 1-100 and I want it to check every random 3-6 second and then print again the number and repeat until the website is down.
What I have tried to do is following:
def get_product_feed(url_site):
thread = url_site
password = False #We start by giving a false/true value
while not password: #If it is not True then we do the for loop
available = False
while not available:
try:
checkpass = requests.get(thread, timeout=12) #We requests the site to see if its alive or not
if ('deadsite' in checkpass.url): #If it contains etc deadsite then we enter the while True
while True:
contcheck = requests.get(thread,timeout=12) #We make new requests to see if its dead.
if ('deadsite' in contcheck.url): #if it is, then we sleep 3-7sec and try again
randomtime = random.randint(3, 7)
time.sleep(randomtime)
else: #If its not containig it anymore then we send the bs4 value
available = True
bs4 = soup(contcheck.text, 'lxml')
return bs4
break
else: #If its having either of them then we send instant.
bs4 = soup(contcheck.text, 'lxml')
return bs4
break
except Exception as err:
randomtime = random.randint(1, 2)
time.sleep(randomtime)
continue
def get_info(thread):
while True:
try:
url = thread
resp = requests.get(url, timeout=12) #We requests to the website here
resp.raise_for_status()
json_resp = resp.json() #We grab the json value.
except Exception as err:
randomtime = random.randint(1,3)
time.sleep(randomtime)
continue
metadata = json_resp['number'] #We return the metadata value back to get_identifier
return metadata
def get_identifier(thread):
new = get_info(thread) #We requests the get_info(thread):
try:
thread_number = new
except KeyError:
thread_number = None
identifier = ('{}').format(thread_number) #We return back to script
return identifier
def script():
url_site = 'https://www.randomsitenumbergenerator.com/' #What url we gonna use
old_list = []
while True:
for thread in get_product_feed(url_site): #We loop to see through get_product_feed if its alive or not
if get_identifier(thread) not in old_list: # We then ask get_identifier(thread) for the values and see if its in the old_list or not.
print(get_identifier(thread)
old_list.append(get_identifier(thread)
I added a comment to make it easier to understand what is going on.
The issue I am having now that I am not able to make get_identifier(thread) to run until the website is down and I want it to continue to print out until the website is live til it dies and that is my question! What do I need to do to make it happen?
My thoughts was to add eventually threads maybe that 10 threads are checking at the same time to see if the website is dead or not and give back the value as a print but I am not sure if that is my solution for the question.
So there's this website that posts something I want to buy at a random time of day for a limited amount of time and I want to write something to send a message to my phone when a new url is posted to that webpage.
I planned on doing this by counting the number of links on the page (since it's rarely updated) and checking it every 5 minutes against what it was 5 minutes before that, then 5 minutes later check it against what it was 10 minutes before that, 5 minutes later check what it was 15 minutes before that... and if it's greater than what it originally was, send a message to my phone. Here's what I have so far:
class url_alert:
url = ''
def link_count(self):
notifyy=True
while notifyy:
try:
page = urllib.request.urlopen(self.url)
soup = bs(page, "lxml")
links=[]
for link in soup.findAll('a'):
links.append(link.get('href'))
notifyy=False
print('found', int(len(links)), 'links')
except:
print('Stop making so many requests')
time.sleep(60*5)
return len(links)
def phone(self):
self= phone
phone.message = client.messages.create(to="", from_="",body="")
print('notified')
def looper(self):
first_count = self.link_count()
print('outside while')
noty = True
while noty:
try:
second_count = self.link_count()
print('before compare')
if second_count == first_count:
self.phone()
noty = False
except:
print('not quite...')
time.sleep(60)
alert = url_alert()
alert.looper()
As a test, I decided to set the if statement that determines whether or not to send a message as equal but the loop kept on running. Am I calling the functions within the looper function the right way?
It looks like you need to eliminate the try block, as it is now, if self.phone() takes an exception you will never leave the loop
def looper(self):
first_count = self.link_count()
while True:
if first_count != self.link_count():
self.phone()
break
time.sleep(60)