Do while emulation in python is not working properly - python

testurl = '{}/testplan/Plans/{}/Suites/{}/Testpoint?includePointDetails=true&api-version=5.1-preview.2'.format(base, planId, suiteId)
print(testurl)
while True:
c = count_testpoints(testplanAPI(base, planId, suiteId, callAPI(testurl)))
if(c<200):
break
Where callAPI() is a function is used to return a header from the response which is passed as an argument to testplanAPI() to build a new testurl using that argument as URL parameter. testplanAPI() returns testurl while count_testpoints() returns the count of testpoints.
I have to close the loops after its get the first count less than 200.
Using above code is building the url only once and iterating the same condition infinitely. It's not appending the url further after the first iteration.
Can you please suggest a better way or what can be rectify here?

As #deceze correctly wrote, you have to set the url inside the loop, and you most likely have to save the new base and IDs...
while c < 200:
testurl = '{}/testplan/Plans/{}/Suites/{}/Testpoint?includePointDetails=true&api-version=5.1-preview.2.format(base, planId, suiteId)'
print(testurl)
c = count_testpoints(testplanAPI(base, planId, suiteId, callAPI(testurl)))
# sth like: base, planId, suiteId = new values for these...

Related

Using Try Except to iterate through a list in Python

I'm trying to iterate through a list of NFL QBs (over 100) and add create a list of links that I will use later.
The links follow a standard format, however if there are multiple players with the same name (such as 'Josh Allen') the link format needs to change.
I've been trying to do this with different nested while/for loops with Try/Except with little to no success. This is what I have so far:
test = ['Josh Allen', 'Lamar Jackson', 'Derek Carr']
empty_list=[]
name_int = 0
for names in test:
try:
q_b_name = names.split()
link1=q_b_name[1][0].capitalize()
link2=q_b_name[1][0:4].capitalize()+q_b_name[0][0:2].capitalize()+f'0{name_int}'
q_b = pd.read_html(f'https://www.pro-football-reference.com/players/{link1}/{link2}/gamelog/')
q_b1 = q_b[0]
#filter_status is a function that only works with QB data
df = filter_stats(q_b1)
#triggers the try if the link wasn't a QB
df.head(5)
empty_list.append(f'https://www.pro-football-reference.com/players/{link1}/{link2}/gamelog/')
except:
#adds one to the variable to change the link to find the proper QB link
name_int += 1
The result only appends the final correct link. I need to append each correct link to the empty list.
Still a beginner in Python and trying to challenge myself with different projects. Thanks!
As stated, the try/except will work in that it will try the code under the try block. If at any point within that block it fails or raises and exception/error, it goes and executes the block of code under the except.
There are better ways to go about this problem (for example, I'd use BeautifulSoup to simply check the html for the "QB" position), but since you are a beginner, I think trying to learn this process will help you understand the loops.
So what this code does:
1 It formats your player name into the link format.
2 We initialize a while loop that will it will enter
3 It gets the table.
4a) It enters a function that checks if the table contains 'passing'
stats by looking at the column headers.
4b) If it finds 'passing' in the column, it will return a True statement to indicate it is a "QB" type of table (keep in mind sometimes there might be runningbacks or other positions who have passing stats, but we'll ignore that). If it returns True, the while loop will stop and go to the next name in your test list
4c) If it returns False, it'll increment your name_int and check the next one
5 To take care of a case where it never finds a QB table, the while loop will go to False if it tries 10 iterations
Code:
import pandas as pd
def check_stats(q_b1):
for col in q_b1.columns:
if 'passing' in col.lower():
return True
return False
test = ['Josh Allen', 'Lamar Jackson', 'Derek Carr']
empty_list=[]
for names in test:
name_int = 0
q_b_name = names.split()
link1=q_b_name[1][0].capitalize()
qbStatsInTable = False
while qbStatsInTable == False:
link2=q_b_name[1][0:4].capitalize()+q_b_name[0][0:2].capitalize()+f'0{name_int}'
url = f'https://www.pro-football-reference.com/players/{link1}/{link2}/gamelog/'
try:
q_b = pd.read_html(url, header=0)
q_b1 = q_b[0]
except Exception as e:
print(e)
break
#Check if "passing" in the table columns
qbStatsInTable = check_stats(q_b1)
if qbStatsInTable == True:
print(f'{names} - Found QB Stats in {link1}/{link2}/gamelog/')
empty_list.append(f'https://www.pro-football-reference.com/players/{link1}/{link2}/gamelog/')
else:
name_int += 1
if name_int == 10:
print(f'Did not find a link for {names}')
qbStatsInTable = False
Output:
print(empty_list)
['https://www.pro-football-reference.com/players/A/AlleJo02/gamelog/', 'https://www.pro-football-reference.com/players/J/JackLa00/gamelog/', 'https://www.pro-football-reference.com/players/C/CarrDe02/gamelog/']

How to call this function right?

Hello guys I'm currently trying to build a phone_numbers generator with two methods. The first one is phone_numbers and should be called every time you need new generated number. You can get the numbers from next_phone_numbers. This function should also call phone_numbers if there are no numbers available. However I always get an error message when I call it. Here is my code:
def phone_number(self):
request = requests.get('https://www.bestrandoms.com/random-at-phone-number')
soup = BeautifulSoup(request.content, 'html.parser')
self.phone_numbers = soup.select_one('#main > div > div.col-xs-12.col-sm-9.main > ul > textarea').text
self.phone_numbers = self.phone_numbers.splitlines()
for phone in range(len(self.phone_numbers)):
self.phone_numbers[phone] = '+43' + self.phone_numbers[phone][1:].replace(' ', '')
self.phone_numbers.extend(self.phone_numbers)
return self.phone_numbers
#property
def next_phone_number(self):
self.phone_index += 1
if self.phone_index >= len(self.phone_numbers):
self.phone_number()
return self.phone_numbers[self.phone_index]
error-message:
enter image description here
The issue is that the phone_number() function is not extending the self.phone_numbers list enough to cover the fact that the self.phone_index is out of range. Perhaps consider doing this to extend until it is large enough:
#property
def next_phone_number(self):
self.phone_index += 1
while self.phone_index >= len(self.phone_numbers):
self.phone_number()
return self.phone_numbers[self.phone_index]
When you do self.phone_numbers = soup.select_one(...), you're clobbering the old list of phone numbers you had in self.phone_numbers and replacing it with the data you've parsed from your web page. You later refine this into a new list, but it's still not doing what you want in terms of adding new numbers (I've not tried the code, I'd guess you always get the same amount of them from the web).
You should use a different variable name for the new data. That way you can extend the existing list with the new data without overwriting anything you already have:
def phone_number(self):
request = requests.get('https://www.bestrandoms.com/random-at-phone-number')
soup = BeautifulSoup(request.content, 'html.parser')
# these lines all use new local variables instead of clobbering self.phone_numbers
new_data = soup.select_one('#main > div > div.col-xs-12.col-sm-9.main > ul > textarea').text
new_numbers = new_data.splitlines()
new_numbers_reformatted = ['+43' + number[1:].replace(' ', '') for number in new_numbers]
# so now we can extend the list as desired
self.phone_numbers.extend(new_numbers_reformatted)
return self.phone_numbers
It's possible that you'll also need to change the initialization code in your class to make sure it initializes self.phone_numbers to an empty list, if it's not doing that already. The bad behavior of your method might have been covering up a bug if the list was not being created anywhere else.

Python BeautifulSoup - Improve readability of find by Id function?

I would like to improve the readability following code, especially lines 8 to 11
import requests
from bs4 import BeautifulSoup
URL = 'https://docs.google.com/forms/d/e/1FAIpQLSd5tU8isVcqd02ymC2n952LC2Nz_FFPd6NT1lD4crDeSsJi2w/viewform?usp=sf_link'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
question1 = str(soup.find(id='i1'))
question1 = question1.split('>')[1].lstrip().split('.')[1]
question1 = question1[1:]
question1 = question1.replace("_", "")
print(question1)
Thanks in advance :)
You could use the following
question1 = soup.find(id='i1').getText().split(".")[1].replace("_","").strip()
to replace lines 8 to 11.
.getText() takes care of removing the html-tags. Rest is pretty much the same.
In python you can almos always just chain operations. So your code would also be valid a a one-liner:
question1 = str(soup.find(id='i1')).split('>')[1].lstrip().split('.')[1][1:].replace("_", "")
But in most cases it is better to leave the code in a more readable form than to reduce the line-count.
Abhinav, is not very clear what you want to achieve, the script is actually already very simple which is a good thing and follow the Pythonic principle of The Zen of Python:
"Simple is better than complex."
Also is not comprehensive of what you actually mean:
Make it more simple as in Understandable and clear for Human beings?
Make it more simple for the machine to compute it, hence improve performance?
Reduce the line of codes and follow more the programming Guidelines?
I point this out because for next time would be better to make it more explicit in the question, having said that, as I don't know exactly what you mean, I come up with an answer that more or less covers all of 3 points:
ANSWER
import requests
from bs4 import BeautifulSoup
URL = 'https://docs.google.com/forms/d/e/1FAIpQLSd5tU8isVcqd02ymC2n952LC2Nz_FFPd6NT1lD4crDeSsJi2w/viewform?usp=sf_link'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
# ========= < FUNCTION TO GET ALL QUESTION DYNAMICALLY > ========= #
def clean_string_by_id(page, id):
content = str(page.find(id=id)) # Get Content of page by different ids
if content != 'None': # Check if there is actual content or not
find_question = content.split('>') # NOTE: Split at tags closing
if len(find_question) >= 2 and find_question[1][0].isdigit(): # NOTE: If len is 1 means that is not the correct element Also we check if the first element is a digit means that is correct
cleaned_question = find_question[1].split('.')[1].strip() # We get the actual Question and strip it already !
result = cleaned_question.replace('_', '')
return result
else:
return
# ========= < Scan the entire page Dynamically + add result to a list> ========= #
all_questions = []
for i in range(1, 50): # NOTE: I went up to 50 but there may be many more, I let you test it
get_question = clean_string_by_id(soup, f'i{i}')
if get_question: # Append result to list only if there is actual content
all_questions.append(get_question)
# ========= < show all results > ========= #
for question in all_questions:
print(question)
NOTE
Here I'm assuming that you want to get all elements from this page, hence you don't want to write 2000 variables, as you can see I left the logic basically the same as yours but I wrapped everything in a Function instead.
In fact the steps you follow were pretty good and yes you may "improve it" or make it "smarter" however comprehensible wins complexity. Also take in mind that I assumed that get all the 'questions' from that Google Forms was your goal.
EDIT
As pointed by #wuerfelfreak and as he explains in his answer further improvement can be achived by using getText() function
Hence here the result of the above function using getText:
def clean_string_by_id(page, id):
content = page.find(id=id)
if content: # NOTE: Check if there is actual content or not, same as if len(content) >= 0
find_question = content.getText() # NOTE: Split at tags closing
if find_question: # NOTE: same as do if len(findÑ_question) >= 1: ... If is 0 means that is a empty line so we skip it
cleaned_question = find_question.split('.')[1].strip() # Same as before
result = cleaned_question.replace('_', '')
return result
Documentations & Guides
Zen of Python
getText
geeksforgeeks.org | isdigit()

How to increment between pages using Selenium and BeautifulSoup?

I'm trying to get my code to increment through the pages of this website and I can't seem to get it to loop and increment, instead doing the first page, and giving up. Is there something I'm doing wrong?
if(pageExist is not None):
if(countitup != pageNum):
countitup = countitup + 1
driver.get('http://800notes.com/Phone.aspx/%s/%s' % (tele800,countitup))
delay = 4
scamNum = soup.find_all(text=re.compile(r"Scam"))
spamNum = soup.find_all(text=re.compile(r"Call type: Telemarketer"))
debtNum = soup.find_all(text=re.compile(r"Call type: Debt Collector"))
hospitalNum = soup.find_all(text=re.compile(r"Hospital"))
scamCount = len(scamNum) + scamCount
spamCount = len(spamNum) + spamCount
debtCount = len(debtNum) + debtCount
hospitalCount = len(hospitalNum) + hospitalCount
block = soup.find(text=re.compile(r"OctoNet HTTP filter"))
extrablock = soup.find(text=re.compile(r"returning an unknown error"))
type(block) is str
type(extrablock) is str
if(block is not None or extrablock is not None):
print("\n Damn. Gimme an hour to fix this.")
time.sleep(2000)
Repo: https://github.com/GarnetSunset/Haircuttery/tree/Experimental
pageExist is not None this seems to be the problem.
Since it checks whether the page is None and it will most likely never be none. There is no official way to check for HTTP responses but we can use something like this.
if (soup.find_element_by_xpath('/html/body/p'[contains(text(),'400')])
#this will check if there's a 400 code in the p tag.
or
if ('400' in soup.find_element_by_xpath('/html/body/p[1]').text)
I'm sure there are other ways that one can do this but this is one of those and so that's the only issue here. You can then increment or keep the rest of your code pretty much as soon as you fix the first if .
I might have made some mistakes (syntax) in my code since I'm not testing it but the logic applies), great code tho!
Also instead of
type(block) is str
type(extrablock) is str
the pythonic way is using
isinstace
isinstance(block, str)
isinstance(extrablock, str)
and as for time.sleep you can use WebDriverWait there are two available methods, implicit and explicit wait, please take a look here.

How to use offset in VKontakte with Python?

I am trying to build a script where I can get the check-ins for a specific location. For some reason when I specify lat, long coords VK never returns any check-ins so I have to fetch location IDs first and then request the check-ins from that list. However I am not sure on how to use the offset feature, which I presume is supposed to work somewhat like a pagination function.
So far I have this:
import vk
import json
app_id = #enter app id
login_nr = #enter your login phone or email
password = '' #enter password
vkapi = vk.API(app_id, login_nr, password)
vkapi.getServerTime()
def get_places(lat, lon, rad):
name_list = []
try:
locations = vkapi.places.search(latitude=lat, longitude=lon, radius=rad)
name_list.append(locations['items'])
except Exception, e:
print '*********------------ ERROR ------------*********'
print str(e)
return name_list
# Returns last checkins up to a maximum of 100
# Define the number of checkins you want, 100 being maximum
def get_checkins_id(place_id,check_count):
checkin_list= []
try:
checkins = vkapi.places.getCheckins(place = place_id, count = check_count)
checkin_list.append(checkins['items'])
except Exception, e:
print '*********------------ ERROR ------------*********'
print str(e)
return checkin_list
What I would like to do eventually is combine the two into a single function but before that I have to figure out how offset works, the current VK API documentation does not explain that too well. I would like the code to read something similar to:
def get_users_list_geo(lat, lon, rad, count):
users_list = []
locations_lists = []
users = []
locations = vkapi.places.search(latitude=lat, longitude=lon, radius=rad)
for i in locations[0]:
locations_list.append(i['id'])
for i in locations:
# Get each location ID
# Get Checkins for location
# Append checkin and ID to the list
From what I understand I have to count the offset when getting the check-ins and then somehow account for locations that have more than 100 check-ins. Anyways, I would greatly appreciate any type of help, advice, or anything. If you have any suggestions on the script I would love to hear them as well. I am teaching myself Python so clearly I am not very good so far.
Thanks!
I've worked with VK API with javascript, but I think, logic is the same.
TL;DR: Offset is a number of results (starting with the first) which API should skip in response
For example, you make query, which should return 1000 results (lets imagine that you know exact number of results).
But VK return to you only 100 per request. So, how to get other 900?
You say to API: give me next 100 results. Next is offset - number of results you want to skip because you've already handled them. So, VK API takes 1000 results, skip first 100, and return to you next (second) 100.
Also, if you are talking about this method (http://vk.com/dev/places.getCheckins) in first paragraph, please check that your lat/long are float, not integer. And it could be useful to try swap lat/long - maybe you got them mixed up?

Categories

Resources