def Item_Finder():
item_finder = driver.find_element(By.XPATH, "XPATH HERE").text
item_finder = re.sub('[%+]', '', item_finder)
item_finder = float(item_finder)
return item_finder
while Item_Finder() <= 5:
driver.find_element(By.XPATH, "XPATH HERE").click()
else:
Item_Finder()
print("No item found. Retrying...")
Cant seem to get this while loop working it only runs once. Code just looks at item markup on peoples listings . First post on here aswell not to sure how to get the indentations to show but they are there. Any help appreciated only recently started learning.
Related
I'm trying to make a little Facebook group auto post script using python and selenium
First of all I'm uploading 3/4 photos using the following code:
l = driver.find_elements_by_tag_name('input')
for g in l:
print(g)
try:
if g == driver.find_element_by_xpath("//input[#type='file']"):
print("Found")
logging.debug("Found input for image uploading")
g.send_keys(
'/var/www/html/v1/insta-post/AutoPostFB/images/0.jpg \n/var/www/html/v1/insta-post/AutoPostFB/images/1.jpg \n/var/www/html/v1/insta-post/AutoPostFB/images/2.jpg')
print("File/s Uploaded")
logging.debug("Images uploaded")
time.sleep(5)
# break
except:
print("Element not found after upload")
logging.debug("Element for upload not found")
Then I push the post text, since the textbox is already on focus
actions = ActionChains(driver)
actions.send_keys(info)
actions.perform()
time.sleep(1)
The issue is that every space in the info variable is abnormally converted into a return key
so if the text is like
info = "HEI HOW'S IS GOING?"
what I get in the textbox is
IS
GOING
HEY
HOW'S
Someone can help me out? I've tried like everything
I am scraping a YouTube page and find a open program codes online. The code runs and returns correct results. However, as I learn the code sentence by sentence, i find that I could not find the attribute in the source code. I searched for it in page source, inspect element view and copied and paste the raw code in word. Nowhere could I find it.
How did this happen?
Codes below:
soup=BeautifulSoup(result.text,"lxml")
# cannot find yt-lockup-meta-info anywhere......
view_element=soup.find_all("ul",class_="yt-lockup-meta-info")
totalview=0
for objects in view_element:
view_list=obj.findChildren()
for element in view_list:
if element.string.endwith("views"):
videoviews=element.text.replace("views","").replace(",","")
totalview=totalview+int(videoviews)
print(videoviews)
print("----------------------")
print("Total_Views"+str(totalview))
The attribute I searched for is "yt-lockup-meta-info".
The page source is here.
The original page.
I see a few problems, which I think might be cleared up if I saw the full code. However there are some things that need fixed within this block.
For example, this line should read:
for obj in view_element:
instead of:
for objects in view_element:
You are only referencing one "obj", not multiple objects when traversing through "view_element".
Also, there is no need to search for the word "views" when there is a class you can search directly.
Here is how I would address this problem. Hope this helps.
#Go to website and convert page source to Soup
response = requests.get('https://www.youtube.com/results?search_query=web+scraping+youtube')
soup = BeautifulSoup(response.text, 'lxml')
f.close()
videos = soup.find_all('ytd-video-renderer') #Find all videos
total_view_count = 0
for video in videos:
video_meta = video.find('div', {'id': 'metadata'}) #The text under the video title
view_count_text = video_meta.find_all('span', {'class': 'ytd-video-meta-block'})[0].text.replace('views', '').strip() #The view counter
#Converts view count to integer
if 'K' in view_count_text:
video_view_count = int(float(view_count_text.split('K')[0])*1000)
elif 'M' in view_count_text:
video_view_count = int(float(view_count_text.split('M')[0])*1000000)
elif 'B' in view_count_text:
video_view_count = int(float(view_count_text.split('B')[0])*1000000000)
else:
video_view_count = int(view_count_text)
print(video_view_count)
total_view_count += video_view_count
print(total_view_count)
I am trying to get the like count from someone's latest posts (and also get the Instagram link for that post) using Python but I can't seem to figure it out. I have tried every single method that is used online but none of them seem to work anymore.
My idea was to let Python open a browser tab and go to the www.instagram.com/p/ link but that also doesn't seem to work anymore.
I have no code to upload because it's all a big mess with different strategies so I just deleted it and decided to start over.
Just grab an https proxy list and here ya go, I wrote this a while back and can be easily edited to your needs.
import requests, sys, time
from random import choice
if len(sys.argv) < 2: sys.exit(f"Usage: {sys.argv[0]} <Post Link> <Proxy List>")
Comments = 0
ProxList = []
Prox = open(sys.argv[2], "r")readlines()
for line in ReadProx:
ProxList.append(line.strip('\n'))
while True:
try:
g = requests.get(sys.argv[1], proxies={'https://': 'https://'+choice(ProxList)})
print(g.json())
time.sleep(5)
if len(g.json()['data']['shortcode_media']['edge_liked_by']['count']) > Comments:
print(f"[+] Like | {g.json()['data']['shortcode_media']['edge_liked_by']['edges'][int(Comments)]['node']['username']}")
if len(g.json()['data']['shortcode_media']['edge_liked_by']['count']) == Comments - 1:
pass
else:
Comments += 1
time.sleep(1.5)
except KeyboardInterrupt: sys.exit("Done")
except Exception as e: print(e)
sorry if this is kind of vague I do not know how to explain it well but basically I am trying to run a function which checks if an ID is on a page and I do not know how to do it. Here is my code of what I've attempted so far.
def checkoutpage():
driver1.find_element_by_id('test')
try:
if checkoutpage == True:
print("Working")
else:
print("Not working")
except:
print("ERROR")
It returns Not working not matter if the ID is on the page or not, help is appreciated.
Hello once again fellow stack'ers. Short description.. I am web scraping some data from an automotive forum using Python and saving all data into CSV files. With some help from other stackoverflow members managed to get as far as mining through all pages for certain topic, gathering the dates, title and link for each post.
I also have a seperate script I am now sturggling with implementing (For every link found, python creates a new soup for it, scrapes through all the posts and then goes back to previous link).
Would really appreciate any other tips or advice on how to make this better as it's my first time working with python, I think it might be my nested loop logic that's messed up, but checking through multiple times seems right to me.
Heres the code snippet :
link += (div.get('href'))
savedData += "\n" + title + ", " + link
tempSoup = make_soup('http://www.automotiveforums.com/vbulletin/' + link)
while tempNumber < 3:
for tempRow in tempSoup.find_all(id=re.compile("^td_post_")):
for tempNext in tempSoup.find_all(title=re.compile("^Next Page -")):
tempNextPage = ""
tempNextPage += (tempNext.get('href'))
post = ""
post += tempRow.get_text(strip=True)
postData += post + "\n"
tempNumber += 1
tempNewUrl = "http://www.automotiveforums.com/vbulletin/" + tempNextPage
tempSoup = make_soup(tempNewUrl)
print(tempNewUrl)
tempNumber = 1
number += 1
print(number)
newUrl = "http://www.automotiveforums.com/vbulletin/" + nextPage
soup = make_soup(newUrl)
My main issue with it so far is that tempSoup = make_soup('http://www.automotiveforums.com/vbulletin/' + link)
Does not seem to create a new soup after it has done scraping all the posts for forum thread.
This is the output I'm getting :
http://www.automotiveforums.com/vbulletin/showthread.php?s=6a2caa2b46531be10e8b1c4acb848776&t=1139532&page=2
http://www.automotiveforums.com/vbulletin/showthread.php?s=6a2caa2b46531be10e8b1c4acb848776&t=1139532&page=3
1
So it does seem to find the correct links for new pages and scrape them, however for next itteration it prints the new dates AND the same exact pages. There's also a reaaly weird 10-12 seconds delays after the last link is printed and only then it hops down to print number 1 and then bash out all the new dates..
But after going for the next forum threads link, it scrapes same exact data every time.
Sorry if it looks really messy, it is sort of a side project, and my first attempt at doing some useful things, so I am very new at this, any advice or tips would be much appreciated. I'm not asking you to solve the code for me, even some pointers for my possibly wrong logic would be greatly appreciated!
So after spending a little bit more time, I have managed to ALMOST crack it. It's now at the point where python finds every thread and it's link on the forum, then goes onto each link, reads all pages and continues on with next link.
This is the fixed code for it if anyone will make any use of it.
link += (div.get('href'))
savedData += "\n" + title + ", " + link
soup3 = make_soup('http://www.automotiveforums.com/vbulletin/' + link)
while tempNumber < 4:
for postScrape in soup3.find_all(id=re.compile("^td_post_")):
post = ""
post += postScrape.get_text(strip=True)
postData += post + "\n"
print(post)
for tempNext in soup3.find_all(title=re.compile("^Next Page -")):
tempNextPage = ""
tempNextPage += (tempNext.get('href'))
print(tempNextPage)
soup3 = ""
soup3 = make_soup('http://www.automotiveforums.com/vbulletin/' + tempNextPage)
tempNumber += 1
tempNumber = 1
number += 1
print(number)
newUrl = "http://www.automotiveforums.com/vbulletin/" + nextPage
soup = make_soup(newUrl)
All I've had to do was to seperate the 2 for loops that were nested within each other, into own loops. Still not a perfect solution, but hey, it ALMOST works.
The non working bit: First 2 threads of provided link have multiple pages of posts. The following 10+ more threads Do not. I cannot figure out a way to check the for tempNext in soup3.find_all(title=re.compile("^Next Page -")):
value outside of loop to see if it's empty or not. Because if it does not find a next page element / href, it just uses the last one. But if I reset the value after each run, it no longer mines each page =l A solution that just created another one problem :D.
Many thanks dear Norbis for sharing your ideas and insights and concepts
since you offer only a snippet i just try to provide an approach that shows how to login to a phpBB - using payload:
import requests
forum = "the forum name"
headers = {'User-Agent': 'Mozilla/5.0'}
payload = {'username': 'username', 'password': 'password', 'redirect':'index.php', 'sid':'', 'login':'Login'}
session = requests.Session()
r = session.post(forum + "ucp.php?mode=login", headers=headers, data=payload)
print(r.text)
but wait: we can - instead of manipulating the website using requests,
also make use a browser automation such as mechanize offers this.
This way we don't have to manage the own session and only have a few lines of code to craft each request.
a interesting example is on GitHub https://github.com/winny-/sirsi/blob/317928f23847f4fe85e2428598fbe44c4dae2352/sirsi/sirsi.py#L74-L211