Python Selenium Instagram Bot - python

I'm building now Instagram bot and I have a question.
I want the bot to keep hitting like on the pictures in my feed.
I did something like this:
self.driver.find_element_by_css_selector('article._8Rm4L:nth-child(1) > div:nth-child(3) > section:nth-child(1) > span:nth-child(1) > button:nth-child(1)').click()
time.sleep(3)
self.driver.find_element_by_css_selector('article._8Rm4L:nth-child(2) > div:nth-child(3) > section:nth-child(1) > span:nth-child(1) > button:nth-child(1)').click()
time.sleep(3)
And it keeps going like this, there is any way I can do it easier than write 1,2,3,4...

You must add scrolling to it in order to continue liking images in the feed
My code:
SCROLL_PAUSE_TIME = 0.5
# Get scroll height
last_height = self.driver.execute_script("return
document.body.scrollHeight")
while true:
# Scroll down to bottom
self.driver.execute_script("window.scrollTo
(0,document.body.scrollHeight);")
# Wait to load page
sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = self.driver.execute_script("return document.body.scrollHeight")
last_height = new_height

Use Xpath it is better => find_element_by_xpath()
Or
You can use selenium IDE to record evey thing and save it python file to edit it.

Related

python selenium refuses to scroll down page, how to force / fix?

I need selenium using the firefox driver to scroll down the page to the very bottom, but my code no longer works. Its like the page recognizes I am trying to scroll with selenium and forces the scroll back to the top of the screen...
def scroll_to_bottom():
'''
Scroll down the page the whole way. This must be done to show all images on page to download.
'''
print("[+] Starting scroll down of page. This needs to be performed to view all images.")
count = 0
start_time = time.perf_counter()
try:
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
if count % 10 == 0:
elapsed_time = time.perf_counter() - start_time
elapsed_time_clean = time.strftime("%H:%M:%S", time.gmtime(elapsed_time))
print("[i] Still scrolling, its been " + str(elapsed_time_clean) + ", please continue to standby... C: " + str(count))
time.sleep(3)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3)
# Check if hight has changed
new_height = driver.execute_script("return document.body.scrollHeight")
vprint("last_height: " + str(last_height) + ", new_height: " + str(new_height))
if new_height == last_height:
break
last_height = new_height
count += 1
except Exception as exception:
print('[!] Exception exception: '+str(exception))
pass
print("[i] Scroll down of whole page complete.")
return
It will go to the page, start to scroll once, then pop up to the top of the screen and no longer scroll. Then my code thinks its at the bottom of the page because the page size did not change. This worked about 3 weeks ago but no longer works. I cant figure out why.
Is there a way to force scrolling?
BTw I tried using "DOWN" and "Page DOWN" key presses, that does not work either. Anyone have any ideas?

Not scrolling down in a website having dynamic scroll

I'm scraping news-articles from a website where there is no load-more button in a specific category page, the news article links are being generated as I scroll down. I wrote a function which take input category_page_url and limit_page(how many times I want to scroll down) and return me back all the links of the news articles displayed in that page.
Category page link = https://www.scmp.com/topics/trade
def get_article_links(url, limit_loading):
options = webdriver.ChromeOptions()
lists = ['disable-popup-blocking']
caps = DesiredCapabilities().CHROME
caps["pageLoadStrategy"] = "normal"
options.add_argument("--window-size=1920,1080")
options.add_argument("--disable-extensions")
options.add_argument("--disable-notifications")
options.add_argument("--disable-Advertisement")
options.add_argument("--disable-popup-blocking")
driver = webdriver.Chrome(executable_path= r"E:\chromedriver\chromedriver.exe", options=options) #add your chrome path
driver.get(url)
last_height = driver.execute_script("return document.body.scrollHeight")
loading = 0
while loading < limit_loading:
loading += 1
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(8)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
article_links = []
bsObj = BeautifulSoup(driver.page_source, 'html.parser')
for i in bsObj.find('div', {'class': 'content-box'}).find('div', {'class': 'topic-article-container'}).find_all('h2', {'class': 'article__title'}):
article_links.append(i.a['href'])
return article_links
Assuming I want to scroll 5 times in this category page,
get_article_links('https://www.scmp.com/topics/trade', 5)
But even if I change the number of my limit_page it return me back only the links from first page, there is some mistake I've done to write the scrolling part. Please help me with this.
Instead of scrolling using per body scrollHeight property, I checked to see if there was any appropriate element after the list of articles to scroll to. I noticed this appropriately named div:
<div class="topic-content__load-more-anchor" data-v-db98a5c0=""></div>
Accordingly, I primarily changed the while loop in your function get_article_links to scroll to this div using location_once_scrolled_into_view after finding the div before the loop starts, as follows:
loading = 0
end_div = driver.find_element('class name','topic-content__load-more-anchor')
while loading < limit_loading:
loading += 1
print(f'scrolling to page {loading}...')
end_div.location_once_scrolled_into_view
time.sleep(2)
If we now call the function with different limit_loading, we get different count of unique news links. Here are couple of runs:
>>> ar_links = get_article_links('https://www.scmp.com/topics/trade', 2)
>>> len(ar_links)
scrolling to page 1...
scrolling to page 2...
90
>>> ar_links = get_article_links('https://www.scmp.com/topics/trade', 3)
>>> len(ar_links)
scrolling to page 1...
scrolling to page 2...
scrolling to page 3...
120

not able scrape the element which contain the a text using selenium

could someone assist me with an issue, I am trying to scrape the the dish name with a tag MUST TRY but I don't know why it is printing the list of all dishes
CODE :
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome(executable_path='./chromedriver.exe')
driver.get("https://www.zomato.com/pune/bedekar-tea-stall-sadashiv-peth/order")
screen_height = driver.execute_script("return window.screen.height;") # get the screen height of the web
i = 1
count = 0
scroll_pause_time = 1
while True:
# scroll one screen height each time
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
# update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
scroll_height = driver.execute_script("return document.body.scrollHeight;")
# Break the loop when the height we need to scroll to is larger than the total scroll height
if (screen_height) * i > scroll_height:
break
driver.execute_script("window.scrollTo(0, 0);")
#block of code where i am struggling with
dish_divs = driver.find_elements_by_xpath("//div[#class = 'sc-1s0saks-11 cYGeYt']")
for items in dish_divs:
if items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]"):
name = items.find_element(By.CSS_SELECTOR,'h4.sc-1s0saks-15.iSmBPS')
print(name.text)
else:
continue
driver.close()
OUTPUT :
['Misal Slice', 'Shev Chivda', 'Kharvas [Sugar]', 'Extra Rassa [1 Vati]', 'Taak', 'Extra Slice', 'Misal Slice', 'Kharvas [Jaggery]', 'Solkadhi', 'Kokam', 'Nimboo Sharbat', 'Shev Chivda', 'Batata Chivda', 'Misal Slice', 'Extra Kanda [1 Vati]', 'Extra Slice', 'Extra Rassa [1 Vati]', 'Coffee Kharvas', 'Rose Kharvas', 'Shengdana Ladoo', 'Chirota', 'Kharvas [Sugar]', 'Kharvas [Jaggery]', 'Chocolate Fudge', 'Taak', 'Kokam', 'Flavored Milk', 'Nimboo Sharbat', 'Solkadhi', 'Dahi']
EXPECTED OUTPUT :
the list of dishes with musttry tag like in image below. My script is getting all the names not the selected ones
just try this xpath :
//div[text()='MUST TRY']/../../../h4
and use in code like this :
for name in driver.find_elements(By.XPATH, "//div[text()='MUST TRY']/../../../h4"):
print(name.text)
Instead of
dish_divs = driver.find_elements_by_xpath("//div[#class = 'sc-1s0saks-11 cYGeYt']")
for items in dish_divs:
if items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]"):
name = items.find_element(By.CSS_SELECTOR,'h4.sc-1s0saks-15.iSmBPS')
print(name.text)
else:
continue
You can use
dish_divs = driver.find_elements_by_xpath('//div[#class="sc-1s0saks-1 dpXgPd"]/preceding-sibling::h4')
for items in dish_divs:
print(items.text)
This will make your code more readable and easy to maintain
Here items.find_element(By.XPATH, "//div[contains(text(),'MUST TRY')]") you're using absolute XPath (search all elements from the root). In fact you need relative XPath (search only in the current element):
items.find_element(By.XPATH, ".//div[contains(text(),'MUST TRY')]")
You can get same result using a single XPath:
//div[div/div[#type="tag"][.="MUST TRY"]]/preceding-sibling::h4[1]/text()
Also I don't recommend you to parse HTML using Selenium. It's really slow for this. I recommend to use lxml or beautifulsoup.
You can use above XPath like this:
from lxml import html
....
content = driver.page_source
tree = html.fromstring(content)
titles = tree.xpath('//div[div/div[#type="tag"][.="MUST TRY"]]/preceding-sibling::h4[1]/text()')

Can't get all xpath elements from dynamic webpage

First time here asking. Hope someone can help me with this, it's driving me crazy !
I'm trying to scrape a used-car webpage from my country. The data loads when you start to scroll down, so, the first part of the code is for scrolling down and load the webpage.
I'm trying to get the link of every car published here, that's why I'm using find_elements_by_xpath in the try-except part.
Well, the problem is, the cars are showed up in packs of 11 for every load(scroll down), so the 11 xpaths repeats when scrolling down everytime;
meaning xpaths from
"//*[#id='w1']/div[1]/div/div[1]/a"
to
"//*[#id='w11']/div[1]/div/div[1]/a"
All libraries are called at the start of the code, don't worry.
from selenium import webdriver
from bs4 import BeautifulSoup
import time
links = []
url = ('https://buy.olxautos.cl/buscar?VehiculoEsSearch%5Btipo_valor%5D=1&VehiculoEsSearch%5Bprecio_range%5D=3990000%3B15190000')
driver = webdriver.Chrome('')
driver.get(url)
time.sleep(5)
SCROLL_PAUSE_TIME = 3
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
try:
zelda = driver.find_elements_by_xpath("//*[#id='w1']/div[1]/div/div[1]/a").get_attribute('href')
links.append(zelda)
except:
pass
print(links)
So the expected output of this code would be something like this:
['link_car_1', 'link_car_12', 'link_car_23', '...']
But when I run this code, it returns an empty list. But when I run it with find_element_by_xpath returns the first link, what am I doing wrong 😭😭, I just can't figure it out !!.
Thanks!
You get only one link because the XPATH is not the same for all the links. you can use bs4 to extract links by using the driver page source as shown below.
from bs4 import BeautifulSoup
import lxml
links = []
url = ('https://buy.olxautos.cl/buscar?VehiculoEsSearch%5Btipo_valor%5D=1&VehiculoEsSearch%5Bprecio_range%5D=3990000%3B15190000')
driver = webdriver.Chrome(executable_path = Path)
driver.get(url)
time.sleep(5)
SCROLL_PAUSE_TIME = 3
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
page_source_ = driver.page_source
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(SCROLL_PAUSE_TIME)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
#use BeautifulSoup to extract links
sup = BeautifulSoup(page_source_, 'lxml')
sub_ = sup.findAll('div', {'class': 'owl-item active'})
for link_ in sub_:
link = link_.find('a', href= True)
#link = 'https://buy.olxautos.cl' + link #if needed (adding prefix)
links.append(link['href'])
if new_height == last_height:
break
last_height = new_height
print('>> Total length of list : ', len(links))
print('\n',links)

Scrolling in a different layer in Chrome using Selenium in Python

I am writing a code in python using the module selenium, and I want to scroll on a list that is on a different layer in the same window. Imagine you go to Instagram, click on followers, and then wish to scroll down to the bottom so that selenium can make a list of all the users who follow that page.
My problem is my code scrolls on the layer below, which is the wall of the user.
def readingFollowers(self):
self.driver.find_element_by_xpath("//a[contains(#href, '/followers')]")\
.click()
sleep(2.5)
scroll_box = self.driver.find_element_by_xpath('/html/body/div[4]/div/div[2]')
# Get scroll height
last_height = self.driver.execute_script("return arguments[0].scrollHeight", scroll_box)
while True:
# Scroll down to bottom
self.driver.execute_script("window.scrollTo(0, arguments[0].scrollHeight);", scroll_box)
# Wait to load page
sleep(1)
# Calculate new scroll height and compare with last scroll height
new_height = self.driver.execute_script("return arguments[0].scrollHeight", scroll_box)
if new_height == last_height:
break
last_height = new_height
I have used Google Chrome, and the inspect element would be the same on all the systems (most probably).
For complete code, you can comment on me, in case you are not able to understand the problem. I can give you the code required to recreate the situation for better understanding.
I assume that you are already logged-in on the IG account.
def readingFollowers(self):
#click followers
self.driver.find_element_by_xpath('//a[#class="-nal3 "]').click()
time.sleep(5)
pop_up = driver.find_element_by_xpath('//div[#class="isgrP"]')
height = driver.execute_script("return arguments[0].scrollHeight", pop_up)
initial_height = height
#default follower count is 12
followers_count = 12
while True:
driver.execute_script("arguments[0].scrollBy(0,arguments[1])", pop_up, initial_height)
time.sleep(5)
#count loaded followers
count = len(driver.find_elements_by_xpath('//div[#class="PZuss"]/li'))
if count == followers_count:
break
followers_count = count
#add height because the list is expanding
initial_height+=initial_height
It took me some time but it works.

Categories

Resources