i'm trying to take a screenshot of product detail of Amazon item. I found that div id = aplus is the product detail description which is i'm looking for.
So i create code using python and selenium to take the full screen shot of the div section.
However, the result is cropped and only shows partial top of div.
options = webdriver.ChromeOptions()
options.headless = True
driver = webdriver.Chrome()
URL = "https://www.amazon.co.jp/-/en/Figuarts-Dragon-Saiyan-Approx-Painted/dp/B08S7KVHMP/ref=sr_1_1?crid=3O3TF6V9FJHS5¤cy=JPY&keywords=b08s7kvhmp&qid=1668143838&qu=eyJxc2MiOiIwLjAwIiwicXNhIjoiMC4wMCIsInFzcCI6IjAuMDAifQ%3D%3D&sprefix=%2Caps%2C140&sr=8-1"
driver.get(URL)
time.sleep(5)
S = lambda X: driver.execute_script('return document.body.parentNode.scroll' +X)
time.sleep(1)
driver.set_window_size(S('Width'), S('Height'))
image = driver.find_element('id','aplus')
image.screenshot('yes.png')
and if i put
options=options
inside webdriver.Chrome(), depending on product it takes full screenshot of the div, but it does not contain any image.
I have no idea how to take full screenshot of the div :S
This example you need import the library PIL.
pip install Pillow
from selenium import webdriver
from PIL import Image
from io import BytesIO
options = webdriver.ChromeOptions()
options.headless = True
driver = webdriver.Chrome()
URL = "https://www.amazon.co.jp/-/en/Figuarts-Dragon-Saiyan-Approx-Painted/dp/B08S7KVHMP/ref=sr_1_1?crid=3O3TF6V9FJHS5¤cy=JPY&keywords=b08s7kvhmp&qid=1668143838&qu=eyJxc2MiOiIwLjAwIiwicXNhIjoiMC4wMCIsInFzcCI6IjAuMDAifQ%3D%3D&sprefix=%2Caps%2C140&sr=8-1"
driver.get(URL)
# now that we have the preliminary stuff out of the way time to get that image :D
element = options.find_element_by_id('aplus') # find part of the page you want image of
location = element.location
size = element.size
png = options.get_screenshot_as_png() # saves screenshot of entire page
options.quit()
im = Image.open(BytesIO(png)) # uses PIL library to open image in memory
left = location['x']
top = location['y']
right = location['x'] + size['width']
bottom = location['y'] + size['height']
im = im.crop((left, top, right, bottom)) # defines crop points
im.save('screenshot.png') # saves new cropped image
Related
The link contains a map showing the current location of the bus, and I want to scrape the map every few minutes with python and output it as an image. I tried to manage it with the following code, but the output is not showing the map but only showing the route. Moreover, if I want to run multiple times with selenium, it will open a lot of browsers on the backend. Is there any other way to do this? Thanks
Code I tried:
from PIL import Image
from selenium import webdriver
driver = webdriver.Chrome('./chromedriver')
driver.maximize_window() # maximize window
driver.get("https://mobi.mit.edu/default/transit/route?feed=nextbus&direction=loop&agency=mit&route=tech&_tab=map")
element = driver.find_element("xpath", "/html/body/div/div/main/div[2]/div/div[2]/div/div[3]/div/div/div/div/div/div"); # this is the map xpath
location = element.location;
size = element.size;
driver.save_screenshot("canvas.png");
x = location['x'];
y = location['y'];
width = location['x']+size['width'];
height = location['y']+size['height'];
im = Image.open('canvas.png')
im = im.crop((int(x), int(y), int(width), int(height)))
im.save('canvas_el.png') # your file
Output:
Expected:
I have a working code that is able to access Tenor.com, scroll through the website and scrape gifs. But my issue is that it only scrapes and saves upto 24 gifs (no matter how many it scrolls past).
This exact same code works for saving images on other websites (without the same issues presented here).
I've also tried using BeautifulSoup to find all divs with the class "Gif " and then extract the img from each class. But that leads to the exact same result (only 24 gifs being downloaded).
Heres my code and output below. What might the issue be?
Output
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time
import requests
from urllib.parse import urljoin
from selenium.webdriver.common.by import By
import urllib.request
options = Options()
options.add_experimental_option("detach", True)
options.add_argument("--disable-notifications")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
search_url = 'https://tenor.com/'
driver.get(search_url)
time.sleep(5) # Allow 7 seconds for the web page to open
scroll_pause_time = 2 # You can set your own pause time. My laptop is a bit slow so I use 1 sec
screen_height = driver.execute_script("return window.screen.height;") #get the screen height of the web
i = 1
start_time = time.time()
while True:
if time.time() - start_time >= 60:
break
# scroll one screen height each time
driver.execute_script("window.scrollTo(0, {screen_height}*{i});".format(screen_height=screen_height, i=i))
i += 1
time.sleep(scroll_pause_time)
# update scroll height each time after scrolled, as the scroll height can change after we scrolled the page
scroll_height = driver.execute_script("return document.body.scrollHeight;")
# Break the loop when the height we need to scroll to is larger than the total scroll height
if (screen_height) * i > scroll_height:
break
media = []
media_elements = driver.find_elements(By.XPATH,"//div[contains(#class,'Gif ')]//img")
for m in media_elements:
src = m.get_attribute("src")
media.append(src)
print("Total Number of Animated GIFs and Videos Stored is", len(media))
print("The Sequence of Pages we Have Scrolled is", i)
for i in range(len(media)):
urllib.request.urlretrieve(str(media[i]),"tenor/media{}.gif".format(i))
If you scroll down with the DevTools opened, you can see that the number of figure elements doesn't increase after a certain quantity, i.e. old images are removed from the html as new ones are added.
So you have to run .get_attribute("src") inside the scrolling loop. Also, I suggest you using a set instead of a list to save the urls, since by running set.add(url) the url is added only if is not already contained in the set.
The code below scrape the images, get the urls and scroll to the last visible image.
media = set()
for i in range(6):
images = driver.find_elements(By.XPATH,"//div[contains(#class,'Gif ')]//img")
[media.add(img.get_attribute('src')) for img in images]
driver.execute_script('arguments[0].scrollIntoView({block: "center", behavior: "smooth"});', images[-1])
time.sleep(1)
I am working on a machine learning project and need a LOT of pictures for the data set that will train my program. The website https://its.txdot.gov/ITS_WEB/FrontEnd/default.html?r=SAT&p=San%20Antonio&t=cctv has pictures that are updated every six minutes. I need to save the image at LP 1604 at Kyle Seal Pkwy, but can't figure out how. I'm trying to right click on the image using action chains to save the image. Here's what I have so far:
driver.get('https://its.txdot.gov/ITS_WEB/FrontEnd/default.html?r=SAT&p=San%20Antonio&t=cctv')
time.sleep(5) #to let the site load
driver.find_element_by_id('LP-1604').click() #to get to the 1604 tab
time.sleep(5) #to let the site load
pic = driver.find_element_by_id('LP 1604 at Kyle Seale Pkwy__SAT')
action = ActionChains(driver)
action.context_click(pic)
The drop-down menu that usually pops up when you right-click is not showing up. And I feel like there has to be a better way to do this than right-click. I know how to wrap this in a loop that will execute every six minutes, so I don't need help there. It's just the downloading the image part. One of the problems I run into is that all the images are under the same url and most examples out there use urls. Any suggestions would be helpful.
I think that it could be help you do save the images in your Pc:
from PIL import Image
def save_image_on_disk(driver, element, path):
location = element.location
size = element.size
# saves screenshot of entire page
driver.save_screenshot(path)
# uses PIL library to open image in memory
image = Image.open(path)
left = location['x']
top = location['y'] + 0
right = location['x'] + size['width']
bottom = location['y'] + size['height'] + 0
image = image.crop((left, top, right, bottom)) # defines crop points
image = image.convert('RGB')
image.save(path, 'png') # saves new cropped image
def your_main_method():
some_element_img = driver.find_element_by_xpath('//*[#id="id-of-image"]')
save_image_on_disk(driver, some_element_img, 'my-image.png')
About the time you should use time.sleep(6*60)
The image data is located in the src property of the currentSnap element. It's encoded in base64, so you need to capture it and decode it. Then using PIL you can do anything you like with the image.
Also you can use selenium's built in wait functions instead of hardcoding sleeps. In this case the image sometimes loads even after the image element loads, so there's an additional short sleep still in the code to allow it to load.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from PIL import Image
from io import BytesIO
import base64
import re
# Max time to wait for page to load
timeout=10
driver = webdriver.Chrome()
driver.get('https://its.txdot.gov/ITS_WEB/FrontEnd/default.html?r=SAT&p=San%20Antonio&t=cctv')
# Wait for element to load before clicking
element_present = EC.presence_of_element_located((By.ID, 'LP-1604'))
WebDriverWait(driver, timeout).until(element_present)
driver.find_element_by_id('LP-1604').click() #to get to the 1604 tab
# Waat for image to load before capturing data
element_present = EC.presence_of_element_located((By.ID, 'currentSnap'))
WebDriverWait(driver, timeout).until(element_present)
# Sometimes the image still loads after the element is present, give it a few more seconds
time.sleep(4)
# Get base64 encoded image data from src
pic = driver.find_element_by_id('currentSnap').get_attribute('src')
# Strip prefix
pic = re.sub('^data:image/.+;base64,', '', pic)
# Load image file to memory
im = Image.open(BytesIO(base64.b64decode(pic)))
# Write to disk
im.save('image.jpg')
# Display image in Jupyter
im
# Open in your default image viewer
im.show()
Consider the following task:
Open a given URL
Find the first image tag in the URL
Substitute it for an image in your local drive
Save the resulting webpage as a png
I want to automatize this task with a Python script, and I am unsure of the best approach.
I have been using selenium to convert URLs into screenshots, but I am unsure of how to introduce the part about modifying the first image tag to load a local file.
You can use execute_script to replace the image should look something like:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
url = 'https://www.aircanada.com/en/'
browser.get(url)
my_image = browser.find_element_by_xpath('//*[#id="pagePromoBanner-wrapper"]/div/a/img')
# or
# my_image = browser.find_element_by_xpath('any XPath')
link_to_new_image = "https://images.pexels.com/photos/67636/rose-blue-flower-rose-blooms-67636.jpeg?auto=compress&cs=tinysrgb&dpr=2&h=750&w=1260"
# if you are using python 3.6 and up:
browser.execute_script(f"arguments[0].src = '{link_to_new_image}'", my_image )
# else:
# browser.execute_script("arguments[0].src = '"+link_to_new_image+"'", my_image )
Hope this helps you!
I have been able to catch screenshots as pngs of some elements such the one with following code
from selenium import webdriver
from PIL import Image
from io import BytesIO
from os.path import expanduser
from time import sleep
# Define url and driver
url = 'https://www.formula1.com/'
driver = webdriver.Chrome('chromedriver')
# Go to url, scroll down to right point on page and find correct element
driver.get(url)
driver.execute_script('window.scrollTo(0, 4100)')
sleep(4) # Wait a little for page to load
element = driver.find_element_by_class_name('race-list')
location = element.location
size = element.size
png = driver.get_screenshot_as_png()
driver.quit()
# Store image as bytes, crop it and save to desktop
im = Image.open(BytesIO(png))
im = im.crop((200, 150, 700, 725))
path = expanduser('~/Desktop/')
im.save(path + 'F1-info.png')
This outputs to:
Which is what I want but not exactly how I want. I needed to manually input some scrolling down and as I couldn't get the element I wanted (class='race step-1 step-2 step-3') I had to manually crop the image too.
Any better solutions?
In case someone is wondering. This is how I managed it in the end. First I found and scrolled to the right part of the page like this
element = browser.find_element_by_css_selector('.race.step-1.step-2.step-3')
browser.execute_script('arguments[0].scrollIntoView()', element)
browser.execute_script('window.scrollBy(0, -80)')
and then cropped the image
im = im.crop((200, 80, 700, 560))