How to find Youtube video duration with python selenium? - python

I am trying to get the video duration using Selenium with python 3. The code is working properly with small videos (I've tried up to 30 minutes). But with longer videos, nothing is shown. I can't find any solution.
My Code:
from selenium import webdriver
import time, os
firefox = webdriver.Chrome()
#youtube_url = "https://www.youtube.com/watch?v=oEx-SBpZP_M" # Short Video
youtube_url = "https://www.youtube.com/watch?v=EMWM2uN8WCQ" # Long Video
firefox.get(youtube_url)
number_of_views = firefox.find_element_by_css_selector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer')
print(number_of_views.text)
duration = firefox.find_element_by_css_selector('#movie_player > div.ytp-chrome-bottom > div.ytp-chrome-controls > div.ytp-left-controls > div > span.ytp-time-duration')
print(duration)
print(duration.text)

The issue with all these solutions is if the element is visible or not.
I.E.
cur_time = driver.find_element_by_class_name("ytp-time-current").text
print(cur_time)
It will only print the cur_time if my mouse is hovering over the video and the element is showing. Otherwise if the video playback time isn't showing selenium will not be able to grab the element. Here's a GIF showing this to be the case.
https://i.imgur.com/bmWdC7A.gif
You need to execute javascript on the page to get the current time and duration. Youtube Player API has functions for both.
video_dur = self.driver.execute_script(
"return document.getElementById('movie_player').getCurrentTime()")
video_len = self.driver.execute_script(
"return document.getElementById('movie_player').getDuration()")
video_len = int(video_len) / 60
print(f"{video_dur}/{video_len})
https://i.imgur.com/TEdFZ0z.gif
This will continue to work even if I am not on the page.
https://i.imgur.com/qXcbPDG.gif

just use this:
duration = firefox.find_element_by_class_name('ytp-cued-thumbnail-overlay-duration')
print(duration)
print(duration.text)

duration = driver.find_elements_by_xpath("//span[#class='ytp-time-duration']")[0]
print(duration.text)

Related

Making an alarm in python with the data I read from the site with Selenium

The code I wrote is very basic and it simply works. It takes a value on the Selenium-related site, writes it to a txt, then reads the necessary part of the value and should sound an alarm according to this value. The code terminates before reaching the alarm part or it does not see the alarm part. The problem here may be related to the value I got from the txt, but I could not solve the problem despite my attempts. How can I solve this?
note:There is no problem with the vlc library, it works when used separately and in this example the value in the txt is 12 feb 2022 and it only reads the first character
from selenium import webdriver
import time
import vlc
driver = webdriver.Chrome()
driver.get("https://demoqa.com/automation-practice-form")
driver.maximize_window()
print("Site Title:",driver.title)
#####################################################
nameElement =driver.find_element_by_id("dateOfBirthInput")
nameElement.click()
time.sleep(5)
taleptAttribute = nameElement.get_attribute('value')
print(taleptAttribute)
#print("Talep Sayısı: " + nameElement.get_attribute('value'))
################################################################
talep_satırı = open("talep_satiri.txt", "w")
talep_satırı.write(taleptAttribute)
talep_satırı = open("talep_satiri.txt","r")
talepsayisi=talep_satırı.read(1)
print(talepsayisi)
alarm = vlc.MediaPlayer("path")
if (talepsayisi == 1 ):
alarm.play()
time.sleep(10)
alarm.stop()
else:
alarm.play()

Is there a way to get YT url or video ID from playlist with pafy?

I am trying to make a program that takes YT playlist and play all it's content.
I've installed all components needed for pafy to run with python3. Everything I've tried works as it's expected, except the bellow part of the code.
plurl = "https://www.youtube.com/playlist?list=PL634F2B56B8C346A2"
playlist = pafy.get_playlist(plurl)
url = playlist['items'][21]['pafy'].getbest().url
video = pafy.new(url)
When pafy.new() is called, gives an error because of too long url:
Need 11 character video id or the URL of the video. Got https://r2---sn-bavc5aoxu-nv4l.googlevideo.com/videoplayback?ms=au%2Crdu&sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cexpire&mv=m&mt=1554899146&requiressl=yes&ip=37.157.173.53&pl=19&id=o-AGQZkyoEvykUGae7O4v_Ycmuj4jJBYdgafcfLBQ5S4Dd&mn=sn-bavc5aoxu-nv4l%2Csn-nv47lnsr&mm=31%2C29&source=youtube&lmt=1387649403290510&ei=POGtXJzdIo_ugAeEiL_wAQ&c=WEB&key=yt6&mime=video%2Fmp4&gir=yes&itag=18&clen=5461830&fvip=2&expire=1554920860&ratebypass=yes&dur=206.100&initcwndbps=1573750&ipbits=0&signature=AAA8B36CD3B402F587F874956595ACB928806C4F.D36C0A79E7F1727DB872425E696DBFC550AA7DF6
Is there a way I can get normal url or video ID ?
The videoid is also available in the url object. You can use
dir(<object>)
to see what properties are available.
id = playlist['items'][2]['pafy'].videoid
video = pafy.new('https://www.youtube.com/watch?v='+id)
use try and catch before using pafy.new , as some of the videos might not be available in the region.

Selenium chromedriver returns empty data for canvas

I have a selenium set up as:
options = webdriver.ChromeOptions()
options.add_argument('headless')
options.add_argument('window-size=1200x600')
driver = webdriver.Chrome(options=options)
driver.implicitly_wait(10)
driver.get('file:///path_to_file')
When I execute the script:
data = driver.execute_script('return document.getElementsByClassName("runner-canvas")[1].getContext("2d").getImageData(0,0,600,150);')['data']`
Data is all zeros: [0,0,0,0, 0,0,0,0 ..., 0,0,0,0].
But when I take a screenshoot, at the same time, with:
driver.save_screenshot(os.path.join(os.path.dirname(os.path.realpath(__file__)), '.', 'screenshot.png'))
I can see that the canvas is populated.
Canvas loads a game that doesn't start till the SPACE is pressed.
Function that is responsible for collecting the canvas data looks similar to this:
# Somewhere before the __get_data is called
self.document.send_keys(Keys.SPACE) # self.document is set to html document
def __get_data(self):
while self.driver.execute_script("return Runner.instance_.started") == False:
print('Waiting to start')
# data is always empty at this stage
data = self.driver.execute_script('return document.getElementsByClassName("runner-canvas")[1].getContext("2d").getImageData(0,0,600,150);')['data']
rgba = np.array(data).reshape((90000, 4))
b = a[:, 2]
return a.reshape((150, 600))
When I run it I can see a lot of 'Waiting to start' in a console, hence I don't think it is a timing issue as by the time while breaks everything should be drawn as the game already started.
Im on Mac running ChromeDriver 2.46.628411
Thought so the image that you are trying to capture or get the data from it it's not loaded properly hence you having the empty data, do one thing find out how much time it's taking to load the image and add that much wait and after that get the data and screenshot
Let us know that if works or not..

Python Selenium: Unable to Find Element After First Refresh

I've seen a few instances of this question, but I was not sure how to apply the changes to my particular situation. I have code that monitors a webpage for changes and refreshes every 30 seconds, as follows:
import sys
import ctypes
from time import sleep
from Checker import Checker
USERNAME = sys.argv[1]
PASSWORD = sys.argv[2]
def main():
crawler = Checker()
crawler.login(USERNAME, PASSWORD)
crawler.click_data()
crawler.view_page()
while crawler.check_page():
crawler.wait_for_table()
crawler.refresh()
ctypes.windll.user32.MessageBoxW(0, "A change has been made!", "Attention", 1)
if __name__ == "__main__":
main()
The problem is that Selenium will always show an error stating it is unable to locate the element after the first refresh has been made. The element in question, I suspect, is a table from which I retrieve data using the following function:
def get_data_cells(self):
contents = []
table_id = "table.datadisplaytable:nth-child(4)"
table = self.driver.find_element(By.CSS_SELECTOR, table_id)
cells = table.find_elements_by_tag_name('td')
for cell in cells:
contents.append(cell.text)
return contents
I can't tell if the issue is in the above function or in the main(). What's an easy way to get Selenium to refresh the page without returning such an error?
Update:
I've added a wait function and adjusted the main() function accordinly:
def wait_for_table(self):
table_selector = "table.datadisplaytable:nth-child(4)"
delay = 60
try:
wait = ui.WebDriverWait(self.driver, delay)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, table_selector)))
except TimeoutError:
print("Operation timeout! The requested element never loaded.")
Since the same error is still occurring, either my timing function is not working properly or it is not a timing issue.
I've run into the same issue while doing web scraping before and found that re-sending the GET request (instead of refreshing) seemed to eliminate it.
It's not very elegant, but it worked for me.
I appear to have fixed my own problem.
My refresh() function was written as follows:
def refresh():
self.driver.refresh()
All I did was switch frames right after the refresh() call. That is:
def refresh():
self.driver.refresh()
self.driver.switch_to.frame("content")
This took care of it. I can see that the page is now refreshing without issues.

how extract real time form time.gov in python?

I want to show real time in my program from time.gov. I saw ntplib module and this example:
import ntplib
from time import ctime
c = ntplib.NTPClient()
response = c.request('europe.pool.ntp.org', version=3)
ctime(response.tx_time)
but I can't use time.gov instead of 'europe.pool.ntp.org' because time.gov is not a ntp server. Also I saw some java script code in page source. is there a way to extract real time from time.gov in python with or without ntplib?
Assuming the goal is just to get official US government time, you could stick with using NTP, and refer to time.nist.gov, instead of time.gov. They're both run by NIST.
Use urllib to retrieve
http://time.gov/actualtime.cgi
that returns something like this:
<timestamp time="1433396367767836" delay="0"/>
Looks like microseconds
>>> time.ctime(1433396367.767836)
'Thu Jun 4 15:39:27 2015'
Somehow the ntp time server was blocked by firewall in our institute's system. So an alternative, in this case, would be to scrape time from the website.
You will need chrome driver to run this which can be downloaded from here.
Here is the working code:
from selenium import webdriver
import time
from datetime import datetime
# Chromedriver can be downloaded from https://chromedriver.chromium.org/
driver_path = r'pathtochromedriver\chromedriver.exe'
driver = webdriver.Chrome(driver_path)
# Reference website for datetime
url = 'https://www.time.gov/'
driver.get(url)
# Wait to respond
time.sleep(4)
# Correct time
timedata = driver.find_element_by_xpath('//*[#id="timeUTC"]')
# Correct date
datedata = driver.find_element_by_xpath('//*[#id="myDate"]')
result_time = timedata.text
result_date = datedata.text
driver.close() # close the webpage
result_datetime = result_date[7:]+result_time
datetime_now = datetime.strptime(result_datetime, '%m/%d/%Y%H:%M:%S')
print(datetime_now)

Categories

Resources