I am trying to make a code which can download the entire playlist from YouTube. It worked for some playlist but not working for few playlists. One of the playlist I have shown in my code below. Also feel free to add more features on this code.
If there is already a code to download the playlist so please share the link with me
`
from bs4 import BeautifulSoup
from pytube import YouTube
import urllib.request
import time
import os
## list of link parsed by bs4
s = []
## to name and save the playlist folder and download path respectively
directory = 'Hacker101'
savePath = "G:/Download/video/"
path = os.path.join(savePath, directory)
## link parser
past_link_here = "https://www.youtube.com/playlist?list=PLxhvVyxYRviZd1oEA9nmnilY3PhVrt4nj"
html_page = urllib.request.urlopen(past_link_here)
x = html_page.read()
soup = BeautifulSoup(x, 'html.parser')
for link in soup.findAll('a'):
k = link.get('href')
if 'watch' in k:
s.append(k)
else:
pass
## to create playlist folder
def create_project_dir(x):
if not os.path.exists(x):
print('Creating directory ' + x)
os.makedirs(x)
create_project_dir(path)
## downloading videos by using links from list s = []
for x in set(s):
link="https://www.youtube.com" + x
yt = YouTube(link)
k = yt.title
file_path = path + '\\' + k + '.mp4'
try:
if os.path.exists(file_path):
print(k + ' is \n' + "already downloaded")
else:
j = yt.streams.filter(progressive=True).all()
l = yt.streams.first()
print(k + ' is downloading....')
l.download(path)
time.sleep(1)
print('downloading compleat')
## except Exception:
## print('error')
except KeyError as e:
print('KeyError') % str(e)
`
Your issue appears to be related to a bug that was fixed today by giacaglia . Based on the Github Commit the solution to the bug can be fixed by modifying your mixins.py as detailed in the link. Your playlists should work without running into the KeyError: 'url_encoded_fmt_stream_map' issue you had above.
I have asked this question before the release of new version of pytube this problem is solved in pytube3 you just need to install it by using pip cmd i.e pip install pytube3
kindly check out this doc pytube3 ive used the methods in it and it has actually worked..
the first step is basically to upgrade the pytube library :
pip3 install pytube3 --upgrade
the load your code next...
for the normal youtube video downlaods:
from pytube import YouTube
url = input("Paste the URL here -->>")
yt = YouTube(url)
YouTube(url).streams[0].download()
for the whole playlist
from pytube import Playlist
url = input("Paste the URL here -->>")
playlist = Playlist(url)
for my_videos in playlist:
my_videos.streams.get_highest_resolution().download()
have Fun make something cool!!!
Related
I am new to web scraping and I am using python to build a Google Images Web Scraper. This is a snippet of my code.
import requests
import os
import bs4 as bs
query = 'kittens'
url = 'https://www.google.co.in/search?q='+query+'&source=lnms&tbm=isch'
res = requests.get(url)
res.raise_for_status()
os.makedirs('new1')
imgElem = soup.select('div img')
print(len(imgElem))
for i in range(1,len(imgElem)):
if imgElem == []: #if not found print error
print('could not find any image')
else:
try:
imgUrl = imgElem[i].get('src')
print(imgElem[i].get('src'))
print('Downloading image %s.....' %(imgUrl))
res = requests.get(imgUrl)
res.raise_for_status()
#except requests.exceptions.MissingSchema:
except Exception as e:
#skip if not a normal image file
print(e)
num = str(i) + ".jpg"
imageFile = open(os.path.join('.\\new1', num),'wb')
#write downloaded image to hard disk
for chunk in res.iter_content(10000):
imageFile.write(chunk)
imageFile.close()
len(imgElem) returns 21 for me.
I can currently only download 20 images.
Why do I get only 20 images and what would be a good way to overcome this?
You are having this issue because not all the src attribute values in imgEleme are valid urls.
Try this:
for el in imgElem:
print(el['src'])
You will see that first output line is
/images/branding/searchlogo/1x/googlelogo_desk_heirloom_color_150x55dp.gif
while all the others are valid urls. So the statement:
res = requests.get(imgUrl)
fails in that cases; hence only 20 images downloaded.
I'm using the following Python code to download images from a certain website. It's part of a code that I'm using to make a web scraper.
for url in links:
# Invoke wget download method to download specified url image.
local_image_filename = wget.download(url)
# Print out local image file name.
local_image_filename
continue
It's working well, but I want to know if it's possible to add a string as a prefix to each file...
My ideia is get the page title via Xpath and add as a prefix for each file.
I don't know where to add a string in this code. Can someone help me?
For example, I'm downloading these files:
logo.jpg, plans.jpg, circle.jpg
And I need to add a prefix, like these:
Beautiful_Plan_logo.jpg, Beautiful_Plan_plans.jpg, Beautiful_Plan_circle.jpg
Following I'll put the entire code:
import requests
import bs4 as bs
import urllib.request
import wget
##################################################
# getting url images #
##################################################
url = "https://tyreehouseplans.com/shop/house-plans/blackberry-blossom/"
opener = urllib.request.build_opener()
opener.add_headers = [{'User-Agent' : 'Mozilla'}]
urllib.request.install_opener(opener)
raw = requests.get(url).text
soup = bs.BeautifulSoup(raw, 'html.parser')
imgs = soup.find_all('img')
links = []
for img in imgs:
link = img.get('src')
links.append(link)
print(links)
################################################
# downloading images #
################################################
for url in links:
# Invoke wget download method to download specified url image.
local_image_filename = wget.download(url)
# Print out local image file name.
local_image_filename
continue
Thank you for any help!
python module wget has an option out, which determines the name of the output file. For example, the following script downloads 3 images, adding a prefix Beautiful_Plan_.
import wget
base_url = 'https://homepages.cae.wisc.edu/~ece533/images/'
image_names = ['airplane.png', 'arctichare.png', 'baboon.png']
prefix = 'Beautiful_Plan_'
for image_name in image_names:
wget.download(base_url + image_name, out = prefix + image_name)
you can use shutil for this
import shutil
prefix = "prefix_"
#your piece of code
for url in links:
# Invoke wget download method to download specified url image.
local_image_filename = wget.download(url)
# Print out local image file name.
local_image_filename
shutil.copy(local_image_filename, prefix+local_image_filename)
use os.rename as per this documentation
I wrote code for making a seperate file with the extra information up front with a seperator.
import requests
import bs4 as bs
import urllib.request
import wget
##################################################
# getting url images #
##################################################
url = "https://tyreehouseplans.com/shop/house-plans/blackberry-blossom/"
opener = urllib.request.build_opener()
opener.add_headers = [{'User-Agent': 'Mozilla'}]
urllib.request.install_opener(opener)
raw = requests.get(url).text
soup = bs.BeautifulSoup(raw, 'html.parser')
imgs = soup.find_all('img')
links = []
for img in imgs:
link = img.get('src')
links.append(link)
# print(links)
################################################
# downloading images #
################################################
for url in links:
# Invoke wget download method to download specified url image.
try:
local_image_filename = wget.download(url)
except ValueError:
break
# Print out local image file name.
print(local_image_filename)
with open(local_image_filename, 'r') as myFile:
try:
data = myFile.read()
except UnicodeDecodeError:
data = "UNICODE DECODE ERROR"
except ValueError:
data = "VALUE ERROR"
print(data)
print(type(data))
myFile.close()
newSaveString = str(local_image_filename) + "SeperatorOfSomeKind" + str(data)
newFileName = "NEW_" + local_image_filename
with open(newFileName, 'w') as myFile:
myFile.write(newSaveString)
myFile.close()
continue
I am looking to download a YouTube playlist using the PyTube library. Currently, I am able to download a single video at a time. I cannot download more than one video at once.
Currently, my implimentation is
import pytube
link = input('Please enter a url link\n')
yt = pytube.YouTube(link)
stream = yt.streams.first()
finished = stream.download()
print('Download is complete')
This results in the following output
>> Download is complete
And the YouTube file is downloaded. When I try this with a playlist link (An example) only the first video is downloaded. There is no error outputted.
I would like to be able to download an entire playlist without re-prompting the user.
You can import Playlist to achieve this. There is no reference to Playlist in the redoc, though there is a section in the GitHub repo found here. The source of the script is in the repo here.
from pytube import Playlist
playlist = Playlist('https://www.youtube.com/watch?v=58PpYacL-VQ&list=UUd6MoB9NC6uYN2grvUNT-Zg')
print('Number of videos in playlist: %s' % len(playlist.video_urls))
playlist.download_all()
NOTE: I've found the supporting method Playlist.video_urls does not work. The videos are still downloaded however, as evidenced here
The solutions above no longer work. Here's a code which downloads the sound stream of the videos referenced in a Youtube playlist. Pytube3 is used, not pytube. Note that the playlist must be public for the download to succeed. Also, if you want to download the full video instead of the sound track only, you have to modify the value of the Youtube tag constant. The empty Playlist.videos list fix was taken from this Stackoverflow post:PyTube3 Playlist returns empty list
import re
from pytube import Playlist
YOUTUBE_STREAM_AUDIO = '140' # modify the value to download a different stream
DOWNLOAD_DIR = 'D:\\Users\\Jean-Pierre\\Downloads'
playlist = Playlist('https://www.youtube.com/playlist?list=PLzwWSJNcZTMSW-v1x6MhHFKkwrGaEgQ-L')
# this fixes the empty playlist.videos list
playlist._video_regex = re.compile(r"\"url\":\"(/watch\?v=[\w-]*)")
print(len(playlist.video_urls))
for url in playlist.video_urls:
print(url)
# physically downloading the audio track
for video in playlist.videos:
audioStream = video.streams.get_by_itag(YOUTUBE_STREAM_AUDIO)
audioStream.download(output_path=DOWNLOAD_DIR)
if one needs to download the highest quality video of each item in a playlist
playlist = Playlist('https://www.youtube.com/watch?v=VZclsCzhzt4&list=PLk-w4cD8sJ6N6ffzp5A4PQaD76RvdpHLP')
for video in playlist.videos:
print('downloading : {} with url : {}'.format(video.title, video.watch_url))
video.streams.\
filter(type='video', progressive=True, file_extension='mp4').\
order_by('resolution').\
desc().\
first().\
download(cur_dir)
this code allows you to download a playlist to your assigned folder
import re
from pytube import Playlist
playlist = Playlist('https://www.youtube.com/playlist?list=Pd5k1hvD2apA0DwI3XMiSDqp')
DOWNLOAD_DIR = 'D:\Video'
playlist._video_regex = re.compile(r"\"url\":\"(/watch\?v=[\w-]*)")
print(len(playlist.video_urls))
for url in playlist.video_urls:
print(url)
for video in playlist.videos:
print('downloading : {} with url : {}'.format(video.title, video.watch_url))
video.streams.\
filter(type='video', progressive=True, file_extension='mp4').\
order_by('resolution').\
desc().\
first().\
download(DOWNLOAD_DIR)
from pytube import Playlist
playlist = Playlist('https://www.youtube.com/playlist?list=PL6gx4Cwl9DGCkg2uj3PxUWhMDuTw3VKjM')
print('Number of videos in playlist: %s' % len(playlist.video_urls))
for video_url in playlist.video_urls:
print(video_url)
playlist.download_all()
https://www.youtube.com/watch?v=HjuHHI60s44
https://www.youtube.com/watch?v=Z40N7b9NHTE
https://www.youtube.com/watch?v=FvziRqkLrEU
https://www.youtube.com/watch?v=XN2-87haa8k
https://www.youtube.com/watch?v=VgI4UKyL0Lc
https://www.youtube.com/watch?v=BvPIgm2SMG8
https://www.youtube.com/watch?v=DpdmUmglPBA
https://www.youtube.com/watch?v=BmVmJi5dR9c
https://www.youtube.com/watch?v=pYNuKXjcriM
https://www.youtube.com/watch?v=EWONqLqSxYc
https://www.youtube.com/watch?v=EKmLXiA4zaQ
https://www.youtube.com/watch?v=-DHCm9AlXvo
https://www.youtube.com/watch?v=7cRaGaIZQlo
https://www.youtube.com/watch?v=ZkcEB96iMFk
https://www.youtube.com/watch?v=5Fcf-8LPvws
https://www.youtube.com/watch?v=xWLgdSgsBFo
https://www.youtube.com/watch?v=QcKYFEgfV-I
https://www.youtube.com/watch?v=BtSQIxDPnLc
https://www.youtube.com/watch?v=O5kh_-6e4kk
https://www.youtube.com/watch?v=RuWVDz-48-o
https://www.youtube.com/watch?v=-yjc5Y7Wbmw
https://www.youtube.com/watch?v=C5T59WsrNCU
https://www.youtube.com/watch?v=MWldNGdX9zE
I'm using pytube3 9.6.4, not pytube.
Now Playlist.video_urls works well.
And Playlist.populate_video_urls() function was deprecated.
from pytube import YouTube
from pytube import Playlist
SAVE_PATH = "E:/YouTube" #to_do
#link of the video to be downloaded
links= "https://youtube.com/playlist?list=PLblh5JKOoLUL3IJ4- yor0HzkqDQ3JmJkc"
playlist = Playlist(links)
PlayListLinks = playlist.video_urls
N = len(PlayListLinks)
#print('Number of videos in playlist: %s' % len(PlayListLinks))
print(f"This link found to be a Playlist Link with number of videos equal to {N} ")
print(f"\n Lets Download all {N} videos")
for i,link in enumerate(PlayListLinks):
yt = YouTube(link)
d_video = yt.streams.filter(progressive=True, file_extension='mp4').order_by('resolution').desc().first()
d_video.download(SAVE_PATH)
print(i+1, ' Video is Downloaded.')
You can check this repository to download playlists and individual videos and keep them in different directories.
https://github.com/pushpendra050/Pytube-for-Playlist-download
this work for me in windows 11 or 10.5
Original code: Jean-Pierre Schnyder
import re
from pytube import Playlist
DOWNLOAD_DIR = input ("Download dir")
playlist = input ("Link:")
playlist._video_regex = re.compile(r"\"url\":\"(/watch\?v=[\w-]*)")
print(len(playlist.video_urls))
for url in playlist.video_urls:
print(url)
for video in playlist.videos:
audioStream = video.streams.get_highest_resolution()
audioStream.download(output_path=DOWNLOAD_DIR)
Though it seems like a solved problem, but here is one of my solutions which may help you to download the playlist specifically as video of 1080P 30 FPS or 720P 30 FPS (if 1080P not available).
from pytube import Playlist
playlist = Playlist('https://www.youtube.com/playlist?list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw')
print('Number of videos in playlist: %s' % len(playlist.video_urls))
# Loop through all videos in the playlist and download them
for video in playlist.videos:
try:
print(video.streams.filter(file_extension='mp4'))
stream = video.streams.get_by_itag(137) # 137 = 1080P30
stream.download()
except AttributeError:
stream = video.streams.get_by_itag(22) # 22, 136 = 720P30; if 22 still don't work, try 136
stream.download()
except:
print("Something went wrong.")
This works for me. Just takes the URLs in the playlist and downloads them one by one:
from pytube import Playlist
playlist = Playlist('URL')
print('Number of videos in playlist: %s' % len(playlist.video_urls))
for video_url in playlist.video_urls:
print(video_url)
urls.append(video_url)
for url in urls:
my_video = YouTube(url)
print("*****************DOWNLOAD VID*************")
print(my_video.title)
my_video = my_video.streams.get_highest_resolution()
path = "PATH"
my_video.download(path)
print("VIDEO DOWNLOAD DONNNNE")
import pytube
from pytube import Playlist
playlist = Playlist('plylist link')
num = 0
for v in playlist.videos:
print(v.watch_url)
one = pytube.YouTube(v.watch_url)
one_v = one.streams.get_highest_resolution()
name = f"{0}" + one_v.default_filename
one_v.download()
num = num + 1
Download all playlist
This works flawlessly to download complete playlist
check pytube github to install
pip install pytube
Now, go to this folder and open cipher.py
D:\ProgramData\Anaconda3\Lib\site-packages\pytube\
replace at line 273
function_patterns = [
r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&\s*.*\|\|\s*(.*)\(',
r'\([a-z]\s*=\s*([a-zA-Z0-9$]{3})(\[\d+\])?\([a-z]\)',
]
main.py
from pytube import Playlist
playlist = Playlist('https://www.youtube.com/playlist?list=PLwdnzlV3ogoXUifhvYB65lLJCZ74o_fAk')
playlist._video_regex = re.compile(r"\"url\":\"(/watch\?v=[\w-]*)")
print(len(playlist.video_urls))
for url in playlist.video_urls:
print(url)
for video in playlist.videos:
video.streams.get_highest_resolution().download()
this may not work, the code below works for every case
from pytube import Playlist
from pytube import YouTube
from pytube import Playlist
playlist = Playlist('https://www.youtube.com/watch?v=UPFKAG9rYOE&list=PLknwEmKsW8OtK_n48UOuYGxJPbSFrICxm')
print('Number of videos in playlist: %s' % len(playlist.video_urls))
for video_url in playlist.video_urls:
print(video_url)
video=YouTube(video_url)
try:
#video.streams.first().download()
video.streams.filter(res="720p").first().download()
except:
continue
I'm trying to use the pytube library to download a bunch of links I have on a .csv file.
EDIT:
WORKING CODE:
import sys
reload(sys)
sys.setdefaultencoding('Cp1252')
import os.path
from pytube import YouTube
from pprint import pprint
import csv
with open('onedialectic.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
try:
yt = YouTube(row[1])
path = os.path.join('/videos/',row[0])
path2 = os.path.join(path + '.mp4')
print(path2)
if not os.path.exists(path2) :
print(row[0] + '\n')
pprint(yt.get_videos())
yt.set_filename(row[0])
video = yt.get('mp4', '360p')
video.download('/videos')
except Exception as e:
print("Passing on exception %s", e)
continue
To install it you need to use
pip install pytube
and then in your code run
from pytube import YouTube
I haven't seen any code examples of using this with csv though, are you sure it's supported?
You can download via command line directly using e.g.
$ pytube -e mp4 -r 720p -f Dancing Scene from Pulp Fiction http://www.youtube.com/watch?v=Ik-RsDGPI5Y
-e, -f and -r are optional, (extension, filename and resolution)
However for you I would suggest maybe the best thing is to put them all in a playlist and then use Jordan Mear's excellent Python Youtube Playlist Downloader
On a footnote, usually all [external] libraries need to be imported. You can read more about importing here, in the python online tutorials
You could maybe do something like this:
import csv
from pytube import YouTube
vidcsvreader = csv.reader(open("videos.csv"), delimiter=",")
header1 = vidcsvreader.next() #header
for id, url in vidcsvreader:
yt = url #assign url to var
#set resolution and filetype
video = yt.get('mp4', '720p')
# set a destination directory for download
video.download('/tmp/')
break
I want to save all images from a site. wget is horrible, at least for http://www.leveldesigninspirationmachine.tumblr.com since in the image folder it just drops html files, and nothing as an extension.
I found a python script, the usage is like this:
[python] ImageDownloader.py URL MaxRecursionDepth DownloadLocationPath MinImageFileSize
Finally I got the script running after some BeautifulSoup problems.
However, I can't find the files anywhere. I also tried "/" as the output dir in hope the images got on the root of my HD but no luck. Can someone either help me to simplify the script so it outputs at the cd directory set in terminal. Or give me a command that should work. I have zero python experience and I don't really want to learn python for a 2 year old script that maybe doesn't even work the way I want.
Also, how can I pass an array of website? With a lot of scrapers it gives me the first few results of the page. Tumblr has the load on scroll but that has no effect so i would like to add /page1 etc.
thanks in advance
# imageDownloader.py
# Finds and downloads all images from any given URL recursively.
# FB - 201009094
import urllib2
from os.path import basename
import urlparse
#from BeautifulSoup import BeautifulSoup # for HTML parsing
import bs4
from bs4 import BeautifulSoup
global urlList
urlList = []
# recursively download images starting from the root URL
def downloadImages(url, level, minFileSize): # the root URL is level 0
# do not go to other websites
global website
netloc = urlparse.urlsplit(url).netloc.split('.')
if netloc[-2] + netloc[-1] != website:
return
global urlList
if url in urlList: # prevent using the same URL again
return
try:
urlContent = urllib2.urlopen(url).read()
urlList.append(url)
print url
except:
return
soup = BeautifulSoup(''.join(urlContent))
# find and download all images
imgTags = soup.findAll('img')
for imgTag in imgTags:
imgUrl = imgTag['src']
# download only the proper image files
if imgUrl.lower().endswith('.jpeg') or \
imgUrl.lower().endswith('.jpg') or \
imgUrl.lower().endswith('.gif') or \
imgUrl.lower().endswith('.png') or \
imgUrl.lower().endswith('.bmp'):
try:
imgData = urllib2.urlopen(imgUrl).read()
if len(imgData) >= minFileSize:
print " " + imgUrl
fileName = basename(urlsplit(imgUrl)[2])
output = open(fileName,'wb')
output.write(imgData)
output.close()
except:
pass
print
print
# if there are links on the webpage then recursively repeat
if level > 0:
linkTags = soup.findAll('a')
if len(linkTags) > 0:
for linkTag in linkTags:
try:
linkUrl = linkTag['href']
downloadImages(linkUrl, level - 1, minFileSize)
except:
pass
# main
rootUrl = 'http://www.leveldesigninspirationmachine.tumblr.com'
netloc = urlparse.urlsplit(rootUrl).netloc.split('.')
global website
website = netloc[-2] + netloc[-1]
downloadImages(rootUrl, 1, 50000)
As Frxstream has commented, this program creates the files in the current directory (i.e. where you run it). After running the program, run ls -l (or dir) to find the files it has created.
If it seemingly hasn't created any files, then most probably it really hasn't created any files, most probably because there was an exception which your except: pass has hidden. To see what was going wrong, replace try: ... except: pass with just ..., and rerun the program. (If you can't understand and fix that, ask a separate StackOverflow question.)
it's hard to tell without looking at the errors (+1 to turning off your try/except block so you can see the exceptions) but I do see one typo here:
fileName = basename(urlsplit(imgUrl)[2])
you didn't do "from urlparse import urlsplit" you have "import urlparse" so you need to refer to it as urlparse.urlsplit() as you have in other places, so should be like this
fileName = basename(urlparse.urlsplit(imgUrl)[2])