I'm trying to use the pytube library to download a bunch of links I have on a .csv file.
EDIT:
WORKING CODE:
import sys
reload(sys)
sys.setdefaultencoding('Cp1252')
import os.path
from pytube import YouTube
from pprint import pprint
import csv
with open('onedialectic.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
try:
yt = YouTube(row[1])
path = os.path.join('/videos/',row[0])
path2 = os.path.join(path + '.mp4')
print(path2)
if not os.path.exists(path2) :
print(row[0] + '\n')
pprint(yt.get_videos())
yt.set_filename(row[0])
video = yt.get('mp4', '360p')
video.download('/videos')
except Exception as e:
print("Passing on exception %s", e)
continue
To install it you need to use
pip install pytube
and then in your code run
from pytube import YouTube
I haven't seen any code examples of using this with csv though, are you sure it's supported?
You can download via command line directly using e.g.
$ pytube -e mp4 -r 720p -f Dancing Scene from Pulp Fiction http://www.youtube.com/watch?v=Ik-RsDGPI5Y
-e, -f and -r are optional, (extension, filename and resolution)
However for you I would suggest maybe the best thing is to put them all in a playlist and then use Jordan Mear's excellent Python Youtube Playlist Downloader
On a footnote, usually all [external] libraries need to be imported. You can read more about importing here, in the python online tutorials
You could maybe do something like this:
import csv
from pytube import YouTube
vidcsvreader = csv.reader(open("videos.csv"), delimiter=",")
header1 = vidcsvreader.next() #header
for id, url in vidcsvreader:
yt = url #assign url to var
#set resolution and filetype
video = yt.get('mp4', '720p')
# set a destination directory for download
video.download('/tmp/')
break
Related
import PyPDF2
from PyDF2 import PdfFileReader, PdfFileWriter
file_path="sample.pdf"
pdf = PdfFileReader(file_path)
with open("sample.pdf", "w") as f:'
for page_num in range(pdf.numPages):
pageObj = pdf.getPage(page_num)
try:
txt = pageObj.extractText()
txt = DocumentInformation.author
except:
pass
else:
f.write(txt)
f.close()
Error Received:
ModuleNotFoundError: No module named 'PyPDF2'
Writing my first ever script where I want to scan in a PDF then extract the text and write it to a txt file. I was trying to use pyPDF2 but I'm not sure how to use it in a script like this.
EDIT: I had success importing the os & sys like so.
import os
import sys
There are multiple issues:
from PyDF2 import ...: A typo. You meant PyPDF2 instead of PyDF2
PdfFileWriter was imported, but never used (side-note: It's PdfReader and PdfWriter in the latest version of PyPDF2)
with open("sample.pdf", "w") as f:': A syntax error
Lacking indentation of the next lines
Side-note: Did you know that you can simply write for page in pdf.pages?
DocumentInformation.author is wrong. I guess you meant pdf.metadata.author
You overwrite the txt variable - I don't understand why you don't use it before you re-assign it.
Maybe this is what you want:
from PyPDF2 import PdfReader
def get_text(pdf_file_path: str) -> str:
text = ""
reader = PdfReader(pdf_file_path)
for page in reader.pages:
text += page.extract_text()
return text
text = get_text("example.pdf")
with open("example.txt", "w") as f:
f.write(text)
Installation issues
In case you have installation issues, maybe the docs on installing PyPDF2 can help you?
If you execute your script in the console as python your_script_name.py you might want to check the output of
python -c "import PyPDF2; print(PyPDF2.__version__)"
That should show your PyPDF2 version. If it doesn't, it the Python environment you're using doesn't have PyPDF2 installed. Please note that your system might have arbitrary many Python environments.
I've downloaded some files using requests
url = 'https://www.youtube.com/watch?v=gp5tziO5lXg&feature=youtu.be'
video_name = url.split('/')[-1]
print("Downloading file:%s" % video_name)
# download the url contents in binary format
r = requests.get(url)
# open method to open a file on your system and write the contents
with open('saved.mp4', 'wb') as f:
f.write(r.content)
and using urllib.requests
url = 'https://www.youtube.com/watch?v=gp5tziO5lXg&feature=youtu.be'
video_name = url.split('/')[-1]
print("Downloading file:%s" % video_name)
# Copy a network object to a local file
urllib.request.urlretrieve(url, "saved2.mp4")
When I then try to open the .mp4 file I get the following error
Cannot play
This file cannot be played. This can happen because the file type is
not supported, the file extension is incorrect or the file is
corrupted.
0xc00d36c4
If I test it with pytube it works fine.
What's wrong with the other methods?
To answer your question, with the other methods it is not downloading the video but the page. What you may be obtaining is an html file with an mp4 file extension.
Therefore, it gives that error when trying to open the file.
If pytube works for what you need, I would suggest using that one.
If you want to download videos from other platforms, you might consider youtube-dl.
Hello you can import IPython.display for audio diplay
import IPython.display as ipd
ipd.Audio(video_name)
regards
I hope I can have solved your problem
I am trying to make a code which can download the entire playlist from YouTube. It worked for some playlist but not working for few playlists. One of the playlist I have shown in my code below. Also feel free to add more features on this code.
If there is already a code to download the playlist so please share the link with me
`
from bs4 import BeautifulSoup
from pytube import YouTube
import urllib.request
import time
import os
## list of link parsed by bs4
s = []
## to name and save the playlist folder and download path respectively
directory = 'Hacker101'
savePath = "G:/Download/video/"
path = os.path.join(savePath, directory)
## link parser
past_link_here = "https://www.youtube.com/playlist?list=PLxhvVyxYRviZd1oEA9nmnilY3PhVrt4nj"
html_page = urllib.request.urlopen(past_link_here)
x = html_page.read()
soup = BeautifulSoup(x, 'html.parser')
for link in soup.findAll('a'):
k = link.get('href')
if 'watch' in k:
s.append(k)
else:
pass
## to create playlist folder
def create_project_dir(x):
if not os.path.exists(x):
print('Creating directory ' + x)
os.makedirs(x)
create_project_dir(path)
## downloading videos by using links from list s = []
for x in set(s):
link="https://www.youtube.com" + x
yt = YouTube(link)
k = yt.title
file_path = path + '\\' + k + '.mp4'
try:
if os.path.exists(file_path):
print(k + ' is \n' + "already downloaded")
else:
j = yt.streams.filter(progressive=True).all()
l = yt.streams.first()
print(k + ' is downloading....')
l.download(path)
time.sleep(1)
print('downloading compleat')
## except Exception:
## print('error')
except KeyError as e:
print('KeyError') % str(e)
`
Your issue appears to be related to a bug that was fixed today by giacaglia . Based on the Github Commit the solution to the bug can be fixed by modifying your mixins.py as detailed in the link. Your playlists should work without running into the KeyError: 'url_encoded_fmt_stream_map' issue you had above.
I have asked this question before the release of new version of pytube this problem is solved in pytube3 you just need to install it by using pip cmd i.e pip install pytube3
kindly check out this doc pytube3 ive used the methods in it and it has actually worked..
the first step is basically to upgrade the pytube library :
pip3 install pytube3 --upgrade
the load your code next...
for the normal youtube video downlaods:
from pytube import YouTube
url = input("Paste the URL here -->>")
yt = YouTube(url)
YouTube(url).streams[0].download()
for the whole playlist
from pytube import Playlist
url = input("Paste the URL here -->>")
playlist = Playlist(url)
for my_videos in playlist:
my_videos.streams.get_highest_resolution().download()
have Fun make something cool!!!
How would I go about navigating to a URL that's stored in a list and downloading the file? I'd preferably like to be able to store the MP4 file as it's clip title. I've used requests to retrieve the urls.
Thanks
list_clips = ['https://clips.twitch.tv/SpeedySneakyHeronKappaClaus', 'https://clips.twitch.tv/SplendidGiantPuffinThunBeast', 'https://clips.twitch.tv/ArtsyAuspiciousHamburgerThisIsSparta', 'https://clips.twitch.tv/BoringNiceHerbsSaltBae']
You can use python's requests module to download the file. Please refer the code below
import requests, os
for clips in list_clips:
clip_title = os.path.basename(clips)
r = requests.get(clips)
with open(clip_title+'.mp4', 'wb') as f:
f.write(r.content)
I have tried downloading a .pdf file using the following code but I can't open the downloaded file, it shows pdf error. I also tried doing the same with urllib2, requests none of them helped. Please help in resolving this.
import urllib
import os
pdf_link = "https://www.indeed.com/resumes/account/login?dest=%2Fr%2F23c59475ad19d393/pdf"
pdf_file = "sample.pdf"
response = urllib.urlopen(pdf_link)
file = open(pdf_file, 'wb')
file.write(response.read())
file.close()