I have tried downloading a .pdf file using the following code but I can't open the downloaded file, it shows pdf error. I also tried doing the same with urllib2, requests none of them helped. Please help in resolving this.
import urllib
import os
pdf_link = "https://www.indeed.com/resumes/account/login?dest=%2Fr%2F23c59475ad19d393/pdf"
pdf_file = "sample.pdf"
response = urllib.urlopen(pdf_link)
file = open(pdf_file, 'wb')
file.write(response.read())
file.close()
Related
I want to scrape pdf files from this site
https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_pilote_7b.pdf
I tried this code for that but it doesn't work. Can anybody tell me why, please?
res = requests.get('https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_7b.pdf')
with open('C:\\Users\\sioud\\Desktop\\Manuels scolaires TN\\1\\test.pdf', 'wb') as f:
f.write(ress.content)
res = requests.get('https://www.sigmaths.net/manuels/ph/physique_7b.pdf',stream=True)
with open('test.pdf', 'wb') as f:
f.write(res.content)
your url is pointing to a reader https://www.sigmaths.net/Reader.php?var=manuels/ph/physique_7b.pdf, remove the 'reader.php?var= for the actual pdf
You can also use urlretrieve.
Check out my solution code.
from urllib.request import urlretrieve
pdfurl = u"https://www.sigmaths.net/manuels/ph/physique_7b.pdf";
urlretrieve(pdfurl, "test.pdf")
And you will find the required pdf download under the name test.pdf
I've downloaded some files using requests
url = 'https://www.youtube.com/watch?v=gp5tziO5lXg&feature=youtu.be'
video_name = url.split('/')[-1]
print("Downloading file:%s" % video_name)
# download the url contents in binary format
r = requests.get(url)
# open method to open a file on your system and write the contents
with open('saved.mp4', 'wb') as f:
f.write(r.content)
and using urllib.requests
url = 'https://www.youtube.com/watch?v=gp5tziO5lXg&feature=youtu.be'
video_name = url.split('/')[-1]
print("Downloading file:%s" % video_name)
# Copy a network object to a local file
urllib.request.urlretrieve(url, "saved2.mp4")
When I then try to open the .mp4 file I get the following error
Cannot play
This file cannot be played. This can happen because the file type is
not supported, the file extension is incorrect or the file is
corrupted.
0xc00d36c4
If I test it with pytube it works fine.
What's wrong with the other methods?
To answer your question, with the other methods it is not downloading the video but the page. What you may be obtaining is an html file with an mp4 file extension.
Therefore, it gives that error when trying to open the file.
If pytube works for what you need, I would suggest using that one.
If you want to download videos from other platforms, you might consider youtube-dl.
Hello you can import IPython.display for audio diplay
import IPython.display as ipd
ipd.Audio(video_name)
regards
I hope I can have solved your problem
I am writing small python code to download a file from follow link and retrieve original filename
and its extension.But I have come across one such follow link for which python downloads the file but it is without any extension whereas file has .txt extension when downloads using browser.
Below is the code I am trying :
from urllib.request import urlopen
from urllib.parse import unquote
import wget
filePath = 'D:\\folder_path'
followLink = 'http://example.com/Reports/Download/c4feb46c-8758-4266-bec6-12358'
response = urlopen(followLink)
if response.code == 200:
print('Follow Link(response url) :' + response.url)
print('\n')
unquote_url = unquote(response.url)
file_name = wget.detect_filename(response.url).replace('|', '_')
print('file_name - '+file_name)
wget.download(response.url,filePa
th)
file_name variable in above code is just giving 'c4feb46c-8758-4266-bec6-12358' as filename.
Where I want to download it as c4feb46c-8758-4266-bec6-12358.txt.
I have also tried to read file name from header i.e. response.info(). But not getting proper file name.
Anyone can please help me with this.I am stucked in my work.Thanks in advance.
Wget gets the filename from the URL itself. For example, if your URL was https://someurl.com/filename.pdf, it is saved as filename.pdf. If it was https://someurl.com/filename, it is saved as filename. Since wget.download returns the filename of the downloaded file, you can rename it to any extension you want with os.rename(filename, filename+'.<extension>').
I work on a project and I want to download a csv file from a url. I did some research on the site but none of the solutions presented worked for me.
The url offers you directly to download or open the file of the blow I do not know how to say a python to save the file (it would be nice if I could also rename it)
But when I open the url with this code nothing happens.
import urllib
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
testfile = urllib.request.urlopen(url)
Any ideas?
Try this. Change "folder" to a folder on your machine
import os
import requests
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
response = requests.get(url)
with open(os.path.join("folder", "file"), 'wb') as f:
f.write(response.content)
You can adapt an example from the docs
import urllib.request
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
with urllib.request.urlopen(url) as testfile, open('dataset.csv', 'w') as f:
f.write(testfile.read().decode())
I'm trying to dynamically download pdf's from a web site. I am sure I'm listing them correctly but I am not sure I'm doing the actual file I/O correctly. I get the following error
File "download.py", line 22, in <module>
with open("'"+url+"'", "wb") as pdf:
IOError: [Errno 2] No such file or directory: "'http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-179.pdf'"
Here is my code:
import requests
import re
from bs4 import BeautifulSoup
origin = requests.get("http://freehaven.net/anonbib")
soup=BeautifulSoup(origin.text)
results = soup.find_all(href=re.compile("(http).*(pdf)"))
for link in results:
url = (link.get('href'))
r = requests.get(url)
with open("'"+url+"'", "wb") as pdf:
try:
pdf.write(r.content)
finally:
pdf.close
If url is set to 'http://www.lcs.mit.edu/publications/pubs/pdf/MIT-LCS-TR-179.pdf', your code fails because it is trying to open a file with that name on your filesystem.
Instead, try something like this:
fileForUrl = '/tmp/' + url.split('/')[-1]
with open(fileForUrl, 'wb') as pdf:
# Rest of the code as before