I am quire sure, this is something very common. I want to download files from a https-server and want to keep the original (changed) date. So it should not show me the date the one which is after downloaded.
Here I am using this way.
filename = file.text
file = session.get(url, verify=False)
if file.endswith('arg'):
file = open('C:/RD/M/' + filename, 'wb+')
file.write(file.content)
file.close()
else:
'do something'
Is there any way to add something after file.write(file.content)?
Thanks for info in advance
Related
The goal is to download GTFS data through python web scraping, starting with https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download
Currently, I'm using requests like so:
def download(url):
fpath = "prov/city/GTFS"
r = requests.get(url)
if r.ok:
print("Saving file.")
open(fpath, "wb").write(r.content)
else:
print("Download failed.")
The results of requests.content of the above url unfortunately renders the following:
You can see the files of interest within the output (e.g. stops.txt) but how might I access them to read/write?
I fear you're trying to read a zip file with a text editor, perhaps you should try using the "zipfile" module.
The following worked:
def download(url):
fpath = "path/to/output/"
f = requests.get(url, stream = True, headers = headers)
if f.ok:
print("Saving to {}".format(fpath))
g=open(fpath+'output.zip','wb')
g.write(f.content)
g.close()
else:
print("Download failed with error code: ", f.status_code)
You need to write this file into a zip.
import requests
url = "https://transitfeeds.com/p/agence-metropolitaine-de-transport/129/latest/download"
fname = "gtfs.zip"
r = requests.get(url)
open(fname, "wb").write(r.content)
Now fname exists and has several text files inside. If you want to programmatically extract this zip and then read the content of a file, for example stops.txt, then you need first to extract a single file, or simply extractall.
import zipfile
# this will extract only a single file, and
# raise a KeyError if the file is missing from the archive
zipfile.ZipFile(fname).extract("stops.txt")
# this will extract all the files found from the archive,
# overwriting files in the process
zipfile.ZipFile(fname).extractall()
Now you just need to work with your file(s).
thefile = "stops.txt"
# just plain text
text = open(thefile).read()
# csv file
import csv
reader = csv.reader(open(thefile))
for row in reader:
...
this is an URL example "https://procurement-notices.undp.org/view_file.cfm?doc_id=257280"
if you put it in the browser a file will start downloading in your system.
I want to download this file using python and store it somewhere on my computer
this is how tried
import requests
# first_url = 'https://readthedocs.org/projects/python-guide/downloads/pdf/latest/'
second_url="https://procurement-notices.undp.org/view_file.cfm?doc_id=257280"
myfile = requests.get(second_url , allow_redirects=True)
# this works for the first URL
# open('example.pdf' , 'wb').write(myfile.content)
# this did't work for both of them
# open('example.txt' , 'wb').write(myfile.content)
# this works for the second URL
open('example.doc' , 'wb').write(myfile.content)
first: if I put the first_url in the browser it will download a pdf file, putting second_url will download a .doc file How can I know what type of file will the URL give to us or what type of file will be downloaded so that I use the correct open(...) method?
second: If I use the second URL in the browser a file with the name "T__proc_notices_notices_080_k_notice_doc_79545_770020123.docx" starts downloading. how can I know this file name when I try to download the file?
if you know any better solution kindly let me know for the implementation.
kindly have a quick look at Downloading Files from URLs and zip downloaded files in python question aswell
myfile.headers['content-type'] will give you the MIME-type of the URL's content and myfile.headers['content-disposition'] gives you info like filename etc. (if the response contains this header at all)
you can use response headers content-type like for first url it is application/pdf and sencond url for is application/msword you save file according to it. you can make extension dictinary where you can store possible file format and their types and match with it. your second question is also same like this one so i am taking your two urls from that question and for file name i am using just integers
all_Urls = ['https://omextemplates.content.office.net/support/templates/en-us/tf16402488.dotx' ,
'https://procurement-notices.undp.org/view_file.cfm?doc_id=257280']
extension_dict = {'application/vnd.openxmlformats-officedocument.wordprocessingml.document':'.docx',
'application/vnd.openxmlformats-officedocument.wordprocessingml.template':'.dotx',
'application/vnd.ms-word.document.macroEnabled.12':'.docm',
'application/vnd.ms-word.template.macroEnabled.12':'.dotm',
'application/pdf':'.pdf',
'application/msword':'.doc'}
for i,url in enumerate(all_Urls):
resp = requests.get(url)
response_headers = resp.headers
file_extension = extensio_dict[response_headers['Content-Type']]
with open(f"{i}.{file_extension}",'wb') as f:
f.write(resp.content)
for MIME-Type see this answer
I have the following code to get messages from group:
getmessage = client.get_messages(dialog, limit=1000)
for message in getmessage:
try:
if message.media == None:
print("message")
continue
else:
print("Media**********")
client.download_media(message)
The code above write the media.
I need to know the file name/file type before I write the file, How can I get it?
You can use a filename of your choosing to ensure that said filename will be used:
filename = 'some-file'
filename = client.download_media(message, filename)
Then filename will be some-file with the correct extension.
Otherwise, Telethon will generate a file name for the file (it will try the original name, and if it exists, append (n) to avoid overwriting existing files), so you can't really know where it will be saved beforehand (but the method does return the final filename).
I'm quite new on Django and i'm looking for a way to dwonload a zip file from my django site but i have some issue when i'm running this piece of code:
def download(self):
dirName = settings.DEBUG_FOLDER
name = 'test.zip'
with ZipFile(name, 'w') as zipObj:
# Iterate over all the files in directory
for folderName, subfolders, filenames in os.walk(dirName):
for filename in filenames:
# create complete filepath of file in directory
filePath = os.path.join(folderName, filename)
# Add file to zip
zipObj.write(filePath, basename(filePath))
path_to_file = 'http://' + sys.argv[-1] + '/' + name
resp= {}
# Grab ZIP file from in-memory, make response with correct MIME-type
resp = HttpResponse(content_type='application/zip')
# ..and correct content-disposition
resp['Content-Disposition'] = 'attachment; filename=%s' % smart_str(name)
resp['X-Sendfile'] = smart_str(path_to_file)
return resp
I get:
Exception Value:
<HttpResponse status_code=200, "application/zip"> is not JSON serializable
I tried to change the content_type to octet-stream but it doesn't work
And to use a wrapper as followw:
wrapper = FileWrapper(open('test.zip', 'rb'))
content_type = 'application/zip'
content_disposition = 'attachment; filename=name'
# Grab ZIP file from in-memory, make response with correct MIME-type
resp = HttpResponse(wrapper, content_type=content_type)
# ..and correct content-disposition
resp['Content-Disposition'] = content_disposition
I didn't find useful answer so far but maybe I didn't search well, so if it seems my problem had been already traited, feel free to notify me
Thank you very much for any help
You have to send the zip file as byte
response = HttpResponse(zipObj.read(), content_type="application/zip")
response['Content-Disposition'] = 'attachment; filename=%s' % smart_str(name)
return response
I would do like this:
(Caveat I use wsl so the python function will make use of cmd lines)
In view:
import os
def zipdownfun(request):
""" Please establish in settings.py where media file should be downloaded from.
In my case is media with a series of other folders inside. Media folder is at the same level of project root folder, where settings.py is"""
file_name = os.path.join(MEDIA_URL,'folder_where_your_file_is','file_name.zip')
"""let us put the case that you have zip folder in media folder"""
file_folder_path = os.path.join(MEDIA_URL,'saving_folder')
"""The command line takes as first variable the name of the
future zip file and as second variable the destination folder"""
cmd = f'zip {file_name} {file_folder_path}'
"""With os I open a process in the background so that some magic
happens"""
os.system(cmd)
"""I don't know what you want to do with this, but I placed the
URL of the file in a button for the download, so you will need
the string of the URL to place in href of an <a> element"""
return render(request,'your_html_file.html', {'url':file_name})
The db I have created, will be updated very often. I used a slightly different version of this function with -r clause since I had to zip, each time, a folder. Why I did this? The database I have created has to allow the download of this zipped folder. This folder will be updated daily. So this function basically overwrites the file each time that is downloaded. It will be so fresh of new data each time.
Please refer to this page to understand how to create a button for the download of the generated file.
Take as reference approach 2. The URL variable that you are passing to the Django template should be used at the place of the file (screenshot attached)
I hope it can help!
I'm trying create and serve excel files using Django. I have a jar file which gets parameters and produces an excel file according to parameters and it works with no problem. But when i'm trying to get the produced file and serve it to the user for download the file comes out broken. It has 0kb size. This is the code piece I'm using for excel generation and serving.
def generateExcel(request,id):
if os.path.exists('./%s_Report.xlsx' % id):
excel = open("%s_Report.xlsx" % id, "r")
output = StringIO.StringIO(excel.read())
out_content = output.getvalue()
output.close()
response = HttpResponse(out_content,content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
response['Content-Disposition'] = 'attachment; filename=%s_Report.xlsx' % id
return response
else:
args = ['ServerExcel.jar', id]
result = jarWrapper(*args) # this creates the excel file with no problem
if result:
excel = open("%s_Report.xlsx" % id, "r")
output = StringIO.StringIO(excel.read())
out_content = output.getvalue()
output.close()
response = HttpResponse(out_content,content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
response['Content-Disposition'] = 'attachment; filename=%s_Report.xlsx' % id
return response
else:
return HttpResponse(json.dumps({"no":"excel","no one": "cries"}))
I have searched for possible solutions and tried to use File Wrapper also but the result did not changed. I assume i have problem with reading the xlsx file into StringIO object. But dont have any idea about how to fix it
Why on earth are you passing your file's content to a StringIO just to assign StringIO.get_value() to a local variable ? What's wrong with assigning file.read() to your variable directly ?
def generateExcel(request,id):
path = './%s_Report.xlsx' % id # this should live elsewhere, definitely
if os.path.exists(path):
with open(path, "r") as excel:
data = excel.read()
response = HttpResponse(data,content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
response['Content-Disposition'] = 'attachment; filename=%s_Report.xlsx' % id
return response
else:
# quite some duplication to fix down there
Now you may want to check weither you actually had any content in your file - the fact that the file exists doesn't mean it has anything in it. Remember that you're in a concurrent context, you can have one thread or process trying to read the file while another (=>another request) is trying to write it.
In addition to what Bruno says, you probably need to open the file in binary mode:
excel = open("%s_Report.xlsx" % id, "rb")
You can use this library to create excel sheets on the fly.
http://xlsxwriter.readthedocs.io/
For more information see this page. Thanks to #alexcxe
XlsxWriter object save as http response to create download in Django
my answer is:
def generateExcel(request,id):
if os.path.exists('./%s_Report.xlsx' % id):
with open('./%s_Report.xlsx' % id, "rb") as file:
response = HttpResponse(file.read(),content_type='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet')
response['Content-Disposition'] = 'attachment; filename=%s_Report.xlsx' % id
return response
else:
# quite some duplication to fix down there
why using "rb"? because HttpResponse class init parameters is (self, content=b'', *args, **kwargs), so we should using "rb" and using .read() to get the bytes.