I've been trying for a while to make tablib work with web2py without luck. The code is delivering a .xls file as expected, but it's corrupted and empty.
import tablib
data = []
headers = ('first_name', 'last_name')
data = tablib.Dataset(*data, headers=headers)
data.append(('John', 'Adams'))
data.append(('George', 'Washington'))
response.headers['Content-Type']= 'application/vnd.ms-excel;charset=utf-8'
response.headers['Content-disposition']='attachment; filename=test.xls'
response.write(data.xls, escape=False)
Any ideas??
Thanks!
Per http://en.wikipedia.org/wiki/Process_state , response.write is documented as serving
to write text into the output page body
(my emphasis). data.xls is not text -- it's binary stuff! To verify that is indeed the cause of your problem, try using data.csv instead, and that should work, since it is text.
I believe you'll need to use response.stream instead, to send "binary stuff" as your response (or as an attachment thereto).
Related
I found a tutorial and I'm trying to run this script, I did not work with python before.
tutorial
I've already seen what is running through logging.debug, checking whether it is connecting to google and trying to create csv file with other scripts
from urllib.parse import urlencode, urlparse, parse_qs
from lxml.html import fromstring
from requests import get
import csv
def scrape_run():
with open('/Users/Work/Desktop/searches.txt') as searches:
for search in searches:
userQuery = search
raw = get("https://www.google.com/search?q=" + userQuery).text
page = fromstring(raw)
links = page.cssselect('.r a')
csvfile = '/Users/Work/Desktop/data.csv'
for row in links:
raw_url = row.get('href')
title = row.text_content()
if raw_url.startswith("/url?"):
url = parse_qs(urlparse(raw_url).query)['q']
csvRow = [userQuery, url[0], title]
with open(csvfile, 'a') as data:
writer = csv.writer(data)
writer.writerow(csvRow)
print(links)
scrape_run()
The TL;DR of this script is that it does three basic functions:
Locates and opens your searches.txt file.
Uses those keywords and searches the first page of Google for each
result.
Creates a new CSV file and prints the results (Keyword, URLs, and
page titles).
Solved
Google add captcha couse i use to many request
its work when i use mobile internet
Assuming the links variable is full and contains data - please verify.
if empty - test the api call itself you are making, maybe it returns something different than you expected.
Other than that - I think you just need to tweak a little bit your file handling.
https://www.guru99.com/reading-and-writing-files-in-python.html
here you can find some guidelines regarding file handling in python.
in my perspective, you need to make sure you create the file first.
start on with a script which is able to just create a file.
after that enhance the script to be able to write and append to the file.
from there on I think you are good to go and continue with you're script.
other than that I think that you would prefer opening the file only once instead of each loop, it could mean much faster execution time.
let me know if something is not clear.
I've attempted to use urllib, requests, and wget. All three don't work.
I'm trying to download a 300KB .npz file from a URL. When I download the file with wget.download(), urllib.request.urlretrieve(), or with requests, an error is not thrown. The .npz file downloads. However, this .npz file is not 300KB. The file size is only 1 KB. Also, the file is unreadable - when I use np.load(), the error OSError: Failed to interpret file 'x.npz' as a pickle shows up.
I am also certain that the URL is valid. When I download the file with a browser, it is correctly read by np.load() and has the right file size.
Thank you very much for the help.
Edit 1:
The full code was requested. This was the code:
loadfrom = "http://example.com/dist/x.npz"
savedir = "x.npz"
wget.download(loadfrom, savedir)
data = np.load(savedir)
I've also used variants with urllib:
loadfrom = "http://example.com/dist/x.npz"
savedir = "x.npz"
urllib.request.urlretrieve(loadfrom, savedir)
data = np.load(savedir)
and requests:
loadfrom = "http://example.com/dist/x.npz"
savedir = "x.npz"
r = requests.get(loadfrom).content
with open("x.npz",'wb') as f:
f.write(r)
data = np.load(savedir)
They all produce the same result, with the aforementioned conditions.
Kindly show the full code and the exact lines you use to download the file. Remember you need to use
r=requests.get("direct_URL_of_your_file.npz").content
with open("local_file.npz",'wb') as f:
f.write(r)
Also make sure the URL is a direct download link.
The issue was that the server needed javascript to run as a security precaution. So, when I send the request, all I got was html with "This Site Requres Javascript to Work". I found out that there was a __test cookie that needed to be passed during the request.
This answer explains it fully. This video may also be helpful.
I use python-pptx v0.6.2 to generate powerpoint. I read a exist powerpoint into BytesIO, then do some modification and save it. I can download the file successfully, and I'm sure the content can be write into the file. But when I open the powerpoint, it will popup a error message "Powerpoint found a problem with content in foo.pptx. Powerpoint can attempt to repair the presatation.", then I have to click "repair" button, the powerpoint will display as "repaired" mode. My Python version is 3.5.2 and Django version is 1.10. Below is my code:
with open('foo.pptx', 'rb') as f:
source_stream = BytesIO(f.read())
prs = Presentation(source_stream)
first_slide = prs.slides[0]
title = first_slide.shapes.title
subtitle = first_slide.placeholders[1]
title.text = 'Title'
subtitle.text = "Subtitle"
response = HttpResponse(content_type='application/vnd.ms-powerpoint')
response['Content-Disposition'] = 'attachment; filename="sample.pptx"'
prs.save(source_stream)
ppt = source_stream.getvalue()
source_stream.close()
response.write(ppt)
return response
Any help is appreciate, thanks in advance!
It looks like you've got problems with the IO.
The first three lines can be replaced by:
prs = Presentation('foo.pptx')
Placing the file into a memory-based stream just uses unnecessary resources.
On the writing, you're writing to that original (unnecessary) stream, which is dicey. I suspect that because you didn't seek(0) that you're appending onto the end of it. Also it's conceptually more complicated to deal with reuse.
If you use a fresh BytesIO buffer for the save I think you'll get the proper behavior. It's also better practice because it decouples the open, modify, and save, which you can then factor into separate methods later.
If you eliminate the first BytesIO you should just need the one for the save in order to get the .pptx "file" into the HTTP response.
I have a file, gather.htm which is a valid HTML file with header/body and forms. If I double click the file on the Desktop, it properly opens in a web browser, auto-submits the form data (via <SCRIPT LANGUAGE="Javascript">document.forms[2].submit();</SCRIPT>) and the page refreshes with the requested data.
I want to be able to have Python make a requests.post(url) call using gather.htm. However, my research and my trail-and-error has provided no solution.
How is this accomplished?
I've tried things along these lines (based on examples found on the web). I suspect I'm missing something simple here!
myUrl = 'www.somewhere.com'
filename='/Users/John/Desktop/gather.htm'
f = open (filename)
r = requests.post(url=myUrl, data = {'title':'test_file'}, files = {'file':f})
print r.status_code
print r.text
And:
htmfile = 'file:///Users/John/Desktop/gather.htm'
files = {'file':open('gather.htm')}
webbrowser.open(url,new=2)
response = requests.post(url)
print response.text
Note that in the 2nd example above, the webbrowser.open() call works correctly but the requests.post does not.
It appears that everything I tried failed in the same way - the URL is opened and the page returns default data. It appears the website never receives the gather.htm file.
Since your request is returning 200 OK, there is nothing wrong getting your post request to the server. It's hard to give you an exact answer, but the problem lies with how the server is handling the request. Either your post request is being formatted in a way that the server doesn't recognise, or the server hasn't been set up to deal with them at all. If you're managing the website yourself, some additional details would help.
Just as a final check, try the following:
r = requests.post(url=myUrl, data={'title':'test_file', 'file':f})
Why isn't the code below working? The email is received, and the file comes through with the correct filename (it's a .png file). But when I try to open the file, it doesn't open correctly (Windows Gallery reports that it can't open this photo or video and that the file may be unsupported, damaged or corrupted).
When I download the file using a subclass of blobstore_handlers.BlobstoreDownloadHandler (basically the exact handler from the GAE docs), and the same blob key, everything works fine and Windows reads the image.
One more bit of info - the binary files from the download and the email appear very similar, but have a slightly different length.
Anyone got any ideas on how I can get email attachments sending from GAE blobstore? There are similar questions on S/O, suggesting other people have had this issue, but there don't appear to be any conclusions.
from google.appengine.api import mail
from google.appengine.ext import blobstore
def send_forum_post_notification():
blob_reader = blobstore.BlobReader('my_blobstore_key')
blob_info = blobstore.BlobInfo.get('my_blobstore_key')
value = blob_reader.read()
mail.send_mail(
sender='my.email#address.com',
to='my.email#address.com',
subject='this is the subject',
body='hi',
reply_to='my.email#address.com',
attachments=[(blob_info.filename, value)]
)
send_forum_post_notification()
I do not understand why you use a tuple for the attachment. I use :
message = mail.EmailMessage(sender = ......
message.attachments = [blob_info.filename,blob_reader.read()]
I found that this code doesn't work on dev_appserver but does work when pushed to production.
I ran into a similar problem using the blobstore on a Python Google App Engine application. My application handles PDF files instead of images, but I was also seeing a "the file may be unsupported, damaged or corrupted" error using code similar to your code shown above.
Try approaching the problem this way: Call open() on the BlobInfo object before reading the binary stream. Replace this line:
value = blob_reader.read()
... with these two lines:
bstream = blob_info.open()
value = bstream.read()
Then you can remove this line, too:
blob_reader = blobstore.BlobReader('my_blobstore_key')
... since bstream above will be of type BlobReader.
Relevant documentation from Google is located here:
https://cloud.google.com/appengine/docs/python/blobstore/blobinfoclass#BlobInfo_filename