django test file download - "ValueError: I/O operation on closed file" - python

I have code for a view which serves a file download, and it works fine in the browser. Now I am trying to write a test for it, using the internal django Client.get:
response = self.client.get("/compile-book/", {'id': book.id})
self.assertEqual(response.status_code, 200)
self.assertEquals(response.get('Content-Disposition'),
"attachment; filename=book.zip")
so far so good. Now I would like to test if the downloaded file is the one I expect it to download. So I start by saying:
f = cStringIO.StringIO(response.content)
Now my test runner responds with:
Traceback (most recent call last):
File ".../tests.py", line 154, in test_download
f = cStringIO.StringIO(response.content)
File "/home/epub/projects/epub-env/lib/python2.7/site-packages/django/http/response.py", line 282, in content
self._consume_content()
File "/home/epub/projects/epub-env/lib/python2.7/site-packages/django/http/response.py", line 278, in _consume_content
self.content = b''.join(self.make_bytes(e) for e in self._container)
File "/home/epub/projects/epub-env/lib/python2.7/site-packages/django/http/response.py", line 278, in <genexpr>
self.content = b''.join(self.make_bytes(e) for e in self._container)
File "/usr/lib/python2.7/wsgiref/util.py", line 30, in next
data = self.filelike.read(self.blksize)
ValueError: I/O operation on closed file
Even when I do simply: self.assertIsNotNone(response.content) I get the same ValueError
The only topic on the entire internet (including django docs) I could find about testing downloads was this stackoverflow topic: Django Unit Test for testing a file download. Trying that solution led to these results. It is old and rare enough for me to open a new question.
Anybody knows how the testing of downloads is supposed to be handled in Django? (btw, running django 1.5 on python 2.7)

This works for us. We return rest_framework.response.Response but it should work with regular Django responses, as well.
import io
response = self.client.get(download_url, {'id': archive_id})
downloaded_file = io.BytesIO(b"".join(response.streaming_content))
Note:
streaming_content is only available for StreamingHttpResponse (also Django 1.10):
https://docs.djangoproject.com/en/1.10/ref/request-response/#django.http.StreamingHttpResponse.streaming_content

I had some file download code and a corresponding test that worked with Django 1.4. The test failed when I upgraded to Django 1.5 (with the same ValueError: I/O operation on closed file error that you encountered).
I fixed it by changing my non-test code to use a StreamingHttpResponse instead of a standard HttpResponse. My test code used response.content so I first migrated to CompatibleStreamingHttpResponse, then changed my test code code to use response.streaming_content instead to allow me to drop CompatibleStreamingHttpResponse in favour of StreamingHttpResponse.

Related

can't get page title from notion using api

I'm using notion.py and I'm new to python I want to get a page title from page and post it in another page but when I try I'm getting an error
Traceback (most recent call last):
File "auto_notion_read.py", line 16, in <module>
page_read = client.get_block(list_url_read)
File "/home/lotfi/.local/lib/python3.6/site-packages/notion/client.py", line 169, in get_block
block = self.get_record_data("block", block_id, force_refresh=force_refresh)
File "/home/lotfi/.local/lib/python3.6/site-packages/notion/client.py", line 162, in get_record_data
return self._store.get(table, id, force_refresh=force_refresh)
File "/home/lotfi/.local/lib/python3.6/site-packages/notion/store.py", line 184, in get
self.call_load_page_chunk(id)
File "/home/lotfi/.local/lib/python3.6/site-packages/notion/store.py", line 286, in call_load_page_chunk
recordmap = self._client.post("loadPageChunk", data).json()["recordMap"]
File "/home/lotfi/.local/lib/python3.6/site-packages/notion/client.py", line 262, in post
"message", "There was an error (400) submitting the request."
requests.exceptions.HTTPError: Invalid input.
My code is that I'm using is
from notion.client import NotionClient
import time
token_v2 = "my page tocken"
client = NotionClient(token_v2 = token_v2)
list_url_read = 'the url of the page page to read'
page_read = client.get_block(list_url_read)
list_url_post = 'the url of the page'
page_post = client.get_block(list_url_post)
print (page_read.title)
It isn't recommended to edit source code for dependencies, as you will most certainly cause a conflict when updating the dependencies in the future.
Fix PR 294 has been open since the 6th of March 2021 and has not been merged.
To fix this issue with the currently open PR (pull request) on GitHub, do the following:
pip uninstall notion
Then either:
pip install git+https://github.com/jamalex/notion-py.git#refs/pull/294/merge
OR in your requirements.txt add
git+https://github.com/jamalex/notion-py.git#refs/pull/294/merge
Source for PR 294 fix
You can find the fix here
In a nutshell you need to modify two files in the library itself:
store.py
client.py
Find "limit" value and change it to 100 in both.

How to download datasat from The Humanitarian Data Exchange (hdx api python)

I don't quite understand how I can download data from a dataset. I only download one file, and there are several of them. How can I solve this problem?
I am using hdx api library. There is a small example in the documentation. A list is returned to me and I use the download method. But only the first file from the list is downloaded, not all of them.
My code
from hdx.hdx_configuration import Configuration
from hdx.data.dataset import Dataset
Configuration.create(hdx_site='prod', user_agent='A_Quick_Example', hdx_read_only=True)
dataset = Dataset.read_from_hdx('novel-coronavirus-2019-ncov-cases')
resources = dataset.get_resources()
print(resources)
url, path = resources[0].download()
print('Resource URL %s downloaded to %s' % (url, path))
I tried to use different methods, but only this one turned out to be working, it seems some kind of error in the loop, but I do not understand how to solve it.
Result
Resource URL https://data.humdata.org/hxlproxy/api/data-preview.csv?url=https%3A%2F%2Fraw.githubusercontent.com%2FCSSEGISandData%2FCOVID-19%2Fmaster%2Fcsse_covid_19_data%2Fcsse_covid_19_time_series%2Ftime_series_covid19_confirmed_global.csv&filename=time_series_covid19_confirmed_global.csv downloaded to C:\Users\tred1\AppData\Local\Temp\time_series_covid19_confirmed_global.csv.CSV
Forgot to add that I get a list of strings where there is a download url value. Probably the problem is in the loop
When I use a for-loop I get this:
for res in resources:
print(res)
res[0].download()
Traceback (most recent call last):
File "C:/Users/tred1/PycharmProjects/pythonProject2/HDXapi.py", line 31, in <module>
main()
File "C:/Users/tred1/PycharmProjects/pythonProject2/HDXapi.py", line 21, in main
res[0].download()
File "C:\Users\tred1\AppData\Local\Programs\Python\Python38\lib\collections\__init__.py", line 1010, in __getitem__
raise KeyError(key)
KeyError: 0
Datasets
You can get the download link as follows:
dataset = Dataset.read_from_hdx('acled-conflict-data-for-africa-1997-lastyear')
lita_resources = dataset.get_resources()
dictio=lista_resources[1]
url=dictio['download_url']

running a simple pdblp code to extract BBG data

I am currently logged on to my BBG anywhere (web login) on my Mac. So first question is would I still be able to extract data using tia (as I am not actually on my terminal)
import pdblp
con = pdblp.BCon(debug=True, port=8194, timeout=5000)
con.start()
I got this error
pdblp.pdblp:WARNING:Message Received:
SessionStartupFailure = {
reason = {
source = "Session"
category = "IO_ERROR"
errorCode = 9
description = "Connection failed"
}
}
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/prasadkamath/anaconda2/envs/Pk36/lib/python3.6/site-packages/pdblp/pdblp.py", line 147, in start
raise ConnectionError('Could not start blpapi.Session')
ConnectionError: Could not start blpapi.Session
I am assuming that I need to be on the terminal to be able to extract data, but wanted to confirm that.
This is a duplicate of this issue here on SO. It is not an issue with pdblp per se, but with blpapi not finding a connection. You mention that you are logged in via the web, which only allows you to use the terminal (or Excel add-in) within the browser, but not outside of it, since this way of accessing Bloomberg lacks a data feed and an API. More details and alternatives can be found here.

How to serve pdf (or non-image) files through google app engine with python flask? [duplicate]

This question already has an answer here:
serving blobs from GAE blobstore in flask
(1 answer)
Closed 7 years ago.
Here is the code that I currently use to upload the file that I get:
header = file.headers['Content-Type']
parsed_header = parse_options_header(header)
blob_key = parsed_header[1]['blob-key']
UserObject(file = blobstore.BlobKey(blob_key)).put()
On the serving side, I tried doing the same thing that I do for images. That is, I get the UserObject and then just do
photoUrl = images.get_serving_url(str(userObject.file)).
To my surprise, this hack worked perfectly in local server. However, on production, it raises an exception:
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/images/__init__.py", line 1794, in get_serving_url
return rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/images/__init__.py", line 1892, in get_serving_url_hook
raise _ToImagesError(e, readable_blob_key)
TransformationError
What is a proper way to store/serve non-image files such as pdf?
Don't use the images API, that won't work since (at least in production) it performs image content/format checks and you're not serving images.
You can have your own request path namespace with a handler, just like any other handler in your app, maybe something along these lines:
def pdfServingUrl(request, blob_key):
from urlparse import urlsplit, urlunsplit
split = list(urlsplit(request.url))
split[2] = '/PDF?key=%s' % blob_key.urlsafe() # rewrite the url 'path'
return urlunsplit(tuple(split))
And in the handler for /PDF*:
blob_key = self.request.get('key')
self.response.write(blobstore.BlobReader(blob_key).read())
If you use the Blobstore API on top of Google Cloud Storage you can also use direct GCS access paths, as mentioned in this answer: https://stackoverflow.com/a/34023661/4495081.
Note: the above code is a barebone python suggestion just for accessing the blob data. Additional stuff like headers, etc. is not addressed.
Per #NicolaOtasevic's comment, for flask the more complete code might look like this:
blob_info = blobstore.get(request.args.get('key'))
response = make_response(blob_info.open().read())
response.headers['Content-Type'] = blob_info.content_type
response.headers['Content-Disposition'] = 'filename="%s"' % blob_info.filename

Syntax error after uploading GAE Python app

I have created a GAE app that parses RSS feeds using cElementTree. Testing on my local installation of GAE works fine. When I uploaded this app and tried to test it, I get a SyntaxError.
The error is :
Traceback (most recent call last): File "/base/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 509, in __call__
handler.post(*groups) File "/base/data/home/apps/palmfeedparser/1-6.339910418736930444/pipes.py", line 285, in post
tree = ET.parse(urlopen(URL)) File "<string>", line 45, in parse File "<string>", line 32,
in parse SyntaxError: no element found: line 14039, column 45
I did what Mr.Alex Martelli suggested and it printed out the following on my local machine:
[
' <ac:tag><![CDATA[Mobilit\xc3\xa4t]]></ac:tag>\n',
' </ac:tags>\n',
' <ac:images>\n',
' <ac:image ac:number="1">\n',
' <ac:asset_url ac:type="app">http://cdn.downloads.example.com/public/1198/de/images/1/A/01.png</ac:asset_url>\n'
]
I uploaded the app and it printed out:
[
' <ac:tag><![CDATA[Mobilit\xc3\xa4t]]></ac:tag>\n',
' </ac:tags>\n',
' <ac:images>\n',
' <ac:image ac:number="1">\n',
' <ac:asset_url ac:type="app">http://cdn.downloads.example.com/public/1198/de/images/1/A/01.png</ac:asset_url>\n'
]
These lines correspond to the following lines in the RSS feed I am reading:
<ac:tags>
<ac:tag><![CDATA[Mobilität]]></ac:tag>
</ac:tags>
<ac:images>
<ac:image ac:number="1">
<ac:asset_url ac:type="app">http://cdn.downloads.example.com/public/1198/de/images/1/A/01.png</ac:asset_url>
I notice that there is a newline before the closing ac:tags. Line 14039 corresponds to this new line.
Update:
I use urllib.urlopen to access the URL of the feed. I displayed the contents it fetches both locally and on GAE proper. Locally, no content is truncated. Testing after uploading the app, shows that the feed that has 15289 lines is truncated to 14185 lines.
What method can I use to fetch this huge feed? Would urlfetch work?
Thanks in advance for your help!
A_iyer
You may have run into one of the mysterious limits placed on GAE.
Urlopen has been overridden by google to it's urlfetch method, so there shouldn't be any difference in it. (though it might be worth trying, there are a lot of hidden things in GAE)
newline characters shouldn't effect cElementTree.
Are there any other logging messages coming through in your AppEngine Logs? (Relating to the urlopen request?)

Categories

Resources