Working on a python scraper/spider and encountered a URL that exceeds the char limit with the titled IOError. Using httplib2 and when I attempt to retrieve the URL I receive a file name too long error. I prefer to have all of my projects within the home directory since I am using Dropbox. Anyway around this issue or should I just setup my working directory outside of home?
You are probably hitting limitation of the encrypted file system, which allows up to 143 chars in file name.
Here is the bug:
https://bugs.launchpad.net/ecryptfs/+bug/344878
The solution for now is to use any other directory outside your encrypted home directory. To double check this:
mount | grep ecryptfs
and see if your home dir is listed.
If that's the case either use some other dir above home, or create a new home directory without using encryption.
The fact that the filename that's too long starts with '.cache/www.example.com' explains the problem.
httplib2 optionally caches requests that you make. You've enabled caching, and you've given it .cache as the cache directory.
The easy solution is to put the cache directory somewhere else.
Without seeing your code, it's impossible to tell you how to fix it. But it should be trivial. The documentation for FileCache shows that it takes a dir_name as the first parameter.
Or, alternatively, you can pass a safe function that lets you generate a filename from the URI, overriding the default. That would allow you to generate filenames that fit within the 144-character limit for Ubuntu encrypted fs.
Or, alternatively, you can create your own object with the same interface as FileCache and pass that to the Http object to use as a cache. For example, you could use tempfile to create random filenames, and store a mapping of URLs to filenames in an anydbm or sqlite3 database.
A final alternative is to just turn off caching, of course.
As you apparently have passed '.cache' to the httplib.Http constructor, you should change this to something more appropriate or disable the cache.
Related
I need a component that's a browser-based file browser, and I expect some django app to currently provide this. Is there such a thing?
The full story:
I'm building a django app that is used for testing. I want to use it to serve files (and strings, and etc.) and attach custom headers to it.
Currently, I have a model FileSource which has a single file_path field, which is of type django.db.models.FileField.
When creating a FileSource from the admin, the user has a nice file upload dialog, and when saving, the file he chose, is saved on the server (in a really weird location, inside the directory where django is installed, or something weird like that, because i didn't customize the storage, nor will it help me in any way)
My problem: I only want to use the file dialog for the user to select a full path on the server. The file that the user chose must be only referenced, not copied (like currently), and it must reside on the server.
The server must thus be able to list the files it has, so i basically need a little browser-based file-browser.
At that point, I expect to be able to save a full path in my DB, and then I'll be able to access that file and serve it (together with whatever custom headers the user will chose from my app).
Currently, as you might know, the browsers always lie about the full path of the file. Chromium appends "C:\fakepath" to the file name, so I need support of the backend to accomplish this.
Also, I checked out django-filebrowser and django-filer and from what I understood, they weren't built for this. If I'm wrong, a little assistence in configuring them would be awesome.
You can use a FilePathField for that. It won't upload a file, but rather allow you to choose a pre-existing file. A caveat is that you can only use one directory. If you need multiple directories, then you'd need do go with something like django-filer.
I was working with the python os module and confronted some obstacles regarding the symbolic path thing..
linkdir = os.path.dirname(filepath)
if not os.path.isdir(linkdir):
if os.path.exists(linkdir):
os.unlink(linkdir)
os.makedirs(linkdir)
this is the code that i had problem fully understanding. Accrording to the explanation on the book, it means:
If I enter the if clause, this means the directory either does not exist or is a plain file.
In the case, it is the latter, so it will be erased. Finally, the target directory is created.
However i do not exactly understand how the directory(linkdir) could be a plain file. I tried to google it but just got an answer : 'Because it is the symbolic link'. I honestly do not get it with such short answer... Would you be kind enough to explain it to me in an understandable fashion?
The code tries to clear the way for a directory being created. The value in filepath is just a string. It isn't actually connected to anything on the filesystem, but you cannot just create a directory without checking if there isn't anything in the way first.
If you have the value /foo/bar/spam.html in filepath, the code does this:
extract the directory portion of that path, /foo/bar. This is still just a string, nothing really to do with the actual file system.
Test if /foo/bar is an actual directory on your filesystem with os.path.isdir(). If there is an existing directory at that location, you are done, mission accomplished.
If it is not a directory, then test if /foo/bar exists at all. We already discounted that it is a directory, so if /foo/bar exists anyway it must be something else. Usually that means it is a file. The code then will delete whatever is there to make way for the directory.
This doesn't have all that much to do with symbolic links; /foo/bar could be a pre-existing symbolic link too, but that doesn't really matter here. All that matters is that whatever actually exists on your filesystem at /foo/bar better be a directory already, otherwise it needs to be removed before you can create a directory there.
Because os.path.dirname(filepath) only splits the string "filepath" into head and tail according to slash.
It doesn't check whether the head is an existing directory.
For example, we hava a file named "a" in the working directory.
(1) the code
os.path.dirname("a/a")
returns "a".
(2) however, it is false if we check it via isdir
(3) it returns true if we check it via isfile
I'd like to download all of the files in a particular directory at a known URL. The names of the files won't necessarily be known, but their names will all contain a common keyword, and will have the same extension (.xml).
Is there an equivalent of "os.walk" for urllib2, such that I can simply walk through whatever files exist in the directory and open them for parsing?
The only examples of this I have seen online involve a a file of known name which contains a list of all the filenames in the directory. I do NOT want to do this...
Other possibly relevant info:
The files are on an apache server, and they are publicly accessible.
This is impossible without knowing the filenames - you'd have to randomly try every possible name, because your only way of knowing if a file with this name exists is requesting the url and seeing if you get a response. But you could let the Apache webserver generate a directory index for you (with mod_autoindex) and parse this to get the filenames.
In my app engine script (using the Python API), I'm using this code to dynamically generate zip files and serve them back to the user. When I download and extract the generated zip file and I'm running OS X, the permissions of each file extracted from the archive is 0, forcing me to chmod them. I'd rather not have my users have to do the same. Is there a way to fix this?
Yup, see the docs for the Python zipfile module. Specifically, the signature of the writestr method, which is:
ZipFile.writestr(zinfo_or_arcname,
bytes[, compress_type])
The first argument can be the filename, or a ZipInfo object, which allows you to specify information about the file to be stored. I believe the relevant field to set to change the permissions of the file is the external_attr, but some experimentation reading existing zip files may be required to determine this.
My application caches some data on disk. Because the cache may be large, it should not be stored on a network drive. It should persist between invocations of the application. I have a mechanism for the user to choose a location, but would like the default to be sensible and "the right thing" for the platform.
What is the appropriate location for such a cache? Is there an API for determining the appropriate location? How do I call it from Python?
There are a number a places you can put your application files in Windows. This page shows a list (this enum is .Net specific but most of the special folders are standard on Windows in general). Basically you'll need to decide if you need a cache per user, only for the local machine, per application or shared, etc.
I don't have much experience with python so I cannot specifically help with how to get these paths, but I'm sure someone more knowledgeable here can.
The standard location for Windows application to store their (permanent) application data is referenced by the %APPDATA% (current user) or %ALLUSERSPROFILE% (all users) environment variables. You can access them using, e.g. (only rudimentary and not very elegant error checking!):
import os
app_path = os.getenv("APPDATA") + "\\MyApplicationData"
try:
os.mkdir(app_path)
except WindowsError:
# already exists
Now you have your own directory for your app.
Have a look here: http://en.wikipedia.org/wiki/Environment_variable#User_management_variables. Anything that's under the users directory is good. If its for all users, then it should be: %ALLUSERSPROFILE%. If its for a specific user, make sure the permissions are right.
Check out MSDN for more info about other Windows Versions. Environment variables can vary from systems to systems.
Perhaps the tempfile module provides what you need. It uses the Windows Temp directory (which probably is not on a network drive) but you can specify a directory if you want to. Also for security reasons this module should be the right tool - if you use tempfile.mkstemp() the file is readable and writable only by the creating user ID.
Oh. I see you have just edited your question and that you need file persistence between invocations of the app. Then tempfile is not that ideal (even though you could choose not to delete your cache between invocations).
The wx.StandardPaths module contains methods that return various standard locations in the file system and transparently tries to do "the right thing" under Unix, Mac OS X, and Windows.
A handy package for this is appdirs, which support Windows and most other operating systems. For example, to create a cache directory for "MyApp" version 1.0 published by "Author" is:
import os
from appdirs import user_cache_dir
dirname = user_cache_dir("AppName", "Author", "v1.0")
# C:\Users\username\AppData\Local\Author\AppName\Cache\v1.0
# create it, if it doesn't exist
os.makedirs(dirname, exist_ok=True)
Does the app have any preferences, settings or options that the user can specify? If so, add an option where the user can specify the location of the data, with a default of the current Windows temp directory.
There's always a chance they may not have enough space on the drive with the temp directory, and would need to use a different drive/directory.