I am hoping for a little advice on shelves/databases in Python.
Problem: I have a database created on the mac, that I want to use on windows 7. I use Python 3.2, MacOS 10.7, and win 7.
When I open and save my shelve on the mac all is good and well. I get a file with a ".db" extension. On my windows-python it is not recognized. I can however create a new db on the pc and get files with ".bak, dat, .dir" extensions.
I am guessing that the python on the pc does not have the same underlying database that my mac-python uses?
I am not sure which is the correct approach here, but maybe I could:
Change the default-db that my systems uses?
Find out which db my mac-python uses and add that on the pc?
Change the way I store my data all together?
Speed is not an issue, the datasize is a few megabytes, and it's not accessed very often.
Hope to find a helping hand out there. Thanks in advance - any help is much appreciated.
/Esben
What I am doing:
Import shelve
db = shelve.open('mydb')
entries = db['list']
db.close
It's pretty straight forward, I have a working db-file called "mydb.db" on the mac but when I try to open it on the pc-python i get:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/dbm/init.py", line 107, in whichdb
f = io.open(filename + ".pag", "rb")
IOError: [Errno 2] No such file or directory: 'mydb.pag'
Thank you for your reply!
I seems that shelves in python are not easily forced to use a specific db, however pickles works like a charm. At least from mac os -> windows 7.
So short answer: If you want portability, don't use shelves, use pickles directly.
/Esben
sqlite3 module is a cross platform module that is even supported by many other languages and tools.
pickle module is simpler, but also cross platform. You give it an object and it dumps it to a file. No tables or rows like sqlite.
I ran into the same issue and implemented a dict-based class that supports loading and writing the contents of the dict from and to disk.
from pathlib import Path
import pickle
class DiskDict(dict):
def __init__(self, sync_path: Path):
self.path = sync_path
if self.path.exists():
with open(self.path, "rb") as file:
tmp_dct = pickle.load(file)
super().update(tmp_dct)
print(f"loaded DiskDict with {len(tmp_dct)} items from {self.path}")
def sync_to_disk(self):
with open(self.path, "wb") as file:
tmp_dct = super().copy()
pickle.dump(tmp_dct, file)
print(f"saved DiskDict with {len(tmp_dct)} items to {self.path}")
You might try using pysos, it's a lightweight shelve alternative that is also cross platform.
Install using pip: pip install pysos
Example usage:
import pysos
db = pysos.Dict('myfile.db')
db['hello'] = 'persistence!'
db.close()
The advantage is also that everything is contained in this single file myfile.db so you can easely just copy it around.
Related
I'm trying to remove a file in Python 3 on Linux (RHEL) the following way:
os.remove(or.getcwd() + '/file.txt')
(sorry not allowed to publish the real paths).
and it gives me the usual error
No such file or directory: '/path/to/file/file.txt'
(I've respected slash or antislash in the path)
What is strange is that when I just ls the file (by copy pasting, so the very same path) the file does exist.
I've read this post but i'm not on Windows and slash direction seems correct.
Any idea ?
EDIT: as suggested by #DominicPrice os.system('ls') is showing the file while os.listdir() does not show it (but shows other files in the same directory)
EDIT 2: So my issue was due a a bad usage of os.popen. I used this method to copy file but did not wait for the subprocess to be terminated. So my understanding is that the file was not copied yet when I tried to delete it.
The problem is that, as you have explained in the comments, you are creating the file using os.popen("cp ..."). This works asynchronously, so it may not have had time to complete by the time you call os.remove(). You can force python to wait for it to finish by calling the close method:
proc = os.popen("cp myfile myotherfile")
proc.close() # wait for process to finish
os.remove("myotherfile") # we're all good
I would highly recommend staying away from using os.popen in favour of the subprocess library, which has a run function which is way safer to use.
For the specific functions of copying a file, an even better (and cross platform) solution is to use the shutil library:
import shutil
shutil.copyfile("myfile", "myotherfile")
you should use os.path.dirname(__file__).
this is an inbuilt function of os module in python.
you can read more here.
https://www.geeksforgeeks.org/find-path-to-the-given-file-using-python/
Let's focus on one dll: C:\Windows\System32\wbem\wmiutils.dll. Why? Because it's the file in which I personally discovered Windows delivers a different dll depending on process architecture.
TLDR; Is there a way to programmatically determine the actual path of the dll that was returned by the file system redirector?
I understand that if launched as a x86 process, I get C:\Windows\SysWOW64\wbem\wmiutils.dll. And, if launched as a x64 process, I get C:\Windows\System32\wbem\wmiutils.dll.
I need to determine which wmiutils.dll I'm actually looking at. The redirector makes system32\wbem\wmiutils.dll look and feel identical but it's not. If I use parent path, I get C:\Windows\System32\wbem even though I may/may not be looking at C:\Windows\SysWOW64\wbem.
Any sweet python magic to make this happen? I can't seem to see anything from other languages I can port. Based on my use case, I've come up with a couple hacks but they're just that. Hoping somebody has found a solution as easy as parent path that actually works in this case.
import ctypes, hashlib
k32 = ctypes.windll.kernel32
oldValue = ctypes.c_long(0)
k32.Wow64DisableWow64FsRedirection(ctypes.byref(oldValue)) # Should open 32-bit
with open(r"C:\Windows\System32\wbem\wmiutil.dll", "rb") as f:
checksum32 = hashlib.md5(f.read()).hexdigest()
k32.Wow64RevertWow64FsRedirection(oldValue) # Should use what Windows thinks you need
with open(r"C:\Windows\System32\wbem\wmiutil.dll", "rb") as f:
checksum64 = hashlib.md5(f.read()).hexdigest()
if (checksum32 != checksum64):
print("You're running 64bit wmiutil dll")
I don't have Windows Python to test this, but it should work according to https://msdn.microsoft.com/en-us/library/windows/desktop/aa365745%28v=vs.85%29.aspx.
I think an easier way would be to just do some test like creating a struct and seeing if it's 8 bytes or 4 bytes. Then you can assume that Windows is using the 64-bit version of DLLs if it's 8 bytes.
I have data stored in a shelf file created with python 2.7
When I try to access the file from python 3.4, I get an error:
>>> import shelve
>>> population=shelve.open('shelved.shelf')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python34\lib\shelve.py", line 239, in open
return DbfilenameShelf(filename, flag, protocol, writeback)
File "C:\Python34\lib\shelve.py", line 223, in __init__
Shelf.__init__(self, dbm.open(filename, flag), protocol, writeback)
File "C:\Python34\lib\dbm\__init__.py", line 88, in open
raise error[0]("db type could not be determined")
dbm.error: db type could not be determined
I'm still able to access the shelf with no problem in python 2.7, so there seems to be a backward-compatibility issue. Is there any way to directly access the old format with the new python version?
As I understand now, here is the path that lead to my problem:
The original shelf was created with Python 2 in Windows
Python 2 Windows defaults to bsddb as the underlying database for shelving, since dbm is not available on the Windows platform
Python 3 does not ship with bsddb. The underlying database is dumbdbm in Python 3 for Windows.
I at first looked into installing a third party bsddb module for Python 3, but it quickly started to turn into a hassle. It then seemed that it would be a recurring hassle any time I need to use the same shelf file on a new machine. So I decided to convert the file from bsddb to dumbdbm, which both my python 2 and python 3 installations can read.
I ran the following in Python 2, which is the version that contains both bsddb and dumbdbm:
import shelve
import dumbdbm
def dumbdbm_shelve(filename,flag="c"):
return shelve.Shelf(dumbdbm.open(filename,flag))
out_shelf=dumbdbm_shelve("shelved.dumbdbm.shelf")
in_shelf=shelve.open("shelved.shelf")
key_list=in_shelf.keys()
for key in key_list:
out_shelf[key]=in_shelf[key]
out_shelf.close()
in_shelf.close()
So far it looks like the dumbdbm.shelf files came out ok, pending a double-check of the contents.
The shelve module uses Python's pickle, which may require a protocol version when being accessed between different versions of Python.
Try supplying protocol version 2:
population = shelve.open('shelved.shelf', protocol=2)
According to the documentation:
Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
This is most likely the protocol used in the original serialization (or pickling).
Edited: You may need to rename your database. Read on...
Seems like pickle is not the culprit here. shelve relies also in anydbm (Python 2.x) or dbm (Python 3) to create/open a database and store the pickled information.
I created (manually) a database file using the following:
# Python 2.7
import anydbm
anydbm.open('database2', flag='c')
and
# Python 3.4
import dbm
dbm.open('database3', flag='c')
In both cases, it creates the same kind of database (may be distribution dependent, this is on Debian 7):
$ file *
database2: Berkeley DB (Hash, version 9, native byte-order)
database3.db: Berkeley DB (Hash, version 9, native byte-order)
anydbm can open database3.db without problems, as expected:
>>> anydbm.open('database3')
<dbm.dbm object at 0x7fb1089900f0>
Notice the lack of .db when specifying the database name, though. But dbm chokes on database2, which is weird:
>>> dbm.open('database2')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/dbm/__init__.py", line 88, in open
raise error[0]("db type could not be determined")
dbm.error: db type could not be determined
unless I change the name of the name of the database to database2.db:
$ mv database2 database2.db
$ python3
>>> import dbm
>>> dbm.open('database2')
<_dbm.dbm object at 0x7fa7eaefcf50>
So, I suspect a regression on the dbm module, but I haven't checked the documentation. It may be intended :-?
NB: Notice that in my case, the extension is .db, but that depends on the database being used by dbm by default! Create an empty shelf using Python 3 to figure out which one are you using and what is it expecting.
I don't think it's possible to use a Python 2 shelf with Python 3's shelve module. The underlying files are completely different, at least in my tests.
In Python 2*, a shelf is represented as a single file with the filename you originally gave it.
In Python 3*, a shelf consists of three files: filename.bak, filename.dat, and filename.dir. Without any of these files present, the shelf cannot be opened by the Python 3 library (though it appears that just the .dat file is sufficient for opening, if not actual reading).
#Ricardo Cárdenes has given an overview of why this may be--it's likely an issue with the underlying database modules used in storing the shelved data. It's possible that the databases are backwards compatible, but I don't know and a quick search hasn't turned up any obvious answers.
I think it's likely that some of the possible databases implemented by dbm are backwards-compatible, whereas others are not: this could be the cause of the discrepancy between answers here, where some people, but not all, are able to open older databases directly by specifying a protocol.
*On every machine I've tested, using Python 2.7.6 vs Pythons 3.2.5, 3.3.4, and 3.4.1
The berkeleydb module includes a retro-compatible
implementation of the shelve object. (You will also need to install Oracle Berkeley DB)
You just need to:
import berkeleydb
from berkeleydb import dbshelve as shelve
population = shelve.open('shelved.shelf')
In python, how can I identify a file that is a "window system file". From the command line I can do this with the following command:
ATTRIB "c:\file_path_name.txt"
If the return has the "S" character, then it's a windows system file. I cannot figure out the equivilant in python. A few example of similar queries look like this:
Is a file writeable?
import os
filePath = r'c:\testfile.txt'
if os.access(filePath, os.W_OK):
print 'writable'
else:
print 'not writable'
another way...
import os
import stat
filePath = r'c:\testfile.txt'
attr = os.stat(filePath)[0]
if not attr & stat.S_IWRITE:
print 'not writable'
else:
print 'writable'
But I can't find a function or enum to identify a windows system file. Hopefully there's a built in way to do this. I'd prefer not to have to use win32com or another external module.
The reason I want to do this is because I am using os.walk to copy files from one drive to another. If there was a way to walk the directory tree while ignoring system files that may work too.
Thanks for reading.
Here's the solutions I came up with based on the answer:
Using win32api:
import win32api
import win32con
filePath = r'c:\test_file_path.txt'
if not win32api.GetFileAttributes(filePath) & win32con.FILE_ATTRIBUTE_SYSTEM:
print filePath, 'is not a windows system file'
else:
print filePath, 'is a windows system file'
and using ctypes:
import ctypes
import ctypes.wintypes as types
# From pywin32
FILE_ATTRIBUTE_SYSTEM = 0x4
kernel32dll = ctypes.windll.kernel32
class WIN32_FILE_ATTRIBUTE_DATA(ctypes.Structure):
_fields_ = [("dwFileAttributes", types.DWORD),
("ftCreationTime", types.FILETIME),
("ftLastAccessTime", types.FILETIME),
("ftLastWriteTime", types.FILETIME),
("nFileSizeHigh", types.DWORD),
("nFileSizeLow", types.DWORD)]
def isWindowsSystemFile(pFilepath):
GetFileExInfoStandard = 0
GetFileAttributesEx = kernel32dll.GetFileAttributesExA
GetFileAttributesEx.restype = ctypes.c_int
# I can't figure out the correct args here
#GetFileAttributesEx.argtypes = [ctypes.c_char, ctypes.c_int, WIN32_FILE_ATTRIBUTE_DATA]
wfad = WIN32_FILE_ATTRIBUTE_DATA()
GetFileAttributesEx(pFilepath, GetFileExInfoStandard, ctypes.byref(wfad))
return wfad.dwFileAttributes & FILE_ATTRIBUTE_SYSTEM
filePath = r'c:\test_file_path.txt'
if not isWindowsSystemFile(filePath):
print filePath, 'is not a windows system file'
else:
print filePath, 'is a windows system file'
I wonder if pasting the constant "FILE_ATTRIBUTE_SYSTEM" in my code is legit, or can I get its value using ctypes as well?
But I can't find a function or enum to identify a windows system file. Hopefully there's a built in way to do this.
There is no such thing. Python's file abstraction doesn't have any notion of "system file", so it doesn't give you any way to get it. Also, Python's stat is a very thin wrapper around the stat or _stat functions in Microsoft's C runtime library, which doesn't have any notion of "system file". The reason for this is that both Python files and Microsoft's C library are both designed to be "pretty much like POSIX".
Of course Windows also has a completely different abstraction for files. But this one isn't exposed by the open, stat, etc. functions; rather, there's a completely parallel set of functions like CreateFile, GetFileAttributes, etc. And you have to call those if you want that information.
I'd prefer not to have to use win32com or another external module.
Well, you don't need win32com, because this is just Windows API, not COM.
But win32api is the easiest way to do it. It provides a nice wrapper around GetFileAttributesEx, which is the function you want to call.
If you don't want to use an external module, you can always call Windows API functions via ctypes instead. Or use subprocess to run command-line tools (like ATTRIB—or, if you prefer, like DIR /S /A-S to let Windows do the recursive-walk-skipping-system-files bit for you…).
The ctypes docs show how to call Windows API functions, but it's a little tricky the first time.
First you need to go to the MSDN page to find out what DLL you need to load (kernel32), and whether your function has separate A and W variants (it does), and what values to pass for any constants (you have to follow a link to another page, and know how C enums works, to find out that GetFileExInfoStandard is 0), and then you need to figure out how to define any structs necessary. In this case, something like this:
from ctypes import *
kernel = windll.kernel32
GetFileExInfoStandard = 0
GetFileAttributesEx = kernel.GetFileAttributesEx
GetFileAttributesEx.restype = c_int
GetFileAttributesEx.argypes = # ...
If you really want to avoid using win32api, you can do the work to finish the ctypes wrapper yourself. Personally, I'd use win32api.
Meanwhile:
The reason I want to do this is because I am using os.walk to copy files from one drive to another. If there was a way to walk the directory tree while ignoring system files that may work too.
For that case, especially given your complaint that checking each file was too slow, you probably don't want to use os.walk either. Instead, use FindFirstFileEx, and do the recursion manually. You can distinguish files and directories without having to stat (or GetFileAttributesEx) each file (which os.walk does under the covers), you can filter out system files directly inside the find function instead of having to stat each file, etc.
Again, the options are the same: use win32api if you want it to be easy, use ctypes otherwise.
But in this case, I'd take a look at Ben Hoyt's betterwalk, because he's already done 99% of the ctypes-wrapping, and 95% of the rest of the code, that you want.
Simply moving the file to ~/.Trash/ will not work, as if the file os on an external drive, it will move the file to the main system drive..
Also, there are other conditions, like files on external drives get moved to /Volumes/.Trash/501/ (or whatever the current user's ID is)
Given a file or folder path, what is the correct way to determine the trash folder? I imagine the language is pretty irrelevant, but I intend to use Python
Based upon code from http://www.cocoadev.com/index.pl?MoveToTrash I have came up with the following:
def get_trash_path(input_file):
path, file = os.path.split(input_file)
if path.startswith("/Volumes/"):
# /Volumes/driveName/.Trashes/<uid>
s = path.split(os.path.sep)
# s[2] is drive name ([0] is empty, [1] is Volumes)
trash_path = os.path.join("/Volumes", s[2], ".Trashes", str(os.getuid()))
if not os.path.isdir(trash_path):
raise IOError("Volume appears to be a network drive (%s could not be found)" % (trash_path))
else:
trash_path = os.path.join(os.getenv("HOME"), ".Trash")
return trash_path
Fairly basic, and there's a few things that have to be done seperatly, particularly checking if the filename already exist in trash (to avoid overwriting) and the actual moving to trash, but it seems to cover most things (internal, external and network drives)
Update: I wanted to trash a file in a Python script, so I re-implemented Dave Dribin's solution in Python:
from AppKit import NSURL
from ScriptingBridge import SBApplication
def trashPath(path):
"""Trashes a path using the Finder, via OS X's Scripting Bridge.
"""
targetfile = NSURL.fileURLWithPath_(path)
finder = SBApplication.applicationWithBundleIdentifier_("com.apple.Finder")
items = finder.items().objectAtLocation_(targetfile)
items.delete()
Usage is simple:
trashPath("/tmp/examplefile")
Alternatively, if you're on OS X 10.5, you could use Scripting Bridge to delete files via the Finder. I've done this in Ruby code here via RubyCocoa. The the gist of it is:
url = NSURL.fileURLWithPath(path)
finder = SBApplication.applicationWithBundleIdentifier("com.apple.Finder")
item = finder.items.objectAtLocation(url)
item.delete
You could easily do something similar with PyObjC.
A better way is NSWorkspaceRecycleOperation, which is one of the operations you can use with -[NSWorkspace performFileOperation:source:destination:files:tag:]. The constant's name is another artifact of Cocoa's NeXT heritage; its function is to move the item to the Trash.
Since it's part of Cocoa, it should be available to both Python and Ruby.
In Python, without using the scripting bridge, you can do this:
from AppKit import NSWorkspace, NSWorkspaceRecycleOperation
source = "path holding files"
files = ["file1", "file2"]
ws = NSWorkspace.sharedWorkspace()
ws.performFileOperation_source_destination_files_tag_(NSWorkspaceRecycleOperation, source, "", files, None)
The File Manager API has a pair of functions called FSMoveObjectToTrashAsync and FSPathMoveObjectToTrashSync.
Not sure if that is exposed to Python or not.
Another one in ruby:
Appscript.app('Finder').items[MacTypes::Alias.path(path)].delete
You will need rb-appscript gem, you can read about it here