Patching over local JSON file in unit testing - python

I have some Python code that loads in a local JSON file:
with open("/path/to/file.json") as f:
json_str = f.read()
# Now do stuff with this JSON string
In testing, I want to patch that JSON file to be a JSON file located in my repo's test directory ("/path/to/repo/test/fake_file.json").
How can I go about doing that?
One other requirement is I actually have a version of "/path/to/file.json" locally, but I don't want to change it. I want it patched over at test time, and unpatched upon test completion.
Note: I use pytest, and it seems like the plug-in pyfakefs would do this. Sadly, I can't figure out how to get it to patch in another local file (from within my repo's test directory). I am open to solutions using vanilla Python 3.10+ and/or pyfakefs.

With pyfakefs, you can map real files into the fake file system. In your case, you can use add_real_file:
def test_json(fs):
fs.add_real_file("/path/to/repo/test/fake_file.json",
target_path="/path/to/file.json")
assert os.path.exists("/path/to/file.json")
This will map your existing file into target_path in the fake file system (if target_path is not given, it will map it to the same location as the source file).
It does not matter if there is a real file at the same location, as the real file system will be ignored in the fake file system. If you read "/path/to/file.json" in your test code, it will actually read "/path/to/repo/test/fake_file.json" (mapped files are only read on demand).
Note that by default the file is mapped read only, so if you want to change it in your tested code, you have to set read_only=False in the mapping call. This will make the file in the fake file system writable, though writing to it will not touch the file in the real file system, of course.
Disclaimer:
I'm a contributor to pyfakefs.

Related

A file when you run creates a replica of itself and deletes the original file. (Python)

I tried creating a code where the file, when you run creates a replica of itself and deletes the original file.
Here is my code:
import shutil
import os
loc=os.getcwd()
shutil.move("./aa/test.py", loc, copy_function=shutil.copy2)
But the issue with this is that:
this code is only 1 time usable and to use it again, I need to change the name of the file or delete the newly created file and then run it again.
Also, If I run it inside a folder, It will always create the new file outside the folder (in a dir up from the exceuting program).
How Do I fix this?
Some Notes:
The copy should be made at the exact place where the original file was.
The folder was empty, just having this file. The file doesn't needs to be in a folder but I just used it as a test instance.
Yes, I understand that if I delete the original file it should stop working. I actually have a figure in my mind of how It should work:
First, a new file with the exact same content in it will be made > in the same path as the original file (with a different name probably).
Then, the original file will be deleted and the 2nd file (which is > the copy of the original file) will be renamed as the exact name and > extension as of the original file which got deleted.
This thing above should repeat every time I run the .py file (containing this code) thus making this code portable and suitable for multiple uses.
Maybe the code to be executed after the file deletion can be stored in memory cache (I guess?).
Easiest way (in pseudo code):
Get name of current script.
Read all contents in memory.
Delete current script.
Write memory contents into new file with the same name.
this code is only 1 time usable and to use it again, I need to change the name of the file or delete the newly created file and then run it again.
That is of course because the file is called differently. You could approach this by having no other files in that folder, or always prefixing the filename in the same way, so that you can find the file although it always is called differently.
Also, If I run it inside a folder, It will always create the new file outside the folder (in a dir up from the exceuting program).
That is because you move it from ./aa to ./. You can take the path of the file and reuse it, apart for the filename, and then it would be in the same folder.
Hey TheKaushikGoswami,
I believe your code does exactly what you told him to and as everybody before me stated, surely only works once. :)
I would like to throw in another idea:
First off I'd personally believe that shutil.move is more of a method for actually moving a file into another directory, as you did by accident.
https://docs.python.org/3/library/shutil.html#shutil.move
So why not firstly parameterize your folder (makes it easier for gui/cmd access) and then just copy to a temporary file and then copying from that temporary file. That way you wont get caught in errors raised if you try to create files already existing and have an easy-to-understand code!
Like so:
import shutil
import os
try:
os.mkdir('./aa/')
except:
print('Folder already exists!')
dest= './aa/'
file= 'test.py'
copypath = dest + 'tmp' + file
srcpath = dest + file
shutil.copy2(srcpath, copypath, follow_symlinks=True)
os.remove(srcpath)
shutil.copy2(copypath, srcpath, follow_symlinks=True)
os.remove(copypath)
But may I ask what your use-case is for that since it really doesn't change anything for me other than creating an exact same file?

How to save a pickle file in python in a physical folder

I am new to using pickle in python.
This is how I dump data in a pickle file
result = "123"
pickle.dump(result, open("result.p","wb"))
I am able to read this file using
pickle.load(open("result.p", "rb"))
Now what I am not able to see is the physical pickle file result.p after I dump data in to it.
My question is: where does this pickle file get stored after serialized data is dumped in to it.
You can do one of the two things:
1) Find out what is the working directory for your program / Python interpreter session. This is where this file is created.
One of the ways to do it (from the Python interpreter):
import os
os.getcwd()
2) Provide full path to the "open" function instead of only the filename part, in which case it would create the file where you specify it to be created.

Render hg.tar into in-memory tree-structure after getting back from MongoDB

I am working on a project where I have to store .hg directories. The easiest way is to pack the .hg into hg.tar. I save it in MongoDB's GridFS filesystem.
If I go with this plan, I have to read the tar out.
import tarfile, cStringIO as io
repo = get_repo(saved_repo.id)
ios = io.StringIO()
ios.write(repo.hgfile.read())
ios.seek(0)
tar = tarfile.open(mode='r', fileobj=ios)
members = tar.getmembers()
#for info in members:
# tar.extract(info.name, '/tmp')
for file in members:
print file.name, file.isdir()
This is a working code. I can get all the files and directories names as the loop continues.
My question is how do I extract this tar into a valid, file-system like directory. I can .extractfile individually into memory, but if I want to feed into Mercurial API, I probably need the entire directory as in a single DIRECTORY .hg in memory like how they exist in the filesystem.
Thoughts?
Mercurial has a concept called opener that's used to abstract filesystem access. I first looked at http://hg.intevation.org/mercurial/crew/file/tip/mercurial/revlog.py to see if you can replace the revlog class (which is the base class for changelog, manifest log and filelogs), but recent versions of Mercurial also have a VFS abstraction layer. It can be found in http://hg.intevation.org/mercurial/crew/file/8c64c4af21a4/mercurial/scmutil.py#l202 and is used by the localrepo.localrepository class for all file access.

Store user defined data after inputted

I am making a python program, and I want to check if it is the users first time running the program (firstTime == True). After its ran however, I want to permanently change firstTime to False. (There are other variables that I want to take input for that will stay if it is the first run, but that should be solved the same way).
Is there a better way then just reading from a file that contains the data? If not, how can I find where the file is being ran from (so the data will be in the same dir)?
If you want to persist data, it will "eventually" be to disk files (though there might be intermediate steps, e.g. via a network or database system, eventually if the data is to be persistent it will be somewhere in disk files).
To "find out where you are",
import os
print os.path.dirname(os.path.abspath(__file__))
There are variants, but this is the basic idea. __file__ in any .py script or module gives the file path in which that file resides (won't work on the interactive command line, of course, since there's no file involved then;-).
The os.path module in Python's standard library has many useful function to manipulate path strings -- here, we're using two: abspath to give an absolute (not relative) version of the file's path, so you don't have to care about what your current working directory is; and dirname to extract just the directory name (actually, the whole directory path;-) and drop the filename proper (you don't care if the module's name is foo.py or bar.py, only in what directory it is;-).
It is enough to just create file in same directory if program is run first time (of course that file can be deleted to do stuff for first run again, but that can be sometimes usefull):
firstrunfile = 'config.dat'
if not os.path.exists(firstrunfile):
## configuration here
open(firstrunfile,'w').close() ## .write(configuration)
print 'First run'
firstTime == True
else:
print 'Not first run'
## read configuration
firstTime == False

How to safely write to a file?

Imagine you have a library for working with some sort of XML file or configuration file. The library reads the whole file into memory and provides methods for editing the content. When you are done manipulating the content you can call a write to save the content back to file. The question is how to do this in a safe way.
Overwriting the existing file (starting to write to the original file) is obviously not safe. If the write method fails before it is done you end up with a half written file and you have lost data.
A better option would be to write to a temporary file somewhere, and when the write method has finished, you copy the temporary file to the original file.
Now, if the copy somehow fails, you still have correctly saved data in the temporary file. And if the copy succeeds, you can remove the temporary file.
On POSIX systems I guess you can use the rename system call which is an atomic operation. But how would you do this best on a Windows system? In particular, how do you handle this best using Python?
Also, is there another scheme for safely writing to files?
If you see Python's documentation, it clearly mentions that os.rename() is an atomic operation. So in your case, writing data to a temporary file and then renaming it to the original file would be quite safe.
Another way could work like this:
let original file be abc.xml
create abc.xml.tmp and write new data to it
rename abc.xml to abc.xml.bak
rename abc.xml.tmp to abc.xml
after new abc.xml is properly put in place, remove abc.xml.bak
As you can see that you have the abc.xml.bak with you which you can use to restore if there are any issues related with the tmp file and of copying it back.
If you want to be POSIXly correct and save you have to:
Write to temporary file
Flush and fsync the file (or fdatasync)
Rename over the original file
Note that calling fsync has unpredictable effects on performance -- Linux on ext3 may stall for disk I/O whole numbers of seconds as a result, depending on other outstanding I/O.
Notice that rename is not an atomic operation in POSIX -- at least not in relation to file data as you expect. However, most operating systems and filesystems will work this way. But it seems you missed the very large linux discussion about Ext4 and filesystem guarantees about atomicity. I don't know exactly where to link but here is a start: ext4 and data loss.
Notice however that on many systems, rename will be as safe in practice as you expect. However it is in a way not possible to get both -- performance and reliability across all possible linux confiugrations!
With a write to a temporary file, then a rename of the temporary file, one would expect the operations are dependent and would be executed in order.
The issue however is that most, if not all filesystems separate metadata and data. A rename is only metadata. It may sound horrible to you, but filesystems value metadata over data (take Journaling in HFS+ or Ext3,4 for example)! The reason is that metadata is lighter, and if the metadata is corrupt, the whole filesystem is corrupt -- the filesystem must of course preserve it self, then preserve the user's data, in that order.
Ext4 did break the rename expectation when it first came out, however heuristics were added to resolve it. The issue is not a failed rename, but a successful rename. Ext4 might sucessfully register the rename, but fail to write out the file data if a crash comes shortly thereafter. The result is then a 0-length file and neither orignal nor new data.
So in short, POSIX makes no such guarantee. Read the linked Ext4 article for more information!
In Win API I found quite nice function ReplaceFile that does what name suggests even with optional back-up. There is always way with DeleteFile, MoveFile combo.
In general what you want to do is really good. And I cannot think of any better write scheme.
A simplistic solution. Use tempfile to create a temporary file and if writing succeeds the just rename the file to your original configuration file.
Note that rename is not atomic across filesystems. You'll have to resort to a slight workaround (e.g. tempfile on target filesystem, followed by a rename) in order to be really safe.
For locking a file, see portalocker.
The standard solution is this.
Write a new file with a similar name. X.ext# for example.
When that file has been closed (and perhaps even read and checksummed), then you two two renames.
X.ext (the original) to X.ext~
X.ext# (the new one) to X.ext
(Only for the crazy paranoids) call the OS sync function to force dirty buffer writes.
At no time is anything lost or corruptable. The only glitch can happen during the renames. But you haven't lost anything or corrupted anything. The original is recoverable right up until the final rename.
Per RedGlyph's suggestion, I'm added an implementation of ReplaceFile that uses ctypes to access the Windows APIs. I first added this to jaraco.windows.api.filesystem.
ReplaceFile = windll.kernel32.ReplaceFileW
ReplaceFile.restype = BOOL
ReplaceFile.argtypes = [
LPWSTR,
LPWSTR,
LPWSTR,
DWORD,
LPVOID,
LPVOID,
]
REPLACEFILE_WRITE_THROUGH = 0x1
REPLACEFILE_IGNORE_MERGE_ERRORS = 0x2
REPLACEFILE_IGNORE_ACL_ERRORS = 0x4
I then tested the behavior using this script.
from jaraco.windows.api.filesystem import ReplaceFile
import os
open('orig-file', 'w').write('some content')
open('replacing-file', 'w').write('new content')
ReplaceFile('orig-file', 'replacing-file', 'orig-backup', 0, 0, 0)
assert open('orig-file').read() == 'new content'
assert open('orig-backup').read() == 'some content'
assert not os.path.exists('replacing-file')
While this only works in Windows, it appears to have a lot of nice features that other replace routines would lack. See the API docs for details.
There's now a codified, pure-Python, and I dare say Pythonic solution to this in the boltons utility library: boltons.fileutils.atomic_save.
Just pip install boltons, then:
from boltons.fileutils import atomic_save
with atomic_save('/path/to/file.txt') as f:
f.write('this will only overwrite if it succeeds!\n')
There are a lot of practical options, all well-documented. Full disclosure, I am the author of boltons, but this particular part was built with a lot of community help. Don't hesitate to drop a note if something is unclear!
You could use the fileinput module to handle the backing-up and in-place writing for you:
import fileinput
for line in fileinput.input(filename,inplace=True, backup='.bak'):
# inplace=True causes the original file to be moved to a backup
# standard output is redirected to the original file.
# backup='.bak' specifies the extension for the backup file.
# manipulate line
newline=process(line)
print(newline)
If you need to read in the entire contents before you can write the newline's,
then you can do that first, then print entire new contents with
newcontents=process(contents)
for line in fileinput.input(filename,inplace=True, backup='.bak'):
print(newcontents)
break
If the script ends abruptly, you will still have the backup.

Categories

Resources