How do I cache files within a function app?

How do I cache files within a function app? - python

I am trying to temporarily store files within a function app. The function is triggered by an http request which contains a file name. I first check if the file is within the function app storage and if not, I write it into the storage.
Like this:
if local_path.exists():
file = json.loads(local_path.read_text('utf-8'))
else:
s = AzureStorageBlob(account_name=account_name, container=container,
file_path=file_path, blob_contents_only=True,
mi_client_id=mi_client_id, account_key=account_key)
local_path.write_bytes(s.read_azure_blob())
file = json.loads((s.read_azure_blob()).decode('utf-8'))
This is how I get the local path
root = pathlib.Path(__file__).parent.absolute()
file_name = pathlib.Path(file_path).name
local_path = root.joinpath('file_directory').joinpath(file_name)
If I upload the files when deploying the function app everything works as expected and I read the files from the function app storage. But if I try to save or cache a file, it breaks and gives me this error Error: [Errno 30] Read-only file system:
Any help is greatly appreciated

So after a year I discovered that Azure function apps can utilize the python os module to save files which will persist throughout calls to the function app. No need to implement any sort of caching logic. Simply use os.write() and it will work just fine.
I did have to implement a function to clear the cache after a certain period of time but that was a much better alternative to the verbose code I had previously.

Related

How to create a MoviePipelineMasterConfig instance from file path in Python?

I’m trying to build a movie render queue job and assign it a pre-saved output config. The file is saved under Content\Cinematics\MoviePipeline\Presets\myConfig.uasset.
My guess would be to use the unreal.MoviePipelineExecutorJob.set_configuration(preset) method. But how do I get an instance of MoviePipelineMasterConfig from a file path to apply to the job within a Python script?
Thanks.

So I found the answer I was looking for.
newConfig = unreal.load_asset( "/Game/Cinematics/MoviePipeline/Presets/myConfig" )
Will create an instance of MoviePipelineMasterConfig from an existing file that can be used within Python scripts.

Unit-testing: Mocking a subprocess running "aws s3 sync" with Python

My project needs to download quite a few files regularly before doing treatment on them.
I tried coding it directly in Python but it's horribly slow considering the amount of data in the buckets.
I decided to use a subprocess running aws-cli because boto3 still doesn't have a sync functionality. I know using a subprocess with aws-cli is not ideal, but it really is useful and works extremely well out of the box.
One of the perks of aws-cli is the fact that I can see the progress in stdout, which I am getting with the following code:
def download_bucket(bucket_url, dir_name, dest):
"""Download all the files from a bucket into a directory."""
path = Path(dest) / dir_name
bucket_dest = str(os.path.join(bucket_url, dir_name))
with subprocess.Popen(["aws", "s3", "sync", bucket_dest, path], stdout=subprocess.PIPE, bufsize=1, universal_newlines=True) as p:
for b in p.stdout:
print(b, end='')
if p.returncode != 0:
raise subprocess.CalledProcessError(p.returncode, p.args)
Now, I want to make sure that I test this function but I am blocked here because:
I don't know the best way to test this kind of freakish behavior:
Am I supposed to actually create a fake local s3 bucket so that aws s3 sync can hit it?
Am I supposed to mock the subprocess call and not actually call my download_bucket function?
Until now, my attempt was to create a fake bucket and to pass it to my download_bucket function.
This way, I thought that aws s3 sync would still be working, albeit locally:
def test_download_s3(tmpdir):
tmpdir.join(f'frankendir').ensure()
with mock_s3():
conn = boto3.resource('s3', region_name='us-east-1')
conn.create_bucket(Bucket='cool-bucket.us-east-1.dev.000000000000')
s3 = boto3.client('s3', region_name="us-east-1")
s3.put_object(Bucket='cool-bucket.us-east-1.dev.000000000000', Key='frankendir', Body='has no files')
body = conn.Object('cool-bucket.us-east-1.dev.000000000000', 'frankendir').get()[
'Body'].read().decode("utf-8")
download_bucket('s3://cool-bucket.us-east-1.dev.000000000000', 'frankendir', tmpdir)
#assert tmpdir.join('frankendir').join('has not files').exists()
assert body == 'has no files'
But I get the following error fatal error: An error occurred (InvalidAccessKeyId) when calling the ListObjects operation: The AWS Access Key Id you provided does not exist in our records.
My questions are the following:
Should I continue to pursue this creation of a fake local s3 bucket?
If so, how am I supposed to get the credentials to work?
Should I just mock the subprocess call and how?
I am having a hard time understanding how mocking works and how it's supposed to be done. From my understanding, I would just fake a call to aws s3 sync and return some files?
Is there another kind of unit test that would be enough that I didn't think of?
After all, I just want to know if when I transmit a well-formed s3://bucketurl, a dir in that bucket and a local dir, the files contained within the s3://bucketurl/dir are downloaded to my local dir.
Thank you for your help, I hope that I am not all over the place.

A much better approach is to use moto when faking / testing s3. You can check out their documentation or look at a test code example I did: https://github.com/pksol/pycon-go-beyond-mocks/blob/main/test_s3_fake.py.
If you have a few minutes, you can view this short video of me explaining the benefits of using moto vs trying to mock.

Using the paste "call" scheme when starting a Pyramid application

I have a Pyramid application which I can start using pserve some.ini. The ini file contains the usual paste configuration and everything works fine. In production, I use uwsgi, having a paste = config:/path/to/some.ini entry, which works fine too.
But instead of reading my configuration from a static ini file, I want to retrieve it from some external key value store. Reading the paste documentation and source code, I figured out, that there is a call scheme, which calls a python function to retrieve the "settings".
I implemented some get_conf method and try to start my application using pserve call:my.module:get_conf. If the module/method do not exist, I get an appropriate error, so the method seems to be used. But whatever I return from the method, I end up with this error message:
AssertionError: Protocol None unknown
I have no idea what return value of the method is expected and how to implement it. I tried to find documentation or examples, but without success. How do I have to implement this method?

While not the answer to your exact question, I think this is the answer to what you want to do. When pyramid starts, your ini file vars from the ini file just get parsed into the settings object that is set on your registry, and you access them through the registry from the rest of your app. So if you want to get settings somewhere else (say env vars, or some other third party source), all you need to do is build yourself a factory component for getting them, and use that in the server start up method that is typically in your base _ _ init _ _.py file. You don't need to get anything from the ini file if that's not convenenient, and if you don't, it doesn't matter how you deploy it. The rest of your app doesn't need to know where they came from. Here's an example of how I do this for getting settings from env vars because I have a distributed app with three separate processes and I don't want to be mucking about with three sets of ini files (instead I have one file of env vars that doesn't go in git and gets sourced before anything gets turned on):
# the method that runs on server startup, no matter
# how you deploy.
def main(global_config, **settings):
""" This function returns a Pyramid WSGI application."""
# settings has your values from the ini file
# go ahead and stick things it from any source you'd like
settings['db_url'] = os.environ.get('DB_URL')
config = Configurator(
settings=settings,
# your other configurator args
)
# you can also just stick things directly on the registry
# for other components to use, as everyone has access to
# request.registry.
# here we look in an env var and fall back to the ini file
amqp_url = os.environ.get('AMQP_URL', settings['amqp.url'] )
config.registry.bus = MessageClient( amqp_url=amqp_url )
# rest of your server start up code.... below

unable to delete temporary file with python

I am using django views, I create a temp_dir using tempfile.gettempdir().
I write a gzipped text file in there, and then scp the file elsewhere. When these tasks are complete I try to delete the temp_dir.
if os.path.exists( temp_dir ):
shutil.rmtree( temp_dir )
However, occasionally I get this error back:
Operation not permitted: '/tmp/.ICE-unix'
Any ideas what this error means and how to best handle this situation?

tempfile.gettempdir() does not create a temp directory - it returns your system's standand tmp directory. DO NOT DELETE IT! That will blow everybody's temp files away. You can delete the file you created inside the temp dir, or you can create your own temp dir, but leave this one alone.

The value for temp_dir is taken from the OS environment variables, and apparently some other process is also using it to create files. The other file might be in use/locked and that will prevent you from deleting it.
Q: What is /tmp/.ICE-unix ?
A: Its a directory where X-windows session information is saved.

I am no expert but try running the python program or what your using to do this as an administrator then it will most likely allow this process to be done...

How do I queue FTP commands in Twisted?

I'm writing an FTP client using Twisted that downloads a lot of files and I'm trying to do it pretty intelligently. However, I've been having the problem that I'll download several files very quickly (sometimes ~20 per batch, sometimes ~250) and then the downloading will hang, only to eventually have connections time out and then the download and hang start all over again. I'm using a DeferredSemaphore to only download 3 files at a time, but I now suspect that this is probably not the right way to avoid throttling the server.
Here is the code in question:
def downloadFiles(self, result, directory):
# make download directory if it doesn't already exist
if not os.path.exists(directory['filename']):
os.makedirs(directory['filename'])
log.msg("Downloading files in %r..." % directory['filename'])
files = filterFiles(None, self.fileListProtocol)
# from http://stackoverflow.com/questions/2861858/queue-remote-calls-to-a-python-twisted-perspective-broker/2862440#2862440
# use a DeferredSemaphore to limit the number of files downloaded simultaneously from the directory to 3
sem = DeferredSemaphore(3)
jobs = [sem.run(self.downloadFile, f, directory) for f in files]
d = gatherResults(jobs)
return d
def downloadFile(self, f, directory):
filename = os.path.join(directory['filename'], f['filename']).encode('ascii')
log.msg('Downloading %r...' % filename)
d = self.ftpClient.retrieveFile(filename, FTPFile(filename))
return d
You'll noticed that I'm reusing an FTP connection (active, by the way) and using my own FTPFile instance to make sure the local file object gets closed when the file download connection is 'lost' (ie completed). Looking at FTPClient I wonder if I should be using queueCommand directly. To be honest, I got lost following the retrieveFile command to _openDataConnection and beyond, so maybe it's already being used.
Any suggestions? Thanks!

I would suggest using queueCommand, as you suggested I'd suspect the semaphore you're using is probably causing you issues. I believe using queueCommand will limit your FTPClient to a single active connection (though I'm just speculating), so you may want to think about creating a few FTPClient instances and passing download jobs to them if you want to do things quickly. If you use queueStringCommand, you get a Deferred that you can use to determine where each client is up to, and even add another job to the queue for that client in the callback.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I cache files within a function app? - python

Related

How to create a MoviePipelineMasterConfig instance from file path in Python?

Unit-testing: Mocking a subprocess running "aws s3 sync" with Python

Using the paste "call" scheme when starting a Pyramid application

unable to delete temporary file with python

How do I queue FTP commands in Twisted?

Categories

Resources