AWS Lambda Python __pycache__ bytecode and local imports without layers

AWS Lambda Python __pycache__ bytecode and local imports without layers - python

When creating an AWS Lambda using Python:
Can the Lambda access local imports if the modules are included in the Lambda handler zip; and
What are the implications of including the __pycache__ directories in the zip?
Question 1: Can the runtime access local imports?
The AWS documentation focuses on the Lambda handler Python file containing the handler function itself. This is obviously must be included in the deployment zip. But we don't want one big function, or even several functions or classes in one big file.
If using the usual Python approach of creating sub-directories, containing modules and packages, included in the directory containing the handler itself, and included in the zip that is uploaded to the AWS Lambda handler, will that code be accessible at run-time, and therefore be importable by the Lambda handler?
I'm not referring to the AWS Lambda support for "layers", which is normally used for providing access to packages that are installed in a virtual environment with pip etc. I (think I) understand that support and am not asking about that.
I specifically just want to clarify: can the Lambda handler import from local files, for instance, an adjacent definitions.py being referenced by from definitions import * (please no judgements about don't import star :-) as long as it's also in the zip?
Question 2: Is it good practice to include the __pycache__ directories?
In the AWS Python Lambda deployment documentation the output of a zip command shows the packages included with the Lambda handler Python file, also including a __pycache__ directory. Additional libraries are also shown but it seems intended that these are collected from the layers.
The AWS documentation shows the inclusion of __pycache__ but no mention is made at all.
I believe AWS Lambda run-times are certain specific versions of Python running on AWS Linux images. I'm currently forced to develop on Windows :-(. Will this mismatch cause issues for the run-time? Would other considerations come into play such as ensuring included bytecode is the same version of Python as the runtime?
Doesn't Python expect bytecode for particular packages in a particular relative location? Presumably the top-level (in the zip) __pycache__ directory should only contain handler routine bytecode?
Does the Lambda runtime even use the __pycache__ directories? If this is workable and working, given the Lambda may only run once before being destroyed, does that imply that developers should put effort in to providing the bytecode in the Lambda zip to improve Lambda performance? In this case, is it necessary to run sufficient tests across the code before zipping it for Lambda, to ensure all the bytecode is generated?
Context
I have reviewed various articles on creating an AWS Lambda zip using Python, including the AWS documentation, but the content is shallow and simplistic, failing to clarify the precise "reach" of the runtime. It's not even in the AWS Lambda Handler Cookbook.
I do not (yet) have access to a live AWS environment to test this out, and given this broad omission in the on-line documentation and community commentaries (most articles just parrot the AWS documentation anyway, albeit with worse grammar), I thought it would be good to get a clarification on SO.

Related

django: taking DRY & code reuse to the next level?

In the django culture, I have encountered the concept of app reuse but not snippet reuse. Here is an example of what I mean by snippet reuse: I have a function getDateTimeObjFromString( sDateTime ), obviously you pass a string date time and it returns a python date-time object.
Back in the late 1980's or early 1990's, I was exposed to the idea of snippet reuse at a FoxPro developers conference. If you write code for a specific problem and find it is useful elsewhere in your project, move it to a project library. If you find that code is useful for other projects, move it to a generic library that can be accessed by all projects.
(At the FoxPro DevCon, they did not call it snippet reuse. I coined that term to make clear that I am referring to reuse of chunks of code smaller than an entire app. The FoxPro DevCon was long ago, I do not remember exactly what they called it.)
I read the most recent "Two Scoops of Django", and it does mention reusing snippets within a single project but I did not find any mention of the concept of snippet reuse across multiple projects.
I wrote and used getDateTimeObjFromString() long before I tackled my django app. It is in packages I keep under /home/Common/pyPacks. On my computers, I set PYTHONPATH=/home/Common/pyPacks, so every project can access the code there. The code for getDateTimeObjFromString() is under a Time subdirectory in a file named Convert.py. So to use the code in any project:
from Time.Convert import getDateTimeObjFromString
My django app downloads data from an API, and that data includes timestamps. It would be nice if the API sent python date time objects, but what you get are strings. Hence the utility of getDateTimeObjFromString().
This is just one example, there are many little functions under /home/Common/pyPacks that I found convenient to access in my django project.
Yes /home/Common/pyPacks are under version control in github and yes I deploy on any particular machine via git pull.
When working on my django project from a development computer, PYTHONPATH works, and I can import the packages. But then I tried running my django app on a server via wsgi.py -- PYTHONPATH is disabled. I can set PYTHONPATH both at the OS and Apache2 level, but python ignores it, the functions cannot import.
I do not want to bother with making my personal generic library an official python package under PyPI.
Does the django community expect me to copy and paste?
I arrived at a work around: make /home/Common/pyPacks a psudeo site-package by putting a "pyPks" symlink in the virtual environment's site-packages directory to /home/Common/pyPacks, adding "pyPks" to the INSTALLED_APPS, then changing all the import statements as follows:
original:
from Time.Convert import getDateTimeObjFromString
work around update:
from pyPks.Time.Convert import getDateTimeObjFromString
I also had to update all my generic library files to handle both absolute imports via PYTHONPATH and relative imports.
How to fix “Attempted relative import in non-package” even with init.py
Is there a better way to reuse snippets in a django project?

Python package management for AWS lambda function

I'm struggling to understand how best to manage python packages to get zipped up for an AWS lambda function.
In my project folder I have a number of .py files. As part of my build process I zip these up and use the AWS APIs to create and publish my lambda function supplying the zip file as part of that call.
Therefore, it is my belief that I need to have all the packages my lambda is dependant on within my project folder.
With that in mind, I call pip as follows:
pip install -t . tzlocal
This seems to fill my project folder with lots of stuff and I'm unsure if all of it needs to get zipped up into my lambda function deployment e.g.
.\pytz
.\pytz-2018.4.dist-info
.\tzlocal
...
...
First question - does all of this stuff need to be zipped up into my lambda?
If not, how do I get a package that gives me just the bits I need to go into my zip file?
Coming from a .Net / Node background - with the former I NuGet my package in and it goes into a nice packages folder containing just the .dll file I need which I then reference into my project.
If I do need all of these files is there a way to "put" them somewhere more tidy - like in a packages folder?
Finally, is there a way to just download the binary that I actually need? I've read the problem here is that the Lambda function will need a different binary to the one I use on my desktop development environment (Windows) so not sure how to solve that problem either.

Binary libraries used for example by numpy should be compiled on AWS Linux to work on lambda. I find this tutorial useful (https://serverlesscode.com/post/deploy-scikitlearn-on-lamba/). There is even newer version of it which uses docker container so you do not need EC2 instance for compilation and you can do everything locally.
As for the packages: the AWS docs says to install everything to the root, but you may install them all in a ./packages directory and append it to path in the beginning if the lambda handler code
import os
import sys
cwd = os.getcwd()
package_path = os.path.join(cwd, 'packages')
sys.path.append(package_path)

Using python in part of a Cocoa application - How do I package the .py with the application?

I'm writing a Cocoa application that uses Python to perform some calculations and data manipulation. I've got an Objective-C class that I'm using to run the Python scripts via the Python API. I can currently call Python with no problem using the API and linking to Python.framework.
I'm looking at how to package the code together now. My understanding is that the python code would be included as part of the .app bundle, possibly in the Resources folder. I've run into py2app being discussed many places, but it appears to be only used if your app is written wholly in Python; I don't think this is the solution to my problem. How do I properly package the code with my app? Can I send the .pyc instead of the .py file?

You can use py2app to compile an NSBundle which can be loaded at runtime (you could add this loadable bundle to your app bundle's PlugIns/ folder). However, while initially quite easy to get working, there appears to be a bug in PyObjC or py2app that leads to significant memory leaks depending on the API of your plugin (see http://sourceforge.net/tracker/?func=detail&aid=1982104&group_id=14534&atid=114534).
The harder but safer approach is to link against the Python.framework. You can then keep your .py files in the app bundle's Resources/ directory and load them via the standard CPython embedding API.
Don't include only the .pyc files. The pyc format is an implementation detail that you shouldn't rely upon for future Python versions.

Can I use zipimport to ship a embedded python?

Currently, I'm deploying a full python distribution (the original python 2.7 msi) with my app. Which is an embedded web server made with delphi.
Reading this, I wonder if is possible to embed the necessary python files with my app, to decrease load files and avoid conflict with several python versions.
I have previous experience with python for delphi so I only need to know if only shipping the python dll + zip with the distro + own scripts will work (and if exist any caveats I must know or a sample where I can look)

zipimport should work just fine for you -- I'm not familiar with Python for Delphi, but I doubt it disables that functionality (an embedding application can do that, but it's an unusual choice). Just remember that what you can zip up and import directly are the Python-coded modules (or just their corresponding .pyc or .pyo byte codes) -- DLLs (even if renamed as .pyds;-) need to be on disk to be loaded (so if you have a zipfile with them it will need to be unzipped at the start of the app, e.g. into a temporary directory).
Moreover, you don't even need to zip up all modules, just those you actually need (by transitive closure) -- and you can easily find out exactly which modules those are, with the modulefinder module of the standard Python library. The example on the documentation page I just pointed to should clarify things. Happy zipping!

Yes it is possible.
I'm actually writing automatisation script in Python with the Zipimport library. I actually included every .py files in my zip as well as configuration or xml files needed by those script.
Then, I call a .command file targeting a __main__.py class that redirect towards the desired script according to my sys.argv parameters which is really useful!

How to organise the file structure of my already working plugin system?

I am working on a project whose main design guiding principle is extensibility.
I implemented a plugin system by defining a metaclass that register - with a class method - the class name of any plugin that gets loaded (each type of plugin inherit from a specific class defined in the core code, as there are different types of plugins in the application). Basically this means that a developer will have to define his class as
class PieChart(ChartPluginAncestor):
# Duck typing:
# Implement compulsory methods for Plugins
# extending Chart functionality
and the main program will know of his presence because PieChart will be included in the list of registered plugins available at ChartPluginAncestor.plugins.
Being the mounting method a class method, all plugins get registered when their class code is loaded into memory (so even before an object of that class is instantiated).
The system works good enough™ for me (although I am always open to suggestions on how to improve the architecture!) but I am now wondering what would be the best way to manage the plugin files (i.e. where and how the files containing the plugins should be stored).
So far I am using - for developing purposes - a package that I called "plugins". I put all my *.py files containing plugins classes in the package directory, and I simply issue import plugins in the main.py file, for all the plugins to get mounted properly.
EDIT: Jeff pointed out in the comments that import plugins the classes contained in the various modules of the packages won't be readily available (I did not realise this as I was - for debugging purposes - importing each class separately with from plugins.myAI import AI).
However this system is only good while I am developing and testing the code, as:
Plugins might come with their own unittests, and I do not want to load those in memory.
All plugins are currently loaded into memory, but indeed there are certain plugins which are alternative versions of the same feature, so you really just need to know that you can switch between the two, but you want to load into memory just the one you picked from the config pane.
At some point, I will want to have a double location for installing plugins: a system-wide location (for example somewhere under /usr/local/bin/) and a user-specific one (for example somewhere under /home/<user>/.myprogram/).
So my questions are really - perhaps - three:
Plugin container: what is the most sensible choice for my goal? single files? packages? a simple directory of .py files?)
Recognise the presence of plugins without necessarily loading (importing) them: what is a smart way to use Python introspection to do so?
Placing plugins in two different locations: is there a standard way / best practice (under gnu/linux, at least) to do that?

The question is hard to address, because the needs are complex.
Anyway I will try with some suggestions.
About
Placing plugins in two different
locations: is there a standard way /
best practice (under gnu/linux, at
least) to do that?
A good approach is virtualenv. Virtualenv is a python module to build "isolated" python installation. It is the better way to get separate projects working together.
You get a brand new site-package where you can put your plugins with the relevant project modules.
Give it a try: http://pypi.python.org/pypi/virtualenv
Plugin container: what is the most
sensible choice for my goal? single
files? packages? a simple directory of
.py files?)
A good approach is a python package which can do a "self registration" upon import: simply define inside the package directory a proper init.py
An example can be http://www.qgis.org/wiki/Writing_Python_Plugins
and also the API described here http://twistedmatrix.com/documents/current/core/howto/plugin.html
See also http://pypi.python.org/pypi/giblets/0.2.1
Giblets is a simple plugin system
based on the component architecture of
Trac. In a nutshell, giblets allows
you to declare interfaces and discover
components that implement them without
coupling.
Giblets also includes plugin discovery
based on file paths or entry points
along with flexible means to manage
which components are enabled or
disabled in your application.

I also have a plugin system with three types of plugins, though I don't claim to have done it well. You can see some details here.
For internal plugins, I have a package (e.g., MethodPlugins) and in this package is a module for each plugin (e.g., MethodPlugins.IRV). Here is how I load the plugins:
Load the package (import MethodPlugins)
Use pkgutil.iter_modules to load all the modules there (e.g., MethodPlugins.IRV)
All the plugins descend from a common base class so I can use __subclassess__ to identify them all.
I believe this would allow you to recognize plugins without actually loading them, though I don't do that as I just load them all.
For external plugins, I have a specified directory where users can put them, and I use os.listdir to import them. The user is required to use the right base class so I can find them.
I would be interested in improving this as well, but it also works good enough for me. :)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

AWS Lambda Python pycache bytecode and local imports without layers - python

Related

django: taking DRY & code reuse to the next level?

Python package management for AWS lambda function

Using python in part of a Cocoa application - How do I package the .py with the application?

Can I use zipimport to ship a embedded python?

How to organise the file structure of my already working plugin system?

Categories

Resources