Choosing Python environment for Python execution in Openrefire

Choosing Python environment for Python execution in Openrefire - python

I recently install Openrefine and it's great, especially enjoying the python execution option.
Within the python execution, one can import additional packages, this can be seen in this example where the random package is imported.
Example of Openrefine python execution which returns random word out of the first 50 words
Now, I want to use a special package within the Openrefine tool, which is installed on one of my Conda environments. can I activate a particular Conda env that will be executed in Openrefine tool?

TL;DR: just wrap your Python package with FastAPI and communicate via HTTP requests.
OpenRefine and Jython
OpenRefine is using Jython, a Java implementation of Python.
Therefore, you can not "just" activate a conda environment, but you have to provide a Jython compatible package.
There is a tutorial in the OpenRefine wiki describing how to extend Jython with PyPi modules.
Please note that currently 2.7 is the newest Jython implementation. Jython 3 is still in it's planing and development phase. See Jython 3 Roadmap for details. This makes it difficult to use external libraries, as Python 2 had its end of life on 01.01.2020 and accordingly (most) libraries stopped supporting Python 2.
Also, some Python packages rely on C libraries that are not compatible with Jython. Check the Appendix A of the Definitive Guide to Jython for more details on using external tools and libraries.
Alternative solution using FastAPI
Personally I find it easier to just wrap the Python packages I want to use with FastAPI and communicate from OpenRefine via HTTP requests. Depending on your data you can then add new columns by fetching URLs or use GET/POST requests in Jython.
I recently created a GitHub Gist showing how to wrap the NER component of a spaCy model with FastAPI to be then used via OpenRefine.

Related

Which Python versions do I need to support as the upstream author of a Python library for Debian?

I'm the upstream- and Debian maintainer of a Python library which is installed as a dependency of a required package in the Debian ecosystem, so this library is pretty much installed on every Debian system. I'm trying to support older Python versions as long as possible and my rule of thumb for this package is to support versions down to whatever is currently in Debian/stable. However, there is also Debian/oldstable (even /oldoldstable) and I wonder if there are some guidelines that help to make the decision which Python versions can be dropped which should be supported still?
The most relevant documentation for the issue is probably the Debian Python Policy, but I'm unable to find the information here.

As far as Debian is concerned, the package you publish should support the currently supported Python versions. So if you published a package when oldstable was still stable, the package you published then should have supported the oldstable Python versions. When you publish something now, the current stable is your target.
But going back in time, there is no particular reason for an oldstable user to be disappointed if some current packages are not usable on their platform; the decision to use an old system is usually motivated by a desire to maintain it for a longer time, not make it do new things which were not possible at the time it was configured.
Of course, there are situations where you want to support older systems. If you publish a tool for forensic analysis of hacked systems, you really want to be able to run that on older versions, too; but this is dictated by other factors then.

Multiple Python Version Env Setup

I'm working in the VFX industry and we deal with different software packages that ship with their own Python interpreter. We are running on Linux and use modules to handle our environments to make sure that people are using the correct version of all applications depending on the project they are working on.
Since months, we are trying to setup an environment that supports multiple versions of Python. And what is blocking right now are additional Python libraries that we are using in our in-house tools, like sqlalchemy, psycopg2, openpyxl, zmq, etc.
So far, for each project, we have config file that defines the version of each package to be use including the additional Python modules. And to use the correct Python version of each Python module, we look up the main Python interpreter defined in that same modules definition file. This works as long as the major and minor versions of all Python interpreters do line up.
But now, I would like to start an application that ships with a Python 3.7 interpreter and another application with a Python 3.9 interpreter and so on. All applications do you use our in-house tools which need the additional Python modules. Of course, this fails when trying to import any additional module.
For now, the only solution that I see is to install the corresponding Python modules in the 'site-packages' of each application that comes with its own Python interpreter. That should work. But this means that we have to install for each application version all necessary Python modules (ideally, the same version of it to avoid compatibility issues) and when we decide to update one of them, this needs to be done again for all 3rd party applications.
That does not sound super-efficient to me.
Do you have similar experiences and what did you came up with to handle this? I know, that there are more advanced packages like rez to handle complex environment setups, but although I do not know the details of rez, I could imagine that the problems stays the same. I guess that it is not possible to globally populate PYTHONPATH with additional modules so that it works on multiple Python interpreter versions.
Another solution that I could imagine is to make sure that on startup of each application that needs additional Python modules, we do our own sys.path modification depending on the interpreter version. That would imply some coding but we could keep a global version handling without installing them everywhere.
Anyway, if you have any further hints, please let me know.
Greets,
Carlo

Using python project in c# application

I'm looking for a way to call my python project and display the console on my c# app.
My python project is a bit particular, i use some specific librairy, i even have to install a package whl not avalible with pip, avalible only with python3.7(not with 3.8) etc. So to run the project, i need python 3.7 exactly.
The problem is also that i want to deploy my c# app in ClickOnce to be use by clients, without having them to install a local python version.
I've seen on the net two ways to do work with python and c#, that both doesn't seem working for my need.
using python in c# app
1.Call python in a shell
I've imported the python project in my c# app, i've called the python.exe avalible in the venv and i've deployed the app. Everithing seems to work, but i discover that python executable in the venv refer to the local python installation and doesn't seem autonomus. So it was only working for me and not the clients.
Use IronPython in c#
Really far from working, starting by the encoding blowing from everywhere, even in imported librairy like numpy. I've referenced my venv librairy in SetSearchPaths(), and even with that, it's doesn't seem to work.
Any suggestions? The best way i think, would be to have a python.exe independant following the project that can load my venv.

Python.Included and the Numpy.NET package made with it may work for this:
https://github.com/henon/Python.Included
https://github.com/SciSharp/Numpy.NET
They are introduced in this post from 2019 and seem to be in active development today, https://medium.com/scisharp/using-python-libraries-in-net-without-a-python-installation-11124d6190cf
That solution does not use IronPython, but Python.NET: https://github.com/pythonnet/pythonnet
IronPython is an implementation of Python in C#, championed by others and then Microsoft itself back in 2010 era. Later Microsoft dropped it and largely the whole notion of supporting dynamic languages on .NET (they made a DLR system for it back then). It is very cool and works for pure Python code.
But NumPy and many useful and popular Python modules are written in C, using the C API of Python.org 's default C written Python implementation, aka. CPython. This is what Microsoft decided to back too instead, because C written Python modules don't work (easily and well) with IronPython. Also IronPython remains at Python 2.7.
Python.NET just bridges the .NET land with the normal CPython interpreter, so that you can call code cross the language boundary. So Numpy and everything works the same as with Python usually. Which is what you want.
Python.Included is one way to deploy that in C# projects - which may or may not work for you, but at least provides a starting point:
Python.Included is an automatic deployment mechanism for .NET packages
which depend on the embedded Python distribution. This allows
libraries depending on Python and/or Python packages to be deployed
via Nuget without having to worry about any local Python
installations.
It packages embedded Python (python-3.7.3-embed-amd64.zip) in its .NET
assembly and automatically deploys it in the user's home directory
upon first execution. On subsequent runs, it will find Python already
deployed and therefor doesn't install it again.
If you don't want to use that install mechanism, I think you can just bundle the CPython interpreter with your C# application and use the Python.NET mechanism to call that from your app's directory instead.

Python library deployment

I have created a python library using sklearn and some other dependencies. I want other developers to be able to use it in their programs, in a non-public environment(e.g.within a organization) They will use this library for to write their own applications.
Some questions that I have are -
What is a best way to make it available to other developers?
Let's say , the developers have their own python installation, and they use a version 1.x of a package(e.g. sklearn etc) but my
package uses 2.x, will there be a problem? If yes, how can i ensure they
can use my library.
I want to make my library available for both Python 2.7 and 3.x users. Do I need two different deployments? Currently my library
works(no version specific calls for 2.7/3.x) in both 2.7 and 3.x, if
the correct dependencies are pre-installed by the user.

The best way is to publish at PyPI. That way your user have ti just run pip install $LIB and got it with all dependencies (if you properly configured dependencies). See Python Packaging User Guide.
Just recommend your users to use virtualenv. Virtual environments are the way to separate and install different versions of Python libraries and programs to coexist at one system.
Very much depends on the nature of your library. There are libraries that can be installed to both Python 2 and 3 from one source and there are libraries that require different package for every python version.

Embed Python's grpcio module into a Bazel project

I tried several different ways to embed the Python grpcio module into my Bazel project but unfortunately, none of them works correctly.
As far as I know, Bazel does not support injection of plugins into the Python environment (so you can directly run import grpcio). Note that Bazel does not support virtual-env and other tools (buildout...).
I found a way to include Pypi packages thanks to this proof of concept but unfortunately it does not work for grpc.io (environment variables are missing).
I am trying to debug the plugin, but I wonder if there is a better way to include grpcio module since the code is based on Bazel?

As Nathaniel mentioned in the comments, bazel support for gRPC Python is still work in progress.
However, pubref https://github.com/pubref/rules_protobuf offers rules for bazel that support gRPC for all languages (including Python). I have used them for Java and they worked very well.
There is also a gRPC blog post about those rules: http://www.grpc.io/blog/bazel_rules_protobuf

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.