I'm trying to implement the python DB-API for a small "database" that we built internally. This database does not expose an ODBC interface (or JDBC for that matter). My goal is to create a sqlalchemy for this so that I can use it with an application like Superset for example. I have created JDBC drivers in the past and that requires full Java implementation of the methods from the interfaces. In case of Python's DB-API, I couldn't find any example. Even the one I saw with psycopg2 https://github.com/psycopg/psycopg2 is fully written in C and I'm not an expert on C.
Any way to implement the DB-API only in python? Is there any examples available? (Sorry if my understanding of db-api is not correct.)
You can find plenty of DB-API drivers written in Python. The specific libraries depend on how your database communicates and packs/unpacks data.
If you're database is listening on a port, you'll probably be using the socket module.
If you're doing any kind of batch inserting or unpacking, you want to check out the struct module as well.
If you don't need support for Python 2, with Python 3.3+ you get memoryview().cast(), which may also come handy with regard to unpacking data.
Python 3.8 comes with the shared memory module, which can help you out when you start optimizing.
If your database runs on a specific platform, ctypes comes handy for pulling out OS specific tweaks (Like manually implementing shared memory if you can't use Python 3.8).
Pandas used to support DB-API connections directly. It currently only supports the SQlite DB-API officially, but you can piggyback it, which will allow you to test yourself with a known tool.
Related
In order to work with HANA in python, I've always used the pretty old python 2.6.4 that came with the HANA client. There are ways to get it working with other python versions, too, but they seem to by very hacky.
Now I've found the very promising-looking PyHDB project on Github, which seems to do the same job being installed more easy and working with newer Python versions, too.
Which features won't work with PyHDB that worked with the HANA python?
Are there performance drawbacks?
Edit:
Here is what I found using the module globals:
PyHDB is more thread-safe (threads may share connections here)
the parameter style is different (PyHDB: format; hdbcli: qmark, named)
Here an update on the current situation:
pyhdb supports Python 2.7, 3.3, 3.4, 3.5 and also PyPy on Linux, OSX
and Windows.
hdbcli has support for python 2.7 and python 3.4+ and is is fully supported and pushed by SAP.
Regarding performance:
executemany is an order of magnitude faster with hdbcli
The following blog post contains some further information:
https://blogs.sap.com/2017/07/26/sap-hana-2.0-sps02-new-feature-updated-python-driver/
The PyHDB is the younger brother of the node-hdb, both implementing the same protocol as ODBC based Python package hdbcli. The hdbcli Python client is a part of standard HANA installation but it does not belong to the list of officially supported interfaces for building HANA applications, to my knowledge.
The orientation for building PyHDB was therefore not the hdbcli client but the above mentioned protocol specification and the existing node-hdb implementation.
The detailed hdbcli/PyHDB cross-comparison is therefore not currently available but looking into the protocol specification and READMEs of PyHDB and node-hdb helps getting insight into connectors' features and the current coverage of the specification by both implementations.
Among these three HANA connectors, hdbcli, node-hdb and PyHDB, the PyHDB, being the youngest one, offers atm. the least features and capabilities, missing some features already available in node-hdb, like authentication methods or prepared statements for example. Looking into node-hdb source helps getting the rough estimate of efforts required for building the same in Python.
Rather than waiting for the full coverage of the protocol, which could take a while, the PyHDB is released "as is", but open for receiving input and requirements from projects and offering new features in that direction.
No performance drawbacks discovered so far.
Perhaps worth of mentioning here, the direct connectivity from Python and nodejs (GO should come soon) is also possible with ABAP, via SAP RFC protocol, via PyRFC and node-rfc connectors, which are counterparts of standard ABAP RFC connectors, available for Java and .NET as well.
pyhdb was deprecated in May 2021 and is no longer maintained. You can find its GitHub repo is archived (https://github.com/SAP-archive/PyHDB) and in read-only mode. Any open issues and pull requests were closed by the SAP maintainer #bsrdjan. hdbcli is now the recommended Python package for interacting with SAP HANA databases, although it's closed source with less documentation.
I recommend using sqlalchemy-hana, which is a SQLAlchemy dialect for SAP HANA databases. It can use hdbcli or pyhdb with a one-line change to the configuration, so you don't need to track down database driver documentation.
I want to use neo4j for large graph querying (12M nodes, 1.5G relations). I have tested the performance, and using cypher it's unsatisfactory as for the web-server.
Since java API query is >10x faster than the cypher, I want to write necessary query functions in java and handle everything in Python (my website backend is written in Python).
Can you give me some hints how to approach the problem of running Java functions within Python?
I have managed to run JPype. But only one java instance can access the embedded database. For that, py4j using background JVM process may be better solution. Yet, I cannot make py4j working. Do you have an experience with py4j? Or Python and neo4j java API?
The best way to achieve this is writing unmanaged extensions to your Neo4j server in Java. These unmanaged extensions expose new REST endpoints which can then be called from python via http.
A while ago I've created a kind of template for unmanaged extensions using gradle as build system.
I am looking for a pure-python SQL library that would give access to both MySQL and PostgreSQL.
The only requirement is to run on Python 2.5+ and be pure-python, so it can be included with the script and still run on most platforms (no-install).
In fact I am looking for a simple solution that would allow me to write SQL and export the results as CSV files.
Two part answer:
A) This is absolutely possible.
B) Depending on your exact concerns, a pure-python may or may not be a good approach to your problem.
Explained:
The SqlAlchemy library comes with two components : the more popular "ORM" , and the "Core" which it sits on top of. Either one will let you write your SQL commands in the SqlAlchemy format (which is just Python); SqlAlchemy will then compile the statements to Mysql or PostgreSQL and connect to the appropriate database.
SqlAlchemy is a great library, and I recommend it for just about everything. While you do have to write your statements in their format, it's easy to pick up -- and you can switch to virtually any underlying database library you want... at any time. It's the perfect platform to use in any database project , whether or not you need to support multiple backends.
SqlAlchemy talks to the database via the standard DBAPI drivers, and does support multiple options for pure python, notably the pymysql and pypostgresql drivers ( http://docs.sqlalchemy.org/en/latest/core/engines.html#supported-dbapis )
As for writing csv, the standard library has you covered.
import csv
So the caveat?
The following may or may not apply to your situation:
Most higher level db modules in the python universe are still recommending mysql-python and psycopg - both of which are not pure-python and compile/link/configure against the installed database. This largely seems to be from a mix of API / integration concerns and the speed of the various pure-python packages compared to c-extensions when run under CPython.
There are pure-python drivers like I recommended, but most reviewers state that they're largely experimental. The pymysql authors claim stability and production readiness, some developers who blog have challenged that. As for what "stabile" or "experimental" means, that varies with the projects. Some have a changing API, others are incomplete, some are buggy.
You'd need to ensure that you can find pure-python drivers for each system that support the exact operations you need. This could be simple, or this could be messy. Whether you use SqlAlchemy or something else, you'll still face this concern when selecting a DBAPI driver.
The PyPy project ( pure-python python interpreter ) has a wiki listing the compatibility of various packages : https://bitbucket.org/pypy/compatibility/wiki/Home I would defer to them for specific driver suggestions. If PyPy is your intended platform, SqlAlchemy runs perfectly on it as well.
Are you looking for an ORM, or a single library that would allow you to write SQL statements directly and convert where there are differences?
I'm not sure whether psycopg2 is pure-python or not, but it certainly has bindings and works on all platforms. You'd still have to install at least psycopg2 to communicate with the PostgreSQL database as Python (as far as I know) doesn't ship with it natively.
From there, any additional ORM library you want would also need to be installed, but most are pure-python on-top of whatever backend they use.
Storm, Django, SQLAlchemy all have abstracted layers on top of their database layer - based on your description, Django is probably too large a framework for your needs (was for mine) but is a popular one, SQLAlchemy is a tried and true system - tho a bit clunky particularly if you have to deal with inheritance (in my opinion). I have heard that Storm is good, tho I haven't tested too much with it so I can't fully say.
If you are looking to mix and match (some tables in MySQL and some tables in PostgreSQL) vs. a single database that could be either MySQL or PostgreSQL, I've been working on an ORM called ORB that focuses more on object-oriented design and allows for multiple databases and relationships between databases. Right now it only supports PostgreSQL and Mongo, just cause I haven't needed MySQL, but I'd be up for writing that backend. The code for that can be found at http://docs.projexsoftware.com/api/orb
Use SQL-Alchemy. It will work with most database types, and certainly does work with postgres and MySQL.
I'm writing a script to parse some text files, and insert the data that they contain into a mysql database. I don't have root access on the server that this script will run on. I've been looking at mysql-python, but it requires a bunch of dependencies that I don't have available. Is there a simpler way to do this?
I would recommend the MySQL Python Connector, a MySQL DB-API adapter that does not use the C client library but rather reimplements the MySQL protocol completely in pure Python (compatible with Python 2.5 to 2.7, as well a 3.1).
To install C-coded extensions to Python you generally need root access (though the server you're using might have arranged things differently, that's not all that likely). But with a pure Python solution you can simply upload the modules in question (e.g. those from the Connector I recommend) just as you're uploading those you write yourself, which (if you of course do have a valid userid and password for that MySQL database!-) might solve it for you.
what is the advantage of using a python virtualbox API instead of using XPCOM?
The advantage is that pyvb is lot easier to work with.
On the contrary the documentation for the python API of XPCOM doesn't exist, and the API is not pythonic at all. You can't do introspection to find methods/attributes of an object, etc. So you have to check the C++ source to find how it works or some python scripts already written (like vboxshell.py and VBoxWebSrv.py).
On the other hand pyvb is really just python wrapper that call VirtuaBoxManager on the command line. I don't know if it's a real disadvantage or not?
I would generally recommend against either one. If you need to use virtualization programmatically, take a look at libvirt, which gives you cross platform and cross hypervisor support; which lets you do kvm/xen/vz/vmware later on.
That said, the SOAP api is using two extra abstraction layers (the client and server side of the HTTP transaction), which is pretty clearly then just calling the XPCOM interface.
If you need local host only support, use XPCOM. The extra indirection of libvirt/SOAP doesn't help you.
If you need to access virtualbox on a various hosts across multiple client machines, use SOAP or libvirt
If you want cross platform support, or to run your code on Linux, use libvirt.
From sun's site on VirtualBox python APIs:
SOAP allows to control remote VMs over
HTTP, while XPCOM is much more
high-performing and exposes certain
functionality not available with SOAP.
They use very different technologies
(SOAP is procedural, while XPCOM is
OOP), but as it is ultimately API to
the same functionality of the
VirtualBox, we kept in bindings
original semantics, so other that
connection establishment, code could
be written in such a way that people
may not care what communication
channel with VirtualBox instance is
used.
From that article, I'm having trouble seeing the difference between "python virtualbox API" and "XPCOM". Could you provide a link to the API you're thinking of?