how to treat ruby symbols in cross language object serialization

how to treat ruby symbols in cross language object serialization - python

I'm currently working on a project where I need to transfer objects from ruby to python and back again, obviously serialization is the way to go. I've looked at things like yaml but decided to write my own as I didn't want to deal with the dependencies of the libraries and such when it came time to distribute. I've wrote up how this serialization format works here.
my question is that as this format is intended to work cross language between ruby and python,
how should I serialize ruby's symbols? I'm not aware of a object that works the same way in python. should a dump containing a symbol fail? should I just serialize it as a string? what would be best?

Doesn't that depend on what your project needs? If symbols are important, you'll need some way to deal with them.
I'm not a Ruby programmer, but from what I've just read, I think converting them to strings is probably easiest. The standard Python interpreter will reuse memory for identical short strings, which seems to be a key reason suggested for using symbols.
EDIT: If it needs to work for other programmers, passing values back and forth shouldn't change them. So you either have to handle symbols properly, or throw an error straight away. It should be simple enough in Python:
class Symbol(str):
pass
# In serialising code:
if isinstance(x, Symbol):
serialise_as_symbol(x)

Any reason you're not using a standard data interchange format like JSON or XML? They seem to be acceptable to countless applications, services, and programmers.

If symbols are a stumbling block then you have three choices, don't allow them, convert them to strings on the fly or figure out a way to make them universal and/or innocuous in other languages.

Related

How to store and read common data in redis from python and rust?

I have some data being stored in redis cache which will be read by my application in Rust. The data is being stored by python. Whenever I am storing a string or an array, it stores it in a weird form which I was not able to read into Rust. Vice versa, I want to write from Rust and be able to read it in python.
Using django shell:
In [0]: cache.set("test","abc")
In [1]: cache.get("test")
Out[1]:'abc'
Using redis-cli:
127.0.0.1:6379> GET :1:test
"\x80\x04\x95\a\x00\x00\x00\x00\x00\x00\x00\x8c\x03abc\x94."
Output from Rust:
Err(Invalid UTF-8)
Rust code read data using redis-rs library:
let client = redis::Client::open("redis://127.0.0.1:6379")?;
let mut con = client.get_connection()?;
let q:Result<String, redis::RedisError> = con.get(":1:test");
println!("{:?}",q);
I want to be able to read a string or array into Rust as it was written in Python and vice-versa.
Also, data in one key will only be ever written by either Rust or Python, not both.
This question is not a duplicate of this as that deals specifically for accent encoding, however, I want to solve my problem for arrays as well. Moreover, the value being set in redis by django for a string is not simply the UTF encoding for the string.

Ah, the joys of trying to throw data across environments. The thing you're being bitten by right now is called Pickle and is the default serializer of django-redis. What a serializer does in this case (in python) is the transformation of your data between python and redis so you can store it, regardless of the type, but more importantly so you can retrieve it with the type it came in.
The python side
Obviously, if you had infinite time and effort, you could rewrite pickle in rust and you'd then be able to read this format. I'm pretty sure you have neither, and depending on the data you're storing, you might not even want to do so.
Instead, what I'm going to suggest is to change the serializer from pickle to json. The description of what to change in the config is located at https://django-redis-cache.readthedocs.io/en/latest/advanced_configuration.html#pluggable-serializers , and in particular, I'm pretty sure the class name you want to use is django_redis.serializers.JSONSerializer.
This comes with drawbacks. In particular, there will be some object types you will no longer be able to store on the python side, but if you do really intend to read data on the rust side, this should not concern you.
Sven Marnach mentioned in one of the comments that the serde-pickle crate exists. I have not used it myself, but it does look promising and might save you a ton of interop work if it does function.
The rust side
To read stuff, now that every key is going to be json, you'll be decoding types with either serde or miniserde. This should be pretty straightforward; do bear in mind that you will not get native types out of this; instead, you'll get members of the serde::Value enum (Boolean, Number, Object, etc), which you will then have to filter through.
Edit your question to indicate what you are trying to store, and I'll happily expand on how to do this on here!

Python to C++ Data structure API

Is there an easy way to pass a complete data structure from C++ to Python and vice-versa easily with multiple data types?
I have a complex class with pointer objects of floats, longs etc. I could convert this into a json string and parse it both ways, but this would be really slow.
However, if we had a special format, that has takes this data, but also stores meta data of the start/end of the json string, it would parse much faster. Is there anything like this?

I would personally recommend serializing your data into JSON in C++ using e.g. rapidjson or Qt and then to transfer the resulting string to Python using the C API bindings for Python and deserializing it into Python Dictionary there. One-way or two-way transfer should be easy enough.
Note about the C API bindings however. I have used it in the past and it was not pleasant experience in any way or form. Eventually you will make it work and do what you want but it will cost you some nerves.
Lastly do not worry about performance. Since you are using Python (an interpreted language) you are apparently not doing anything performance critical anyway so cost of the JSON (de)serialization can be ignored here.
Good luck because you are going to need it with the Python's C API bindings.

Wrap Scala library in Python

There is a Scala library I'd like to use, namely BIDMach, however I need to be able to use it from Python rather than in Scala. I've been trying to think of different ways of possible being able communicate between the library and Python code, such as creating an HTTP server in Scala and calling this from Python, using something like JPype to try and use Scala libraries in Python, and different types of interprocess communication. However, none of them seem to work very well, and would seem to require a large amount of reimplementation of what is already in the library. Does anyone know of a good way of going about this?
edit: In terms of exactly what I think I'd like to do, ideally I'd be able to get close to almost all of the libraries functionality usable in Python, however that probably isn't realistic. It would be nice if some of the Scala classes were easily usable in Python, without too much repeated implementation effort. The reason I don't think what I've looked into so far will work well is because it requires a fair bit of reimplementation of what is already in the library (i.e. representing something like a matrix in JSON, as a way of transporting data to/from Python/Scala)

What structured text format is the best supported in Python?

This question may be seen as subjective, but I'd like to ask SO users which common structured textual data format is best supported in Python.
My initial choices are:
XML
JSON
and YAML
Which of these three is easiest to work with in Python (ie. has the best library support / performance) ... or is there another format that I haven't mentioned that is better supported in Python.
I cannot use a Python only format (e.g. Pickling) since interop is quite important, but the majority of the code that handles these files will be written in Python, so I'm keen to use a format that has the strongest support in Python.
CSV or fixed column text may also be viable for most use cases, however I'd prefer the flexibility of a more scalable format.
Thank you
Note
Regarding interop I will be generating these files initially from Ruby, using Builder, however Ruby will not be consuming these files again.

I would go with JSON, I mean YAML is awesome but interop with it is not that great.
XML is just an ugly mess to look at and has too much fat.
Python has a built-in JSON module since version 2.6.

JSON has great python support and it is much more compact than XML (and the API is generally more convenient if you're just trying to dump and load objects). There's no out of the box support for YAML that I know of, although I haven't really checked. In the abstract I would suggest using JSON due to the low overhead of the format and the wide range of language support, but it does depend a bit on your application - if you're working in a space that already has established applications, the formats they use might be preferable, even if they're technically deficient.

I think it depends a lot on what you need to do with the data. If you're going to be building a complex database and doing processing and transformations on it, I suspect you'd be better off with XML. I've found the lxml module pretty useful in this regard. It has full support for standards like xpath and xslt, and this support is implemented in native code so you'll get good performance.
But if you're doing something more simple, then likely you'd be better off to use a simpler format like yaml or json. I've heard tell of "json transforms" but don't know how mature the technology is or how developed Python's access to it is.

It's pretty much all the same, out of those three. Use whichever is easier to inter-operate with.

haskell vs python typing

I am looking for example where things in python would be easier to program just because it is dynamically typed?
I want to compare it with Haskell type system because its static typing doesn't get in the way like c# or java. Can I program in Haskell as I can in python without static typing being a hindrance?
PS: I am a python user and have played around little bit with ML and Haskell.. ... I hope it is clear now..

Can I program in Haskell as I can in python without static typing being a hindrance
Yes.
To elaborate, I would say the main gotcha will be the use of existential types in Haskell for heterogeneous data structures (regular data structures holding lists of variously typed elements). This often catches OO people used to a top "Object" type. It often catches Lisp/Scheme programmers. But I'm not sure it will matter to a Pythonista.
Try to write some Haskell, and come back when you get a confusing type error.
You should think of static typing as a benefit -- it checks a lot of things for you, and the more you lean on it, the less things you have to test for. In addition, it enables the compiler to make your code much faster.

Well for one you can't create a list containing multiple types of values without wrappers (like to get a list that may contain a string or an int, you'd have to create a list of Either Int String and wrap each item in a Left or a Right).
You also can't define a function that may return multiple types of values (like if someCondition then 1 else "this won't compile"), again, without using wrappers.

Like Chris said, this is one objective question (what can a dynamically typed language do that a statically typed one can't?) and one subjective question (can I use Haskell without static typing being a hindrance). So you're going to get mostly subjective answers, because the first question is not as interesting.
For me, the biggest hindrance was Haskell's IO type, because I had to stop and think about what code does I/O and what code doesn't, and explicitly pass information between the two. Everything else was pretty easy. If you commonly write
if someCondition:
return 1
else:
return "other"
Then you're making your own problems, Python just doesn't stop you from doing it. Haskell will, and that's about the only difference. The only exception is that this is sort of common in Python:
if someErrorCondition:
return None
else:
return NewItem(Success)
You can't do that in Haskell because there is no common None object. But there are easy ways to work around it.
I did find the type errors confusing at first, but I learned to read them in about a week.
I want to echo Don's advice: just try writing some Haskell and come back when you get a confusing type error.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.