What would be the equivalent of Pythons "pickle" in nodejs - python

One of Python's features is the pickle function, that allows you to store any arbitrary anything, and restore it exactly to its original form. One common usage is to take a fully instantiated object and pickle it for later use. In my case I have an AMQP Message object that is not serializable and I want to be able to store it in a session store and retrieve it which I can do with pickle. The primary difference is that I need to call a method on the object, I am not just looking for the data.
But this project is in nodejs and it seems like with all of node's low-level libraries there must be some way to save this object, so that it could persist between web calls.
The use case is that a web page picks up a RabbitMQ message and displays the info derived from it. I don't want to acknowledge the message until the message has been acted on. I would just normally just save the data in session state, but that's not an option unless I can somehow save it in its original form.

See the pickle-js project: https://code.google.com/p/pickle-js/
Also, from findbestopensource.com:
pickle.js is a JavaScript implementation of the Python pickle format. It supports pickles containing a cross-language subset of the primitive types. Key differences between pickle.js and pickle.py:text pickles only some types are lossily converted (e.g. int) some types are not supported (e.g. class)
More information available here: http://www.findbestopensource.com/product/pickle-js

As far as I am aware, there isn't an equivalent to pickle in JavaScript (or in the standard node libraries).

Check out https://github.com/carlos8f/hydration to see if it fits your needs. I'm not sure it's as complete as pickle but it's pretty terrific.
Disclaimer: The module author and I are coworkers.

Related

Convert GraphQLResponse dictionary to python object

I am running a graphql query using aiographql-client and getting back a GraphQLResponse object, which contains a raw dict as part of the response json data.
This dictionary conforms to a schema, which I am able to parse into a graphql.type.schema.GraphQLSchema type using graphql-core's build_schema method.
I can also correctly get the GraphQLObjectType of the object that is being returned, however I am not sure how to properly deserialize the dictionary into a python object with all the appropriate fields, using the GraphQLObjectType as a reference.
Any help would be greatly appreciated!
I'd recommend using Pydantic to the heavy lifting in the parsing.
You can then either generate the models beforehand and select the ones you need based on GraphQLObjectType or generate them at runtime based on the definition returned by build_schema.
If you really must define your models at runtime you can do that with pydantic's create_model function described here : https://pydantic-docs.helpmanual.io/usage/models/#dynamic-model-creation
For the static model generation you can probably leverage something like https://jsontopydantic.com/
If you share some code samples I'd be happy to give some more insights on the actual implementation
I faced the same tedious problem while developing a personal project.
Because of that I just published a library which has the purpose of managing the mappings between python objects and graphql objects (python objects -> graphql query and graphql response -> python objects).
https://github.com/dapalex/py-graphql-mapper
So far it manages only basic query and response, if it will become useful I will keep implementing more features.
Try to have a look and see if it can help you
Coming back to this, there are some projects out there trying to achieve this functionality:
https://github.com/enra-GmbH/graphql-codegen-ariadne
https://github.com/sauldom102/gql_schema_codegen

What is the proper way to split a source into resource name/zone/project?

I'm listing instances using the google cloud python library method:
service.instances().list()
This returns a dict of instances, for each instance it returns a list of disks, and for each disk the source of the disk is available in the following format:
https://www.googleapis.com/compute/v1/projects/<project name>/zones/<zone>/disks/<disk name>
There is no other "name" in the disks dict, so that is the closest thing I have to retrieve the disk name.
After looking into other methods many of them return resources in a similar way.
However, if I want to use any google disk methods from the library, it's expected that I supply the disk name, project and zone as separate arguments to the library's method.
Is there a common method I can write to split the resource parameters?
In this example this would be project name, zone and disk name, but other resources might have different resources.
I could not find any method in the library that would do the split for me, so I guess it's expected that I write my own.
There is no specific API in GCP that helps you to give you such a result, although considering that the URL you are getting is constant (the order of what you want is constant), I think the easiest way to do it is by applying the next code ,
disk_url = "https://www.googleapis.com/compute/v1/projects/<project name>/zones/<zone>/disks/<disk name>".split('/')
project = disk_url[6]
zone = disk_url[8]
disk = disk_url[10]
I think it would be helpful but If you need something more specific I believe you have more work to do with "handling strings in python" by your own.

Does a JsonProperty deserialize only upon access?

In Google App Engine NDB, there is a property type JsonProperty which takes a Python list or dictionary and serializes it automatically.
The structure of my model depends on the answer to this question, so I want to know when exactly an object is deserialized? For example:
# a User model has a property "dictionary" which is of type JsonProperty
# will the following deserialize the dictionary?
object = User.get_by_id(someid)
# or will it not get deserialized until I actually access the dictionary?
val = object.dictionary['value']
ndb.JsonProperty follows the docs and does things the same way you would when defining a custom property: it defines make_value_from_datastore and get_value_for_datastore methods.
The documentation doesn't tell you when these methods get called, because it's up to the db implementation within the app engine to decide when to call these methods.
However, it's pretty likely they're going to get called whenever the model has to access the database. For example, from the documentation for get_value_for_datastore:
A property class can override this to use a different data type for the datastore than for the model instance, or to perform other data conversion just prior to storing the model instance.
If you really need to verify what's going on, you can provide your own subclass of JsonProperty like this:
class LoggingJsonProperty(ndb.JsonProperty):
def make_value_from_datastore(self, value):
with open('~/test.log', 'a') as logfile:
logfile.write('make_value_from_datastore called\n')
return super(LoggingJson, self).make_value_from_datastore(value)
You can log the JSON string, the backtrace, etc. if you want. And obviously you can use a standard logging function instead of sticking things in a separate log. But this should be enough to see what's happening.
Another option, of course, is to read the code, which I believe is in appengine/ext/db/__init__.py.
Since it's not documented, the details could change from one version to the next, so you'll have to either re-run your tests or re-read the code each time you upgrade, if you need to be 100% sure.
The correct answer is that it does indeed load the item lazily, upon access:
https://groups.google.com/forum/?fromgroups=#!topic/appengine-ndb-discuss/GaUSM7y4XhQ

Serializing data and unpacking safely from untrusted source

I am using Pyramid as a basis for transfer of data for a turn-based video game. The clients use POST data to present their actions, and GET to retrieve serialized game board data. The game data can sometimes involve strings, but is almost always two integers and two tuples:
gamedata = (userid, gamenumber, (sourcex, sourcey), (destx, desty))
My general client side framework was to Pickle , convert to base 64, use urlencode, and submit the POST. The server then receives the POST, unpacks the single-item dictionary, decodes the base64, and then unpickles the data object.
I want to use Pickle because I can use classes and values. Submitting game data as POST fields can only give me strings.
However, Pickle is regarded as unsafe. So, I turned to pyYAML, which serves the same purpose. Using yaml.safe_load(data), I can serialize data without exposing security flaws. However, the safe_load is VERY safe, I cannot even deserialize harmless tuples or lists, even if they only contain integers.
Is there some middle ground here? Is there a way to serialize python structures without at the same time allowing execution of arbitrary code?
My first thought was to write a wrapper for my send and receive functions that uses underscores in value names to recreate tuples, e.g. sending would convert the dictionary value source : (x, y) to source_0 : x, source_1: y. My second thought was that it wasn't a very wise way to develop.
edit: Here's my implementation using JSON... it doesn't seem as powerful as YAML or Pickle, but I'm still concerned there may be security holes.
Client side was constructed a bit more visibly while I experimented:
import urllib, json, base64
arbitrarydata = { 'id':14, 'gn':25, 'sourcecoord':(10,12), 'destcoord':(8,14)}
jsondata = json.dumps(arbitrarydata)
b64data = base64.urlsafe_b64encode(jsondata)
transmitstring = urllib.urlencode( [ ('data', b64data) ] )
urllib.urlopen('http://127.0.0.1:9000/post', transmitstring).read()
Pyramid Server can retrieve the data objects:
json.loads(base64.urlsafe_b64decode(request.POST['data'].encode('ascii')))
On an unrelated note, I'd love to hear some other opinions about the acceptability of using POST data in this method, my game client is in no way browser based at this time.
Why not use colander for your serialization and deserialization? Colander turns an object schema into simple data structure and vice-versa, and you can use JSON to send and receive this information.
For example:
import colander
class Item(colander.MappingSchema):
thing = colander.SchemaNode(colander.String(),
validator=colander.OneOf(['foo', 'bar']))
flag = colander.SchemaNode(colander.Boolean())
language = colander.SchemaNode(colander.String()
validator=colander.OneOf(supported_languages)
class Items(colander.SequenceSchema):
item = Item()
The above setup defines a list of item objects, but you can easily define game-specific objects too.
Deserialization becomes:
items = Items().deserialize(json.loads(jsondata))
and serialization is:
json.dumps(Items().serialize(items))
Apart from letting you round-trip python objects, it also validates the serialized data to ensure it fits your schema and hasn't been mucked about with.
How about json? The library is part of the standard Python libraries, and it allows serialization of most generic data without arbitrary code execution.
I don't see raw JSON providing the answer here, as I believe the question specifically mentioned pickling classes and values. I don't believe using straight JSON can serialize and deserialize python classes, while pickle can.
I use a pickle-based serialization method for almost all server-to-server communication, but always include very serious authentication mechanisms (e.g. RSA key-pair matching). However, that means I only deal with trusted sources.
If you absolutely need to work with untrusted sources, I would at the very least, try to add (much like #MartijnPieters suggests) a schema to validate your transactions. I don't think there is a good way to work with arbitrary pickled data from an untrusted source. You'd have to do something like parse the byte-string with some disassembler and then only allow trusted patterns (or block untrusted patterns). I don't know of anything that can do this for pickle.
However, if your class is "simple enough"… you might be able to use the JSONEncoder, which essentially converts your python class to something JSON can serialize… and thus validate…
How to make a class JSON serializable
The impact is, however, you have to derive your classes from JSONEncoder.

How should I append to small file-like objects in mongodb?

I need an interface to mongodb by which I can treat data in a collection like a standard python file-like object. These will be fairly small files (measured in kilobytes, at most) and in particular I need the ability to append to these so-called files. (So this question is not a dupe.)
I have read the GridFS documentation, and in particular it says I should not use it for small files. The only other implementations I've been able to find have all been PHP. I'm not really looking for help writing any specifics of the code, but implementing the entire file api seems a daunting task.
Are there any shortcuts or tools to make it easier to implement file-like objects in python 2?
Am I missing that someone has already done this?
(Why am I doing this? Because I received an eleventh-hour requirement that we deploy a pre-existing application that produces csv files on a multinode cloud environment that cannot transparently handle files.)
For question 1: check out the io module, and especially IOBase. It implements all of the file-likes in terms of a fairly sensible set of methods.
You could just store the data as binary, or text, in a MongoDB collection. But you'd have two problems:
You'd have to implement as much of the Python file protocol as your other code expects to have implemented.
When you append to the "file", the document would grow in MongoDB and possibly need to be moved on disk to a location with enough space to hold the larger document. Moving documents is expensive.
Go with GridFS -- the documentation discourages you from using for static files but for your case it's perfect because PyMongo has done the work for you of implementing Python's file protocol for MongoDB data. To append to a GridFS file you must read it, save a new version with the additional data, and delete the previous version. But this isn't much more expensive than moving a grown document anyway.

Categories

Resources