Pickle vs. ply serialization

Pickle vs. ply serialization - python

I want to serialize a collection of meshes (only vertices and faces). Is there any way to do it in one .ply file and, if so, is there any advantage of that approach over using pickle, if I am only going to access that file through python later when I need it?
Tried using trimesh for serialization, but only way is to concatenate meshes, but then I cannot view them as individual instances when I deserialize.

Related

How can I persistently store and efficiently access a very large 2D list in Python?

In Python, I'm reading in a very large 2D grid of data that consists of around 200,000,000 data points in total. Each data point is a tuple of 3 floats. Reading all of this data into a two dimensional list frequently causes Memory Errors. To get around this Memory Error, I would like to be able to read this data into some sort of table on the hard drive that can be efficiently accessed when given a grid coordinate i.e harddrive_table.get(300, 42).
So far in my research, I've come across PyTables, which is an implementation of HDF5 and seems like overkill, and the built in shelve library, which uses a dictionary-like method to access saved data, but the keys have to be strings and the performance of converting hundreds of millions of grid coordinates to strings for storage could be too much of a performance hit for my use.
Are there any libraries that allow me to store a 2D table of data on the hard drive with efficient access for a single data point?
This table of data is only needed while the program is running, so I don't care about it's interoperability or how it stores the data on the hard drive as it will be deleted after the program has run.

HDF5 isn't really overkill if it works. In addition to PyTables there's the somewhat simpler h5py.
Numpy lets you mmap a file directly into a numpy array. The values will be stored in the disk file in the minimum-overhead way, with the numpy array shape providing the mapping between array indices and file offsets. mmap uses the same underlying OS mechanisms that power the disk cache to map a disk file into virtual memory, meaning that the whole thing can be loaded into RAM if memory permits, but parts can be flushed to disk (and reloaded later on demand) if it doesn't all fit at once.

Python GeoJson to GML conversion

I have a python application that creates polygons to identify geographic areas of interest at specific times. To this point I've been using geojson because of the handy geojson library that makes writing it easy. I put the time information in the file name. However now I need to publish my polygons via a WMS with TIME (probably going to use mapserver). As geojson doesn't appear to support a feature time and geojson-events hasn't been accepted yet, I thought I would try to convert to GML,however I cannot seem to locate a library that would make writing GML from python simple. Does one exist? I tried using the geojson-events format and then ogr2ogr to convert from geojson-events to gml but the time information gets dropped.
So looking for either:
a) an efficient way to write GML from python,
b) a way to encode datetime information into geojson such that ogr will recognize it or
c) another brilliant solution I haven't thought of.

To convert GeoJSON into GML you could use GDAL (Geospatial Data Abstraction Library). There are numerous ways of using the library including directly with Python
However as you want to set up a WMS to serve your data, you might want to set up a spatial database, for example PostgreSQL/PostGIS, and import the GeoJSON directly into the database, then allow MapServer to do the conversion for you.
See Store a GeoJSON FeatureCollection to postgres with postgis for details of how you might do this.

HDF5 and Python: Automatic export of an object

I'm new to the HDF5 file format and have been experimenting successfully in Python with h5py. Now its time to store real data.
I will need to store a list of objects, where each object can be one of several types and each type will have a number of arrays and strings. The critical part is that the list of objects will be different for each file, and there will be hundreds of files.
Is there a way to automatically export an arbitrary, nested object into (and back from) an HDF5 file? I'm imagining a routine that would automatically span the hierarchy of a nested object and build the same hierarchy in the HDF5 file.
I've read through the H5PY doc and don't see any spanning routines. Furthermore, google and SO searches are (strangely) not showing this capability. Am i missing something or is there another way to look at this.

dumping several objects into the same file

Let's say I have a dictionary of about 100k pairs of strings, and a numpy matrix of shape (100k, 500). I would like to save them to the disk in a same file.
What I'm doing right now is using cPickle to dump the dictionary, and scipy.io.savemat to dump the matrix. This way, the dump / load is very fast. But the problem is that since I use different methods I obtain 2 files, and I would like to have just one file containing my 2 objects. How can I do this?
I could cPickle them both in the same file, but cPickle is incredibly slow on big arrays.

You could use dill. dill.dump accesses and uses the dump method from numpy to store an array or matrix object, so it's stored the same way it would be if you did it directly from the method on the numpy object. You'd just dill.dump the dictionary.
dill also has the ability to store pickles in compressed format, but it's slower. As mentioned in the comments, there's also joblib, which can also do the same as dill… but basically, joblib leverages cloudpickle (which is another serializer) or can also use dill, to do the serialization.
If you have a huge dictionary, and don't need all of the contents at once… maybe a better option would be klepto, which can use advanced serialization methods (from dill) to store a dict to several files on disk (or a database), where you have a proxy dict in memory that enables you to only get the entries you need.
All of these packages give you a fast unified dump for standard python and also for numpy objects.

How to build a file to save different Python objects as a project file?

I'm creating a GUI using PyQt4 and Python 2.7, I have to process different images with advanced techniques. The images are processed as NumPy arrays, however, in the program there are more than one image and other data (dictionaries, tuples, lists, NumPy arrays) that should be stored preferably in just one file. It would be great that all this data could be stored in a file as file_name.project (with a custom extension). I don't know how to do this. There is an easy way to do this stuff? What do you recommend me?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pickle vs. ply serialization - python

Related

How can I persistently store and efficiently access a very large 2D list in Python?

Python GeoJson to GML conversion

HDF5 and Python: Automatic export of an object

dumping several objects into the same file

How to build a file to save different Python objects as a project file?

Categories

Resources