Write Raw Numbers to Disk - python

It occurred to me that I have no idea how to write raw numerical values to disk.
How would I do this in Python or C++?!
I'm running some simulations and writing intermediate results to disk so that it doesn't start from scratch if it crashes.
Sadly these values chomp up gigabytes upon gigabytes of space on my hard drive.
Would writing the numerical values to disk as floats take up significantly less disk space or is there some other overhead I'm not considering?

The most versatile and powerful option is to use the HDF5 format, with the help of the Python interface. From the website:
It lets you store huge amounts of numerical data, and easily
manipulate that data from NumPy. For example, you can slice into
multi-terabyte datasets stored on disk, as if they were real NumPy
arrays. Thousands of datasets can be stored in a single file,
categorized and tagged however you want
It also has a C++ API.
The HDF5 format is widely used in the scientific computing community and is read/written by many software. Data in the HDF5 format can be manipulated rapidly with the parallel utility tools.

You can roll your own binary format and use that, but it's probably a bad idea.
If you're using Python to deal with numeric data, you're almost certainly using numpy. If you're not using numpy, you should look in to using numpy, it's great.
Once you've got your data in a numpy array, you can just use their save method.

The general method in Python is to use the struct module.
import struct
print struct.pack("!d", 3.14159)
(You can choose what byte order to use—I use ! to indicate network byte order for portability—or use no indicator to use the native byte ordering. Actually, I'm not sure if IEEE 754 specifies a byte ordering, so I'm not sure what to recommend. Maybe using the default is best.)

Before you optimize, make sure you are at least doing something like this (storing your numeric type in its binary representation on disk). If you are at this point and the file sizes are still too large, you can consider different types of compressed formats.
#include <iostream>
#include <fstream>
typedef int32_t my_numeric_type;
int main()
{
using namespace std;
{
ofstream output_file("numbers.dat", ios::binary);
if( !output_file )
{
cout << "Failed to open file for writing" << endl;
return 1;
}
for( my_numeric_type i = 0 ; i <= 1000; ++i )
output_file.write(reinterpret_cast<const char*>(&i), sizeof(i));
}
{
ifstream input_file("numbers.dat", ios::binary);
if( !input_file )
{
cout << "Failed to open file for reading" << endl;
return 1;
}
my_numeric_type i;
while( input_file.read(reinterpret_cast<char*>(&i), sizeof(i)) )
cout << i << endl;
}
return 0;
}

Related

Need a way to properly determine c data type sizes from within python

The company I work for has a proprietary file format that's old. REALLY old. I'm building a python library to read/write from the files (database type files) and have some questions.
The runtime that originally reads/writes to the files reads out the first XX bytes dynamically based on the sizeof the struct in question. For example:
struct fhdr {
union {
unsigned char ifflag[2]; /* file type and psw */
int fh_flag; /* alignment to old version */
} ufh;
unsigned reclen; /* record length in bytes */
DWORD fsize; /* byte size/reclen */
struct {
short typ;
short offset; /* in bytes */
} fmt[MAXITMS]; /* struct for formatted file */ (65?)
};
My issue is that we have customers across a wide array of platforms. A long on one customer is 8 bytes, but a customer on an old SCO 6 box (they're out there!) might be 4 bytes in size.
Right now, I have this:
#include <stdio.h>
int main(void){
printf("char=%d\n", sizeof(char));
printf("int=%d\n", sizeof(int));
printf("short=%d\n", sizeof(short));
printf("long=%d\n", sizeof(long));
printf("float=%d\n", sizeof(float));
printf("double=%d\n", sizeof(double));
printf("long double=%d\n", sizeof(long double));
printf("DWORD=%d\n", sizeof(long));
printf("unsigned=%d\n", sizeof(unsigned));
return 0;
}
It just prints out the sizes in this format:
char=1
int=4
short=2
long=8
float=4
double=8
long double=16
DWORD=8
and it's parsed when the class is instantiated. I can then go and build an array based on the platform's real variable sizes.
My question is: Is there a way, in python 3.x, for me to find an individual server's data type sizes or am I just better off parsing a simple c program as above?
It's not hard, it just feels tedious and repetitive and feels WRONG to go and create custom functions to retrieve each datatype.
header_fields = {
'ifflag': IMS.char() * 2,
'fh_flag': IMS.int(),
'reclen': IMS.unsigned(),
'fsize': IMS.DWORD(),
'typ': IMS.short(),
'offset': IMS.short()
}
(yes, I know that a char is always 1 byte. I just like uniformity.)
What I have works and it does it's job rather well. I just want to learn how to improve on it, if possible.

I want to create something like a python dictionary in C++

I'm using a struct. Is there some way to iterate through all the items of type "number"?
struct number { int value; string name; };
In c++ map works like python dictionary, But there is a basic difference in two languages. C++ is typed and python having duck typing. C++ Map is typed and it can't accept any type of (key, value) like python dictionary.
A sample code to make it more clear -
map<int, char> mymap;
mymap[1] = 'a';
mymap[4] = 'b';
cout<<"my map is -"<<mymap[1]<<" "<<mymap[4]<<endl;
You can use tricks to have a map which will accept any type of key, Refer - http://www.cplusplus.com/forum/general/14982/
As per my understanding you want to access a value and name using number. You can go for array of structure like
number n[5]; where n[0],n[1],...n[4]
but we have some additional features in c++ to achieve this with the predefined map, set
You can find lots of examples for map
You can use std::map (or unordered_map)
// Key Value Types.
std::map<int, std::string> data {{1, "Test"}, {2, "Plop"}, {3, "Kill"}, {4, "Beep"}};
for(auto item: data) {
// Key Value
std::cout << item.first << " : " << item.second << "\n";
}
Compile and run:
> g++ -std=c++14 test.cpp
> ./a.out
1 : Test
2 : Plop
3 : Kill
4 : Beep
The difference between std::map and std::unordered_map is for std::map the items are ordered by the Key while in std::unordered_map the values are not ordered (thus they will be printed in a seemingly random order).
Internally they use very different structures but I am sure you are not interested in that level of detail.

Python Struct.Unpack in C++ (Bitshift?)

data = struct.unpack('!10H'%length, buf[:20])
Now assuming C++ where buf is a std::string.
Could I just write:
uint8_t f1 = (buf[0] << 8) | buf[1];
uint8_t f2 = (buf[2] << 8) | buf[3];
?
I have to translate a python ROS-IMU driver to ROS-C++ and have to deal with a lot of struct unpacks. I read about different ways to translate the code, some said declaring a corresponding struct and execute memcpy or reinterprete_cast, others said to use bitshift. That's what I got compiling so far. Would this do what I want it to do? Or how do I cast a std::string or uint8_t array to the corresponding values?
And what does the percent sign (%) before dB mean in this unpack? In the python manual this parameter is not listed under Format Parameters.
data = struct.unpack('!%dB'%length, buf[:-1])

python scipy weave long integer

I'm using scipy.weave to improve the performance of my python code. Basically, I have to go through a long array (1024^3,3) -i.e. an array containing 1024^3 elements, with each element having 3 entries- compute several things for each element and then fill another array.
The problem is that I get and segmentation fault when the array is larger than ~(850**3,3). The segmentation fault takes places when I try to read the value of the array at the position (a,3), where a = 715827882. Note that 3*a ~ 2^31. I have carefully explored this issue and it seems to me that I can't go through arrays with a length larger than the size of an integer variable.
In fact, this simple program
################################
import numpy as np
import scipy.weave as wv
def printf():
a=3*1024**3
support = """
#include <iostream>
using namespace std;
"""
code = """
cout << a << endl;
"""
wv.inline(code,['a'],
type_converters = wv.converters.blitz,
support_code = support,libraries = ['m'])
printf()
#########################################
outputs -1073741824 instead of 3221225472. Which means (I think) that the variable a is taken in the c code as an integer of 32 bits, instead of 64 bits.
Does anyone know how to solve this? Of course, I can only split my array in pieces of sizes smaller than 2^31, but I found this very inefficient.
Thanks.

Port C's fread(&struct,....) to Python

Hey, I'm really struggling with this one. I'am trying to port a small piece of someone else's code to Python and this is what I have:
typedef struct
{
uint8_t Y[LUMA_HEIGHT][LUMA_WIDTH];
uint8_t Cb[CHROMA_HEIGHT][CHROMA_WIDTH];
uint8_t Cr[CHROMA_HEIGHT][CHROMA_WIDTH];
} __attribute__((__packed__)) frame_t;
frame_t frame;
while (! feof(stdin))
{
fread(&frame, 1, sizeof(frame), stdin);
// DO SOME STUFF
}
Later I need to access the data like so: frame.Y[x][y]
So I made a Class 'frame' in Python and inserted the corresponding variables(frame.Y, frame.Cb, frame.Cr).
I have tried to sequentially map the data from Y[0][0] to Cr[MAX][MAX], even printed out the C struct in action but didn't manage to wrap my head around the method used to put the data in there. I've been struggling overnight with this and have to get back to the army tonight, so any immediate help is very welcome and appreciated.
Thanks
You have to use struct python standard module.
From its documentation (emphasys added):
This module performs conversions
between Python values and C structs
represented as Python strings. It uses
format strings (explained below) as
compact descriptions of the lay-out of
the C structs and the intended
conversion to/from Python values. This
can be used in handling binary data
stored in files or from network
connections, among other sources.
Note: as the data you are reading in the end is of a uniform format, you could also use the array module and then "restructure" the data in Python, but I think the best way to go is by using struct.

Categories

Resources