Create a python like list in c++ - python

I have a python script that I have to translate in c++, and 80 % of my python script is based on lists.
I have a file that I read, and put the data of that file in a list :
//Code to translate in c++
bloc = [line]
for b in range(11):
bloc.append(lines[i + 1])
i += 1
I make my stuff with that data and then, I do it again until I read the whole file.
And finally I want to be able to get data of this list doing something like :
#Python script
var = bloc[0, 1, 2, 3 ...]
I'll respond to any questions you need more infos

The C++ container closest to a python List is a std::vector. However contrary to python a std::vector contains only one type of element. You have to declare what the vector will hold.
In your case it would be std::string (reading from a file).
So:
std::vector<std::string> cpp_list; // container for lines (stored as string )from the file
is equivalent to python python_list = []
should get you started.
With a std::vector you do not strictly need to allocated storage upfront but for performance reason it is better to do is if you know the required size in advance.
if you use cpp_list.reserve(something) or do not do any memory allocation, you must push in the vector using cpp_list.push_back(...) which is similar to pyhton_list.append(...)
If you allocate memory upfront eg: std::vector<std::string> cpp_list(nb_lines)
You must use indexing as in python eg cpp_list[3] = something

Related

How to get data from np.array to std::vector in c++ using <numpy/arrayobject.h>?

This is my first question on this site.
First of all, I need to make a module with one function for python in C++, which must work with numpy, using <numpy/arrayobject.h>. This function takes one numpy array and returns two numpy arrays. All arrays are one-dimensional.
The first question is how to get the data from a numpy array? I want to collect the information from array in std::vector, so then I can easily work with it C++.
The second: am I right that function should return a tuple of arrays, then user of my module can write like this in python:
arr1, arr2 = foo(arr)
?
And how to return like this?
Thank you very much.
NumPy includes lots of functions and macros that make it pretty easy to access the data of an ndarray object within a C or C++ extension. Given a 1D ndarray called v, one can access element i with PyArray_GETPTR1(v, i). So if you want to copy each element in the array to a std::vector of the same type, you can iterate over each element and copy it, like so (I'm assuming an array of doubles):
npy_intp vsize = PyArray_SIZE(v);
std::vector<double> out(vsize);
for (int i = 0; i < vsize; i++) {
out[i] = *reinterpret_cast<double*>(PyArray_GETPTR1(v, i));
}
One could also do a bulk memcpy-like operation, but keep in mind that NumPy ndarrays may be mis-aligned for the data type, have non-native byte order, or other subtle attributes that make such copies less than desirable. But assuming that you are aware of these, one could do:
npy_intp vsize = PyArray_SIZE(v);
std::vector<double> out(vsize);
std::memcpy(out.data(), PyArray_DATA(v), sizeof(double) * vsize);
Using either approach, out now contains a copy of the ndarray's data, and you can manipulate it however you like. Keep in mind that, unless you really need the data as a std::vector, the NumPy C API may be perfectly fine to use in your extension as a way to access and manipulate the data. That is, unless you need to pass the data to some other function which must take a std::vector or you want to use C++ library code that relies on std::vector, I'd consider doing all your processing directly on the native array types.
As to your last question, one generally uses PyArg_BuildValue to construct a tuple which is returned from your extension functions. Your tuple would just contain two ndarray objects.

C++ Pointer to Numpy Array

Briefly:
Is there an efficient way to make a numpy array given a pointer in memory to the array, it's type, and the number of elements?
More detail:
I am working with a python framework which has an object.GetData() command that is supposed to return a pointer to the data (an array of 35,000 int8) of this object.
I'm supposed to be able efficiently load these integers to a numpy array through
arr = numpy.frombuffer(object.GetData(),count=35000,dtype="int8")
but this doesn't seem to work. I get an error message ValueError: buffer is smaller than requested size. Changing the length, I can get it to output an array, but typically less than 20 integers in length (usually 0 or 1 integers).
I believe I can access the pointer to the start of the array, in hex form, through
hex(id(object.GetData()))
which looks like it gives addresses (e.g. 0x10fd8c670) but I don't know if this is the actual address.
I'm more comfortable in python than c++, but there could be a bug in the c++ code. The c++ code for GetData is:
const _Tp* GetData() const
{
// Return a const pointer to the internal data
return (fData.size() > 0 ) ? &(fData)[0] : NULL;
}
where fdata is initialized as a VecType through:
VecType fData;
Right now I can access each element of the object's data through an object.At(i) command where i is the index of the data array of object, but it is very slow to load each element into a numpy array this way, and I'm dealing with a lot of data. For reference, the At command in the c++ code does this:
_Tp At(size_t i) const
{
return fData.at(i);
}
Any help would be appreciated. I don't have a ton of experience with pointers, and even less with pointers in python, but I would like to figure this out in python rather than re-write all my code in c++. Thanks!

Parse C Array Size with Python & LibClang

I am currently using Python to parse a C file using LibClang. I've encountered a problem while reading a C-array which size is defined by a define-directive-variable.
With node.get_children i can perfectly read the following array:
int myarray[20][30][10];
As soon as the array size is replaced with a variable, the array won't be read correctly. The following array code can't be read.
#define MAX 60;
int myarray[MAX][30][10];
Actually the parser stops at MAX and in the dump there is the error: invalid sloc.
How can I solve this?
Thanks
Run the code through a C preprocessor before trying to parse it. That will cause all preprocessor-symbols to be replaced by their values, i.e. your [MAX] will become [60].
Note that C code can also do this:
const int three[] = { 1, 2, 3 };
i.e. let the compiler deduce the length of the array from the number of initializer values given.
Or, from C99, even this:
const int hundred[] = { [99] = 4711 };
So a naive approach might still break, but I don't know anything about the capabilities of the parser you're using, of course.
Semicolon ; in the define directive way causing the error.

reading a binary file in python

I have to read a binary file in python. This is first written by a Fortran 90 program in this way:
open(unit=10,file=filename,form='unformatted')
write(10)table%n1,table%n2
write(10)table%nH
write(10)table%T2
write(10)table%cool
write(10)table%heat
write(10)table%cool_com
write(10)table%heat_com
write(10)table%metal
write(10)table%cool_prime
write(10)table%heat_prime
write(10)table%cool_com_prime
write(10)table%heat_com_prime
write(10)table%metal_prime
write(10)table%mu
if (if_species_abundances) write(10)table%n_spec
close(10)
I can easily read this binary file with the following IDL code:
n1=161L
n2=101L
openr,1,file,/f77_unformatted
readu,1,n1,n2
print,n1,n2
spec=dblarr(n1,n2,6)
metal=dblarr(n1,n2)
cool=dblarr(n1,n2)
heat=dblarr(n1,n2)
metal_prime=dblarr(n1,n2)
cool_prime=dblarr(n1,n2)
heat_prime=dblarr(n1,n2)
mu =dblarr(n1,n2)
n =dblarr(n1)
T =dblarr(n2)
Teq =dblarr(n1)
readu,1,n
readu,1,T
readu,1,Teq
readu,1,cool
readu,1,heat
readu,1,metal
readu,1,cool_prime
readu,1,heat_prime
readu,1,metal_prime
readu,1,mu
readu,1,spec
print,spec
close,1
What I want to do is reading this binary file with Python. But there are some problems.
First of all, here is my attempt to read the file:
import numpy
from numpy import *
import struct
file='name_of_my_file'
with open(file,mode='rb') as lines:
c=lines.read()
I try to read the first two variables:
dummy, n1, n2, dummy = struct.unpack('iiii',c[:16])
But as you can see I had to add to dummy variables because, somehow, the fortran programs add the integer 8 in those positions.
The problem is now when trying to read the other bytes. I don't get the same result of the IDL program.
Here is my attempt to read the array n
double = 8
end = 16+n1*double
nH = struct.unpack('d'*n1,c[16:end])
However, when I print this array I get non sense value. I mean, I can read the file with the above IDL code, so I know what to expect. So my question is: how can I read this file when I don't know exactly the structure? Why with IDL it is so simple to read it? I need to read this data set with Python.
What you're looking for is the struct module.
This module allows you to unpack data from strings, treating it like binary data.
You supply a format string, and your file string, and it will consume the data returning you binary objects.
For example, using your variables:
import struct
content = f.read() #I'm not sure why in a binary file you were using "readlines",
#but if this is too much data, you can supply a size to read()
n, T, Teq, cool = struct.unpack("dddd",content[:32])
This will make n, T, Teq, and cool hold the first four doubles in your binary file. Of course, this is just a demonstration. Your example looks like it wants lists of doubles - conveniently struct.unpack returns a tuple, which I take for your case will still work fine (if not, you can listify them). Keep in mind that struct.unpack needs to consume the whole string passed into it - otherwise you'll get a struct.error. So, either slice your input string, or only read the number of characters you'll use, like I said above in my comment.
For example,
n_content = f.read(8*number_of_ns) #8, because doubles are 8 bytes
n = struct.unpack("d"*number_of_ns,n_content)
Did you give scipy.io.readsav a try?
Simply read you file like this:
mydict = scipy.io.readsav('name_of_file')
It looks like you are trying to read the cooling_0000x.out file generated by RAMSES.
Note that the first two integers (n1, n2) provide the dimensions of the two dimentional tables (arrays) that follow in the body of the file... So you need to first process those two integers before you know how much real*8 data is in the rest of the file.
scipy should be of help -- it lets you read arbitrary dimensioned binary data:
http://wiki.scipy.org/Cookbook/InputOutput#head-e35c7736718209eea00ebf37a7e1dfb91df696e1
If you already have this python code, please let me know as I was going to write it today (17Sep2014).
Rick

Creating a list with >255 elements

Ok, so I'm writing some python code (I don't write python much, I'm more used to java and C).
Anyway, so I have collection of integer literals I need to store.
(Ideally >10,000 of them, currently I've only got 1000 of them)
I would have liked to be accessing the literals by file IO, or by accessing there source API, but that is disallowed.
And not ontopic anyway.
So I have the literals put into a list:
src=list(0,1,2,2,2,0,1,2,... ,2,1,2,1,1,0,2,1)
#some code that uses the src
But when I try to run the file it comes up with an error because there are more than 255 arguments.
So the constructor is the problem.
How should I do this?
The data is intitally avaiable to me as a space deliminated textfile.
I just searched and replaced and copied it in
If you use [] instead of list(), you won't run into the limit because [] is not a function.
src = [0,1,2,2,2,0,1,2,... ,2,1,2,1,1,0,2,1]
src = [int(value) for value in open('mycsv.csv').read().split(',') if value.strip()]
Or are you not able to save text file in your system?

Categories

Resources