python scipy weave long integer - python

I'm using scipy.weave to improve the performance of my python code. Basically, I have to go through a long array (1024^3,3) -i.e. an array containing 1024^3 elements, with each element having 3 entries- compute several things for each element and then fill another array.
The problem is that I get and segmentation fault when the array is larger than ~(850**3,3). The segmentation fault takes places when I try to read the value of the array at the position (a,3), where a = 715827882. Note that 3*a ~ 2^31. I have carefully explored this issue and it seems to me that I can't go through arrays with a length larger than the size of an integer variable.
In fact, this simple program
################################
import numpy as np
import scipy.weave as wv
def printf():
a=3*1024**3
support = """
#include <iostream>
using namespace std;
"""
code = """
cout << a << endl;
"""
wv.inline(code,['a'],
type_converters = wv.converters.blitz,
support_code = support,libraries = ['m'])
printf()
#########################################
outputs -1073741824 instead of 3221225472. Which means (I think) that the variable a is taken in the c code as an integer of 32 bits, instead of 64 bits.
Does anyone know how to solve this? Of course, I can only split my array in pieces of sizes smaller than 2^31, but I found this very inefficient.
Thanks.

Related

cython insert element in array.array

I am trying to convert some python code into cython. In the python code I use data of type
array.array('i', [...]) and use the method array.insert to insert an element at a specific index. in cython however, when I try to insert an element using the same method I get this error: BufferError: cannot resize an array that is exporting buffers
basically:
from cpython cimport array
cdef array.array[int] a = array.array('i', [1,2,3,3])
a.insert(1,5) # insert 5 in the index 1 -> throws error
I have been looking at cyappend3 of this answer but I am using libcpp and not sure I understand the magic written there.
Any idea how to insert an element at a specific index in an array.array?
Partial answer:
BufferError: cannot resize an array that is exporting buffers
This is telling you that you have a memoryview (or similar) of the array somewhere. It isn't possible to resize it because that memoryview is looking directly into that array's data and resizing the array will require reallocating the data. You can replicate this error in Python too if you do view = memoryview(arr) before you try to insert.
In your case:
cdef array.array[int] a = array.array('i', [1,2,3,3])
cdef array.array[int] a is defining an array with a fast buffer to elements of the array, and it's this buffer that prevents you from resizing it. If you just do cdef array.array a it works fine. Obviously you lose the fast buffer access to individual elements, but that's because you're trying to change the data out from under the buffer.
I strongly recommend you don't resize arrays though. Not only does it involve the O(n) copy of every element of the array. Also, unlike Python list, array doesn't over-allocate so even append causes a complete reallocation and copy every time (i.e. is O(n) rather than amortized O(1)).
Instead I'd suggest keeping the data as a Python list (or maybe something else) until you've finalized the length and only then converting to array.
what has been answered here in this post is correct, (https://stackoverflow.com/a/74285371/4529589), and I have the same recommendation.
However I want to add this point that if you want to use the insert but as well if you want to use the insert and still define as the c buffer, you could use the std::vector. This will be faster.
from libcpp.vector cimport vector
cdef vector[int] vect = array.array('i', [1,2,3,3])
vect.insert(vect.begin() + 1 ,5)
and as well I recomend if you want to use this solution just drop the array and from the begining just use the vector initialization.

How can I make my Python program use 4 bytes for an int instead of 24 bytes?

To save memory, I want to use less bytes (4) for each int I have instead of 24.
I looked at structs, but I don't really understand how to use them.
https://docs.python.org/3/library/struct.html
When I do the following:
myInt = struct.pack('I', anInt)
sys.getsizeof(myInt) doesn't return 4 like I expected.
Is there something that I am doing wrong? Is there another way for Python to save memory for each variable?
ADDED: I have 750,000,000 integers in an array that I wish to be able to use given an index.
If you want to hold many integers in an array, use a numpy ndarray. Numpy is a very popular third-party package that handles arrays more compactly than Python alone does. Numpy is not in the standard library so that it could be updated more frequently than Python itself is updated--it was considered to be added to the standard library. Numpy is one of the reasons Python has become so popular for Data Science and for other scientific uses.
Numpy's np.int32 type uses four bytes for an integer. Declare your array full of zeros with
import numpy as np
myarray = np.zeros((750000000,), dtype=np.int32)
Or if you just want the array and do not want to spend any time initializing the values,
myarray = np.empty((750000000,), dtype=np.int32)
You then fill and use the array as you like. There is some Python overhead for the complete array, so the array's size will be slightly larger than 4 * 750000000, but the size will be close.

PyCUDA 2D array implementations (or working with strings)

I'm trying to work with an array of strings(words) in CUDA.
I tried flattening it by creating a single string, but then then to index it, I'd have to go through some of it each time a kernel runs. If there are 9000 words with a length of 6 characters, I'd have to examine 53994 characters in the worst case for each kernel call. So I'm looking for different ways to do it.
Update: Forgot to mention, the strings are of different lengths, so I'd have to find the end of each one.
The next thing I tried was copying each word to different memory locations, and then collect the addresses, and pass that to the GPU as an array with the following code:
# np = numpy
wordList = ['asd','bsd','csd']
d_words = []
for word in wordList:
d_words.append(gpuarray.to_gpu(np.array(word, dtype=str)))
d_wordList = gpuarray.to_gpu(np.array([word.ptr for word in d_words], dtype=np.int32))
ker_test(d_wordList, block=(1,1,1), grid=(1,1,1))
and in the kernel:
__global__ void test(char** d_wordList) {
printf("First character of the first word is: %c \n", d_wordList[0][0]);
}
The kernel should get an int32 array of pointers that point to the beginning of each word, effectively being a char** (or int**), but it doesn't work as I expect.
What is wrong with this approach?
Also what are the "standard" ways to work with strings in PyCUDA (or even in CUDA) in general?
Thanks in advance.
After some further thought, I've concluded that for this case of variable-length strings, using an "offset array" may not be much different than 2D indexing (i.e. double-pointer indexing), when considering the issue of data access within the kernel. Both involve a level of indirection.
Here's a worked example demonstrating both methods:
$ cat t5.py
#!python
#!/usr/bin/env python
import time
import numpy as np
from pycuda import driver, compiler, gpuarray, tools
import math
from sys import getsizeof
import pycuda.autoinit
kernel_code1 = """
__global__ void test1(char** d_wordList) {
(d_wordList[blockIdx.x][threadIdx.x])++;
}
"""
kernel_code2 = """
__global__ void test2(char* d_wordList, size_t *offsets) {
(d_wordList[offsets[blockIdx.x] + threadIdx.x])++;
}
"""
mod = compiler.SourceModule(kernel_code1)
ker_test1 = mod.get_function("test1")
wordList = ['asd','bsd','csd']
d_words = []
for word in wordList:
d_words.append(gpuarray.to_gpu(np.array(word, dtype=str)))
d_wordList = gpuarray.to_gpu(np.array([word.ptr for word in d_words], dtype=np.uintp))
ker_test1(d_wordList, block=(3,1,1), grid=(3,1,1))
for word in d_words:
result = word.get()
print result
mod2 = compiler.SourceModule(kernel_code2)
ker_test2 = mod2.get_function("test2")
wordlist2 = np.array(['asdbsdcsd'], dtype=str)
d_words2 = gpuarray.to_gpu(np.array(['asdbsdcsd'], dtype=str))
offsets = gpuarray.to_gpu(np.array([0,3,6,9], dtype=np.uint64))
ker_test2(d_words2, offsets, block=(3,1,1), grid=(3,1,1))
h_words2 = d_words2.get()
print h_words2
$ python t5.py
bte
cte
dte
['btectedte']
$
Notes:
for the double-pointer case, the only change from OP's example was to use the numpy.uintp type for the pointer as suggested in the comments by #talonmies
I don't think the double-pointer access of data will necessarily be quicker or slower than the indirection associated with the offset lookup method. One other performance consideration would be in the area of copying data from host to device and vice versa. The double pointer method effectively involves multiple allocations and multiple copy operations, in both directions, I believe. For a lot of strings, this will be noticeable in the host/device data copy operations.
Another possible merit of the offset method is that it is easy to determine the length of each string - just subtract two adjacent entries in the offset list. This could be useful so as to make it easy to determine how many threads can operate on a string in parallel, as opposed to having a single thread work on a string sequentially (or use a method in kernel code to determine string length, or pass the length of each string).

How to use the PCACompute function from Python in OpenCV 3?

The cv2.PCACompute function worked well in OpenCV 2.4 using the following syntax :
import cv2
mean, eigvec = cv2.PCACompute(data)
The function exists in OpenCV 3.1, but raises the following exception :
TypeError: Required argument 'mean' (pos 2) not found
The C++ documentation is not very helpful at explaining how I should call it from Python. I'm guessing that InputOutputArray arguments are now also mandatory arguments in the Python function signature, but I am unable to find a way to make them work.
Is there a way I can call it properly?
(Note: I know there are other ways I can run a PCA, and I'll probably end up with one of them. I'm just curious about how the new OpenCV bindings works.)
Simple answer:
mean, eigvec = cv2.PCACompute(data, mean=None)
With details:
Let search PCACompute the source first.Then find this:
// [modules/core/src/pca.cpp](L351-L360)
void cv::PCACompute(InputArray data, InputOutputArray mean,
OutputArray eigenvectors, int maxComponents)
{
CV_INSTRUMENT_REGION()
PCA pca;
pca(data, mean, 0, maxComponents);
pca.mean.copyTo(mean);
pca.eigenvectors.copyTo(eigenvectors);
}
OK, now we read the document:
C++: PCA& PCA::operator()(InputArray data, InputArray mean, int flags, int maxComponents=0)
Python: cv2.PCACompute(data[, mean[, eigenvectors[, maxComponents]]]) → mean, eigenvectors
Parameters:
data – input samples stored as the matrix rows or as the matrix columns.
mean – optional mean value; if the matrix is empty (noArray()), the mean is computed from the data.
flags –
operation flags; currently the parameter is only used to specify the data layout.
CV_PCA_DATA_AS_ROW indicates that the input samples are stored as matrix rows.
CV_PCA_DATA_AS_COL indicates that the input samples are stored as matrix columns.
maxComponents – maximum number of components that PCA should retain; by default, all the components are retained.
This to say,
## py
mean, eigvec = cv2.PCACompute(data, mean=None)
is equals to
// cpp
PCA pca;
pca(data, mean=noArray(), flags=CV_PCA_DATA_AS_ROW);
...

C++ Pointer to Numpy Array

Briefly:
Is there an efficient way to make a numpy array given a pointer in memory to the array, it's type, and the number of elements?
More detail:
I am working with a python framework which has an object.GetData() command that is supposed to return a pointer to the data (an array of 35,000 int8) of this object.
I'm supposed to be able efficiently load these integers to a numpy array through
arr = numpy.frombuffer(object.GetData(),count=35000,dtype="int8")
but this doesn't seem to work. I get an error message ValueError: buffer is smaller than requested size. Changing the length, I can get it to output an array, but typically less than 20 integers in length (usually 0 or 1 integers).
I believe I can access the pointer to the start of the array, in hex form, through
hex(id(object.GetData()))
which looks like it gives addresses (e.g. 0x10fd8c670) but I don't know if this is the actual address.
I'm more comfortable in python than c++, but there could be a bug in the c++ code. The c++ code for GetData is:
const _Tp* GetData() const
{
// Return a const pointer to the internal data
return (fData.size() > 0 ) ? &(fData)[0] : NULL;
}
where fdata is initialized as a VecType through:
VecType fData;
Right now I can access each element of the object's data through an object.At(i) command where i is the index of the data array of object, but it is very slow to load each element into a numpy array this way, and I'm dealing with a lot of data. For reference, the At command in the c++ code does this:
_Tp At(size_t i) const
{
return fData.at(i);
}
Any help would be appreciated. I don't have a ton of experience with pointers, and even less with pointers in python, but I would like to figure this out in python rather than re-write all my code in c++. Thanks!

Categories

Resources