PyRun_StringFlags is slow - python

I'm writing a Python C extension, and I need to convert some string into Python object, such as str "(1001,1.0,1)" to list (1001,1.0,1).
Now I'm using PyRun_StringFlags function to get py_object, but I found this is not fast enough for me, is there any other way to do so?

Related

Python Multiprocessing & ctype arrays

I'm trying to do some work on a file, the file has various data in it, and I'm pulling it in in string/raw format, and then working on the strings.
I'm trying to make the process multithreaded, so I can work on several chunks at once, but of course the files are quite large, several gigabytes, so memory is an issue.
The processes don't need to modify the input data, so they don't need their own copies. However, I don't know how to make an array of strings as a ctype in Python 2.7.
Currently I have:
import multiprocessing, ctypes
from multiprocessing.sharedctypes import Value, Array
with open('test.txt', 'r') as fin:
rawdata = Array('c', fin.readlines(), lock=False)
But this doesn't work as I'd hoped, it sees the whole thing as one massive char buffer array and fails as it wants a single string object. I need to be able to pull out the original lines and work with them with existing python code that examines the contents of the lines and does some operations, which vary from substring matching, to pulling out integer and float values from the strings for mathematical operations. Is there any sensible way I can achieve this that I'm missing? Perhaps I'm using the wrong item (Array), to push the data to a shared c format?
Do you want your strings to end up as Python strings, or as c-style strings a.k.a. null-terminated character arrays? If you're working with python string processing, then simply reading the file into a non-ctypes python string and using that everywhere is the way to go -- python doesn't copy strings by default, since they're immutable anyway. If you want to use c-style strings, then you will want to allocate a character buffer using ctypes, and use fin.readinto(buffer).

Efficiently slicing a string in Python3

Since python does slice-by-copy, slicing strings can be very costly.
I have a recursive algorithm that is operating on strings. Specifically, if a function is passed a string a, the function calls itself on a[1:] of the passed string. The hangup is that the strings are so long, the slice-by-copy mechanism is becoming a very costly way to remove the first character.
Is there a way to get around this, or do I need to rewrite the algorithm entirely?
The only way to get around this in general is to make your algorithm uses bytes-like types, either Py2 str or Py3 bytes; views of Py2 unicode/Py3 str are not supported. I provided details on how to do this on my answer to a related question, but the short version is, if you can assume bytes-like arguments (or convert to them), wrapping the argument in a memoryview and slicing is a reasonable solution. Once converted to a memoryview, slicing produces new memoryviews with O(1) cost (in both time and memory), rather than the O(n) time/memory cost of text slicing.

Returning more than one output in C extensions?

Python allows to return more than one result using commas as separating value.
When developing a CPython extension written in C language, is it possible to obtain the same result? How?
I'm developing a CPython extension that replaces an existing Python code to do some tests on performance and I will prefer to have the same interface to not change the existing code base.
I'm using Python 3.6.
Yes, you can create a tuple with PyTuple_New, populate it and return it. The callee will be able to unpack the result as usual.
Python returns multiple values by using containers. A single object is returned that is unpacked. Comma separated means tuple; in square brackets, on the other hand, a list is created. See How does Python return multiple values from a function for more on this.
If you'd like an example of how one might do this in C you can take a look at the implementation of str.partition or array.buffer_info (or any tuple returning built-in method).

How do I make my program acknowledge that a variable contains an integer in Python

I'm new to using Python (and dynamically typed languages in general) and I'm having trouble with the my variables being incorrectly-typed at run time. The program I've written accepts 6 variables (all should be integers) and performs a series of calculations using them. However, the interpreter refuses to perform the first multiplication because it believes the variables are type 'str'. Even when I enter integers for all values it breaks at run-time and claims I've entered strings. Shouldn't Python treat anything that walks and quacks like an int as if it were an int?
Thanks in advance.
PS: I'm running Python 3.4.0, if that helps.
input() always returns a string. If you wanted to have an integer, convert your input.
variable = int(variable)
Python doesn't coerce, you need to convert explicitly. Dynamic typing doesn't mean Python will read your mind. :-)
You can think of it this way: "Duck Typing" applies to the type of a variable, not of the variable's contents. A string variable is something that can for example be indexed with [] or added to other strings with + and even repeated several times with * {some integer}, but you can't add a string to an integer, even if the string happens to be a number.
The number-ness of a string has nothing to do with the type.

how to translate "0x%llx" from C to python

I'm reading some strings from a memory buffer, written by a C program. I need to fetch them using python and print them. however when I encounter a string containing %llx python does not know how to parse this:
"unsupported format character 'l' (0x6c) at index 14"
I could use replace('%llx','%x') but than it would not be a long long.. would python handle this correctly in this case?
than it would not be a long long
Python (essentially) doesn't have any concept of a long long. If you're pulling long longs from C code, just use %x and be done with it -- you're not ever going to get values from the C code that are out of the long long range, the only issue that could arise is if you were trying to send them from Python code into C. Just use (with a new-style format string):
print('{0:x}'.format(your_int))
Tested on both Python v3.3.3 and v2.7.6 :
>>> print('%x' % 523433939134152323423597861958781271347434)
6023bedba8c47434c84785469b1724910ea

Categories

Resources