Why does Python Array Module Process Strings and Lists Differently?

Why does Python Array Module Process Strings and Lists Differently? - python

I'm having trouble understanding the result of the following statements:
>>> from array import array
>>> array('L',[0xff,0xff,0xff,0xff])
array('L', [255L, 255L, 255L, 255L])
>>> from array import array
>>> array('L','\xff\xff\xff\xff')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: string length not a multiple of item size

You are running this on a 64-bit build of Python, on which array treats type code 'L' as a 64-bit unsigned integer.
>>> array('L','\xff\xff\xff\xff\xff\xff\xff\xff')
array('L', [18446744073709551615L])
The documentation isn't very clear. All it says is that 'L' is at least four bytes.

In the first case you are initializing the array from a list with 4 elements. That will give you an array with 4 elements: one for each value in the list.
In the second case you are initializing the array from a byte string: the bytes in the string will be copied directly into the array. The 'L' specifier creates an array of unsigned longs which have a minimum size of 4 bytes.
On my machine (Windows 64 bit Python 2.6) initializing from a 4 byte string works fine:
>>> a = array('L','\xff\xff\xff\xff')
>>> a.tostring()
'\xff\xff\xff\xff'
I guess whichever version of Python you are using has unsigned longs that are 8 bytes rather than 4. Try converting the array you created from a list back to a string and see how many bytes that contains:
>>> a = array('L',[0xff,0xff,0xff,0xff])
>>> a.tostring()
'\xff\x00\x00\x00\xff\x00\x00\x00\xff\x00\x00\x00\xff\x00\x00\x00'
P.S. I'm assuming that you are using Python 2.x, on Python 3.x you would have got a TypeError instead.

Related

Numpy arrax of integers from decimal to binary in python

I have a numpy array: ch=[1, 2, 3, 4, 20, 25]
How can i write it in: b'\x01\x02\x03\x04\x14\x19'
Note: i do not want to convert each integer to binary. Is there any function available to do it directly in one step?

You can use bytes bult-in and pass the sequence:
>>> ch=[1, 2, 3, 4, 20, 25]
>>> bytes(ch)
b'\x01\x02\x03\x04\x14\x19'
On a side note, what you are showing is a python list, not a numpy array.
But, if you want to operate on numpy array, you can first convert it to a python list:
>>> bytes(np.array(ch).tolist())
b'\x01\x02\x03\x04\x14\x19'
When you directly try to_bytes() on the numpy array for above data:
>>> np.array(ch).tobytes()
b'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00\x14\x00\x00\x00\x19\x00\x00\x00'
The above output is also right, the only difference is due to the data type, if you print it, you'll know that it's numpy.int32, which is 32 bit means 32/8=4 bytes i.e. the number of bytes required to represent each of the values.
>>> np.array(ch).dtype
dtype('int32')
If, you convert it to 8-bit i.e. 1 byte number, the output will be same, as using bytes bultin over a list:
>>> np.array(ch).astype(np.int8).tobytes()
b'\x01\x02\x03\x04\x14\x19'

The following will work:
b"".join([int(item).to_bytes(1, "big") for item in ch])
First you iterate over the NumPy array, convert each np.int32 object to int, call int.to_bytes() on it, returning 1 byte in big-endian order (you could also use "little" here), then joining them all together.
Alternatively, you could call list() on the array, then pass it to the built-in bytes() constructor:
bytes(list(ch))

Python not passing correct size of string to C

When I try passing a 16 character string from python to C and scramble it, I keep getting random error codes back.
s = ctypes.c_wchar_p("H86ETJJJJHGFTYHr")
print(libc.hash_password(s))
At the start of the code I added a statement to return the size of the string back to python,
however it keeps returning a value of 8
if (sizeof(my_string) != 17) return sizeof(my_string);
If I try to return a single element of the array, it will return a number, which I am assuming is the ascii value of the character, and the code does not error out.
This works for the last element as well, which is correctly recognised as a null.
The code works within C itself perfectly. So how could I get C to accept the correct size string, or python to accept the return string?
EDIT: Forgot to mention, when I do
sizeof(*my_string)
it returns a 1
EDIT 2:
Here is the function definition
unsigned char *hash_password(char *input_string)

In Python 3, "H86ETJJJJHGFTYHr" is a str object made up of Unicode codepoints. Your C function declaration is unsigned char *hash_password(char *input_string). Python str is marshaled as wchar_t* when passed via ctypes, not char*. Use a bytes object for that.
Assuming sizeof is ctypes.sizeof, it works like C and returns the size of the equivalent C object. for a c_wchar_p, that's a w_char_t*, and pointers typically have a size of 4 or 8 bytes (32- or 64-bit OS). It is not the length of the string.
It's also always a good idea to declare the arguments types and return type of a function when using ctypes, so it can check for type and number of arguments correctly, instead of guessing:
import ctypes
dll = ctypes.CDLL('./your.dll')
dll.hash_password.argtypes = ctypes.c_char_p,
dll.hash_password.restype = ctypes.c_char_p
A quick-and-dirty example (note printf returns length of string printed):
>>> from ctypes import *
>>> dll = CDLL('msvcrt')
>>> dll.printf('hello\n') # ctypes assume wchar_t* for str, so passes UTF16-encoded data
h1 # of 68 00 65 00 ... and prints only to first null, 1 char.
>>> dll.printf.argtypes=c_char_p, # tell ctypes the correct argument type
>>> dll.printf('hello\n') # now it detects str is incorrect.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ctypes.ArgumentError: argument 1: <class 'TypeError'>: wrong type
>>> dll.printf(b'hello\n') # pass bytes, and `char*` is marshaled to C
hello
6

In C sizeof doesn't ever return the length of the string it returns the size in memory of the variable.
For a string declared as
char *string;
Then string is a pointer to a character, and on your system it seems like pointers are 64 bits (i.e. 8 bits).
When you do *string in C you get the content of the first element that string points to - i.e. a single character.
To get the length of a string in C, use strlen(my_string).

sizeof returns the size of an object in memory. This is not the same thing as the length of a string.
In your C code, my_string is a pointer to a string. sizeof(my_string) is the size of this pointer: 8 bytes on a 64-bit machine. sizeof(*my_string) is the size of what my_string points to. Since you're getting 1, it likely means that there's another problem in your C code, which is that you're mixing up single-byte characters (char, whose size is always 1 by definition) and wide characters (wchar_t, whose size is almost always 2 or 4).
Your string is a null-terminated wide character string. To obtain its length in C, call wcslen. Note that this means that your whole string processing code must use wchar_t and wcsxxx functions. If your string is a byte string, use char, strlen and other functions that work on byte strings.

String Indices Must be Integers in Python

I am using Python to solve a contest problem. I am getting this error. I am fairly new and inexperienced with Python.
for kek in sorteddic:
lengthitem = int(len(kek))
questionstring = start[0, lengthitem]
kek is essentially the "item" in "sorteddic" which is an array of strings.
The error I am getting is:
questionstring = start[0, lengthitem]
TypeError: string indices must be integers
Can someone please help? Thanks.

It's because the item you're trying to use as an index, 0, lengthitem, is not an integer but a tuple of integers, as shown below:
>>> x = 1 : type(x)
<class 'int'>
>>> x = 1,2 : type(x)
<class 'tuple'>
If your intent is to get a slice of the array (not entirely clear but I'd warrant it's a fairly safe guess), the correct operator to use is :, as in:
questionstring = start[0:lengthitem]
or, since 0 is the default start point:
questionstring = start[:lengthitem]
The following transcript shows how your current snippet fails and the correct way to do it:
>>> print("ABCDE"[1])
B
>>> print("ABCDE"[1,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
>>> print("ABCDE"[1:3])
BC

Slice notation uses colons, not commas (unless you are in numpy where commas separate dimensions in slices, athough under the hood that is treated as a tuple of slice objects). So use:
questionstring = start[0:lengthitem]

64 bits Symbolic String (Hex) to 32 bit Symbolic String

I'm trying to convert 64-bit symbolic string to 32-bit. My advisor said, what that 64-bit symbolic string is a Hex string and I should only cut all the bits after 8th left bit. I decided to do it in Python by mask:
that_string & 0xFFFFFFFF
But it's impossible to do without converting string to int():
int('0x'+that_string, 16) & 0xFFFFFFFF
But then 'that_string' becomes a truly Integer and I can't convert her back to string. It's not possible to make chr(int('0x'+that_string, 16) & 0xFFFFFFFF), it causes in problem:
ValueError: chr() arg not in range(256)
Also it's not possible to do decode() because of another Error:
bash ~$: (that_string).decode('hex')
... File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/encodings/hex_codec.py", line 42, in hex_decode
output = binascii.a2b_hex(input)
TypeError: Odd-length string
I tried to search for another opportunity, but didn't get any suitable solution.
And, yes, I tried base64 - library, but my adviser said, what that's wrong idea and what that's not a solution.
I'm much embarrassed. Could You help me, please? May be, there are other ways exist, aren't they?

Take the rightmost 8 digits with a slice:
str[-8:]
where str is a string containing the hexadecimal representation of your input data.
If str has fewer than 8 digits, then they will all be returned by this expression, which is what you want.
For example:
>>> str = '0123456789abcdef'
>>> str[-8:]
'89abcdef'
>>> str = 'abcd'
>>> str[-8:]
'abcd'

converting hex to int, the 'L' character [duplicate]

This question already has answers here:
Python Trailing L Problem
(5 answers)
Closed 9 years ago.
I have a 64bit hex number and I want to convert it to unsigned integer. I run
>>> a = "ffffffff723b8640"
>>> int(a,16)
18446744071331087936L
So what is the 'L' at the end of the number?
Using the following commands also don't help
>>> int(a,16)[:-1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'long' object is unsubscriptable
>>> int(a,16).rstrip("L")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'long' object has no attribute 'rstrip'

Python2.x has 2 classes of integer (neither of them are unsigned btw). There is the usual class int which is based on your system's concept of an integer (often a 4-byte integer). There's also the arbitrary "precision" type of integer long. They behave the same in almost1 all circumstances and int objects automatically convert to long if they overflow. Don't worry about the L in the representation -- It just means your integer is too big for int (there was an Overflow) so python automatically created a long instead.
It is also worth pointing out that in python3.x, they removed python2.x's int in favor of always using long. Since they're now always using long, they renamed it to int as that name is much more common in code. PEP-237 gives more rational behind this decision.
1The only time they behave differently that I can think of is that long's __repr__ adds that extra L on the end that you're seeing.

You are trying to apply string methods to an integer. But the string representation of a long integer doesn't have the L at the end:
In [1]: a = "ffffffff723b8640"
In [2]: int(a, 16)
Out[2]: 18446744071331087936L
In [3]: str(int(a, 16))
Out[3]: '18446744071331087936'
The __repr__ does, though (as #mgilson notes):
In [4]: repr(int(a, 16))
Out[4]: '18446744071331087936L'
In [5]: repr(int(a, 16))[:-1]
Out[5]: '18446744071331087936'

you can't call rstrip on an integer, you have to call it on the string representation of the integer.
>>> a = "ffffffff723b8640"
>>> b = int(a,16)
>>> c = repr(b).rstrip("L")
>>> c
'18446744071331087936'
Note however, that this would only be for displaying the number or something. Turning the string back into an integer will append the 'L' again:
>>> int(c)
18446744071331087936L

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does Python Array Module Process Strings and Lists Differently? - python

You are running this on a 64-bit build of Python, on which array treats type code 'L' as a 64-bit unsigned integer. >>> array('L','\xff\xff\xff\xff\xff\xff\xff\xff') array('L', [18446744073709551615L]) The documentation isn't very clear. All it says is that 'L' is at least four bytes.

Related

Numpy arrax of integers from decimal to binary in python

Python not passing correct size of string to C

String Indices Must be Integers in Python

64 bits Symbolic String (Hex) to 32 bit Symbolic String

converting hex to int, the 'L' character [duplicate]

Categories

Resources