What is a bytearray? Why was it used? - python

I'm going over other people's code in CoderByte exercises. I was just reviewing the first exercise to review a string.
Here is the code:
def FirstReverse(s):
ar = bytearray(s)
ar.reverse()
return str(ar)
print FirstReverse("Argument goes here")
I printed ar after the first line and just got the string back so I'm unclear how the bytearray helped. I also still didn't understand it after reading the documentation here: https://docs.python.org/2/library/functions.html#bytearray
So what is a bytearray? Did it make sense to use it in this example?

As the doc says,
Return a new array of bytes. ... is a mutable sequence of integers in the range 0 <= x < 256
For example,
>>> s = 'hello world'
>>> print bytearray(s)
hello world
>>> bytearray(s)[0]
104
and 104 is the ASCII side of h.
Class bytearray has the method reverse, but string doesn't. In order to reverse the string, this code first gets its bytes array, and then reserves, finally gets the reversed string by str.
In addition, you can use [::-1] to reverse a string.
>>> 'Argument goes here'[::-1]
'ereh seog tnemugrA'

The difference between a str and a bytearray is that a str is a sequence of Unicode code points, whereas a bytearray is a sequence of bytes. A single Unicode String may be represented by multiple different bytearrays, depending on the encoding format (e.g. there would be different bytearrays for the UTF-8 representation and the UTF-16 representation of the same str). In addition, str is intended to represent text; by contrast, bytearray may be used to represent arbitrary byte sequences that do not correspond to text at all (e.g. sequences of bytes that are not valid Unicode in any standard encoding format and that will, in fact, be interpreted as something completely different from text altogether such as integer sequences, serialized objects, extended precision integers, or anything else you would want to represent as a sequence of bytes).
In addition to this distinction, str is immutable whereas bytearray is mutable. This means that transformations on str necessarily perform copying operations; by contrast, the contents of a bytearray may be updated / modified in place.
In this particular example, there really is no reason to use a bytearray (and in fact, doing that is more dangerous than using a reversed slice of str, because bytearray.reverse() reverses the underlying bytes... for characters that are encoded by more than one byte, this may result in totally invalid Unicode sequences when interpreting back into Unicode code points). However, if you want to examine or manipulate the encoded form of a string or perform something that is totally unrelated to raw text (like populate the bytes of a datagram packet), that would be a use case for bytearray.

I don't see how it helped personally. You can do this type of reversal natively with a string by just slicing it with a step size of -1:
def FirstReverse(s):
return s[::-1]
print FirstReverse("Argument goes here")
I timed the bytearray version and this version using Python 2.7.10 and didn't see one being faster than the other.
So I guess it is a different approach, but I don't see it as a better approach.
The only advantage I could see is if the string were unicode and you are using Python 2.x instead of 3.x (because Python 2.x strings were not natively unicode). However, to pull a unicode string into a bytearray, you need to specify the encoding, which wasn't done here. So it must not have been for that purpose.

Related

Convert a byte array to a literal string [duplicate]

In a python source code I stumbled upon I've seen a small b before a string like in:
b"abcdef"
I know about the u prefix signifying a unicode string, and the r prefix for a raw string literal.
What does the b stand for and in which kind of source code is it useful as it seems to be exactly like a plain string without any prefix?
The b prefix signifies a bytes string literal.
If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.
bytes objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Any bytes outside the printable range of ASCII characters are shown as escape sequences (e.g. \n, \x82, etc.). Inversely, you can use both ASCII characters and escape sequences to define byte values; for ASCII values their numeric value is used (e.g. b'A' == b'\x41')
Because a bytes object consist of a sequence of integers, you can construct a bytes object from any other sequence of integers with values in the 0-255 range, like a list:
bytes([72, 101, 108, 108, 111])
and indexing gives you back the integers (but slicing produces a new bytes value; for the above example, value[0] gives you 72, but value[:1] is b'H' as 72 is the ASCII code point for the capital letter H).
bytes model binary data, including encoded text. If your bytes value does contain text, you need to first decode it, using the correct codec. If the data is encoded as UTF-8, for example, you can obtain a Unicode str value with:
strvalue = bytesvalue.decode('utf-8')
Conversely, to go from text in a str object to bytes you need to encode. You need to decide on an encoding to use; the default is to use UTF-8, but what you will need is highly dependent on your use case:
bytesvalue = strvalue.encode('utf-8')
You can also use the constructor, bytes(strvalue, encoding) to do the same.
Both the decoding and encoding methods take an extra argument to specify how errors should be handled.
Python 2, versions 2.6 and 2.7 also support creating string literals using b'..' string literal syntax, to ease code that works on both Python 2 and 3.
bytes objects are immutable, just like str strings are. Use a bytearray() object if you need to have a mutable bytes value.
This is Python3 bytes literal. This prefix is absent in Python 2.5 and older (it is equivalent to a plain string of 2.x, while plain string of 3.x is equivalent to a literal with u prefix in 2.x). In Python 2.6+ it is equivalent to a plain string, for compatibility with 3.x.

Python convert strings of bytes to byte array

For example given an arbitrary string. Could be chars or just random bytes:
string = '\xf0\x9f\xa4\xb1'
I want to output:
b'\xf0\x9f\xa4\xb1'
This seems so simple, but I could not find an answer anywhere. Of course just typing the b followed by the string will do. But I want to do this runtime, or from a variable containing the strings of byte.
if the given string was AAAA or some known characters I can simply do string.encode('utf-8'), but I am expecting the string of bytes to just be random. Doing that to '\xf0\x9f\xa4\xb1' ( random bytes ) produces unexpected result b'\xc3\xb0\xc2\x9f\xc2\xa4\xc2\xb1'.
There must be a simpler way to do this?
Edit:
I want to convert the string to bytes without using an encoding
The Latin-1 character encoding trivially (and unlike every other encoding supported by Python) encodes every code point in the range 0x00-0xff to a byte with the same value.
byteobj = '\xf0\x9f\xa4\xb1'.encode('latin-1')
You say you don't want to use an encoding, but the alternatives which avoid it seem far inferior.
The UTF-8 encoding is unsuitable because, as you already discovered, code points above 0x7f map to a sequence of multiple bytes (up to four bytes) none of which are exactly the input code point as a byte value.
Omitting the argument to .encode() (as in a now-deleted answer) forces Python to guess an encoding, which produces system-dependent behavior (probably picks UTF-8 on most systems except Windows, where it will typically instead choose something much more unpredictable, as well as usually much more sinister and horrible).
I found a working solution
import struct
def convert_string_to_bytes(string):
bytes = b''
for i in string:
bytes += struct.pack("B", ord(i))
return bytes
string = '\xf0\x9f\xa4\xb1'
print (convert_string_to_bytes(string)))
output:
b'\xf0\x9f\xa4\xb1'

How to convert int to hex (\x) not (0x) [duplicate]

In a python source code I stumbled upon I've seen a small b before a string like in:
b"abcdef"
I know about the u prefix signifying a unicode string, and the r prefix for a raw string literal.
What does the b stand for and in which kind of source code is it useful as it seems to be exactly like a plain string without any prefix?
The b prefix signifies a bytes string literal.
If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.
bytes objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Any bytes outside the printable range of ASCII characters are shown as escape sequences (e.g. \n, \x82, etc.). Inversely, you can use both ASCII characters and escape sequences to define byte values; for ASCII values their numeric value is used (e.g. b'A' == b'\x41')
Because a bytes object consist of a sequence of integers, you can construct a bytes object from any other sequence of integers with values in the 0-255 range, like a list:
bytes([72, 101, 108, 108, 111])
and indexing gives you back the integers (but slicing produces a new bytes value; for the above example, value[0] gives you 72, but value[:1] is b'H' as 72 is the ASCII code point for the capital letter H).
bytes model binary data, including encoded text. If your bytes value does contain text, you need to first decode it, using the correct codec. If the data is encoded as UTF-8, for example, you can obtain a Unicode str value with:
strvalue = bytesvalue.decode('utf-8')
Conversely, to go from text in a str object to bytes you need to encode. You need to decide on an encoding to use; the default is to use UTF-8, but what you will need is highly dependent on your use case:
bytesvalue = strvalue.encode('utf-8')
You can also use the constructor, bytes(strvalue, encoding) to do the same.
Both the decoding and encoding methods take an extra argument to specify how errors should be handled.
Python 2, versions 2.6 and 2.7 also support creating string literals using b'..' string literal syntax, to ease code that works on both Python 2 and 3.
bytes objects are immutable, just like str strings are. Use a bytearray() object if you need to have a mutable bytes value.
This is Python3 bytes literal. This prefix is absent in Python 2.5 and older (it is equivalent to a plain string of 2.x, while plain string of 3.x is equivalent to a literal with u prefix in 2.x). In Python 2.6+ it is equivalent to a plain string, for compatibility with 3.x.

why adding b, before string solves TypeError: startswith first arg must be bytes or a tuple of bytes, not str [duplicate]

In a python source code I stumbled upon I've seen a small b before a string like in:
b"abcdef"
I know about the u prefix signifying a unicode string, and the r prefix for a raw string literal.
What does the b stand for and in which kind of source code is it useful as it seems to be exactly like a plain string without any prefix?
The b prefix signifies a bytes string literal.
If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.
bytes objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Any bytes outside the printable range of ASCII characters are shown as escape sequences (e.g. \n, \x82, etc.). Inversely, you can use both ASCII characters and escape sequences to define byte values; for ASCII values their numeric value is used (e.g. b'A' == b'\x41')
Because a bytes object consist of a sequence of integers, you can construct a bytes object from any other sequence of integers with values in the 0-255 range, like a list:
bytes([72, 101, 108, 108, 111])
and indexing gives you back the integers (but slicing produces a new bytes value; for the above example, value[0] gives you 72, but value[:1] is b'H' as 72 is the ASCII code point for the capital letter H).
bytes model binary data, including encoded text. If your bytes value does contain text, you need to first decode it, using the correct codec. If the data is encoded as UTF-8, for example, you can obtain a Unicode str value with:
strvalue = bytesvalue.decode('utf-8')
Conversely, to go from text in a str object to bytes you need to encode. You need to decide on an encoding to use; the default is to use UTF-8, but what you will need is highly dependent on your use case:
bytesvalue = strvalue.encode('utf-8')
You can also use the constructor, bytes(strvalue, encoding) to do the same.
Both the decoding and encoding methods take an extra argument to specify how errors should be handled.
Python 2, versions 2.6 and 2.7 also support creating string literals using b'..' string literal syntax, to ease code that works on both Python 2 and 3.
bytes objects are immutable, just like str strings are. Use a bytearray() object if you need to have a mutable bytes value.
This is Python3 bytes literal. This prefix is absent in Python 2.5 and older (it is equivalent to a plain string of 2.x, while plain string of 3.x is equivalent to a literal with u prefix in 2.x). In Python 2.6+ it is equivalent to a plain string, for compatibility with 3.x.

What does a b prefix before a python string mean?

In a python source code I stumbled upon I've seen a small b before a string like in:
b"abcdef"
I know about the u prefix signifying a unicode string, and the r prefix for a raw string literal.
What does the b stand for and in which kind of source code is it useful as it seems to be exactly like a plain string without any prefix?
The b prefix signifies a bytes string literal.
If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.
bytes objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Any bytes outside the printable range of ASCII characters are shown as escape sequences (e.g. \n, \x82, etc.). Inversely, you can use both ASCII characters and escape sequences to define byte values; for ASCII values their numeric value is used (e.g. b'A' == b'\x41')
Because a bytes object consist of a sequence of integers, you can construct a bytes object from any other sequence of integers with values in the 0-255 range, like a list:
bytes([72, 101, 108, 108, 111])
and indexing gives you back the integers (but slicing produces a new bytes value; for the above example, value[0] gives you 72, but value[:1] is b'H' as 72 is the ASCII code point for the capital letter H).
bytes model binary data, including encoded text. If your bytes value does contain text, you need to first decode it, using the correct codec. If the data is encoded as UTF-8, for example, you can obtain a Unicode str value with:
strvalue = bytesvalue.decode('utf-8')
Conversely, to go from text in a str object to bytes you need to encode. You need to decide on an encoding to use; the default is to use UTF-8, but what you will need is highly dependent on your use case:
bytesvalue = strvalue.encode('utf-8')
You can also use the constructor, bytes(strvalue, encoding) to do the same.
Both the decoding and encoding methods take an extra argument to specify how errors should be handled.
Python 2, versions 2.6 and 2.7 also support creating string literals using b'..' string literal syntax, to ease code that works on both Python 2 and 3.
bytes objects are immutable, just like str strings are. Use a bytearray() object if you need to have a mutable bytes value.
This is Python3 bytes literal. This prefix is absent in Python 2.5 and older (it is equivalent to a plain string of 2.x, while plain string of 3.x is equivalent to a literal with u prefix in 2.x). In Python 2.6+ it is equivalent to a plain string, for compatibility with 3.x.

Categories

Resources