How to get the size of a string in Python? - python

For example, I get a string:
str = "please answer my question"
I want to write it to a file.
But I need to know the size of the string before writing the string to the file. What function can I use to calculate the size of the string?

If you are talking about the length of the string, you can use len():
>>> s = 'please answer my question'
>>> len(s) # number of characters in s
25
If you need the size of the string in bytes, you need sys.getsizeof():
>>> import sys
>>> sys.getsizeof(s)
58
Also, don't call your string variable str. It shadows the built-in str() function.

Python 3:
user225312's answer is correct:
A. To count number of characters in str object, you can use len() function:
>>> print(len('please anwser my question'))
25
B. To get memory size in bytes allocated to store str object, you can use sys.getsizeof() function
>>> from sys import getsizeof
>>> print(getsizeof('please anwser my question'))
50
Python 2:
It gets complicated for Python 2.
A. The len() function in Python 2 returns count of bytes allocated to store encoded characters in a str object.
Sometimes it will be equal to character count:
>>> print(len('abc'))
3
But sometimes, it won't:
>>> print(len('йцы')) # String contains Cyrillic symbols
6
That's because str can use variable-length encoding internally. So, to count characters in str you should know which encoding your str object is using. Then you can convert it to unicode object and get character count:
>>> print(len('йцы'.decode('utf8'))) #String contains Cyrillic symbols
3
B. The sys.getsizeof() function does the same thing as in Python 3 - it returns count of bytes allocated to store the whole string object
>>> print(getsizeof('йцы'))
27
>>> print(getsizeof('йцы'.decode('utf8')))
32

>>> s = 'abcd'
>>> len(s)
4

You also may use str.len() to count length of element in the column
data['name of column'].str.len()

The most Pythonic way is to use the len(). Keep in mind that the '\' character in escape sequences is not counted and can be dangerous if not used correctly.
>>> len('foo')
3
>>> len('\foo')
3
>>> len('\xoo')
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape

Do you want to find the length of the string in python language ? If you want to find the length of the word, you can use the len function.
string = input("Enter the string : ")
print("The string length is : ",len(string))
OUTPUT : -
Enter the string : viral
The string length is : 5

Related

Is it possible to chr() a string?

I am sure this question has already been asked, so forgive me for the duplicate.
Python's chr() function returns the unicode string representation of 1 ordinal value. How can I return a unicode string of a string of ordinals? For example:
john:
j - 106
o - 111
h - 104
n - 110
The full unicode string is: 106111104110
My current method is:
from textwrap import wrap
ct = "106111104110" # unicode string
Split = wrap(ct,3) # split into threes list
inInt = list(map(int, Split)) # convert list of string into list of int
answer=''.join([chr(num) for num in inInt]) # return unicode string for each 3 character string
print(answer)
The above works correctly, printing "john".
However this does not work when the unicode for the value is less than 3 characters, or less than 100. For example:
apple:
a - 97
p - 112
p - 112
l - 108
e - 101
The full unicode string is: 97112112108101
However doing:
ct="97112112108101"
Split = wrap(ct,3)
inInt = list(map(int, Split))
answer=''.join([chr(num) for num in inInt])
print(answer)
will print ϋyyQ because the unicode of a is 97, which is only 2 characters. I would like to not be constricted to using only characters over 100.
Is there a python library that has the functionality I am looking for? Many thanks in advance.
Unicode code points can be up to six hexadecimal digits or seven decimal digits, so you could use leading zeros for consistency:
>>> ''.join(format(ord(x),'06x') for x in 'john')
'00006a00006f00006800006e'
>>> ''.join(chr(int(_[i:i+6],16)) for i in range(0,len(_),6)) # _ gets previous result from REPL.
'john'
>>> ''.join(format(ord(x),'06x') for x in '你好吗')
'004f6000597d005417'
>>> ''.join(chr(int(_[i:i+6],16)) for i in range(0,len(_),6))
'你好吗'
However, typical encoding is performed on byte strings, so encode to UTF-8 first, then you can use bytes methods to get two-digit hex strings:
>>> 'apple'.encode('utf8').hex()
'6170706c65'
>>> bytes.fromhex(_).decode()
'apple'
>>> '你好吗'.encode('utf8').hex()
'e4bda0e5a5bde59097'
>>> bytes.fromhex(_).decode('utf8')
'你好吗'

How to add a character before a string?

I am new and I'm trying to insert a character before a string.
If I have a string like so:
'wB0JSYuEUshUkgpKi8TRTwv/EABgBAQADAQAAAAAAA'
I want to add b before the string but not part of the string-like so:
b'wB0JSYuEUshUkgpKi8TRTwv/EABgBAQADAQAAAAAAA'
Here's what I tried:
test = 'b' + words[1]
test
but this obviously returns the b within the string, which is not what I want.
That b is not part of the string, it's a special syntax in Python 3.x to indicate that it's a bytes literal (see this post). If you want to convert a "normal" string into a bytes literal, do this:
st = 'abc'
bl = st.encode()
bl
=> b'abc'
I'm not exactly sure what you mean. But assuming words is a list of strings, and index 1 = 'wB0JSYuEUshUkgpKi8TRTwv/EABgBAQADAQAAAAAAA' you could print(f'b {words[1]}')
There is a bit of confusion here. In python "" is a string and b"" is a byte string. These are completely different objects. They can be converted to one another, but they are not the same thing. You can't add "b to a string". Essentially a byte string b"" is a string of the bytes that generate a string, and a string is well the string. For example,
x = 'STRING' #The string itself.
y = x.encode() #The bytes for the string. Note that ascii bytes are written in ascii.
a = 'MyName®' #The string itself.
b = a.encode() #The bytes for the string. The last character takes two non-ascii bytes.
c = b.decode() #Covert the bytes back to a string.

How to convert byte string with non-printable chars to hexadecimal in python? [duplicate]

This question already has answers here:
What's the correct way to convert bytes to a hex string in Python 3?
(9 answers)
Closed 7 years ago.
I have an ANSI string Ď–ór˙rXüď\ő‡íQl7 and I need to convert it to hexadecimal like this:
06cf96f30a7258fcef5cf587ed51156c37 (converted with XVI32).
The problem is that Python cannot encode all characters correctly (some of them are incorrectly displayed even here, on Stack Overflow) so I have to deal with them with a byte string.
So the above string is in bytes this: b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
And that's what I need to convert to hexadecimal.
So far I tried binascii with no success, I've tried this:
h = ""
for i in b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7':
h += hex(i)
print(h)
It prints:
0x60xcf0x960xf30xa0x720x830xff0x720x580xfc0xef0x5c0xf50x870xed0x510x150x6c0x37
Okay. It looks like I'm getting somewhere... but what's up with the 0x thing?
When I remove 0x from the string like this:
h.replace("0x", "")
I get 6cf96f3a7283ff7258fcef5cf587ed51156c37 which looks like it's correct.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
Any ideas?
If you're running python 3.5+, bytes type has an new bytes.hex() method that returns string representation.
>>> h = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> h.hex()
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
Otherwise you can use binascii.hexlify() to do the same thing
>>> import binascii
>>> binascii.hexlify(h).decode('utf8')
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
As per the documentation, hex() converts “an integer number to a lowercase hexadecimal string prefixed with ‘0x’.” So when using hex() you always get a 0x prefix. You will always have to remove that if you want to concatenate multiple hex representations.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
That does not make any sense. x is not a valid hexadecimal character, so in your solution it can only be generated by the hex() call. And that, as said above, will always create a 0x. So the sequence 0x can never appear in a different way in your resulting string, so replacing 0x by nothing should work just fine.
The actual problem in your solution is that hex() does not enforce a two-digit result, as simply shown by this example:
>>> hex(10)
'0xa'
>>> hex(2)
'0x2'
So in your case, since the string starts with b\x06 which represents the number 6, hex(6) only returns 0x6, so you only get a single digit here which is the real cause of your problem.
What you can do is use format strings to perform the conversion to hexadecimal. That way you can both leave out the prefix and enforce a length of two digits. You can then use str.join to combine it all into a single hexadecimal string:
>>> value = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> ''.join(['{:02x}'.format(x) for x in value])
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
This solution does not only work with a bytes string but with really anything that can be formatted as a hexadecimal string (e.g. an integer list):
>>> value = [1, 2, 3, 4]
>>> ''.join(['{:02x}'.format(x) for x in value])
'01020304'

How to remove '\x' from a hex string in Python?

I'm reading a wav audio file in Python using wave module. The readframe() function in this library returns frames as hex string. I want to remove \x of this string, but translate() function doesn't work as I want:
>>> input = wave.open(r"G:\Workspace\wav\1.wav",'r')
>>> input.readframes (1)
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\\x')
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\x')
ValueError: invalid \x escape
>>> '\xff\x1f\x00\xe8'.translate(None,r'\x')
'\xff\x1f\x00\xe8'
>>>
Any way I want divide the result values by 2 and then add \x again and generate a new wav file containing these new values. Does any one have any better idea?
What's wrong?
Indeed, you don't have backslashes in your string. So, that's why you can't remove them.
If you try to play with each hex character from this string (using ord() and len() functions - you'll see their real values. Besides, the length of your string is just 4, not 16.
You can play with several solutions to achieve your result:
'hex' encode:
'\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
Or use repr() function:
repr('\xff\x1f\x00\xe8').translate(None,r'\\x')
One way to do what you want is:
>>> s = '\xff\x1f\x00\xe8'
>>> ''.join('%02x' % ord(c) for c in s)
'ff1f00e8'
The reason why translate is not working is that what you are seeing is not the string itself, but its representation. In other words, \x is not contained in the string:
>>> '\\x' in '\xff\x1f\x00\xe8'
False
\xff, \x1f, \x00 and \xe8 are the hexadecimal representation of for characters (in fact, len(s) == 4, not 24).
Use the encode method:
>>> s = '\xff\x1f\x00\xe8'
>>> print s.encode("hex")
'ff1f00e8'
As this is a hexadecimal representation, encode with hex
>>> '\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'

how to represent a number value as a string in python?

In Python, how can I represent an integer value (<256) as a string? For example:
i = 10
How can I create a string "s" that is one-byte long, and the byte has the value 10?
to clarify, I do not want a string "10". I want a string that its 1st (and only) byte has the value of 10.
by the way, I cannot create the string statically:
s = '\x0A'
because the value is not pre-defined. It is a dynamic number value.
You can use chr() function as:
>>> chr(60)
'<'
>>> chr(97)
'a'
>>> chr(67)
'C'
To convert back use ord() funtion as:
>>> ord('C')
67
In Python 2.x, you want:
s = chr(10)
In Python 3.x, strings are Unicode, so you want:
s = bytes([10])
why don't you just use chr?
chr(10)
Out[41]: '\n'
chr(255)
Out[42]: '\xff'
Found another answer using struct module working for Python 2:
import struct
struct.pack('B', i)

Categories

Resources