I'm reading a wav audio file in Python using wave module. The readframe() function in this library returns frames as hex string. I want to remove \x of this string, but translate() function doesn't work as I want:
>>> input = wave.open(r"G:\Workspace\wav\1.wav",'r')
>>> input.readframes (1)
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\\x')
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\x')
ValueError: invalid \x escape
>>> '\xff\x1f\x00\xe8'.translate(None,r'\x')
'\xff\x1f\x00\xe8'
>>>
Any way I want divide the result values by 2 and then add \x again and generate a new wav file containing these new values. Does any one have any better idea?
What's wrong?
Indeed, you don't have backslashes in your string. So, that's why you can't remove them.
If you try to play with each hex character from this string (using ord() and len() functions - you'll see their real values. Besides, the length of your string is just 4, not 16.
You can play with several solutions to achieve your result:
'hex' encode:
'\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
Or use repr() function:
repr('\xff\x1f\x00\xe8').translate(None,r'\\x')
One way to do what you want is:
>>> s = '\xff\x1f\x00\xe8'
>>> ''.join('%02x' % ord(c) for c in s)
'ff1f00e8'
The reason why translate is not working is that what you are seeing is not the string itself, but its representation. In other words, \x is not contained in the string:
>>> '\\x' in '\xff\x1f\x00\xe8'
False
\xff, \x1f, \x00 and \xe8 are the hexadecimal representation of for characters (in fact, len(s) == 4, not 24).
Use the encode method:
>>> s = '\xff\x1f\x00\xe8'
>>> print s.encode("hex")
'ff1f00e8'
As this is a hexadecimal representation, encode with hex
>>> '\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
The general problem is that I need the hexadecimal string stays in that format to assign it to a variable and not to save the coding?
no good:
>>> '\x61\x74'
'at'
>>> a = '\x61\x74'
>>> a
'at'
works well, but is not as:
>>> '\x61\x74'
'\x61\x74' ????????
>>> a = '\x61\x74'
>>> a
'\x61\x74' ????????
Use r prefix (explained on SO)
a = r'\x61\x74'
b = '\x61\x74'
print (a) #prints \x61\x74
print (b) # prints at
It is the same data. Python lets you specify a literal string using different methods, one of which is to use escape codes to represent bytes.
As such, '\x61' is the same character value as 'a'. Python just chooses to show printable ASCII characters as printable ASCII characters instead of the escape code, just because that makes working with bytestrings that much easier.
If you need the literal slash, x character and the two digit 6 and 1 characters (so a string of length 4), you need to double the slash or use raw strings.
To illustrate:
>>> '\x61' == 'a' # two notations for the same value
True
>>> len('\x61') # it's just 1 character
1
>>> '\\x61' # escape the escape
'\\x61'
>>> r'\x61' # or use a raw literal instead
'\\x61'
>>> len('\\x61') # which produces 4 characters
4
By hex-string, it is a regular string except every two characters represents some byte, which is mapped to some ASCII char.
So for example the string
abc
Would be represented as
979899
I am looking at the binascii module but don't really know how to take the hex-string and turn it back into the ascii string.
Which method can I use?
Note: I am starting with 979899 and want to convert it back to abc
You can use ord() to get the integer value of each character:
>>> map(ord, 'abc')
[97, 98, 99]
>>> ''.join(map(lambda c: str(ord(c)), 'asd'))
'979899'
>>> ''.join((str(ord(c)) for c in 'abc'))
'979899'
You don't need binascii to get the integer representation of a character in a string, all you need is the built in function ord().
s = 'abc'
print(''.join(map(lambda x:str(ord(x)),s))) # outputs "979899"
To get the string back from the hexadecimal number you can use
s=str(616263)
print "".join([chr(int(s[x:x+2], 16)) for x in range(0,len(s),2)])
See http://ideone.com/dupgs
Is there a way to convert a string to lowercase?
"Kilometers" → "kilometers"
Use str.lower():
"Kilometer".lower()
The canonical Pythonic way of doing this is
>>> 'Kilometers'.lower()
'kilometers'
However, if the purpose is to do case insensitive matching, you should use case-folding:
>>> 'Kilometers'.casefold()
'kilometers'
Here's why:
>>> "Maße".casefold()
'masse'
>>> "Maße".lower()
'maße'
>>> "MASSE" == "Maße"
False
>>> "MASSE".lower() == "Maße".lower()
False
>>> "MASSE".casefold() == "Maße".casefold()
True
This is a str method in Python 3, but in Python 2, you'll want to look at the PyICU or py2casefold - several answers address this here.
Unicode Python 3
Python 3 handles plain string literals as unicode:
>>> string = 'Километр'
>>> string
'Километр'
>>> string.lower()
'километр'
Python 2, plain string literals are bytes
In Python 2, the below, pasted into a shell, encodes the literal as a string of bytes, using utf-8.
And lower doesn't map any changes that bytes would be aware of, so we get the same string.
>>> string = 'Километр'
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.lower()
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.lower()
Километр
In scripts, Python will object to non-ascii (as of Python 2.5, and warning in Python 2.4) bytes being in a string with no encoding given, since the intended coding would be ambiguous. For more on that, see the Unicode how-to in the docs and PEP 263
Use Unicode literals, not str literals
So we need a unicode string to handle this conversion, accomplished easily with a unicode string literal, which disambiguates with a u prefix (and note the u prefix also works in Python 3):
>>> unicode_literal = u'Километр'
>>> print(unicode_literal.lower())
километр
Note that the bytes are completely different from the str bytes - the escape character is '\u' followed by the 2-byte width, or 16 bit representation of these unicode letters:
>>> unicode_literal
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> unicode_literal.lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
Now if we only have it in the form of a str, we need to convert it to unicode. Python's Unicode type is a universal encoding format that has many advantages relative to most other encodings. We can either use the unicode constructor or str.decode method with the codec to convert the str to unicode:
>>> unicode_from_string = unicode(string, 'utf-8') # "encoding" unicode from string
>>> print(unicode_from_string.lower())
километр
>>> string_to_unicode = string.decode('utf-8')
>>> print(string_to_unicode.lower())
километр
>>> unicode_from_string == string_to_unicode == unicode_literal
True
Both methods convert to the unicode type - and same as the unicode_literal.
Best Practice, use Unicode
It is recommended that you always work with text in Unicode.
Software should only work with Unicode strings internally, converting to a particular encoding on output.
Can encode back when necessary
However, to get the lowercase back in type str, encode the python string to utf-8 again:
>>> print string
Километр
>>> string
'\xd0\x9a\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> string.decode('utf-8')
u'\u041a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower()
u'\u043a\u0438\u043b\u043e\u043c\u0435\u0442\u0440'
>>> string.decode('utf-8').lower().encode('utf-8')
'\xd0\xba\xd0\xb8\xd0\xbb\xd0\xbe\xd0\xbc\xd0\xb5\xd1\x82\xd1\x80'
>>> print string.decode('utf-8').lower().encode('utf-8')
километр
So in Python 2, Unicode can encode into Python strings, and Python strings can decode into the Unicode type.
With Python 2, this doesn't work for non-English words in UTF-8. In this case decode('utf-8') can help:
>>> s='Километр'
>>> print s.lower()
Километр
>>> print s.decode('utf-8').lower()
километр
Also, you can overwrite some variables:
s = input('UPPER CASE')
lower = s.lower()
If you use like this:
s = "Kilometer"
print(s.lower()) - kilometer
print(s) - Kilometer
It will work just when called.
Don't try this, totally un-recommend, don't do this:
import string
s='ABCD'
print(''.join([string.ascii_lowercase[string.ascii_uppercase.index(i)] for i in s]))
Output:
abcd
Since no one wrote it yet you can use swapcase (so uppercase letters will become lowercase, and vice versa) (and this one you should use in cases where i just mentioned (convert upper to lower, lower to upper)):
s='ABCD'
print(s.swapcase())
Output:
abcd
I would like to provide the summary of all possible methods
.lower() method.
str.lower()
combination of str.translate() and str.maketrans()
.lower() method
original_string = "UPPERCASE"
lowercase_string = original_string.lower()
print(lowercase_string) # Output: "uppercase"
str.lower()
original_string = "UPPERCASE"
lowercase_string = str.lower(original_string)
print(lowercase_string) # Output: "uppercase"
combination of str.translate() and str.maketrans()
original_string = "UPPERCASE"
lowercase_string = original_string.translate(str.maketrans(string.ascii_uppercase, string.ascii_lowercase))
print(lowercase_string) # Output: "uppercase"
lowercasing
This method not only converts all uppercase letters of the Latin alphabet into lowercase ones, but also shows how such logic is implemented. You can test this code in any online Python sandbox.
def turnIntoLowercase(string):
lowercaseCharacters = ''
abc = ['a','b','c','d','e','f','g','h','i','j','k','l','m',
'n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M',
'N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
for character in string:
if character not in abc:
lowercaseCharacters += character
elif abc.index(character) <= 25:
lowercaseCharacters += character
else:
lowercaseCharacters += abc[abc.index(character) - 26]
return lowercaseCharacters
string = str(input("Enter your string, please: " ))
print(turnIntoLowercase(string = string))
Performance check
Now, let's enter the following string (and press Enter) to make sure everything works as intended:
# Enter your string, please:
"PYTHON 3.11.2, 15TH FeB 2023"
Result:
"python 3.11.2, 15th feb 2023"
If you want to convert a list of strings to lowercase, you can map str.lower:
list_of_strings = ['CamelCase', 'in', 'Python']
list(map(str.lower, list_of_strings)) # ['camelcase', 'in', 'python']
I have a base64 encoded string
When I decode the string this way:
>>> import base64
>>> base64.b64decode("XH13fXM=")
'\\}w}s'
The output is fine.
But when i use it like this:
>>> d = base64.b64decode("XH13fXM=")
>>> print d
\}w}s
some characters are missing
Can anyone advise ?
Thank you in advanced.
It is just a matter of presentation:
>>> '\\}w}s'
'\\}w}s'
>>> print(_, len(_))
\}w}s 5
This string has 5 characters. When you use it in code you need to escape backslash, or use raw string literals:
>>> r'\}w}s'
'\\}w}s'
>>> r'\}w}s' == '\\}w}s'
True
When you print a string, the characters in the string are output. When the interactive shell shows you the value of your last statement, it prints the __repr__ of the string, not the string itself. That's why there are single-quotes around it, and your backslash has been escaped.
No characters are missing from your second example, those are the 5 characters in your string. The first example has had characters add to make the output a legal Python string literal.
If you want to use the print statement and have the output look like the first example, then use:
print repr(d)