Printing a string prints 'u' before the string in Python? - python

'u' before elements in printed list? I didn't type u in my code.
hobbies = []
#prompt user three times for hobbies
for i in range(3):
hobby = raw_input('Enter a hobby:')
hobbies.append(hobby)
#print list stored in hobbies
print hobbies
When I run this, it prints the list but it is formatted like this:
Enter a hobby: Painting
Enter a hobby: Stargazing
Enter a hobby: Reading
[u'Painting', u'Stargazing', u'Reading']
None
Where did those 'u' come from before each of the elements of the list?

I think what you're actually surprised by here is that printing a single string doesn't do the same thing as printing a list of strings—and this is true whether they're Unicode or not:
>>> hobby1 = u'Dizziness'
>>> hobby2 = u'Vértigo'
>>> hobbies = [hobby1, hobby2]
>>> print hobby1
Dizziness
>>> print hobbies
[u'Dizziness', u'V\xe9rtigo']
Even without the u, you've got those extra quotes, not to mention that backslash escape. And if you try the same thing with str byte strings instead of unicode strings, you'll still have the quotes and escapes (plus you might have mojibake characters if your source file and your terminal have different encodings… but forget that part).
In Python, every object can have two different representations: the end-user-friendly representation, str, and the programmer-friendly representation, repr. For byte strings, those representations are Painting and 'Painting', respectively. And for Unicode strings, they're Painting and u'Painting'.
The print statement uses the str, so print hobby1 prints out Painting, with no quotes (or u, if it's Unicode).
However, the str of a list uses the repr of each of its elements, not the str. So, when you print hobbies, each element has quotes around it (and a u if it's Unicode).
This may seem weird at first, but it's an intentional design decision, and it makes sense once you get used to it. And it would be ambiguous to print out [foo, bar, baz]—is that a list of three strings, or a list of two strings, one of which has a comma in the middle of it? But, more importantly, a list is already not a user-friendly thing, no matter how you print it out. My hobbies are [Painting, Stargazing] would look just as ugly as My hobbies are ['Painting', 'Stargazing']. When you want to show a list to an end-user, you always want to format it explicitly in some way that makes sense.
Often, what you want is as simple as this:
>>> print 'Hobbies:', ', '.join(hobbies)
Hobbies: Painting, Stargazing
Or, for Unicode strings:
>>> print u'Hobbies:', u', '.join(hobbies)
Hobbies: Painting, Stargazing

The 'u' is not part of the string, but indicates that the string is a unicode string.

You're not printing the strings, you're printing the representation of the list holding the strings.
for hobby in hobbies:
print hobby

If you want to convert the unicode to string. You can simply use
str(unicodedString) or unicode(normalString) for the other way conversion
Code
hobbies = []
#prompt user three times for hobbies
for i in range(3):
hobby = raw_input('Enter a hobby:')
# converting the normal string to unicode
hobbies.append(unicode(hobby))
# Printing the unicoded string
print("Unicoded string")
print(hobbies)
hobbies = [str(items) for items in hobbies]
# Printing the converted string
print("Normal string from unicoded string")
print(hobbies)
Output
Enter a hobby:test1
Enter a hobby:Test2
Enter a hobby:Test3
Unicoded string
[u'test1', u'Test2', u'Test3']
Normal string from unicoded string
['test1', 'Test2', 'Test3']

Related

back slash in token key [duplicate]

In Python, when I print a string with a backslash, it prints the backslash only once:
>>> print(r'C:\hi')
C:\hi
>>> print('C:\\hi')
C:\hi
But I noticed that when you print a tuple of strings with backslashes, it prints a double backslash:
>>> print((r'C:\hi', 'C:\\there'))
('C:\\hi', 'C:\\there')
Why does it behave differently when printing the tuple?
(Note, this happens in both Python 2 and 3, and in both Windows and Linux.)
When you print a tuple (or a list, or many other kinds of items), the representation (repr()) of the contained items is printed, rather than the string value. For simpler types, the representation is generally what you'd have to type into Python to obtain the value. This allows you to more easily distinguish the items in the container from the punctuation separating them, and also to discern their types. (Think: is (1, 2, 3) a tuple of three integers, or a tuple of a string "1, 2" and an integer 3—or some other combination of values?)
To see the repr() of any string:
print(repr(r'C:\hi'))
At the interactive Python prompt, just specifying any value (or variable, or expression) prints its repr().
To print the contents of tuples as regular strings, try something like:
items = (r'C:\hi', 'C:\\there')
print(*items, sep=", ")
str.join() is also useful, especially when you are not printing but instead building a string which you will later use for something else:
text = ", ".join(items)
However, the items must be strings already (join requires this). If they're not all strings, you can do:
text = ", ".join(map(str, items))

Python \0 in a string followed by a number behaves inconsistently

I can enter an octal value of 'up to 3 characters' in a string.
Is there any way to enter an octal value of only 1 character?
For instance.
If I want to print \0 followed by "Hello", I can do:
"\0Hello"
but if I want to print \0 followed by "12345" I can't do
"\012345"
instead I have to do
"\00012345"
This can, in very obscure scenarios, lead to inconsistent behaviour.
def parseAsString(characters):
output = ['H','I''!','\\','0'] + characters
print("".join(output).encode().decode('unicode_escape'));
parseAsString(['Y','O','U'])
#Output:
#>HI! YOU
parseAsString(['1','2','3'])
#Output:
#>HI!
#>3
The answer to this is, when you're dealing with \0, to either.
Always remember to explicitly use \000 or \x00, this may not be possible if your raw text is coming from another source.
When dealing with raw strings AND concatenating them, always decode each constituent part first, then concatenate them last, not the other way around.
For instance the parser will do this for you if you concatenate strings together:
"\0" + "Hello"
and
"\0" + "12345"
Both work consistently as expected., because "\0" is converted to "\x00" before being concatenated with the rest of the string.
Or, in the more obscure scenario:
def safeParseAsString(characters):
output = "".join(['H','I''!','\\','0']).encode().decode('unicode_escape')
output +="".join(characters).encode().decode('unicode_escape')
print(output)
safeParseAsString(['Y','O','U'])
#Output:
#>HI! YOU
safeParseAsString(['1','2','3'])
#Output:
#>HI! 123

japanese, korean characters not showing up in lists, but show up fine when printed separately

I have strings of characters in different languages, mainly Japanese, and they show up fine when I try to print them as strings. However, when I add many of them to a python list, and then print out the list, they display as text like this: xe9
for example:
string1 = "西野カナ- NO. 1"
string2 = "첫눈처럼 너에게 가겠다"
list1 = []
list1.append(string1)
list1.append(string2)
print list1
for item in list1:
print item
These two prints will give me different outputs:
['\xe8\xa5\xbf\xe9\x87\x8e\xe3\x82\xab\xe3\x83\x8a- NO. 1 NEW', '\xec\xb2\xab\xeb\x88\x88\xec\xb2\x98\xeb\x9f\xbc \xeb\x84\x88\xec\x97\x90\xea\xb2\x8c \xea\xb0\x80\xea\xb2\xa0\xeb\x8b\xa4']
西野カナ- NO. 1 NEW
첫눈처럼 너에게 가겠다
How would I get the list to print the actual characters too?
Actually,when you print a list or write to a file, it internally calls the str() method,and list internally calls repr() on its elements. So you are seeing is repr() returns.
print repr(string1)
'\xe8\xa5\xbf\xe9\x87\x8e\xe3\x82\xab\xe3\x83\x8a- NO. 1'
It is really discouraged.So if you want to avoid encoding problem, you should start to think seriously about switching to Python3.
You can check out this or see unicode in python2 and python3

How to replace items in a list, or string?

def str_to_bin(user_input):
str_list = list(user_input)
str_to_bin = ('Hello World')
The string 'Hello World' has been turned into a list, so that each character is seperated (because using the replace function in strings only replaces words). But from here on, I have no idea how to change the letter 'ah' to, for example, '000001'. I tried multiple ways but nothing seems to work.
And I want a compact way too, because, obviously converting phrases into binary requires a value for each character.
If doing it with a list isnt the best way to go, how can you replace individual characters in strings?
>>> myString = "Hello World"
>>> myString.replace("H","F")
"Fello World"
If you want binary to char (actually, binary to int to char here)
>>> replaceChar = '00010001' #8 bits
>>> int(replaceChar, 2)
17
>>> chr(int(replaceChar, 2))
'\x11'
The replace function is a string method. What exactly has not been working when you try it, and what have you tried?
I'm not quite sure what you are asking; however, if you're trying to say you want to go through your list and replace values with other specific values (ex: Say for example replace letters like "a" with their binary values "01100001") then you could use a dictionary and then just process your way through it. Here's an example I made for you using my binary example:
dictionary = {
'a': "01100001",
'b': "01100010",
'c': "01100011",
'd': "01100100",
#etc..
}
def modify(raw_input):
message = ''
print("Your new output is: ")
for character in raw_input:
message += "%s" % (dictionary[character])
print message
def main():
modify(raw_input())
main()
Edit: Input and output for this file would be:
>>> abc
>>> Your new output is:
>>> 011000010110001001100011
I think this is what you are looking for but additional clarification is needed. The function converts each character in the string to a binary value.
def str_to_bin(user_input):
str_list = list(user_input)
return [format(ord(x), 'b') for x in str_list]
print str_to_bin('Hello World')
# OUTPUT
# ['1001000', '1100101', '1101100', '1101100', '1101111', '100000', '1010111', '1101111', '1110010', '1101100', '1100100']
I am not clear on what exactly your requirement is:- whether to return the bianry value of each characters in the input string as a list or to return the equivalent bianry representation of the whole string. That is :- if you provide input abc, you want to return each binary value separately in list as ['1100001', '1100010', '1100011'] or to return the equivalent binary representation 110000111000101100011.
However, I think that you can do by your own, once you have the way.
But, as mentioned in your code and by #afarber1, you don't even need to convert the input string to list separately. So the following line is not at all needed :-
str_list = list(user_input)
Because, string is treated as list of characters in Python and you can access over each characters of the string as well as iterate, using the indexes.
def str_to_bin(user_input):
# if you need binary of each character in list
return [format(char, 'b') for char in bytearray(user_input)]
# if you need equivalent binary representation of the string itself
return ''.join(format(char, 'b') for char in bytearray(user_input))

Python: efficient syntax for creating a list composed of strings

I am new to python and essentially trying to figure out the syntax that replicates this functionality: strings = ["foo", "bar", "apple"] with something similar to strings = [foo, bar, apple] so that I don't have to put quotes around all the entries. Thanks!
strings = "foo bar apple".split()
strings = r"foo, bar, apple".split(", ")
You could place all the strings in a text file, one string on each line. Then strings = list(open("datafile", "r")).
I would say that there is no easier way to create a list of strings than what you're already doing.
As other answers have pointed out, there are ways to put all the strings in one big string or file, then split them, but in my opinion that is more difficult to type than the quotes, particularly if you have a decent IDE that automatically closes string quotes. Also, the syntax you're already using is what anyone else who reads your code will expect; using something else just adds unnecessary confusion for almost zero gain.
There is two major differences between what you posted:
The quotes "" and '' indicate that it is a string, just like [1,2,3] would be a list of integers. If you remove the quotes, you are essentially creating a list of python objects. A python object is basically the foundation of all python classes, e.g. integers and strings are python objects are their most basic level.
You can do something like:
foo = "foo"
bar = "bar"
apple = "apple"
strings = [foo,bar,apple]
If you have a lot of elements to write, you can use a program to do that (such an incredible idea from a developper ! ), and then you copy-paste the result:
li = []
ch = ("You have entered an empty string ''\n"
'Type ENTER alone if you want to stop.\n'
'Type anything if you want to record the empty string : ')
while True:
e = raw_input('enter : ')
if e=='':
x = raw_input(ch)
if x=='': break
li.append(e)
print
print li
Example:
enter : 123
enter : ocean
enter : flower
enter :
You have entered an empty string ''
Type ENTER alone if you want to stop.
Type anything if you want to record the empty string : k
enter : once upon a time
enter : 14 * 4
enter :
You have entered an empty string ''
Type ENTER alone if you want to stop.
Type anything if you want to record the empty string :
['123', 'ocean', 'flower', '', 'once upon a time', '14 * 4']
What you have not to type but only copy is:
['123', 'ocean', 'flower', '', 'once upon a time', '14 * 4']

Categories

Resources