Python list items encoding - python

Why is it, that the encoding changes in Python 2.7 when I iterate over the items of a list?
test_list = ['Hafst\xc3\xa4tter', 'asbds#ages.at']
Printing the list:
print(test_list)
gets me this output:
['Hafst\xc3\xa4tter', 'asbds#ages.at']
So far, so good. But why is it, that when I iterate over the list, such as:
for item in test_list:
print(item)
I get this output:
Hafstätter
asbds#ages.at
Why does the encoding change (does it?? And how can I change the encoding within the list?

The encoding isn't changing, they are just different ways of displaying a string. One shows the non-ASCII bytes as escape codes for debugging:
>>> test_list = ['Hafst\xc3\xa4tter', 'asbds#ages.at']
>>> print(test_list)
['Hafst\xc3\xa4tter', 'asbds#ages.at']
>>> for item in test_list:
... print(item)
...
Hafstätter
asbds#ages.at
But they are equivalent:
>>> 'Hafst\xc3\xa4tter' == 'Hafstätter'
True
If you want to see lists displayed with the non-debugging output, you have to generate it yourself:
>>> print("['"+"', '".join(item for item in test_list) + "']")
['Hafstätter', 'asbds#ages.at']
There is a reason for the debugging output:
>>> a = 'a\xcc\x88'
>>> b = '\xc3\xa4'
>>> a
'a\xcc\x88'
>>> print a,b # should look the same, if not it is the browser's fault :)
ä ä
>>> a==b
False
>>> [a,b] # In a list you can see the difference by default.
['a\xcc\x88', '\xc3\xa4']

Related

See if strings in list are a part of strings in another list. Python

I'm trying to find out if a list of strings are also part of strings in another list.
I've found this so far but I'm not able to get what I actually want.
a = ["car", "book","chair"]
b = ["car", "oldbook", "bluechair"]
c = [elem for elem in a if elem in b]
print(c)
this will print ['car'].
I would like to have a way to print 'book' and 'chair' as well because 'book' is a part of 'oldbook' and 'chair' of 'bluechair'.
Thank you!
You can use any() here:
>>> a = ["car", "book","chair"]
>>> b = ["car", "oldbook", "bluechair"]
>>> [elem for elem in a if any(elem in x for x in b)]
['car', 'book', 'chair']
This works since it keeps strings in a that exist in any string in b.

Issue with join(str(x))

x = [1,2,3]
print '-'.join(str(x))
Expected:
1-2-3
Actual:
(-1-,- -2-,- -3-)
What is going on here?
Because calling str on the list in its entirety gives the entire list as a string:
>>> str([1,2,3])
'[1, 2, 3]'
What you need to do is cast each item in the string to an str, then do the join:
>>> '-'.join([str(i) for i in x])
'1-2-3'
You sent x to str() first, putting the given delimiter between each character of the string representation of that whole list. Don't do that. Send each individual item to str().
>>> x = [1,2,3]
>>> print '-'.join(map(str, x))
1-2-3

How to print list items which contain new line?

These commands:
l = ["1\n2"]
print(l)
print
['1\n2']
I want to print
['1
2']
Is it possible when we generate the list outside of the print() command?
A first attempt:
l = ["1\n2"]
print(repr(l).replace('\\n', '\n'))
The solution above doesn't work in tricky cases, for example if the string is "1\\n2" it replaces, but it shouldn't. Here is how to fix it:
import re
l = ["1\n2"]
print(re.sub(r'\\n|(\\.)', lambda match: match.group(1) or '\n', repr(l)))
Only if you are printing the element itself (or each element) and not the whole list:
>>> a = ['1\n2']
>>> a
['1\n2']
>>> print a
['1\n2']
>>> print a[0]
1
2
When you try to just print the whole list, it prints the string representation of the list. Newlines belong to individual elements so get printed as newlines only when print that element. Otherwise, you will see them as \n.
You should probably use this, if you have more than one element
>>> test = ['1\n2', '3', '4\n5']
>>> print '[{0}]'.format(','.join(test))
[1
2,3,4
5]
Try this:
s = ["1\n2"]
print("['{}']".format(s[0]))
=> ['1
2']

Replacing an item in a python list by index.. failing?

Any idea why when I call:
>>> hi = [1, 2]
>>> hi[1]=3
>>> print hi
[1, 3]
I can update a list item by its index, but when I call:
>>> phrase = "hello"
>>> for item in "123":
>>> list(phrase)[int(item)] = list(phrase)[int(item)].upper()
>>> print phrase
hello
It fails?
Should be hELLo
You haven't initialised phrase (The list you were intending to make) into a variable yet. So pretty much you have created a list in each loop, it being the exact same.
If you were intending to actually change the characters of phrase, well that's not possible, as in python, strings are immutable.
Perhaps make phraselist = list(phrase), then edit the list in the for-loop. Also, you can use range():
>>> phrase = "hello"
>>> phraselist = list(phrase)
>>> for i in range(1,4):
... phraselist[i] = phraselist[i].upper()
...
>>> print ''.join(phraselist)
hELLo
>>> phrase = "hello"
>>> list_phrase = list(phrase)
>>> for index in (1, 2, 3):
list_phrase[index] = phrase[index].upper()
>>> ''.join(list_phrase)
'hELLo'
If you prefer one-liner:
>>> ''.join(x.upper() if index in (1, 2, 3) else x for
index, x in enumerate(phrase))
'hELLo'
Another answer, just for fun :)
phrase = 'hello'
func = lambda x: x[1].upper() if str(x[0]) in '123' else x[1]
print ''.join(map(func, enumerate(phrase)))
# hELLo
To make this robust, I created a method: (because I am awesome, and bored)
def method(phrase, indexes):
func = lambda x: x[1].upper() if str(x[0]) in indexes else x[1]
return ''.join(map(func, enumerate(phrase)))
print method('hello', '123')
# hELLo
consider that strings are immutable in python You can't modify existing string can create new.
''.join([c if i not in (1, 2, 3) else c.upper() for i, c in enumerate(phrase)])
list() creates a new list. Your loop creates and instantly discards two new lists on each iteration. You could write it as:
phrase = "hello"
L = list(phrase)
L[1:4] = phrase[1:4].upper()
print("".join(L))
Or without a list:
print("".join([phrase[:1], phrase[1:4].upper(), phrase[4:]]))
Strings are immutable in Python therefore to change it, you need to create a new string.
Or if you are dealing with bytestrings, you could use bytearray which is mutable:
phrase = bytearray(b"hello")
phrase[1:4] = phrase[1:4].upper()
print(phrase.decode())
If indexes are not consecutive; you could use an explicit for-loop:
indexes = [1, 2, 4]
for i in indexes:
L[i] = L[i].upper()

Change array that might contain None to an array that contains "" in python

I have a python function that gets an array called row.
Typically row contains things like:
["Hello","goodbye","green"]
And I print it with:
print "\t".join(row)
Unfortunately, sometimes it contains:
["Hello",None,"green"]
Which generates this error:
TypeError: sequence item 2: expected string or Unicode, NoneType found
Is there an easy way to replace any None elements with ""?
You can use a conditional expression:
>>> l = ["Hello", None, "green"]
>>> [(x if x is not None else '') for x in l]
['Hello', '', 'green']
A slightly shorter way is:
>>> [x or '' for x in l]
But note that the second method also changes 0 and some other objects to the empty string.
You can use a generator expression in place of the array:
print "\t".join(fld or "" for fld in row)
This will substitute the empty string for everything considered as False (None, False, 0, 0.0, ''…).
You can also use the built-in filter function:
>>> l = ["Hello", None, "green"]
>>> filter(None, l)
['Hello', 'green']

Categories

Resources