Python a string with tabs and integers is different outside print [duplicate] - python

This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 1 year ago.
I'm merging some tab delimited files and the printed output is incorrect but if I access the string in a REPL it looks fine. Here's how it looks:
fh=open('out.vcf')
for line in fh:
i+=1
if i == 29401:
print(line)
AAEX03025909.1 1068 . T C 0 42 5
Then looking at it without print:
line
'AAEX03025909.1\t1405\t.\tC\tT\t\t\t\t\t\t0\t0\t0\t0\t0\t0\t0\t0\t10\t9\n'
When I look at out.vcf in less, it looks like the output of print. Why am I getting different outputs? I want the string that is produced without print. Using a comma instead of a tab solves the problem, but I'd like to keep it as tab delimited

there's always going to be some difference between how data is represented and how it's stored; practically, the values are stored as binary, but represented depending on the encoding .. in this case, you're seeing \t (ASCII character 9) represented both ways
print() will show the file with its encoding (which you can change), while simply echoing the file will show you the Python repr() interpretation
>>> "\t"
'\t'
>>> ord("\t")
9
>>> print("\t")
>>> repr("\t")
"'\\t'"
>>> print(repr("\t"))
'\t'

Related

Misaligment while formatting with escape t "\t" [duplicate]

This question already has answers here:
Incorrect column alignment when printing table in Python using tab characters
(4 answers)
Closed 1 year ago.
I'm learning Python through "Python Projects for Beginners" by Connor Milliken. In the first project "Creating a Recipt Printing Program" there is this section
# creating a product and price for three itens
p1_name, p1_price = "Books", 49.95
p2_name, p2_price = "Computer", 579.99
p3_name, p3_price = "Monitor", 124.89
# create a print statement for each product
print("\t{}\t\t${}".format(p1_name.title(), p1_price))
print("\t{}\t\t${}".format(p2_name.title(), p2_price))
print("\t{}\t\t${}".format(p3_name.title(), p3_price))
The lines are equal but for the second line the price is misaligned as if it has another \t. The problem was the same in jupyter notebook and Atom + terminal. If you just delete one '\t' the problem is solved but you can't really understand what happened.
Don't think of a tab as inserting a specific number of spaces in the string (it doesn't). Instead, you are giving control over whoever displays the string, since they are the ones that decide where the tab stops are.
If you want precise control, use fixed-width padded format specifiers instead. For example,
print(" {:>10} {:>6}".format(p1_name.title(), p1_price))
This assumes that 10 characters is wide enough for any title and 6 characters is wide enough for any price.

Compare strings with unicode characters in python [duplicate]

This question already has answers here:
How do I compare a Unicode string that has different bytes, but the same value?
(3 answers)
Closed 2 years ago.
Here is the substring Ritē
I have two strings, one is from the extracted file name by zipfile. I used filename.encode('cp437').decode('utf-8') to have all the paths extracted correctly. The other one is read from a .plist using plistlib.readPlist(). Both are printed correctly using print(). However, they are not the same in comparison. I tried to encode both of them in utf-8, here is what they look like:
Rite\xcc\x84
Rit\xc4\x93
One interprets character e and - on top, the other one interprets the 'LATIN SMALL LETTER E WITH MACRON'Does any one have any advice on this, in order to compare the two strings? Thank you in advance
Based on the comments it sounds like this is what you're looking for:
import unicodedata
foo = 'Rit\u0113'
bar = 'Rite\u0304'
print(foo, bar)
print(unicodedata.normalize('NFD', foo))
print(unicodedata.normalize('NFD', bar))
assert unicodedata.normalize('NFD', foo) == unicodedata.normalize('NFD', bar)
I selected NFD as the form, but you may prefer NFC.

Python 3: Reading string from file and define the same string in code working differently [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 4 years ago.
I have a text file like below
# 1.txt
who is the\u00a0winners\u00a0where\u00a0season result\u00a0is 7th
If I read a file and print it, it shows
>>> s = open("1.txt").read()
>>> print(s)
who is the\u00a0winners\u00a0where\u00a0season result\u00a0is 7th
However, If I do like below with the same string,
>> s = "who is the\u00a0winners\u00a0where\u00a0season result\u00a0is 7th"
>> print(s)
who is the winners where season result is 7th
I want to read a text file like "1.txt" and print it like the below one. I can not find how to do it. Please help me. Thanks.
\u00a0 is a non break space and is one character.
In your first example your are reading \u00a0 as 6 chars.
If you want to read a file with \u00a0s and interpret them as spaces, you would have to parse the file yourself and create spaces for each \u00a0.

How to write a string starting with ' and ending with " in Python? [duplicate]

This question already has answers here:
Having both single and double quotation in a Python string
(9 answers)
Closed 5 years ago.
I'd like to save the following characters 'bar" as a string variable, but it seems to be more complicated than I thought :
foo = 'bar" is not a valid string.
foo = ''bar"' is not a valid string either.
foo = '''bar"'' is still not valid.
foo = ''''bar"''' actually saves '\'bar"'
What is the proper syntax in this case?
The last string saves '\'bar"' as the representation, but it is the string you're looking for, just print it:
foo = ''''bar"'''
print(foo)
'bar"
when you hit enter in the interactive interpreter you'll get it's repr which escapes the second ' to create the string.
Using a triple quoted literal is the only way to define this without explicitly using escapes. You can get the same result by escaping quotes:
print('\'foo"')
'foo"
print("'foo\"")
'foo"

encoding error when reading excel file [duplicate]

This question already has answers here:
Suppress the u'prefix indicating unicode' in python strings
(11 answers)
Closed 8 years ago.
I want to go through data in my folder, identify them and rename them according to a list of rules I have in an excel spreadsheet
I load the needed libraries,
I make my directory the working directory;
I read in the xcel file (using xlrd)
and when I try to read the data by columns e.g. :
fname = metadata.col_values(0, start_rowx=1, end_rowx=None)
the list of values comes with a u in front of them - I guess unicode - such as:
fname = [u'file1', u'file2'] and so on
How can I convert fname to a list of ascii strings?
I'm not sure what the big issue behind having unicode filenames is, but assuming that all of your characters are ascii-valid characters the following should do it. This solution will just ignore anything that's non-ascii, but it's worth thinking about why you're doing this in the first place:
ascii_string = unicode_string.encode("ascii", "ignore")
Specifically, for converting a whole list I would use a list comprehension:
ascii_list = [old_string.encode("ascii", "ignore") for old_string in fname]
The u at the front is just a visual item to show you, when you print the string, what the underlying representation is. It's like the single-quotes around the strings when you print that list--they are there to show you something about the object being printed (specifically, that it's a string), but they aren't actually a part of the object.
In the case of the u, it's saying it's a unicode object. When you use the string internally, that u on the outside doesn't exist, just like the single-quotes. Try opening a file and writing the strings there, and you'll see that the u and the single-quotes don't show up, because they're not actually part of the underlying string objects.
with open(r'C:\test\foo.bar', 'w') as f:
for item in fname:
f.write(item)
f.write('\n')
If you really need to print strings without the u at the start, you can convert them to ASCII with u'unicode stuff'.encode('ascii'), but honestly I doubt this is something that actually matters for what you're doing.
You could also just use Python 3, where Unicode is the default and the u isn't normally printed.

Categories

Resources