How to print text file content with line breaks in python? - python

The content of my text file is:
5 7 6 6 15
4 3
When I do
fs.open('path',mode='rb').read()
I get
b'5 7 6 6 15\r\n4 3'
But because I want it to compare to string output
5 7 6 6 15
4 3
I want to do this comparison like :
if fs.open('path',mode='rb').read() == output
print("yes")
How should I convert it in way that line breaks space everything is maintained?
PS: output is just the string that I am getting through json.

Using Python 3, fs.open('path',mode='rb').read() yields a bytes object, moreover containing a carriage return (windows text file)
(and using Python 2 doesn't help, because of this extra \r which isn't removed because of binary mode)
You're comparing a bytes object with a str object: that is always false.
Moreover, it's unclear if the output string has a line termination on the last line. I would open the file in text mode and strip blanks/newline the end (the file doesn't seem to contain one, but better safe than sorry):
with open('path') as f:
if f.read().rstrip() == output.rstrip():

Change the read mode from rb to r: rb gives back binary, r puts out text.

Related

How can I loop through a textfile but do something different on the first line, Python

I have a text file that looks something like this :
original-- expected output--
0 1 2 3 4 5 SET : {0,1,2,3,4,5}
1 3 RELATION:{(1,3),(3,1),(5,4),(4,5)}
3 1
5 4 REFLEXIVE : NO
4 5 SYMMETRIC : YES
and part of the code is having it print out the first line in curly braces, and the rest within one giant curly braces and each binary set in parentheses. I am still a beginner but I wanted to know if there is some way in python to make one loop that treats the first line differently than the rest?
try this with filename is your file ..
with open("filename.txt", "r") as file:
set_firstline = []
first_string = file.readline()
list_of_first_string = list(first_string)
for i in range(len(list_of_first_string)):
if str(i) in first_string:
set_firstline.append(i)
print(set_firstline)
OUTPUT : [0,1,2,3,4,5]
im new as well. so hope I can help you

Python print .psl format without quotes and commas

I am working on a linux system using python3 with a file in .psl format common to genetics. This is a tab separated file that contains some cells with comma separated values. An small example file with some of the features of a .psl is below.
input.psl
1 2 3 x read1 8,9, 2001,2002,
1 2 3 mt read2 8,9,10 3001,3002,3003
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
I need to filter this file to extract only regions of interest. Here, I extract only rows with a value of 9 in the fourth column.
import csv
def read_psl_transcripts():
psl_transcripts = []
with open("input.psl") as input_psl:
csv_reader = csv.reader(input_psl, delimiter='\t')
for line in input_psl:
#Extract only rows matching chromosome of interest
if '9' == line[3]:
psl_transcripts.append(line)
return psl_transcripts
I then need to be able to print or write these selected lines in a tab delimited format matching the format of the input file with no additional quotes or commas added. I cant seem to get this part right and additional brackets, quotes and commas are always added. Below is an attempt using print().
outF = open("output.psl", "w")
for line in read_psl_transcripts():
print(str(line).strip('"\''), sep='\t')
Any help is much appreciated. Below is the desired output.
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
You might be able to solve you problem with a simple awk statement.
awk '$4 == 9' input.pls > output.pls
But with python you could solve it like this:
write_pls = open("output.pls", "w")
with open("input.pls") as file:
for line in file:
splitted_line = line.split()
if splitted_line[3] == '9':
out_line = '\t'.join(splitted_line)
write_pls.write(out_line + "\n")
write_pls.close()

Decode(), len(), Scandinavian Iso String is printed correctly but length is wrong

svenskaOrd is a list of Swedish words.
Id like to print the length of the word in letters, and the word with correct formatting, only if 4 or above in length.
Only formatting is correct.
swedishWords = open("svenskaOrd.txt","r")
for line in swedishWords:
if(len(line.decode("iso8859_10")) >= 4):
print(len(line.decode("iso8859_10")))
print(line.decode("iso8859_10"))
Output:
....
18
öroninflammation
5
ört
10
örtagård
....
By default open opens file in text mode. This decodes raw bytes into text. You shouldn't need to open a file in text mode and then decode the text gain. It doesn't make sense. Python 3 won't even let you do this and would report an error (because str has no decode method).
If you know your text file has a given encoding then you should give that to open
swedishWords = open("svenskaOrd.txt", "r", encoding="iso8859_10")
for line in swedishWords:
if(len(line) >= 4):
print(len(line))
print(line)
If you really want to operate on raw bytes then open the file in raw mode and decode each line.
swedishBytes = open("svenskaOrd.txt", "rb")

How to get os.system() output as a string and not a set of characters? [duplicate]

This question already has an answer here:
How can I make a for-loop loop through lines instead of characters in a variable?
(1 answer)
Closed 6 years ago.
I'm trying to get output from os.system using the following code:
p = subprocess.Popen([some_directory], stdout=subprocess.PIPE, shell=True)
ls = p.communicate()[0]
when I print the output I get:
> print (ls)
file1.txt
file2.txt
The output somehow displays as two separate strings, However, when I try to print out the strings of filenames using a for loop i get
a list of characters instead:
>> for i in range(len(ls)):
> print i, ls[i]
Output:
0 f
1 i
2 l
3 e
4 1
5 .
6 t
7 x
8 t
9 f
10 i
11 l
12 e
13 2
14 .
15 t
16 x
17 t
I need help ensuring the os.system() output returns as strings and
not a set of characters.
p.communicate returns a string. It may look like a list of filenames, but it is just a string. You can convert it to a list of filenames by splitting on the newline character:
s = p.communicate()[0]
for line in s.split("\n"):
print "line:", line
Are you aware that there are built-in functions to get a list of files in a directory?
for i in range(len(...)): is usually a code smell in Python. If you want to iterate over the numbered elements of a collection to canonical method is for i, element in enumerate(...):.
The code you quote clearly isn't the code you ran, since when you print ls you see two lines separated by a newline, but when you iterate over the characters of the string the newline doesn't appear.
The bottom line is that you are getting a string back from communicate()[0], but you are then iterating over it, giving you the individual characters. I suspect what you would like to do is use the .splt() or .splitlines() method on ls to get the individual file names, but you are trying to run before you can walk. Forst of all, get a clear handle on what the communicate method is returning to you.
Apparently, in Python 3.6, p.communicate returns bytes object:
In [16]: type(ls)
Out[16]: bytes
Following seems to work better:
In [22]: p = subprocess.Popen([some_directory], stdout=subprocess.PIPE, shell=True)
In [23]: ls = p.communicate()[0].split()
In [25]: for i in range(len(ls)):
...: print(i, ls[i])
...:
0 b'file1.txt'
1 b'file2.txt'
But I would rather use os.listdir() instead of subprocess:
import os
for line in os.listdir():
print line

why does python write to a file in gibberish characters

I attempted Problem 10 at project euler and passed but I decided, what if i wote all the prime numbers below 2 million to a text(.txt) file and so I continued and so made some small adjustments to the main function which solved the problem so without just adding it to a variable(tot) I wrote the prime number which was generated by a generator to a text file and it at first worked but forgot to add spaces after each prime number, so the output was sort of gibberish
357111317192329313741434753
so I modified my txt.write(str(next_prime)) to txt.write(str(next_prime) + ' ')
after that slight modification, the output was completely gibberish
″‵‷ㄱㄠ″㜱ㄠ‹㌲㈠‹ㄳ㌠‷ㄴ㐠″
here's my complete code for the function:
def solve_number_10():
total = 2
txt = open("output.txt","w")
for next_prime in get_primes(3):
if next_prime < 2000000:
txt.write(str(next_prime) + ' ')
#total += next_prime
else:
print "Data written to txt file"
#print total
txt.close()
return
Why does this happen and how could I make the output like
3 5 7 11 13 17 19
This is a bug in Microsoft's Notepad program, not in your code.
>>> a = '‵‷ㄱㄠ″㜱ㄠ‹㌲㈠‹ㄳ㌠‷ㄴ㐠'
>>> a.decode('UTF-8').encode('UTF-16LE')
'5 7 11 13 17 19 23 29 31 37 41 4'
Oh hey, look, they're prime numbers (I assume 4 is just a truncated 43).
You can work around the bug in Notepad by
Using a different file viewer that doesn't have the bug.
Write a ZWNBSP, once, to the beginning of the file, encoded in UTF-8:
txt.write(u'\uFEFF'.encode('UTF-8'))
This is incorrectly called a BOM. It would be a BOM in UTF-16, but UTF-8 is not technically supposed to have a BOM. Most programs ignore it, and in other programs it will be harmless.
Try this:
txt.write('%i ' % next_prime)
Looks like str() is converting your number to a character that matches it in some encoding, and not to its string representation.

Categories

Resources