Last line in text file lacks newline; how to cope? - python

I am reading the lines of a file and storing them in an array.
The contents of the file are as follows -
hi
hello
bi
bello
I am writing a program to reverse the order of these words such that they are now written into a new file as
bello
bi
hello
hi
currently, I am reading the file using
file1 = open('example.txt','r')
s = file1.readlines()
when i examine the array 's', it looks like this -
['hi\n', 'hello\n', 'bi\n', 'bello']
now I was using s[::-1] to reverse the order, but that results in the output looking something like -
bellohi
hello
bi
presumably because 'bello' is not follow by \n in the array where it is stored.
I have tried to fix it by manipulating the last term like this
s[-1]=s[-1]+'\n'
on the surface, it works - but am I unknowingly printing out an extra line or something or adding trailing spaces? Is there a better way to do this? Also why does the last string in the array not have a '\n'

You can make the addition conditional:
if not s[-1].endswith('\n'):
s[-1] += '\n'
Or you can normalize by removing any trailing newline and then put one back:
s[-1] = s[-1].rstrip('\n') + '\n'
I'd go with the former, but you see both approaches.

See this answer for help to remove the \n from the list elements.
See this answer for help on how to reverse through a list.
This code will get you the list you want. Could be refactored better... but that can be a challenge for you.
s = ['hi\n', 'hello\n', 'bi\n', 'bello']
revList = list(reversed(s))
for i, e in enumerate(revList):
revList[i] = e.strip("\n")
if i < len(revList)-1:
revList[i] = e + "\n"
print(revList)
When Python reads the file, it will read line breaks as they are strings. So the last word in the first file does not have a new line under it, so that's why it doesn't have "\n" in the list.
When you use [i-1] you are going backwards through the list, but you are keeping the line breaks in order. If you work through your code bit by bit, you will see why you get the output you are getting. Remember that computers take things literally.

Related

Replace words in list that later will be used in variable

I have a file which currently stores a string eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
which I am trying to pass into as a variable to my subprocess command.
My current code looks like this
with open(logfilnavn, 'r') as t:
test = t.readlines()
print(test)
But this prints ['eeb39d3e-dd4f-11e8-acf7-a6389e8e7978\n'] and I don't want the part with ['\n'] to be passed into my command, so i'm trying to remove them by using replace.
with open(logfilnavn, 'r') as t:
test = t.readlines()
removestrings = test.replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
print(removestrings)
I get an exception value saying this so how can I replace these with nothing and store them as a string for my subprocess command?
'list' object has no attribute 'replace'
so how can I replace these with nothing and store them as a string for my subprocess command?
readline() returns a list. Try print(test[0].strip())
You can read the whole file and split lines using str.splitlines:
test = t.read().splitlines()
Your test variable is a list, because readlines() returns a list of all lines read.
Since you said the file only contains this one line, you probably wish to perform the replace on only the first line that you read:
removestrings = test[0].replace('[', '').replace('[', '').replace('\\', '').replace("'", '').replace('n', '')
Where you went wrong...
file.readlines() in python returns an array (collection or grouping of the same variable type) of the lines in the file -- arrays in python are called lists. you, here are treating the list as a string. you must first target the string inside it, then apply that string-only function.
In this case however, this would not work as you are trying to change the way the python interpretter has displayed it for one to understand.
Further information...
In code it would not be a string - we just can't easily understand the stack, heap and memory addresses easily. The example below would work for any number of lines (but it will only print the first element) you will need to change that and
this may be useful...
you could perhaps make the variables globally available (so that other parts of the program can read them
more useless stuff
before they go out of scope - the word used to mean the points at which the interpreter (what runs the program) believes the variable is useful - so that it can remove it from memory, or in much larger programs only worry about the locality of variables e.g. when using for loops i is used a lot without scope there would need to be a different name for each variable in the whole project. scopes however get specialised (meaning that if a scope contains the re-declaration of a variable this would fail as it is already seen as being one. an easy way to understand this might be to think of them being branches and the connections between the tips of branches. they don't touch along with their variables.
solution?
e.g:
with open(logfilenavn, 'r') as file:
lines = file.readlines() # creates a list
# an in-line for loop that goes through each item and takes off the last character: \n - the newline character
#this will work with any number of lines
strippedLines = [line[:-1] for line in lines]
#or
strippedLines = [line.replace('\n', '') for line in lines]
#you can now print the string stored within the list
print(strippedLines[0]) # this prints the first element in the list
I hope this helped!
You get the error because readlines returns a list object. Since you mentioned in the comment that there is just one line in the file, its better to use readline() instead,
line = "" # so you can use it as a variable outside `with` scope,
with open("logfilnavn", 'r') as t:
line = t.readline()
print(line)
# output,
eeb39d3e-dd4f-11e8-acf7-a6389e8e7978
readlines will return a list of lines, and you can't use replace with a list.
If you really want to use readlines, you should know that it doesn't remove the newline character from the end, you'll have to do it yourself.
lines = [line.rstrip('\n') for line in t.readlines()]
But still, after removing the newline character yourself from the end of each line, you'll have a list of lines. And from the question, it looks like, you only have one line, you can just access first line lines[0].
Or you can just leave out readlines, and just use read, it'll read all of the contents from the file. And then just do rstrip.
contents = t.read().rstrip('\n')

How to avoid the last comma while writing an integer array into a text file in python

I need to write an array of integers into a text file, but the formatted solution is adding the comma after each item and I'd like to avoid the last one.
The code looks like this:
with open(name, 'a+') as f:
line = ['FOO ', description, '|Bar|']
f.writelines(line)
f.writelines("%d," % item for item in values)
f.writelines('\n')
Each line starts with a small description of what the array to follow contains, and then a list of integers. New lines are added in the loop as they become available.
The output I get looks something like this:
FOO description|Bar|274,549,549,824,824,824,824,824,794,765,765,736,736,736,736,736,
And I would like to have it look like this, without the last comma:
FOO description|Bar|274,549,549,824,824,824,824,824,794,765,765,736,736,736,736,736
I was unable to find a solution that would work with the writelines() and I need to avoid lengthy processing in additional loops.
Use join:
f.writelines(",".join(map(str,values)))
Note that values is first mapped to a list of strings, instead of numbers, with map.
You can slice it with using below example.
It will always delete last character.
line = ['FOO ', description, '|Bar|']
line = line[:-1]
f.writelines(line)
Slicing is the best approach and works well for every situation atleast in your case.
f.writelines(line[:-1])
You can use print function here.
print(*values,sep=',',file=f)
If you are using python2 please import print function.
from __future__ import print_function

Python - \n appearing in concatenated strings

I've been having an issue with my Python code. I am trying to concatenate the value of two string objects, but when I run the code it keeps printing a '\n' between the two strings.
My code:
while i < len(valList):
curVal = valList[i]
print(curVal)
markupConstant = 'markup.txt'
markupFileName = curVal + markupConstant
markupFile = open(markupFileName)
Now when I run this, it gives me this error:
OSError: [Errno 22] Invalid argument: 'cornWhiteTrimmed\nmarkup.txt'
See that \n between the two strings? I've dissected the code a bit by printing each string individually, and neither one contains a \n on its own. Any ideas as to what I'm doing wrong?
Thanks in advance!
The concatenation itself doesn't add the \n for sure. valList is probably the result of calling readlines() on a file object, so each element in it will have a trailing \n. Call strip on each element before using it:
while i < len(valList):
curVal = valList[i].strip()
print(curVal)
markupConstant = 'markup.txt'
markupFileName = curVal + markupConstant
markupFile = open(markupFileName)
The reason you are not seeing the \n when you actually print out the python statements is because \n is technically the newline character. You will not see this when you actually print, it will only skip to a new line. The problem is when you have this in the middle of your two strings, it is going to cause problems. The solution to your issue is the strip method. You can read into its documentation here (https://www.tutorialspoint.com/python/string_strip.htm) but basically you can use this method to strip the newline character off of any of your strings.
Just to make an addition to the other answers explaining why this came about:
When you need to actually inspect what characters a string contains, you can't simply print it. Many characters are "invisible" when printed.
Turn the string into a list first:
list(curVal)
Or my personal favorite:
[c for c in curVal]
These will create lists that properly show all hard to see characters.

I want to read in strings to the new line character in Python 2.7

I have a long text file that I am trying to pull certain strings out of. The length of these strings are variable with the text file but are always located after certain identifiers. So for example say my text file looks like this:
junk text...
Name:
Age:
Robert
twenty
four.
junk text...
I always know that the "Robert" string is located at "Age:\n\n" but I am not sure how long it is only that it will end at a "\n\n" and the same principle with the "twenty four." string. I have tried using
namepos1 = string.find("Age:")
namepos2 = namepos1 + 6
this will give the starting location of the string I want but I do not know how to save it into a variable such that it always saves the whole string up to the two new line characters. If it was a set length and not variable I think I could use:
name = string[namepos2:length]
but any help would be greatly appreciated. I may have to go about doing it completely different, but this is the first way I have thought about it and tried to do it.
Thanks!
You could do this by finding age, then moving forward your cursor two lines if you would like to do that, if you want the entire section of text after the "junk", and you know how long that text is, this would also work:
lookup = 'age'
lines=[]
with open('C:/Users/Luke/Desktop/Summer 2016/Programs/untitled5.txt') as myFile:
for num, line in enumerate(myFile, 1):
if lookup in line:
lines.append(num+2)
ofile=open('C:/Users/Luke/Desktop/Summer 2016/Programs/untitled5.txt')
line=ofile.readlines()
interestinglines=''
for i in range(len(lines)):
interestinglines+=(line[lines[i]]+'\n')
you may need to tinker with it a bit, but I believe this should reproduce mostly what you're looking for. The '\n' is added onto the line[lines[i]] so that you may save it to a new file.
After you found the location in string, you can split the String by \n\n and get the first item.
s = file_str[namepos2 :]
name = s.split('\n\n')[0]

python writing a list to a file incorrectly

I am having an issue with writing a list to a file. I am annotating certain files to change them into a certain format, so I read sequence alignment files, store them in lists, do necessary formatting, and then write them to a new file. The problem is that while my list, containing sequence alignments is structured correctly, the output produced when it writes them to new files is incorrect (it does not replicate my list structure). I include only a section of my output and what it should look like because the list itself if far too long to post.
OUTPUT WRITTEN TO FILE:
>
TRFE_CHICK
From XALIGN
MKLILCTVLSLGIAAVCFAAP (seq spans multiple lines) ...
ADYIKAVSNLRKCS--TSRLLEAC*> (end of sequence, * should be on a newline, followed by > on a newline as well)
OUTPUT IS SUPPOSED TO BE WRITTEN AS:
>
TRFE_CHICK
From XALIGN
MKLILCTVLSLGIAAVCFAAP (seq spans many lines) ...
ADYIKAVSNLRKCS--TSRLLEAC
*
>
It does this misformatting multiple times over. I have tried pickling and unpickling the list but that misformats it further.
My code for producing the list and writing to file:
new = []
for line in alignment1:
if line.endswith('*\n'):
new.append(line.strip('*\n'))
new.append('*')
else:
new.append(line)
new1 = []
for line in new:
if line.startswith('>'):
twolines = line[0] + '\n' + line[1:]
new1.append(twolines)
continue
else:
new1.append(line)
for line in new1:
alignfile_annot.write(line)
Basically, I have coded it so that it reads the alignment file, inserts a line between the end of the sequence and the * character and also so that > followed by the ID code are always on new lines. This is the way my list is built but not the way it is written to file. Anyone know why the misformatting?
Apologies for the long text, I tried to keep it as short as possible to make my issue clear
I'm running Python 2.6.5
new.append(line.strip('*\n'))
new.append('*')
You have a list of lines (with newline terminators each), so you need to include \n for these two lines, too:
new.append(line[:-2] + "\n") # slice as you just checked line.endswith("*\n")
new.append("*\n")
Remember the strip (or slice, as I've changed it to) will remove the newline, so splitting a single item in the list with a value of "...*\n" into two items of "..." and "*" actually removes a newline from what you had originally.

Categories

Resources