How to get a list from a file in python - python

I am trying to get the contents of a file in a list. For reference: these are the contents of the file:
1. item1
2. item2
3. item3
I am able to open the files, and when I do file.read(), all I get from the interpreter is an empty string:
>>> file = open("C:\\Users\\vivrd\\Desktop\\testing.txt")
>>> file.read()
''
If I do file.readlines(), it displays a list, but even though the python docs say that file.readlines() returns a list, whenever I try to assign a variable to it (to access the list), all I get is an empty list:
>>> file.readlines()
['1. item1\n', '2. item2\n', '3. item3'] #returns the list
>>> item = file.readlines()
>>> item #but the list is empty!
[]
I have also tried to loop through the file, but it doesn't print out anything:
>>> for line in file:
print(line)
>>>
Also, for some reason, file.readlines() is only working once. I tried it a second time, and the interpreter didn't even display anything:
>>> file.readlines()
[]
My target is to get a list that looks like this: ["1. item1", "2. item2", "3. item3"] or even with the escape sequences (though not preferable): ["1. item1 \n", "2. item2 \n", "3. item3"]
How can I get to this point? Thanks!

It will work only once, because after that, python is done reading the file (here's a question that discusses this in detail). Read it once and store it in a variable:
>>> f = open("C:\\Users\\vivrd\\Desktop\\testing.txt")
>>> items = f.readlines() # items should now be a list of lines in the file
>>> f.close() # make sure to close the file
If you want to re-read, move the 'cursor' back to the beginning of the file (before closing) using seek:
>>> items = f.readlines()
>>> f.seek(0)
>>> more_items = f.readlines()
>>> f.close()

Try with these lines of codes .
with open(filename , "r") as file:
content=file.readlines()
for lines in content:
print(lines)

Related

Is there a way of looking for words from one file in another file and outputting the words not found in the other file, in a new file?

I am trying to compare two files in Python, which both contain some words. I would like the code to look for words from file1 in file2 and put the words that are not found from file1 in a new file as an output.
The code below is what I've tried, but it doesn't do anything. It doesn't even show an error, so I don't know what goes wrong or should be different.
file1 = open('C:/Users/Atal/Desktop/School/Project datas/file1.txt')
file2 = open('C:/Users/Atal/Desktop/School/Project datas/file2.txt')
fileContent = file1.read();
fileContent2 = file2.read();
loglist = file1.readlines()
loglist2 = file2.readlines()
file2.close()
line = file1.readline()
file1.close()
found = False
for line in loglist:
if line in loglist2 :
found = True
if not found:
file1 = open('C:/Users/Atal/Desktop/School/Project datas/file1.txt', 'w')
file1.write(line +"\n")
file1.close()
file1 looks like this:
Peter
Jan
Richard
file2 looks like this:
Floyd
Richard
Bob
The new file should look like this:
Peter
Jan
If there is any way to do this, please let me know. Thanks in advance.
Use set and not in like so:
list_1 = ['Peter', 'Jan', 'Richard']
list_2 = ['Floyd', 'Richard', 'Bob']
set_2 = set(list_2)
main_list = [item for item in list_1 if item not in set_2]
main_list
Output:
['Peter', 'Jan']
When writing code, you need to keep in mind exactly what you're expecting each variable to contain at every step of your program's execution. For example, this:
loglist = file1.readlines()
...
line = file1.readline()
...
for line in loglist:
why do that middle statement at all, if you're just going to overwrite line immediately? And within your for loop:
for line in loglist:
if line in loglist2:
found = True
if not found:
# save new file
So, if a line from loglist is found in loglist2, then set the variable found to True. And if that didn't happen (if found remains False) then output to file1. Note here that you're not doing anything else with line, and even if you were, the line file1.write(line +"\n") only ever outputs one line and never repeats with other lines (or so I surmise from the way you indented your code in your question).
So, here's how you would do this more correctly. As you read through this, pay attention to what type (string, list, etc.) each variable is whenever it's used:
with open(".../file1.txt", "r") as file1, open(".../file2.txt", "r") as file2:
logList1 = file1.readlines()
logList2 = file2.readlines()
# the with block will close the files automatically
for line in logList1:
if line in logList2:
logList2.remove(line) # if the line from file1 is found in file2, remove that line from file2
with open(".../file3.txt", "w") as file3:
file3.writelines(logList2) # write the contents of file2, after we removed lines from file1 from it
#johnny1995, in his answer, did the middle step in a list comprehension:
logList3 = [line for line in logList2 if line not in logList1]
which is essentially shorthand for what I did above: "make a new list containing every line from logList2, but only if that line doesn't appear in logList1".

How to create a list from a text file in Python

I have a text file called "test", and I would like to create a list in Python and print it. I have the following code, but it does not print a list of words; it prints the whole document in one line.
file = open("test", 'r')
lines = file.readlines()
my_list = [line.split(' , ')for line in open ("test")]
print (my_list)
You could do
my_list = open("filename.txt").readlines()
When you do this:
file = open("test", 'r')
lines = file.readlines()
Lines is a list of lines. If you want to get a list of words for each line you can do:
list_word = []
for l in lines:
list_word.append(l.split(" "))
I believe you are trying to achieve something like this:
data = [word.split(',') for word in open("test", 'r').readlines()]
It would also help if you were to specify what type of text file you are trying to read as there are several modules(i.e. csv) that would produce the result in a much simpler way.
As pointed out, you may also strip a new line(depends on what line ending you are using) and you'll get something like this:
data = [word.strip('\n').split(',') for word in open("test", 'r').readlines()]
This produces a list of lines with a list of words.

Reading inputs written in different lines from a file and storing each input inside a list

I'm trying to read a file named one.txt which contains the following:
hat
cow
Zu6
This is a sentence
and I'm trying to store each string written on each line inside a list. For example, my output list should contain the following elements:
['hat', 'cow', 'Zu6', 'This is a sentence']
Here's my approach for doing this:
def first(ss):
f = open(ss, 'r')
text = f.readline()
f.close()
lines = []
li = [lines.append(line) for line in text]
print li
first('D:\\abc\\1\\one.txt')
However, when I try to print li, here's what I get as the output:
[None, None, None, None]
What's wrong with my approach?
print list(open("my_text.txt"))
is probably a pretty easy way to do it ...
ofc people are gonna come screaming about dangling pointers so for the sake of good habits
with open("my_text.txt") as f:
print list(f)
alternatively
f.readlines()
you might need to strip off some newline characters
[line.strip() for line in f]

Adding each item in list to end of specific lines in FASTA file

I solved this in the comments below.
So essentially what I am trying to do is add each element of a list of strings to the end of specific lines in a different file.
Hard to explain but essentially I want to parse a FASTA file, and every time it reaches a header (line.startswith('>')) I want it to replace parts of that header with an element in a list I've already made.
For example:
File1:
">seq1 unwanted here
AATATTATA
ATATATATA
>seq2 unwanted stuff here
GTGTGTGTG
GTGTGTGTG
>seq3 more stuff I don't want
ACACACACAC
ACACACACAC"
I want it to keep ">seq#" but replace everything after with the next item in the list below:
List:
mylist = "['things1', '', 'things3', 'things4', '' 'things6', 'things7']"
Result (modified file1):
">seq1 things1
AATATTATA
ATATATATA
>seq2 # adds nothing here due to mylist[1] = ''
GTGTGTGTG
GTGTGTGTG
>seq3 things3
ACACACACAC
ACACACACAC
As you can see I want it to add even the blank items in the list.
So once again, I want it to parse this FASTA file, and every time it gets to a header (there are thousands), I want it to replace everything after the first word with the next item in the separate list I have made.
What you have will work, but there are a few unnecessary lines so I've edited down to use a few less lines. Also, an important note is that you don't close your file handles. This could result in errors, specifically when writing to file, either way it's bad practice. code:
#!/usr/bin/python
import sys
# gets list of annotations
def get_annos(infile):
with open(infile, 'r') as fh: # makes sure the file is closed properly
annos = []
for line in fh:
annos.append( line.split('\t')[5] ) # added tab as separator
return annos
# replaces extra info on each header with correct annotation
def add_annos(infile1, infile2, outfile):
annos = get_annos(infile1) # contains list of annos
with open(infile2, 'r') as f2, open(outfile, 'w') as output:
for line in f2:
if line.startswith('>'):
line_split = list(line.split()[0]) # split line on whitespace and store first element in list
line_split.append(annos.pop(0)) # append data of interest to current id line
output.write( ' '.join(line_split) + '\n' ) # join and write to file with a newline character
else:
output.write(line)
anno = sys.argv[1]
seq = sys.argv[2]
out = sys.argv[3]
add_annos(anno, seq, out)
get_annos(anno)
This is not perfect but it cleans things up a bit. I'd might veer away from using pop() to associate the annotation data with the sequence IDs unless you are certain the files are in the same order every time.
There is a great library in python for Fasta and other DNA file parsing. It is totally helpful in Bioinformatics. You can also manipulate any data according to your need.
Here is a simple example extracted from the library website:
from Bio import SeqIO
for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"):
print(seq_record.id)
print(repr(seq_record.seq))
print(len(seq_record))
You should get something like this on your screen:
gi|2765658|emb|Z78533.1|CIZ78533
Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGATGAGACCGTGG...CGC', SingleLetterAlphabet())
740
...
gi|2765564|emb|Z78439.1|PBZ78439
Seq('CATTGTTGAGATCACATAATAATTGATCGAGTTAATCTGGAGGATCTGTTTACT...GCC', SingleLetterAlphabet())
592
***********EDIT*********
I solved this before anyone could help. This is my code, can anyone tell me if I have any bad practices? Is there a way to do this without writing everything to a new file? Seems like it would take a long time/lots of memory.
#!/usr/bin/python
# Script takes unedited FASTA file, removed seq length and
# other header info, adds annotation after sequence name
# run as: $ python addanno.py testanno.out testseq.fasta out.txt
import sys
# gets list of annotations
def get_annos(infile):
f = open(infile)
list2 = []
for line in f:
columns = line.strip().split('\t')
list2.append(columns[5])
return list2
# replaces extra info on each header with correct annotation
def add_annos(infile1, infile2, outfile):
mylist = get_annos(infile1) # contains list of annos
f2 = open(infile2, 'r')
output = open(out, 'w')
for line in f2:
if line.startswith('>'):
l = line.partition(" ")
list3 = list(l)
del list3[1:]
list3.append(' ')
list3.append(mylist.pop(0))
final = ''.join(list3)
line = line.replace(line, final)
output.write(line)
output.write('\n')
else:
output.write(line)
anno = sys.argv[1]
seq = sys.argv[2]
out = sys.argv[3]
add_annos(anno, seq, out)
get_annos(anno)

Next line escape character not working python

I used the following code to read from a text file line by line and print it on screen.
with open("source.txt") as f:
content = f.readlines()
print(content)
print('\n')
f.close()
But the \n was just getting appended to the output and the output was coming in a single line instead. For example if the file was like this:
abc
def
ghi
the output was:
['abc\n', 'def\n', 'ghi']
Then I tried changing the single quotes with the '\n' with "\n" like this:
with open("source.txt") as f:
content = f.readlines()
print(content)
print("\n")
f.close()
The actual output I need is:
abc
def
ghi
What can i do for that? Operating platform: Mac(Unix) Thanks in advance.
You should do it this way:
with open('source.txt', 'r') as f:
for line in f: #iterate over lines
line = line.strip() #removes whitespaces and new lines
print line #print the line, the print function adds new line
readlines() loads the whole file in memory and if the file is bigger than your memory you can't read it, so iterate over the file.
You can use rstrip():
>>> for i in content:
... print i.rstrip()
...
abc
def
ghi
The problem with your code is that it isn't doing what you would expect it to do. content is a list, and printing the list would just have ['abc\n', etc]. You can use a for-loop (as I have shown) to go through each element in the list and individually print out all the elements on a separate line.
I'm not exactly sure why you have print('\n'), but I'm presuming that you come from another programming language. Python automatically adds a newline, so adding one is not needed :).
Finally, rstrip() is needed to strip the newline, otherwise this would appear:
>>> for i in L:
... print i
...
abc
def
ghi
The problem is you were trying to print the list object itself, instead you should loop over the list and print individual items:
>>> lis = ['abc\n', 'def\n', 'ghi']
>>> print lis
['abc\n', 'def\n', 'ghi']
print lis actually prints the str representation of the list object:
>>> print str(lis)
['abc\n', 'def\n', 'ghi']
Loop over the list and print individual items. In python we can loop over the list itself unlike C/C++ where we require indexes.
>>> for item in lis:
... print item.rstrip('\n') #removes the trailing '\n'
...
abc
def
ghi
A for-loop over a list or any other iterable returns the next item from that iterable one by one and assigns it to the variable used in for-loop:
for x in lis: #in each iteration x is assgined the next item from lis
print x
with open('source.txt', 'r') as f:
content = f.read()
print content

Categories

Resources