Recently I came across a strange behavior of the with open() statement in Python.
The following code returns output just for the first read-statement, having an empty lines-list.
input_csv = []
with open(self.path, 'r') as f: # Opening the CSV
r = csv.DictReader(f)
for row in r:
input_csv.append(row) # Storing its contents in a dictionary for later use
lines = f.readlines() # Reading it in as a list too
f.close()
While splitting it into two open () statements returns the objects as desired.
input_csv = []
with open(self.path, 'r') as f: # Opening the CSV
r = csv.DictReader(f)
for row in r:
input_csv.append(row) # Storing its contents in a dictionary for later use
f.close()
with open(self.path, 'r') as f: # Opening the CSV
lines = f.readlines() # Reading it in as a list too
f.close()
Why is the f variable just used once in the first statement?
Many thanks
If you look into documentation of csv.reader() which is used for DictReader().reader:
Return a reader object which will iterate over lines in the given csvfile. csvfile can be any object which supports the iterator protocol and returns a string each time its __next__() method is called...
Hence, it uses behavior of file-like object for which each iteration essentially is f.readline(). An operation which also advances current position in the file... until EOF is reached, which when iteration raises StopIteration exception. It is the same behavior you would observe trying:
with open(self.path, 'r') as f:
for l in f:
pass # each line was read
print(f.readlines())
You can add print(f.tell()) to see how the position changes as you execute each line.
If you (re)open a new file, you start at position 0 (again). If you've read through once and wanted to use the same handle again, you need to return to the beginning of the file: f.seek(0).
Note: you really do not need to perform f.close() in a managed context using with. Once you leave it, it'll close the file handle for you.
Related
I'm trying to make the program that convert all words to uppercase.
a = open("file.txt",encoding='UTF-8')
for b in a:
c = b.rstrip()
print(c.upper())
a.close()
this is my code
it prints uppercase text. But it can't save the file on 'file.txt'.
I want to convert all words to uppercase.
How can I solve it????
Here's how you can do it: [provided that you are working with a small file]
Open the file in read mode store the uppercase text in a variable; then, open another file handler in write mode and write the content into it.
with open('file.txt' , 'r') as input:
y = input.read().upper()
with open('file.txt', 'w') as out:
out.write(y)
You can actually do this "in place" by reading and writing a character at a time.
with open("file.txt", "r") as f:
while (b := f.read(1)) != '':
f.write(b.upper())
This is safe because you are processing the file one byte at a time (and writing one byte for every byte read) and not using seek to potentially overwrite a byte before it is read. The file-like object's underlying buffering and your system's disk cache means this isn't as inefficient as it looks.
(This does make one assumption: that the encoded length of b is always the same as b.upper(). I suspect that should always be true. If not, you should be able to read and write at least a line at a time, though not in place:
with open("input.txt") as inh, open("output.txt", "w") as outh:
for line in inh:
print(line.upper(), file=outh)
)
First convert the txt into the string:
with open('file.txt', 'r') as file:
data = file.read()
And then revise the data to the uppercase:
data_revise = data.upper()
Finally revise the texts in the file:
fout = open('data/try.txt', 'w')
fout.write(data_revise)
You can write all changes to temporary file and replace original after all data processed. You can use either map() or generator expression:
with open(r"C:\original.txt") as inp_f, open(r"C:\temp.txt", "w+") as out_f:
out_f.writelines(map(str.upper, inp_f))
with open(r"C:\original.txt") as inp_f, open(r"C:\temp.txt", "w+") as out_f:
out_f.writelines(s.upper() for s in inp_f)
To replace original file you can use shutil.move():
import shutil
...
shutil.move(r"C:\temp.txt", r"C:\original.txt")
I have a problem returning values using readlines() and readline(), but not with read().
Any one knows how this can happen?
Appreciate it
with open('seatninger.txt', 'r') as f: # open within a context manager
f_contents = f.read()
f_contents_list = f.readlines()
f_contents_line = f.readline()
print(f_contents)
print(f_contents_list)
print(f_contents_line)
You have exausted the file with read, so you need to go back to read it again using seek:
f_contents = f.read()
f.seek(0)
f_contents_list = f.readlines()
f.seek(0)
f_contents_line = f.readline()
Python goes through the file, reads data and remembers where it stopped.
When you use read() it reads whole file and it stops at the end of the file.
When you use readlines() it reads whole file, splits it on newline character and returns the list.
When you use readline() it reads and returns next line, remembering where it stopped reading, distinguishing lines based on newline character.
I have a large CSV file (~250000 rows) and before I work on fully parsing and sorting it I was trying to display only a part of it by writing it to a text file.
csvfile = open(file_path, "rb")
rows = csvfile.readlines()
text_file = open("output.txt", "w")
row_num = 0
while row_num < 20:
text_file.write(", ".join(row[row_num]))
row_num += 1
text_file.close()
I want to iterate through the CSV file and write only a small section of it to a text file so I can look at how it does this and see if it would be of any use to me. Currently the text file ends up empty.
A way I thought might do this would be to iterate through the file with a for loop that exits after a certain number of iteration but I could be wrong and I'm not sure how to do this, any ideas?
There's nothing specifically wrong with what you're doing, but it's not particularly Pythonic. In particular reading the whole file into memory with readlines() at the start seems pointless if you're only using 20 lines.
Instead you could use a for loop with enumerate and break when necessary.
csvfile = open(file_path, "rb")
text_file = open("output.txt", "w")
for i, row in enumerate(csvfile):
text_file.write(row)
if row_num >= 20:
break
text_file.close()
You could further improve this by using with blocks to open the files, rather than closing them explicitly. For example:
with open(file_path, "rb") as csvfile:
#your code here involving csvfile
#now the csvfile is closed!
Also note that Python might not be the best tool for this - you could do it directly from Bash, for example, with just head -n20 csvfile.csv > output.txt.
A simple solution would be to just do :
#!/usr/bin/python
# -*- encoding: utf-8 -*-
file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:
for i, row in enumerate(csvfile):
textfile.write(row)
if i >= 20:
break
Explanation :
with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:
Instead of using open and close, it is recommended to use this line instead. Just write the lines that you want to execute when your file is opened into a new level of indentation.
'rb' and 'wb' are the keywords you need to open a file in respectively 'reading' and 'writing' in 'binary mode'
for i, row in enumerate(csvfile):
This line allows you to read line by line your CSV file, and using a tuple (i, row) gives you both the content of the row and its index. That's one of the awesome built-in functions from Python : check out here for more about it.
Hope this helps !
EDIT : Note that Python has a CSV package that can do that without enumerate :
# -*- encoding: utf-8 -*-
import csv
file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
reader = csv.reader(csvfile)
with open('output.txt', 'wb') as textfile:
writer = csv.writer(textfile)
i = 0
while i<20:
row = next(reader)
writer.writerow(row)
i += 1
All we need to use is its reader and writer. They have functions next (that reads one line) and writerow (that writes one). Note that here, the variable row is not a string, but a list of strings, because the function does the split job by itself. It might be faster than the previous solution.
Also, this has the major advantage of allowing you to look anywhere you want in the file, no necessarily from the beginning (just change the bounds for i)
I am trying to parse a "pseudo-CSV" file with the python CSV reader, and am having some doubts about how to add some extra logic. The reason I call it a "pseudo-CSV" file is because some of the lines in the input file will have text (30-40 chars) before the actual CSV data starts. I am trying to figure out the best way to remove this text.
Currently, I have found 3 options for removing said text:
From Python, call grep and sed and pipe the output to a temp file which can then be fed to the csv reader
(Ugh, I would like to avoid this option)
Create a CSV dialect to remove the unwanted text
(This option just feels wrong)
Extend the File object, implementing the next() function to remove the unwanted text as necessary.
I have no control over how the input file is generated, so its not an option to modify the generation.
Here is the related code I had when I realized the problem with the input file.
with open('myFile', 'r') as csvfile:
theReader = csv.reader(csvfile)
for row in theReader:
# my logic here
If I go with option 3 above, the solution is quite straight-forward, but
then I wont be able to incorporate the with open() syntax.
So, here is my question (2 actually): Is option 3 the best way to solve this
problem? If so, how can I incorporate it with the with open() syntax?
Edit: Forgot to mention that Im using Python 2.7 on Linux.
csv.reader accepts an arbitrary iterable besides files:
with open('myFile', 'rb') as csvfile:
reader = csv.reader(filter_line(line) for line in csvfile)
for row in reader:
# my logic here
You can just use contextlib and create your own context manager.
from contextlib import contextmanager
#contextmanager
def csv_factory(filename, mode="r"):
# setup here
fileobj = open(filename, mode)
reader = mycsv.reader(fileobj)
try:
yield reader # return value for usage in with
finally:
fileobj.close() # clean up here
with csv_factory("myFile") as csvfile:
for line in csvfile:
print(line)
So I have a file that contains this:
SequenceName 4.6e-38 810..924
SequenceName_FGS_810..924 VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
SequenceName 1.6e-38 887..992
SequenceName_GYQ_887..992 PLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
I want my program to read only the lines that contain these protein sequences. Up until now I got this, which skips the first line and read the second one:
handle = open(filename, "r")
handle.readline()
linearr = handle.readline().split()
handle.close()
fnamealpha = fname + ".txt"
handle = open(fnamealpha, "w")
handle.write(">%s\n%s\n" % (linearr[0], linearr[1]))
handle.close()
But it only processes the first sequence and I need it to process every line that contains a sequence, so I need a loop, how can I do it?
The part that saves to a txt file is really important too so I need to find a way in which I can combine these two objectives.
My output with the above code is:
>SequenceName_810..924
VAWNCRQNVFWAPLFQGPYTPARYYYAPEEPKHYQEMKQCFSQTYHGMSFCDGCQIGMCH
Okay, I think I understand your question--you want to iterate over the lines in the file, right? But only the second line in the sequence--the one with the protein sequence--matters, correct? Here's my suggestion:
# context manager `with` takes care of file closing, error handling
with open(filename, 'r') as handle:
for line in handle:
if line.startswith('SequenceName_'):
print line.split()
# Write to file, etc.
My reasoning being that you're only interested in lines that start with SequenceName_###.
Use readlines and throw it all into a for loop.
with open(filename, 'r') as fh:
for line in fh.readlines:
# do processing here
In the #do processing here section, you can just prepare another list of lines to write to the other file. (Using with handles all the proper closure and sure.)