This question already has answers here:
How to read specific lines from a file (by line number)?
(30 answers)
Closed 8 years ago.
I just found out it's not possible to write to a specific line in a csv file (only the end).
I have just come across another obstacle that I'm having trouble tackling, which is reading from a specific line in a csv file.
One way I have found to accomplish this is:
with open('file.csv',newline = '') as csvfile:
spamreader = csv.reader(csvfile,delimiter=',',quotechar = '"')
lines = []
for row in spamreader:
lines.append(row)
print('What line do you want to read from?')
line = lines[int(input())-1] #I think the -1 is right. since lists start at 0
However, I believe that this might be a slightly inefficient way to do this, since the more rows in the list "lines", the more RAM the program would be using.
Could someone tell me if this is actually an efficient way of doing this? Otherwise, I will just go with this.
Is there any way that I can do something like this?
spamreader.readRow(5) #I just made this up, but is there a similar function?
This is the page that I've been using, it's possible I skipped over it. https://docs.python.org/3/library/csv.html
Also, I'm not very advanced in programming, so if there is an advanced answer, can you try to keep the explanations fairly simple?
If you want to read starting from line 123:
for _ in range(122):
spamreader.next()
for row in spamreader:
...
With Python 3 it seems to be
next(spamreader)
One can also navigate in the file by moving the cursor to a specific byte using find and seek.
Related
This question already has answers here:
Iterating on a file doesn't work the second time [duplicate]
(4 answers)
Closed 1 year ago.
I was refactoring some code for my program and I have a mistake somewhere in the process. I am reading and writing .csv files.
In the beginning of my program I iterate through a .csv file in order to find which data from the file I need.
with open(csvPath, mode='r') as inputFile:
csvReader = csv.reader(inputFile)
potentialVals = []
paramVals = {}
for row in csvReader:
if row[3] == "Parameter":
continue
# Increment vales in dict
if row[3] not in paramVals:
paramVals[row[3]] = 1
else:
paramVals[row[3]] += 1
This iterates and works fine, the for loop gets me every row in the .csv file. I them perform some calculations and go to iterate through the same .csv file again later, and then select data to write to a new .csv file. My problem is here, when I go to iterate through a second time, it only gives me the first row of the .csv file, and nothing else.
# Write all of the information to our new csv file
with open(outputPath, mode='w') as outputFile:
csvWriter = csv.writer(outputFile, delimiter=',', quotechar='"', quoting=csv.QUOTE_ALL)
inputFile.seek(0)
rowNum = 0
for row in csvReader:
print(row)
Where the print statement is, it only prints the first line of the .csv file, and then exits the for loop. I'm not really sure what is causing this. I thought it might have been the
inputFile.seek(0)
But even if I opened a 2nd reader, the problem persisted. This for loop was working before I refactored it, all the other code is the same except the for loop I'm having trouble with, here is what it used to look like:
Edit: So I thought maybe it was a variable instance error, so I tried renaming my variables instead of reusing them and the issue persisted. Going to try a new file instance now,
Edit 2: Okay so this is interesting, when I look at the line_num value for my reader object (when I open a new one instead of using .seek) it does output 1, so I am at the beginning of my file. And when I look at the len(list(csvReader)) it is 229703, which shows that the .csv is fully there, so still not sure why it won't do anything besides the first row of the .csv
Edit 3: Just as a hail mary attempt, I tried creating a deep copy of the .csv file and iterating through that, but same results. I also tried just doing an entire separate .csv file and I also got the same issue of only getting 1 row. I guess that eliminates that it's a file issue, the information is there but there is something preventing it from reading it.
Edit 4: Here is where I'm currently at with the same issue. I might just have to rewrite this method completely haha but I'm going to lunch so I won't be able to actively respond now. Thank you for the help so far though!
# TODO: BUG HERE
with open(csvPath, mode='r') as inputFile2:
csvReader2 = csv.reader(inputFile2)
...
for row2 in csvReader2:
print("CSV Line Num: " + str(csvReader2.line_num))
print("CSV Index: " + str(rowNum))
print("CSV Length: " + str(len(list(csvReader2))))
print("CSV Row: " + str(row2))
Also incase it helps, here is csvPath:
nameOfInput = input("Please enter the file you'd like to convert: ")
csvPath = os.path.dirname(os.path.realpath(nameOfInput))
csvPath = os.path.join(csvPath, nameOfInput)
If you read the documentation carefully, it says csv reader is just a parser and all the heavy lifting is done by the underlying file object.
In your case, you are trying to read from a closed file in the second iteration and that is why it isn't working.
For csv reader to work you'll need an underlying object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable.
Link to the documentation: https://docs.python.org/3/library/csv.html
This question already has answers here:
Insert line at middle of file with Python?
(11 answers)
Closed 4 years ago.
I'm trying to insert some text at specific position of file using this:
with open("test.txt","r+") as f:
f.seek(5)
f.write("B")
But this overwrites character at position 5 with new data ("B") instead of inserting it.
for example if i have
AAAAAAAAAA
in file test.txt and run the code
I get AAAAABAAAA instead of AAAAABAAAAA (five A must be after B)
How can i insert at desired position of file instead of overwrite?
There are three answers for that:
Generic file API (one you expect on all OSes) have no interface for 'insert' (read: this is impossible)
You can implement this by yourself, by reading whole file into memory, composing new content and writing it back. (If file is big, you may need to create some code to do this in chunks).
Good news for linux users: Since linux 3.15 it's possible to insert holes in the middle of the file (basically, shifting everything in file starting from specific location of a specific offset). There is a comprehensive article on this topic here: https://lwn.net/Articles/629965/. It is supported for ext4 and XFS filesystems, and it requires some low-level operations on fd (e.f. not the usual open for the python). Moreover, as I checked (Sep 2018) a fallocate module on pypi does not support it, so you need to write a low-level code to do FALLOC_FL_INSERT_RANGE ioctl.
TL;DR; If you file is small, read it into memory and do insert in memory. If you file is medium size (1-2Gb) do it in temp file and rename it after that. If your file is large, use windowed operations or dig down to FALLOC_FL_INSERT_RANGE (if you have a relatively modern linux).
This worked for me :
with open("test.txt","r+") as f:
f.seek(5) #first fseek to the position
line=f.readline() #read everything after it
f.seek(5) #since file pointer has moved, fseek back to 5
f.write("B") #write the letter
f.write(line) #write the remaining part
Original : AAAAAAAAAA
After : AAAAABAAAAA
f1 = open("test.txt","r+")
f1.seek(5)
data = "{}{}".format("B",f1.readline())
f1.seek(5)
f1.write(data)
f1.close()
This question already has answers here:
How to read a large file - line by line?
(11 answers)
Closed 6 years ago.
I have a Python script which needs to read a section of a very large text file, starting at line N and ending at N+X.
I don't want to use "open('file')", because that will write the entire thing to the memory, which will both take too long, and waste too much memory.
My script runs on a Unix machine, so I currently use the native head and tail functions, i.e.:
section = subprocess.check_output('tail -n-N {filePath} | head -n X')
but is feels like there must be a smarter way of doing it..
is there a way to get lines N through N+X of a text file in Python without opening the entire file?
Thanks!
Python's islice() works well for doing this:
from itertools import islice
N = 2
X = 5
with open('large_file.txt') as f_input:
for row in islice(f_input, N-1, N+X):
print row.strip()
This skips over all of the initial lines and just returns the lines you are interested in.
The answer to your question is located here: How to read large file, line by line in python
with open(...) as f:
for line in f:
<do something with line>
The with statement handles opening and closing the file, including if
an exception is raised in the inner block. The for line in f treats
the file object f as an iterable, which automatically uses buffered IO
and memory management so you don't have to worry about large files.
This might be a really dumb question but I've been stuck there for more than an hour.
I am doing some csv-file reading with python using the following code:
with open(filename, 'rb') as csvfile:
for line in csvfile.readlines():
print("Line = "+str(line))
array = line.split(';')
time = float(array[TIMEPOS])
print("Initial time = "+str(time))
I have a huge number of lines in this csv file. And I see them all with the print("Line = "+str(line)). However, I only see "Initial Time = XXX" once, even though it should be displayed for every line.
I would very much like to know what I'm doing here that is wrong.
Thanks in advance
As I open your question for editing and "walk" my cursor through your code, I see that your indentations use a combination of spaces and tabs. This is bad in Python code: the interpreter does have rules on understanding this but those rules are basically un-followable for humans.
Replace all your tabs with spaces, and try your code again. And change your code editor so it uses only spaces, never tab characters.
This is maybe a very basic question, but let's suppose one has a csv file which looks as follows:
a,a,a,a
b,b,b,b
c,c,c,c
d,d,d,d
e,e,e,e
And I am interested in deleting row[1], and row[3] and rewrite a new file that does not contain such rows. What would be the best way to do this?. As the module csv is already loaded in my code, I'd like to know how to do it within such scheme. I'd be glad if somebody could help me with this.
Since each row is on a separate line (assuming there are no newlines within the data items of the rows themselves), you can do this by simply copying the file line-by-line and skipping any you don't want kept. Since I'm unsure whether you number rows starting from zero or one, I've added a symbolic constant at the beginning to control it. You could, of course, hardcode it, as well as ROWS_TO_DELETE, directly into the code.
Regardless, this approach would be faster than using, for example, the csv module, because it avoids all the unnecessarily parsing and reformatting of the data being processed that the module has to do.
FIRST_ROW_NUM = 1 # or 0
ROWS_TO_DELETE = {1, 3}
with open('infile.csv', 'rt') as infile, open('outfile.csv', 'wt') as outfile:
outfile.writelines(row for row_num, row in enumerate(infile, FIRST_ROW_NUM)
if row_num not in ROWS_TO_DELETE)
Resulting output file's contents:
b,b,b,b
d,d,d,d
e,e,e,e