Reading CSV file with python - python

filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(0, os.SEEK_END)
while 1:
time.sleep(1)
where = mycsv.tell()
line = mycsv.readline()
if not line:
mycsv.seek(where)
else:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
I have this Paython code which is reading the values from a csv file every time there is a new line printed in the csv from external program. My problem is that the csv file is periodically completely rewriten and then python stops reading the new lines. My guess is that python is stuck on some line number and the new update can put maybe 50 more or less lines. So for example python is now waiting a new line at line 70 and the new line has come at line 95. I think the solution is to let mycsv.seek(0, os.SEEK_END) been updated but not sure how to do that.

What you want to do is difficult to accomplish without rewinding the file every time to make sure that you are truly on the last line. If you know approximately how many characters there are on each line, then there is a shortcut you could take using mycsv.seek(-end_buf, os.SEEK_END), as outlined in this answer. So your code could work somehow like this:
avg_len = 50 # use an appropriate number here
end_buf = 3 * avg_len / 2
filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(-end_buf, os.SEEK_END)
last = mycsv.readlines()[-1]
while 1:
time.sleep(1)
mycsv.seek(-end_buf, os.SEEK_END)
line = mycsv.readlines()[-1]
if not line == last:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
Here, in each iteration of the while loop, you seek to a position close to the end of the file, just far back enough that you know for sure the last line will be contained in what remains. Then you read in all the remaining lines (this will probably include a partial amount of the second or third to last lines) and check if the last line of these is different to what you had before.

You can do a simpler way of reading lines in your program. Instead of trying to use seek in order to get what you need, try using readlines on the file object mycsv.
You can do the following:
mycsv = open('NTS.csv', 'r')
csv_lines = mycsv.readlines()
for line in csv_lines:
arr_line = line.split(',')
var3 = arr_line[3]
print(var3)

Related

Function to divide a text file into two files

I wrote a function to input a text file and a ratio (eg. 80%) to divide the first 80% of the file into a file and the other 20% to another file. The first part is correct but the second part is empty. can someone take a look and let me know my mistake?
def splitFile(inputFilePatheName, outputFilePathNameFirst, outputFilePathNameRest, splitRatio):
lines = 0
buffer = bytearray(2048)
with open(inputFilePatheName) as f:
while f.readinto(buffer) > 0:
lines += buffer.count('\n')
print lines
line80 = int(splitRatio * lines)
print line80
with open(inputFilePatheName) as originalFile:
firstNlines = originalFile.readlines()[0:line80]
restOfTheLines=originalFile.readlines()[(line80+1):lines]
print len(firstNlines)
print len(restOfTheLines)
with open(outputFilePathNameFirst, 'w') as outputFileNLines:
for item in firstNlines:
outputFileNLines.write("{}".format(item))
with open(outputFilePathNameRest,'w') as outputFileRest:
for word in restOfTheLines:
outputFileRest.write("{}".format(word))
I believe this is your problem:
firstNlines = originalFile.readlines()[0:line80]
restOfTheLines=originalFile.readlines()[(line80+1):lines]
When you call readlines() the second time, you don't get anything, because you've already read all the lines from the file. Try:
allLines = originalFile.readlines()
firstNLines, restOfTheLines = allLines[:line80], allLines[(line80+1):]
Of course, for very large files there is a problem that you are reading the entire file into memory.

Python - how to get last line in a loop

I have some CSV files that I have to modify which I do through a loop. The code loops through the source file, reads each line, makes some modifications and then saves the output to another CSV file. In order to check my work, I want the first line and the last line saved in another file so I can confirm that nothing was skipped.
What I've done is put all of the lines into a list then get the last one from the index minus 1. This works but I'm wondering if there is a more elegant way to accomplish this.
Code sample:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
check_count = 0
check_list = []
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
check_list.append(line)
check_count += 1
if check_count == 1:
check.write(line)
[CSV modifications become a string called "newline"]
fb.write(newline)
final_check = check_list[len(check_list)-1]
check.write(final_check)
fb.close()
If you actually need check_list for something, then, as the other answers suggest, using check_list[-1] is equivalent to but better than check_list[len(check_list)-1].
But do you really need the list? If all you want to keep track of is the first and last lines, you don't. If you keep track of the first line specially, and keep track of the current line as you go along, then at the end, the first line and the current line are the ones you want.
In fact, since you appear to be writing the first line into check as soon as you see it, you don't need to keep track of anything but the current line. And the current line, you've already got that, it's line.
So, let's strip all the other stuff out:
def CVS1():
fb = open('C:\\HP\\WS\\final-cir.csv','wb')
check = open('C:\\HP\\WS\\check-all.csv','wb')
first_line = True
with open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for line in skip_first_line:
if first_line:
check.write(line)
first_line = False
[CSV modifications become a string called "newline"]
fb.write(newline)
check.write(line)
fb.close()
You can enumerate the csv rows of inpunt file, and check the index, like this:
def CVS1():
with open('C:\\HP\\WS\\final-cir.csv','wb') as fb, open('C:\\HP\\WS\\check-all.csv','wb') as check, open('C:\\HP\\WS\\CVS1-source.csv','r') as infile:
skip_first_line = islice(infile, 3, None)
for idx,line in enumerate(skip_first_line):
if idx==0 or idx==len(skip_first_line):
check.write(line)
#[CSV modifications become a string called "newline"]
fb.write(newline)
I've replaced the open statements with with block, to delegate to interpreter the files handlers
you can access the index -1 directly:
final_check = check_list[-1]
which is nicer than what you have now:
final_check = check_list[len(check_list)-1]
If it's not an empty or 1 line file you can:
my_file = open(root_to file, 'r')
my_lines = my_file.readlines()
first_line = my_lines[0]
last_line = my_lines[-1]

Include surrounding lines of text file match in output using Python 2.7.3

I've been working on a program which assists in log analysis. It finds error or fail messages using regex and prints them to a new .txt file. However, it would be much more beneficial if the program including the top and bottom 4 lines around what the match is. I can't figure out how to do this! Here is a part of the existing program:
def error_finder(filepath):
source = open(filepath, "r").readlines()
error_logs = set()
my_data = []
for line in source:
line = line.strip()
if re.search(exp, line):
error_logs.add(line)
I'm assuming something needs to be added to the very last line, but I've been working on this for a bit and either am not applying myself fully or just can't figure it out.
Any advice or help on this is appreciated.
Thank you!
Why python?
grep -C4 '^your_regex$' logfile > outfile.txt
Some comments:
I'm not sure why error_logs is a set instead of a list.
Using readlines() will read the entire file in memory, which will be inefficient for large files. You should be able to just iterate over the file a line at a time.
exp (which you're using for re.search) isn't defined anywhere, but I assume that's elsewhere in your code.
Anyway, here's complete code that should do what you want without reading the whole file in memory. It will also preserve the order of input lines.
import re
from collections import deque
exp = '\d'
# matches numbers, change to what you need
def error_finder(filepath, context_lines = 4):
source = open(filepath, 'r')
error_logs = []
buffer = deque(maxlen=context_lines)
lines_after = 0
for line in source:
line = line.strip()
if re.search(exp, line):
# add previous lines first
for prev_line in buffer:
error_logs.append(prev_line)
# clear the buffer
buffer.clear()
# add current line
error_logs.append(line)
# schedule lines that follow to be added too
lines_after = context_lines
elif lines_after > 0:
# a line that matched the regex came not so long ago
lines_after -= 1
error_logs.append(line)
else:
buffer.append(line)
# maybe do something with error_logs? I'll just return it
return error_logs
I suggest to use index loop instead of for each loop, try this:
error_logs = list()
for i in range(len(source)):
line = source[i].strip()
if re.search(exp, line):
error_logs.append((line,i-4,i+4))
in this case your errors log will contain ('line of error', line index - 4, line index + 4), so you can get these lines later form "source"

getting data out of a txt file

I'm only just beginning my journey into Python. I want to build a little program that will calculate shim sizes for when I do the valve clearances on my motorbike. I will have a file that will have the target clearances, and I will query the user to enter the current shim sizes, and the current clearances. The program will then spit out the target shim size. Looks simple enough, I have built a spread-sheet that does it, but I want to learn python, and this seems like a simple enough project...
Anyway, so far I have this:
def print_target_exhaust(f):
print f.read()
#current_file = open("clearances.txt")
print print_target_exhaust(open("clearances.txt"))
Now, I've got it reading the whole file, but how do I make it ONLY get the value on, for example, line 4. I've tried print f.readline(4) in the function, but that seems to just spit out the first four characters... What am I doing wrong?
I'm brand new, please be easy on me!
-d
To read all the lines:
lines = f.readlines()
Then, to print line 4:
print lines[4]
Note that indices in python start at 0 so that is actually the fifth line in the file.
with open('myfile') as myfile: # Use a with statement so you don't have to remember to close the file
for line_number, data in enumerate(myfile): # Use enumerate to get line numbers starting with 0
if line_number == 3:
print(data)
break # stop looping when you've found the line you want
More information:
with statement
enumerate
Not very efficient, but it should show you how it works. Basically it will keep a running counter on every line it reads. If the line is '4' then it will print it out.
## Open the file with read only permit
f = open("clearances.txt", "r")
counter = 0
## Read the first line
line = f.readline()
## If the file is not empty keep reading line one at a time
## till the file is empty
while line:
counter = counter + 1
if counter == 4
print line
line = f.readline()
f.close()

Read a multielement list, look for an element and print it out in python

I am writing a python script in order to write a tex file. But I had to use some information from another file. Such file has names of menus in each line that I need to use. I use split to have a list for each line of my "menu".
For example, I had to write a section with the each second element of my lists but after running, I got anything, what could I do?
This is roughly what I am doing:
texfile = open(outputtex.tex', 'w')
infile = open(txtfile.txt, 'r')
for line in infile.readlines():
linesplit = line.split('^')
for i in range(1,len(infile.readlines())):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
By the way, in the inclugraphics line, I had to increace the number after pg_ from "0001" to "25050". Any clues??
I really appreciate your help.
I don't quite follow your question. But I see several errors in your code. Most importantly:
for line in infile.readlines():
...
...
for i in range(1,len(infile.readlines())):
Once you read a file, it's gone. (You can get it back, but in this case there's no point.) That means that the second call to readlines is yielding nothing, so len(infile.readlines()) == 0. Assuming what you've written here really is what you want to do (i.e. write file_len * (file_len - 1) + 1 lines?) then perhaps you should save the file to a list. Also, you didn't put quotes around your filenames, and your indentation is strange. Try this:
with open('txtfile.txt', 'r') as infile: # (with automatically closes infile)
in_lines = infile.readlines()
in_len = len(in_lines)
texfile = open('outputtex.tex', 'w')
for line in in_lines:
linesplit = line.split('^')
for i in range(1, in_len):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
Perhaps you don't actually want nested loops?
infile = open('txtfile.txt', 'r')
texfile = open('outputtex.tex', 'w')
for line_number, line in enumerate(infile):
linesplit = line.split('^')
texfile.write('\section{{{0}}}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' % line_number)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
infile.close()

Categories

Resources