imagine this situation:
a file with 1000 lines. the name of the file is file.txt
file = file.txt
word = 'error'
for line in file:
if word in line:
execute things
if I want the 8 lines BEFORE the line with the word "error", how I get it?
Read the file and save the lines in a deque of a fixed size
from collections import deque
file = "file.txt"
word = 'error'
lines = deque(maxlen=8)
with open(file) as f:
for line in f:
if word in line:
break
lines.append(line)
print(lines)
You can use combination of collections.deque() with fixed length and itertools.takewhile().
from collections import deque
from itertools import takewhile
with open("file.txt") as f:
lines = deque(takewhile(lambda l: "error" not in l, f), maxlen=8)
print(*lines, sep="", end="")
Related
Ive got some code that lets me open all csv files in a directory and run through them removing the top 2 lines of each file, Ideally during this process I would like it to also add a single comma at the end of the new first line (what would have been originally line 3)
Another approach that's possible could be to remove the trailing comma's on all other rows that appear in each of the csvs.
Any thoughts or approaches would be gratefully received.
import glob
path='P:\pytest'
for filename in glob.iglob(path+'/*.csv'):
with open(filename, 'r') as f:
lines = f.read().split("\n")
f.close()
if len(lines) >= 1:
lines = lines[2:]
o = open(filename, 'w')
for line in lines:
o.write(line+'\n')
o.close()
adding a counter in there can solve this:
import glob
path=r'C:/Users/dsqallihoussaini/Desktop/dev_projects/stack_over_flow'
for filename in glob.iglob(path+'/*.csv'):
with open(filename, 'r') as f:
lines = f.read().split("\n")
print(lines)
f.close()
if len(lines) >= 1:
lines = lines[2:]
o = open(filename, 'w')
counter=0
for line in lines:
counter=counter+1
if counter==1:
o.write(line+',\n')
else:
o.write(line+'\n')
o.close()
One possible problem with your code is that you are reading the whole file into memory, which might be fine. If you are reading larger files, then you want to process the file line by line.
The easiest way to do that is to use the fileinput module: https://docs.python.org/3/library/fileinput.html
Something like the following should work:
#!/usr/bin/env python3
import glob
import fileinput
# inplace makes a backup of the file, then any output to stdout is written
# to the current file.
# change the glob..below is just an example.
#
# Iterate through each file in the glob.iglob() results
with fileinput.input(files=glob.iglob('*.csv'), inplace=True) as f:
for line in f: # Iterate over each line of the current file.
if f.filelineno() > 2: # Skip the first two lines
# Note: 'line' has the newline in it.
# Insert the comma if line 3 of the file, otherwise output original line
print(line[:-1]+',') if f.filelineno() == 3 else print(line, end="")
Ive added some encoding as well as mine was throwing a error but encoding fixed that up nicely
import glob
path=r'C:/whateveryourfolderis'
for filename in glob.iglob(path+'/*.csv'):
with open(filename, 'r',encoding='utf-8') as f:
lines = f.read().split("\n")
#print(lines)
f.close()
if len(lines) >= 1:
lines = lines[2:]
o = open(filename, 'w',encoding='utf-8')
counter=0
for line in lines:
counter=counter+1
if counter==1:
o.write(line+',\n')
else:
o.write(line+'\n')
o.close()
As it can be seen in the code. I created two output files one for output after splitting
and second output as actual out after removing duplicate lines
How can i make only one output file. Sorry if i sound too stupid, I'm a beginner
import sys
txt = sys.argv[1]
lines_seen = set() # holds lines already seen
outfile = open("out.txt", "w")
actualout = open("output.txt", "w")
for line in open(txt, "r"):
line = line.split("?", 1)[0]
outfile.write(line+"\n")
outfile.close()
for line in open("out.txt", "r"):
if line not in lines_seen: # not a duplicate
actualout.write(line)
lines_seen.add(line)
actualout.close()
You can add the lines from the input file directly into the set. Since sets cannot have duplicates, you don't even need to check for those. Try this:
import sys
txt = sys.argv[1]
lines_seen = set() # holds lines already seen
actualout = open("output.txt", "w")
for line in open(txt, "r"):
line = line.split("?", 1)[0]
lines_seen.add(line + "\n")
for line in lines_seen:
actualout.write(line)
actualout.close()
In the first step you iterate through every line in the file, split the line on your decriminator and store it into a list. After that you iterate through the list and write it into your output file.
import sys
txt = sys.argv[1]
lines_seen = set() # holds lines already seen
actualout = open("output.txt", "w")
data = [line.split("?", 1[0] for line in open("path/to/file/here", "r")]
for line in data:
if line not in lines_seen: # not a duplicate
actualout.write(line)
lines_seen.add(line)
actualout.close()
I am trying to count elements in a text file. I know I am missing an obvious part, but I can't put my finger on it. This is what I currently have which just produces the count of the letter "f" not the file:
filename = open("output3.txt")
f = open("countoutput.txt", "w")
import collections
for line in filename:
for number in line.split():
print(collections.Counter("f"))
break
import collections
counts = collections.Counter() # create a new counter
with open(filename) as infile: # open the file for reading
for line in infile:
for number in line.split():
counts.update((number,))
print("Now there are {} instances of {}".format(counts[number], number))
print(counts)
I have a text file with lots of data in.
I am trying to read the file line by line, and check if any of the letters in each line are letters in the word "hello".
Then I would like to print any line that does not contain either an h,e,l,l,o
My text file is called data.txt
Here is more code so far:
hello = list('hello')
with open('data.txt', 'r') as file.readlines:
for line in file:
if hello not in line:
print(line)
but currently line 3 produces the error; NameError: name 'file' is not defined
update:
hello = list('hello')
with open('data.txt', 'r') as f:
for line in f:
s = set(line)
if all(i not in s for i in hello):
print(line)
Thank you for the help, now lots of lines of the text file have been eliminated, however "Epping" still prints, which has an "e" in it, as does the word "hello" and therefore it should be excluded?
You're opening your file wrong.
hello = list('hello')
with open('data.txt', 'r') as f:
for line in f:
s = set(line)
if all(letter not in s for letter in hello):
print(line)
This isn't quite the correct usage of "with". Try this:
with open('<filename>') as f:
for line in f:
...
If you just want ANY of the letters to appear (not all), you can use set intersection:
hello_set = set('hello')
for line in f:
if not set(line).intersection(hello_set):
print line
or use the "any" function:
for line in f:
s = set(line)
if any(letter not in s for letter in hello):
print(line)
You had a couple of small bugs in your code. I fixed them, please compare.
hello = list('hello')
with open('data.txt', 'r') as file:
for line in file:
if all(letter not in line for letter in "hello"):
print(line)
I am trying to remove specific line numbers from a file in python in a way such as
./foo.py filename.txt 4 5 2919
Where 4 5 and 2919 are line numbers
What I am trying to do is:
for i in range(len(sys.argv)):
if i>1: # Avoiding sys.argv[0,1]
newlist.append(int(sys.argv[i]))
Then:
count=0
generic_loop{
bar=file.readline()
count+=1
if not count in newlist:
print bar
}
it prints all the lines in original file (with blank spaces in between)
You can use enumerate to determine the line number:
import sys
exclude = set(map(int, sys.argv[2:]))
with open(sys.argv[1]) as f:
for num,line in enumerate(f, start=1):
if num not in exclude:
sys.stdout.write(line)
You can remove start=1 if you start counting at 0. In the above code, the line numbering starts with 1:
$ python3 so-linenumber.py so-linenumber.py 2 4 5
import sys
with open(sys.argv[1], 'r') as f:
sys.stdout.write(line)
If you want to write the content to the file itself, write it to a temporary file instead of sys.stdout, and then rename that to the original file name (or use sponge on the command-line), like this:
import os
import sys
from tempfile import NamedTemporaryFile
exclude = set(map(int, sys.argv[2:]))
with NamedTemporaryFile('w', delete=False) as outf:
with open(sys.argv[1]) as inf:
outf.writelines(line for n,line in enumerate(inf, 1) if n not in exclude)
os.rename(outf.name, sys.argv[1])
You can try something like this:
import sys
import os
filename= sys.argv[1]
lines = [int(x) for x in sys.argv[2:]]
#open two files one for reading and one for writing
with open(filename) as f,open("newfile","w") as f2:
#use enumerate to get the line as well as line number, use enumerate(f,1) to start index from 1
for i,line in enumerate(f):
if i not in lines: #`if i not in lines` is more clear than `if not i in line`
f2.write(line)
os.rename("newfile",filename) #rename the newfile to original one
Note that for the generation of temporary files it's better to use tempfile module.
import sys
# assumes line numbering starts with 1
# enumerate() starts with zero, so we subtract 1 from each line argument
omitlines = set(int(arg)-1 for arg in sys.argv[2:] if int(arg) > 0)
with open(sys.argv[1]) as fp:
filteredlines = (line for n,line in enumerate(fp) if n not in omitlines)
sys.stdout.writelines(filteredlines)
The fileinput module has an inplace=True option that redirects stdout to a tempfile which is automatically renamed after for you.
import fileinput
exclude = set(map(int, sys.argv[2:]))
for i, line in enumerate(fileinput.input('filename.txt', inplace=True), start=1):
if i not in exclude:
print line, # fileinput inplace=True redirects stdout to tempfile