I have a text file with almost a thousand lines such as:
WorldNews,Current
WorldNews,Current,WorldNews#here',
'WorldNewsPro#here Zebra,Poacher',
'Dock,DS_URLs#here'
Zebra,Poacher,ZebraPoacher#here
Zebra,Dock,ZebraDock#here
Timer33,Timer33#here
Sometimes the line ends without "#here" sometimes it ends with "#here" sometimes it has "#here" in the middle of the line and sometimes the line ends with "#here'"
I want to strip all the lines that do NOT have "#here" in them at all. I tried RegEx:
> (^(#here$))
> [\W](#here)
etc. with no luck.
How should I pull the lines with "#here" so my new file (or the output) has only:
WorldNews,Current,WorldNews#here',
'WorldNewsProfessional52#here
Zebra,Poacher',
'DocuShare,AC_DS_URLs#here'
Zebra,Poacher,ZebraPoacher#here
Zebra,DocuShare,ZebraDocushare#here
XNTimer,XNTimer#here
I was thinking it should read the whole line from start to end and if it has #here anywhere in the line, print it. If not, ignore and read the next line.
Thanks,
Adrian
Maybe this helps: (assuming filename is the name of your input file)
with open(filename) as stream:
for line in stream:
if '#here' in line:
print line
You dont need regex. You can use a string methods to do such simple filtering:
def hasstr( lines, s ):
# a generator expression can filter out the lines
return (line for line in lines if s in line)
# get all lines in the file with #here in them
filtered = hasstr(open(the_file, 'rt'), '#here')
You want the in operator.
for line in sys.stdin:
if '#here' in line:
sys.stdout.write(line)
Related
I want to insert a line into file "original.txt" (the file contains about 200 lines). the line neds to be inserted two lines after a string is found in one of the existing lines. This is my code, I am using a couple of print options that show me that the line is being added to the list, in the spot I need, but the file "original.txt" is not being edited
with open("original.txt", "r+") as file:
lines = file.readlines() # makes file into a list of lines
print(lines) #test
for number, item in enumerate(lines):
if testStr in item:
i = number +2
print(i) #test
lines.insert(i, newLine)
print(lines) #test
break
file.close()
I am turning the lines in the text into a list, then I enumerate the lines as I look for the string, assigning the value of the line to i and adding 2 so that the new line is inserted two lines after, the print() fiction shows the line was added in the correct spot, but the text "original.txt" is not modified
You seem to misunderstand what your code is doing. Lets go line by line
with open("original.txt", "r+") as file: # open a file for reading
lines = file.readlines() # read the contents into a list of lines
print(lines) # print the whole file
for number, item in enumerate(lines): # iterate over lines
if testStr in item:
i = number +2
print(i) #test
lines.insert(i, newLine) # insert lines into the list
print(lines) #test
break # get out of the look
file.close() # not needed, with statement takes care of closing
You are not modifying the file. You read the file into a list of strings and modify the list. To modify the actual file you need to open it for writing and write the list back into it. Something like this at the end of the code might work
with open("modified.txt", "w") as f:
for line in lines: f.write(line)
You never modified the original text. Your codes reads the lines into local memory, one at a time. When you identify your trigger, you count two lines, and then insert the undefined value newLine into your local copy. At no point in your code did you modify the original file.
One way is to close the file and then rewrite it from your final value of lines. Do not modify the file while you're reading it -- it's good that you read it all in and then start processing.
Another way is to write to a new file as you go, then use a system command to replace the original file with your new version.
I need to select the first word on each line and make a list from them from a text file:
I would copy the text but it's the formatting is quite screwed up. will try
All the other text is unnecessary.
I have tried
string=[]
for line in f:
String.append(line.split(None, 1)[0]) # add only first word
from another solution, but it keeps returning a "Index out of bounds" error.
I can get the first word from the first line using string=text.partition(' ')[0]
but I do not know how to repeat this for the other lines.
I am still new to python and to the site, I hope my formatting is bearable! (when opened, I encode the text to accept symbols, like so
wikitxt=open('racinesPrefixesSuffixes.txt', 'r', encoding='utf-8')
could this be the issue?)
The reason it's raising an IndexError is because the specific line is empty.
You can do this:
words = []
for line in f:
if line.strip():
words.append(line.split(maxsplit=1)[0])
Here line.strip() is checking if the line consists of only whitespace. If it does only consist of whitespace, it will simply skip the line.
Or, if you like list comprehension:
words = [line.split(maxsplit=1)[0] for line in f if line.strip()]
My problem is that my input file contains empty lines (this is a must), but when it reaches an empty line, the: for i, line in enumerate(file): stops reading the file. How do I prevent this.?
The reading of the file is like this, because I need to do something with every one but the last line of a file, than something else with the last line. (This is also a must.)
Here is what I'm trying to do:
with open(sys.argv[1]) as file:
i = 0
for i, line in enumerate(file):
# Do for all but last line
if i < linecount-1:
print "Not last line"
i += 1
# Do for last line
if i == linecount-1:
print "Last line"
It works fine on files without empty lines.
You do not need to declare or increment i in your code. enumerate does that for you. Incrementing additionally as you do probably triggers your conditionals accidentally; it has nothing to do with empty lines.
The mistake in your implementation is explained in the other answer, but to achieve what I think you want to do, it might be better to process as follows, then you don't need to know the length of the file in advance:
import sys
def process_regular_line(line):
print 'regular line', line
def process_last_line(line):
print 'last line:', line
with open(sys.argv[1]) as file:
last_line = file.readline()
while True:
this_line = file.readline()
if not this_line:
process_last_line(last_line)
break
process_regular_line(last_line)
last_line = this_line
For example, on the test file with 5 lines:
a line
another line
a line after a blank line
The last ever line
You get:
regular line: a line
regular line: another line
regular line:
regular line: a line after a blank line
last line: The last ever line
I have a piece of code that's removing some unwanted lines from a text file and writing the results to a new one:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write('%s\n' % line)
The original file is single-spaced, but the new file has an empty line between each line of text. How can I lose the empty lines?
You're not removing the newline from the input lines, so you shouldn't be adding one (\n) on output.
Either strip the newlines off the lines you read or don't add new ones as you write it out.
Just do:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write(line)
Every line that you read from original file has \n (new line) character at the end, so do not add another one (right now you are adding one, which means you actually introduce empty lines).
My guess is that the variable "line" already has a newline in it, but you're writing an additional newline with the g.write('%s*\n*' % line)
line has a newline at the end.
Remove the \n from your write, or rstrip line.
I am trying to write a jython code for deleting spaces from Text file.I have a following scenario.
I have a text file like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
Here all lines are single lines.
But what i want is like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
So in all i want to start checking from second line till i encounter END and delete all spaces at tyhe end of each line.
Can someone guide me for writing this code??
tried like:
srcfile=open('d:/BUR001.txt','r')
trgtfile=open('d:/BUR002.txt','w')
readfile=srcfile.readline()
while readfile:
trgtfile.write(readfile.replace('\s',''))
readfile=srcfile.readline()
srcfile.close()
trgtfile.close()
Thanks,
Mahesh
You can use fact that those special lines starts with special values:
line = srcfile.readline()
while line:
line2 = line
if not line2.startswith('START') and not line2.startswith('END'):
line2 = line2.replace(' ','')
trgtfile.write(line2)
line = srcfile.readline()
Also note that with readline() result strings ends with \n (or are empty at end of input file), and this code removes all spaces from the line, not only those at end of the line.
If I understood your example all you want is to remove empty lines, so instead of reading file line by line read it at once:
content = srcfile.read()
and then remove empty lines from content:
while '\n\n' in content:
content = content.replace('\n\n', '\n')