Find lines with a phrase and print another section of the line - python

I am trying to search through a long text file to locate sections where a phrase is located and then print the phrase in one column and the corresponding data in another in a new text file.
Phrase I am looking for is "Initialize All". The text file will have thousands of lines - the one I am looking for will look something like this:
14-09-23 13:47:46.053 -07 000000027 INF: Initialize All start
This is where I am at so far
Still trying to print three separate columns: Initialize All, Date, Time
with open ('Result.txt', 'w') as wFile:
with open('Log.txt', 'r') as f:
for line in f:
if 'Initialize All' in line:
date, time = line.split(" ",2)[:2]
wFile.write(date)

with open('file.txt', 'r') as f:
for line in f:
if 'Inintialize All' in line:
# do stuff with line

you can use regex:
lines=open('file.txt', 'r').readlines()
[re.search(r'\d{2}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}\.\d{3}',line).group(0) for line in lines: if 'Inintialize All' in line]

s = "14-09-23 13:47:46.053 -07 000000027 INF: Initialize All start"
if "Initialize All" in s: # check for substring
date, time = s.split(" ",2)[:2] # split on whitespace and get the first two elements
print date,time
14-09-23 13:47:46.053
The 2 in s.split(" ",2) means the maxsplit is set to 2 so we just split twice other than splitting the whole string, s.split()[:2] will also work as it splits on whitespace by default but as we only want the first two substrings there is no point splitting the whole string.

Related

Python: How to remove part of a string starting at a keyword for multiple lines?

Here is my code:
with open('locations.txt', 'r') as f, open('output.txt', 'w') as fo:
for line in f:
fo.write(line.replace('test'[:-1], ''))
I have a file with multiple lines of text:
This is a test the cat jumped around
This is another test the dog jumped under
This is a third test the cow jumped over
I want to be able to open the text file, and remove everything on each line after the word 'test'. So result would look like:
This is a test
This is another test
This is a third test
I am trying to use .replace() but using the paramter of -1 it just removed everything but the last letter in test. I am really not sure how to have the word 'test' be the input then have it remove the rest of the string after that on each line.
use regex to find where "test" first appears in your string
with open('locations.txt', 'r') as f, open('output.txt', 'w') as fo:
for line in f:
index = re.search("test",line).span()[1]
fo.write(line[:index ])
heres a breakdown:
re.search("test",line) searches for "test" in line
re.search("test",line).span() returns a tuple with the starting position and the ending position of what you wanted to find ("test")
re.search("test",line).span()[1] gives you the ending position of the word "test" in the line
finally line[:index ] gives you a slice of line up until the ending position of where it found "test"
You really don't need regex if you know that 'test' appears in every line. Just slice the string at the index of were test begins plus the length of test
with open('locations.txt', 'r') as f, open('output.txt', 'w') as fo:
for line in f:
fo.write(line[:line.index('test') + len('test')])
Have a look at split(). .split(separator, maxsplit) is going to slice the string at the keyword and append them to a list which is returned then returned.
Set the maxsplit to 1 if there are multiple occurences of the keyword but you only need the first one.
with open('locations.txt', 'r') as f, open('output.txt', 'w') as fo:
for line in f:
new_string = line.split('test')[0] + "test"# split removes the separator keyword
fo.write(new_string)

Split and print the word before and after the \ of *n of lines, from a txt to two different txt's

I searched around a bit, but I couldn't find a solution that fits my needs.
I'm new to python, so I'm sorry if what I'm asking is pretty obvious.
I have a .txt file (for simplicity I will call it inputfile.txt) with a list of names of folder\files like this:
camisos\CROWDER_IMAG_1.mov
camisos\KS_HIGHENERGY.mov
camisos\KS_LOWENERGY.mov
What I need is to split the first word (the one before the \) and write it to a txt file (for simplicity I will call it outputfile.txt).
Then take the second (the one after the \) and write it in another txt file.
This is what i did so far:
with open("inputfile.txt", "r") as f:
lines = f.readlines()
with open("outputfile.txt", "w") as new_f:
for line in lines:
text = input()
print(text.split()[0])
This in my mind should print only the first word in the new txt, but I only got an empty txt file without any error.
Any advice is much appreciated, thanks in advance for any help you could give me.
You can read the file in a list of strings and split each string to create 2 separate lists.
with open("inputfile.txt", "r") as f:
lines = f.readlines()
X = []
Y = []
for line in lines:
X.append(line.split('\\')[0] + '\n')
Y.append(line.split('\\')[1])
with open("outputfile1.txt", "w") as f1:
f1.writelines(X)
with open("outputfile2.txt", "w") as f2:
f2.writelines(Y)

regex replace text in line with multiple tags if the word is in the external txt file in python

i have many lines like this line (please scroll to read whole line):
<br>  <font size="4">•</font>  3 Point Updated<br>  <font size="4">•</font>  Shape Removed 4<br>  <font size="4">•</font>  Point 3 Added<br>
I need to remove from every line begining from ....Point.... where inside those tags will find words listed in the external txt file (like for example word "Point")
my code is now:
with open(input.txt') as input:
lines=input.readlines()
with open(output.txt', "w") as output:
for line in lines:
if "Point" in line:
output.write(re.sub('(<br>  <font size="4">•</font> .*?)Point(.*?<br>)', '<br>', line, flags=re.DOTALL))
else:
output.write(line)
when im using this code its only deleting once if it finds "Point" in line and leaves:
<br>  <font size="4">•</font>  Shape Removed 4<br>  <font size="4">•</font>  Point 3 Added<br>
How to make replace multiply instances between tags of Point word?
And second question: Right now im only using if "Point" in line but it would be great if it would search for the words loading from the external txt file.
Thanks for help!
I believe this is the result you want, let me know if it needs to be modified:
bad_words = []
with open('bad_words.txt', 'r') as f:
for line in f:
bad_words.append(line.rstrip())
with open('input.txt', 'r') as f:
with open('output.txt', 'w') as output:
for line in f:
kept_parts = []
for chunk in line.split("<br>"):
if all(bad_word not in chunk for bad_word in bad_words) :
kept_parts.append(chunk)
line = "<br>".join(kept_parts)
output.write(line + "\n")
Result:
<br>  <font size="4">•</font>  Shape Removed 4<br>
In essence, you don't need regex. Just split the code into lines (<br> denotes a line); ignore any chunks that contain the undesired text; and rejoin the resulting list.

extracting certain strings from a a file using python

I have a file with some lines. Out of those lines I will choose only lines which starts with xxx. Now the lines which starts with xxx have pattern as follows:
xxx:(12:"pqrs",223,"rst",-90)
xxx:(23:"abc",111,"def",-80)
I want to extract only the string which are their in the first double quote
i.e., "pqrs" and "abc".
Any help using regex is appreciated.
My code is as follows:
with open("log.txt","r") as f:
f = f.readlines()
for line in f:
line=line.rstrip()
for phrase in 'xxx:':
if re.match('^xxx:',line):
c=line
break
this code is giving me error
Your code is wrongly indented. Your f = f.readlines() has 9 spaces in front while for line in f: has 4 spaces. It should look like below.
import re
list_of_prefixes = ["xxx","aaa"]
resulting_list = []
with open("raw.txt","r") as f:
f = f.readlines()
for line in f:
line=line.rstrip()
for phrase in list_of_prefixes:
if re.match(phrase + ':\(\d+:\"(\w+)',line) != None:
resulting_list.append(re.findall(phrase +':\(\d+:\"(\w+)',line)[0])
Well you are heading in the right direction.
If the input is this simple, you can use regex groups.
with open("log.txt","r") as f:
f = f.readlines()
for line in f:
line=line.rstrip()
m = re.match('^xxx:\(\d*:("[^"]*")',line)
if m is not None:
print(m.group(1))
All the magic is in the regular expression.
^xxx:(\d*:("[^"]*") means
Start from the beginning of the line, match on "xxx:(<any number of numbers>:"<anything but ">"
and because the sequence "<anything but ">" is enclosed in round brackets it will be available as a group (by calling m.group(1)).
PS: next time make sure to include the exact error you are getting
results = []
with open("log.txt","r") as f:
f = f.readlines()
for line in f:
if line.startswith("xxx"):
line = line.split(":") # line[1] will be what is after :
result = line[1].split(",")[0][1:-1] # will be pqrs
results.append(result)
You want to look for lines that start with xxx
then split the line on the :. The first thing after the : is what you want -- up to the comma. Then your result is that string, but remove the quotes. There is no need for regex. Python string functions will be fine
To check if a line starts with xxx do
line.startswith('xxx')
To find the text in first double-quotes do
re.search(r'"(.*?)"', line).group(1)
(as match.group(1) is the first parenthesized subgroup)
So the code will be
with open("file") as f:
for line in f:
if line.startswith('xxx'):
print(re.search(r'"(.*?)"', line).group(1))
re module docs

How do I write only certain lines to a file in Python?

I have a file that looks like this(have to put in code box so it resembles file):
text
(starts with parentheses)
tabbed info
text
(starts with parentheses)
tabbed info
...repeat
I want to grab only "text" lines from the file(or every fourth line) and copy them to another file. This is the code I have, but it copies everything to the new file:
import sys
def process_file(filename):
output_file = open("data.txt", 'w')
input_file = open(filename, "r")
for line in input_file:
line = line.strip()
if not line.startswith("(") or line.startswith(""):
output_file.write(line)
output_file.close()
if __name__ == "__main__":
process_file(sys.argv[1])
The reason why your script is copying every line is because line.startswith("") is True, no matter what line equals.
You might try using isspace to test if line begins with a space:
def process_file(filename):
with open("data.txt", 'w') as output_file:
with open(filename, "r") as input_file:
for line in input_file:
line=line.rstrip()
if not line.startswith("(") or line[:1].isspace():
output_file.write(line)
with open('data.txt','w') as of:
of.write(''.join(textline
for textline in open(filename)
if textline[0] not in ' \t(')
)
To write every fourth line use slice result[::4]
with open('data.txt','w') as of:
of.write(''.join([textline
for textline in open(filename)
if textline[0] not in ' \t('][::4])
)
I need not to rstrip the newlines as I use them with write.
In addition to line.startswith("") always being true, line.strip() will remove the leading tab forcing the tabbed data to be written as well. change it to line.rstrip() and use \t to test for a tab. That part of your code should look like:
line = line.rstrip()
if not line.startswith(('(', '\t')):
#....
In response to your question in the comments:
#edited in response to comments in post
for i, line in input_file:
if i % 4 == 0:
output_file.write(line)
try:
if not line.startswith("(") and not line.startswith("\t"):
without doing line.strip() (this will strip the tabs)
So the issue is that (1) you are misusing boolean logic, and (2) every possible line starts with "".
First, the boolean logic:
The way the or operator works is that it returns True if either of its operands is True. The operands are "not line.startswith('(')" and "line.startswith('')". Note that the not only applies to one of the operands. If you want to apply it to the total result of the or expression, you will have to put the whole thing in parentheses.
The second issue is your use of the startswith() method with a zero-length strong as an argument. This essentially says "match any string where the first zero characters are nothing. It matches any strong you could give it.
See other answers for what you should be doing here.

Categories

Resources