I am using Visual Studio Code to replace text with Python.
I am using a source file with original text and converting it into a new file with new text.
I would like to add quotes to the new text that follows. For example:
Original text: set vlans xxx vlan-id xxx
New text: vlan xxx name "xxx" (add quotes to the remaining portion of the line as seen here)
Here is my code:
with open("SanitizedFinal_E4300.txt", "rt") as fin:
with open("output6.txt", "wt") as fout:
for line in fin:
line = line.replace('set vlans', 'vlan').replace('vlan-id', 'name')
fout.write(line)
Is there a way to add quotes for text in the line that follows 'name'?
Edit:
I tried this code:
with open("SanitizedFinal_E4300.txt", "rt") as fin:
with open("output6.txt", "wt") as fout:
for line in fin:
line = line.replace('set vlans', 'vlan').replace('vlan-id', 'name')
words = line.split()
words[-1] = '"' + words[-1] + '"'
line = ' '.join(words)
fout.write(line)
and received this error:
line 124, in <module>
words[-1] = '"' + words[-1] + '"'
IndexError: list index out of range
I also tried this code with no success:
with open("SanitizedFinal_E4300.txt", "rt") as fin:
with open("output6.txt", "wt") as fout:
for line in fin:
line = line.replace('set vlans', 'vlan').replace('vlan-id', 'name')
import re
t = 'set vlans xxx vlan-id xxx'
re.sub(r'set vlans(.*)vlan-id (.*)', r'vlan\1names "\2"', t)
'vlan xxx names "xxx"'
Again, my goal is to automatically add double quotes to the characters (vlan numbers) at the end of a line.
For example:
Original text: set protocols mstp configuration-name Building 2021.Rm402.access.mstp.zzz
Desired text: set protocols mstp configuration-name "Building 2021.Rm402.access.mstp.zzz"
Use the following regular expression:
>>> import re
>>> t = 'set vlans xxx vlan-id xxx'
>>> re.sub(r'set vlans(.*)vlan-id (.*)', r'vlan\1names "\2"', t)
'vlan xxx names "xxx"'
The parentheses in the search pattern (first parameter) are used to create groups that can be used in the replacement pattern (second parameter). So the first (.*) match in the search pattern will be included in the replacement pattern by means of \1; same thing goes with the second one.
Edit:
The code I shared is just an example of how to use regular expressions. Here's how you should use it.
import re
# whatever imports and code you have down to...
with open("SanitizedFinal_E4300.txt", "rt") as fin, open("output6.txt", "wt") as fout:
for line in fin:
line = re.sub(r'set vlans(.*)vlan-id (.*)', r'vlan\1names "\2"', line)
fout.write(line)
IMPORTANT: if the format of the lines you need to modify is any different from the original text example you shared, you'll need to make adjustments to the regular expression.
First, we split the text up into words by splitting them by whitespace (which is what split does by default).
Then, we take the last word, add quotes to it, and join it back together with a space between each word:
with open("SanitizedFinal_E4300.txt", "rt") as fin:
with open("output6.txt", "wt") as fout:
for line in fin:
line = line.replace('set vlans', 'vlan').replace('vlan-id', 'name')
words = line.split()
# print(words) # ['vlan', 'xxx', 'name', 'xxx']
if words: # if the line is empty, just output the empty line
words[-1] = '"' + words[-1] + '"'
line = ' '.join(words)
# print(line) # vlan xxx name "xxx"
fout.write(line)
WARNING: in your question, you say you'd like the output to be vlan xxx name "xxx" which has two spaces after the first xxx. This result would only have one space between each word.
Related
I'm completely new to Python and I'm currently trying to write a smalle script for finding&replacing different inputs in all files of the same type in a folder. It currently looks like this and works so far:
import glob
import os
z = input('filetype? (i.e. *.txt): ')
x = input('search for?: ')
y = input('replace with?: ')
replacements = {x:y}
lines = []
for filename in glob.glob(z):
with open(filename, 'r') as infile:
for line in infile:
for src, target in replacements.items():
line = line.replace(src, target)
lines.append(line)
with open(filename, 'w') as outfile:
for line in lines:
outfile.write(line)
lines = []
My problem is that I want to replace full words only, so when trying to replace '0' with 'x', '4025' should not become '4x25'. I've tried to incorporate regex, but I couldn't make it work.
Instead of
line.replace(src, target)
try
re.sub(r'\b' + re.escape(src) + r'\b', target, line)
after importing re. \b matches a word boundary.
Just use regex (re) and add ^ and $ patterns in your expression.
line = re.sub('^'+src+'$', target, line)
Don't forget to import re.
I am trying to filter and extract one word from line.
Pattern is: GR.C.24 GRCACH GRALLDKD GR_3AD etc
input will be : the data is GRCACH got from server.
output : GRCAACH
problem : Pattern will start from GR<can be any thing> and end when whitespace encount
I am able to find pattern but not able to end when space encounter.
code is:
import re
fp_data = []
with open("output", "r") as fp:
fp_data = fp.readlines()
for da in fp_data:
match = re.search("\sGR.*", da)
print da
if match:
print dir(match)
print match.group()
Output: GRCACH got from server
Excepted: GRCAACH (or possible word start with GR)
Use:
(?:\s|^)(GR\S*)
(?:\s|^) matches whitespace or start of string
(GR\S*) matches GR followed by 0 or more non-whitespace characters and places match in Group 1
No need to read the entire file into memory (what if the file were very large?). You can iterate the file line by line.
import re
with open("output", "r") as fp:
for line in fp:
matches = re.findall(r"(?:\s|^)(GR\S*)", line)
print(line, matches)
Regex Demo
readlines() method leave trailing new line character "\n" so I used list comprehension to delete this character using rstrip() method and to not operate on empty lines using isspace() method.
import re
fp_data = []
with open("output", "r") as fp:
fp_data = [line.rstrip() for line in fp if not line.isspace()]
for line in fp_data:
match = re.search("\sGR.*", line)
print(line)
if match:
print(match)
print(match.group())
Not sure if I understood your answer and your edit after my question about the desired output correctly, but assuming that you want to list all occurences of words that start with GR, here is a suggestion:
import re
fp_data = []
with open("output", "r") as fp:
fp_data = fp.readlines()
for da in fp_data:
print da
match = re.findall('\\b(GR\\S*)\\b', da)
if match:
print match
The usage of word boundaries (\b) has the benefit of matching at beginning of line and end of line as well.
I have a file with some lines. Out of those lines I will choose only lines which starts with xxx. Now the lines which starts with xxx have pattern as follows:
xxx:(12:"pqrs",223,"rst",-90)
xxx:(23:"abc",111,"def",-80)
I want to extract only the string which are their in the first double quote
i.e., "pqrs" and "abc".
Any help using regex is appreciated.
My code is as follows:
with open("log.txt","r") as f:
f = f.readlines()
for line in f:
line=line.rstrip()
for phrase in 'xxx:':
if re.match('^xxx:',line):
c=line
break
this code is giving me error
Your code is wrongly indented. Your f = f.readlines() has 9 spaces in front while for line in f: has 4 spaces. It should look like below.
import re
list_of_prefixes = ["xxx","aaa"]
resulting_list = []
with open("raw.txt","r") as f:
f = f.readlines()
for line in f:
line=line.rstrip()
for phrase in list_of_prefixes:
if re.match(phrase + ':\(\d+:\"(\w+)',line) != None:
resulting_list.append(re.findall(phrase +':\(\d+:\"(\w+)',line)[0])
Well you are heading in the right direction.
If the input is this simple, you can use regex groups.
with open("log.txt","r") as f:
f = f.readlines()
for line in f:
line=line.rstrip()
m = re.match('^xxx:\(\d*:("[^"]*")',line)
if m is not None:
print(m.group(1))
All the magic is in the regular expression.
^xxx:(\d*:("[^"]*") means
Start from the beginning of the line, match on "xxx:(<any number of numbers>:"<anything but ">"
and because the sequence "<anything but ">" is enclosed in round brackets it will be available as a group (by calling m.group(1)).
PS: next time make sure to include the exact error you are getting
results = []
with open("log.txt","r") as f:
f = f.readlines()
for line in f:
if line.startswith("xxx"):
line = line.split(":") # line[1] will be what is after :
result = line[1].split(",")[0][1:-1] # will be pqrs
results.append(result)
You want to look for lines that start with xxx
then split the line on the :. The first thing after the : is what you want -- up to the comma. Then your result is that string, but remove the quotes. There is no need for regex. Python string functions will be fine
To check if a line starts with xxx do
line.startswith('xxx')
To find the text in first double-quotes do
re.search(r'"(.*?)"', line).group(1)
(as match.group(1) is the first parenthesized subgroup)
So the code will be
with open("file") as f:
for line in f:
if line.startswith('xxx'):
print(re.search(r'"(.*?)"', line).group(1))
re module docs
i have this tsv file containing some paths of links each link is seperated by a ';' i want to use:
In the example below we can se that the text in the file is seperated
and i only want to read through the last column wich is a path starting with '14th'
6a3701d319fc3754 1297740409 166 14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade NULL
3824310e536af032 1344753412 88 14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade 3
415612e93584d30e 1349298640 138 14th_century;Niger;Nigeria;British_Empire;Slavery;Africa;Atlantic_slave_trade;African_slave_trade
I want to somehow split the path into a chain like this:
['14th_century', 'Niger', 'Nigeria'....]
how do i read the file and remove the first 3 columns so i only got the last one ?
UPDATE:
i have tried this now:
import re
with open('test.tsv') as f:
lines = f.readlines()
for line in lines[22:len(lines)]:
re.sub(r"^\s+", " ", line, flags = re.MULTILINE)
e_line = line.split(' ')
real_line = e_line[0]
print real_line.split(';')
But the problem is that it not deleting the first 3 columns ?
If the separator betweeen first is only a space and not a serie of spaces or a tab, you could do that
with open('file_name') as f:
lines = f.readlines()
for line in lines:
e_line = line.split(' ')
real_line = e_line[3]
print real_line.split(';')
Answer to your updated question.
But the problem is that it not deleting the first 3 columns ?
There are several mistakes.
Your code:
import re
with open('test.tsv') as f:
lines = f.readlines()
for line in lines[22:len(lines)]:
re.sub(r"^\s+", " ", line, flags = re.MULTILINE)
e_line = line.split(' ')
real_line = e_line[0]
print real_line.split(';')
This line does nothing...
re.sub(r"^\s+", " ", line, flags = re.MULTILINE)
Because re.sub function doesn't change your line variable, but returns replaced string.
So you may want to do as below.
line = re.sub(r"^\s+", " ", line, flags = re.MULTILINE)
And your regexp ^s\+ matches only string which starts with whitespaces or tabs. Because you use ^.
But I think you just want to replace consective whitespaces or tabs with one space.
So then, above code will be as below.(Just remove ^ in the regexp)
line = re.sub(r"\s+", " ", line, flags = re.MULTILINE)
Now, each string in line are separated just one space. So line.split(' ') will work as you want.
Next, e_line[0] returns first element of e_line which is 1st column of the line.
But you want to skip first 3 columns and get 4th column. You can do like this:
e_line = line.split(' ')
real_line = e_line[3]
OK. Now entire code is look like this.
for line in lines:#<---I also changed here because there is no need to skip first 22 lines in your example.
line = re.sub(r"\s+", " ", line)
e_line = line.split(' ')
real_line = e_line[3]
print real_line
output:
14th_century;15th_century;16th_century;Pacific_Ocean;Atlantic_Ocean;Accra;Africa;Atlantic_slave_trade;African_slave_trade
14th_century;Europe;Africa;Atlantic_slave_trade;African_slave_trade
14th_century;Niger;Nigeria;British_Empire;Slavery;Africa;Atlantic_slave_trade;African_slave_trade
P.S:
This line can become more pythonic.
before:
for line in lines[22:len(lines)]:
after:
for line in lines[22:]:
And, you don't need to use flags = re.MULTILINE, because line is single-line in the for-loop.
You don't need to use regex for this. The csv module can handle tab-separated files too:
import csv
filereader = csv.reader(open('test.tsv', 'rb'), delimiter='\t')
path_list = [row[3].split(';') for row in filereader]
print(path_list)
I need to use strings from one text file to search another, and every time the string matches in the second text file, search the second string for a word word, and if that matches, to create a third text file with specific columns from the second text file, and repeat for every string in the first text file.
Example
Text file 1:
10.2.1.1
10.2.1.2
10.2.1.3
Text file 2:
IP=10.2.1.4 word=apple thing=car name=joe
IP=10.2.1.3 word=apple thing=car name=joe
IP=10.2.1.1 word=apple thing=car name=joe
IP=10.2.1.2 word=apple thing=car name=joe
IP=10.2.1.1 word=apple thing=car name=joe
IP=10.2.1.3 word=apple thing=car name=joe
Result should be three separate text files (named for their string in text file one), one for each string containing the third column:
Result: 10.2.1.3.txt
thing=car
thing=car
etc.
So far my code looks like:
with open(file_1) as list_file:
for string in (line.strip() for line in list_file):
if string in file_2:
if "word" in file_2:
column2 = line.split()[2]
x = open(line+".txt", "a")
with x as new_file:
new_file.write(column2)
My question is: Is this code the best way to do it? I feel as though there's an important 'shortcut' I'm missing.
Final Code with Olafur Osvaldsson:
for line_1 in open(file_1):
with open(line_1+'.txt', 'a') as my_file:
for line_2 in open(file_2):
line_2_split = line_2.split(' ')
if "word" in line_2:
if "word 2" in line_2:
my_file.write(line_2_split[2] + '\n')
The following code I believe does what you ask:
file_1='file1.txt'
file_2='file2.txt'
my_string = 'word'
for line_1 in [l.rstrip() for l in open(file_1)]:
with open(line_1+'.txt', 'a') as my_file:
for line_2 in open(file_2):
line_2_split = line_2.split(' ')
if line_1 == line_2_split[0][3:]:
if my_string in line_2:
my_file.write(line_2_split[2] + '\n')
If you intend on using the last parameter in the lines from file_2 make sure you strip the newline from the end like is done to the first file with rstrip(), I left it in the lines from file_2.
Here's an example, with input files in file1.txt and file2.txt. I cache the contents of file 1, and their associate output file handles in the dictionary 'files', which I then close at the end after the main loop.
In the main loop, I read in each line of file2.txt, strip it, and tokenize it on spaces using the split method. I then find the ip address from the first token, and check if its in 'files'. If so, I write the third column to the respective output file.
The last loop closes the output file handles.
with open('file1.txt') as file1:
files = {ip:open(ip + '.txt', 'w') for ip in [line.strip() for line in file1]}
with open('file2.txt') as file2:
for line in file2:
tokens = line.strip().split(' ')
ip = tokens[0][3:]
if ip in files:
files[ip].write(tokens[2])
files[ip].write('\r\n')
for f in files.values():
f.close()
# define files
file1 = "file1.txt"
file2 = "file2.txt"
ip_patterns = set() # I assume that all patterns fits the memory
# filling ip_patterns
with open(file1) as fp:
for line in fp:
ip_patterns.add(line.strip()) # adding pattern to the set
word_to_match = "apple" # pattern for the "word" field
wanted_fields = ['name', 'thing'] # fields to write
with open(file2) as fp:
for line in fp:
values = dict(map(lambda x: x.split('='), line.split()))
if values['IP'] in ip_patterns and values['word'] == word_to_match:
out = open(values['IP'] + '.txt', 'a')
for k in wanted_fields:
out.write("%s=%s\n" % (k, values[k])) # writing to file
out.close()