My file has something like this
#email = "abc";
%area = (
"abc" => 10,
"xyz" => 10,
);
Is there any regex match I can use to match begin with %area = ( and read the nextline until ); is found. This is so that I can remove those lines from the file.
Regex that I tried ^%area = \(.*|\n\) somehow does not continue to match is next line.
So my final file will only have
#email = "abc";
Assuming a file file contains:
#email = "abc";
%area = (
"abc" => 10,
"xyz" => 10,
);
Would you please try the following:
import re
with open("file") as f:
s = f.read()
print(re.sub(r'^%area =.*?\);', '', s, flags=(re.DOTALL|re.MULTILINE)))
Output:
#email = "abc";
If you want to clean-up the remaining empty lines, please try instead:
print(re.sub(r'\n*^%area =.*?\);\n*', '\n', s, flags=(re.DOTALL|re.MULTILINE))
Then the result looks like:
#email = "abc";
The re.DOTALL flag makes . match any character including a newline.
The re.MULTILINE flag allows ^ and $ to match, respectively,
just after and just before newlines within the string.
[EDIT]
If you want to overwrite the original file, please try:
import re
with open("file") as f:
s = f.read()
with open("file", "w") as f:
f.write(re.sub(r'\n*^%area =.*?\);\n*', '\n', s, flags=(re.DOTALL|re.MULTILINE)))
To capture and remove your area group, you can use; link
re.sub('%area = \((.|\n)*\);', '', string)
#'#email = "abc";\n\n'
However, this will include two new lines after your #email line. You could add \n\n to the regex to capture that as well;
re.sub('\n\n%area = \((.|\n)*\);', '', string)
#'#email = "abc";'
However, if the email always follows the same logic, you would be best searching for that line only. link
re.search('(#email = ).*(?=\n)', string).group()
#'#email = "abc";'
Related
I am working with files right now and i want to get text from a bracket this is what i mean by getting text from a brackets...
{
this is text for a
this is text for a
this is text for a
this is text for a
}
[
this is text for b
this is text for b
this is text for b
this is text for b
]
The content in a is this is text for a and the content for b is is text for b
my code seems to not be printing the contents in a properly it show a&b my file.
My code:
with open('file.txt','r') as read_obj:
for line in read_obj.readlines():
var = line[line.find("{")+1:line.rfind("}")]
print(var)
iterate over the file
for each line check the first character
if the first character is either '[' or '{' start accumulating lines
if the first character is either ']' or '}' stop accumulating lines
a_s = []
b_s = []
capture = False
group = None
with open(path) as f:
for line in f:
if capture: group.append(line)
if line[0] in '{[':
capture = True
group = a_s if line[0] == '{' else b_s
elif line[0] in '}]':
capture = False
group = None
print(a_s)
print(b_s)
Relies on the file to be structured exactly as shown in the example.
This is what regular expressions are made for. Python has a built-in module named re to perform regular expression queries.
In your case, simply:
import re
fname = "foo.txt"
# Read data into a single string
with open(fname, "r") as f:
data = f.read()
# Remove newline characters from the string
data = re.sub(r"\n", "", data)
# Define pattern for type A
pattern_a = r"\{(.*?)\}"
# Print text of type A
print(re.findall(pattern_a, data))
# Define pattern for type B
pattern_b = r"\[(.*?)\]"
# Print text of type B
print(re.findall(pattern_b, data))
Output:
['this is text for athis is text for athis is text for athis is text for a']
['this is text for bthis is text for bthis is text for bthis is text for b']
Read the file and split the content to a list.
Define a brackets list and exclude them through a loop and write the rest to a file.
file_obj = open("content.txt", 'r')
content_list = file_obj.read().splitlines()
brackets = ['[', ']', '{', '}']
for i in content_list:
if i not in brackets:
writer = open("new_content.txt", 'a')
writer.write(i+ '\n')
writer.close()
f1=open('D:\\Tests 1\\t1.txt','r')
for line in f1.readlines():
flag=0
if line.find('{\n') or line.find('[\n'):
flag=1
elif line.find('}\n') or line.find(']\n'):
flag=0
if flag==1:
print(line.split('\n')[0])
I have a file with following format:
device={
id=1
tag=10
name=device1
}
device={
id=2
tag=20
name=device2
}
device={
id=3
tag=30
name=device3
}
So let's say I am only interested in device with id=2 and I want to extract its tag number(this is configurable and will be changed from some other code). So I need to extract tag number of the device id 2. How can I do this in python. I have done following:
ID='id=2'
with open("file.txt") as file:
for line in file:
if line.strip() == ID:
#Here I do not know what to write
# to extract 20
Thanks
With re.search function:
import re
with open('file.txt', 'r') as f:
id_num = 'id=2'
tag_num = re.search(r'' + id_num + '\s+tag=([0-9]+)', f.read())
print(tag_num.group(1))
The output:
20
f.read() - reads the file contents (as text)
r'' + id_num + '\s+tag=([0-9]+)' - constructing regex pattern, so it would become id=2\s+tag=([0-9]+) where \s is one or many whitespace characters(including newlines) and ([0-9]+) is the 1st captured group containing tag number
tag_num.group(1) - extracting the value of the 1st captured/parenthesized group 1 from the match object tag_num
You can read the next line using line.readline() try to use this code:
ID='id=2'
with open("file.txt") as file:
while True:
line = file.readline()
if line.strip() == ID:
nextline = file.readline()
result = nextline.strip().split('=')[1]
if line == '':
break
with open("") as file:
#print file.read()
for line in file:
#print line.split()
if line.strip()==ID:
d=file.next() #reads next line
print d.split('=')[1]
break
I have following input in the log file which I am interested to capture all the part of IDs, however it won't return me the whole of the ID and just returns me some part of that:
id:A2uhasan30hamwix١٦٠٢٢٧١٣٣٣١١٣٥٤
id:A2uhasan30hamwix160212145302428
id:A2uhasan30hamwix١٦٠٢٠٩١٣٠١٥٠٠١١
id:A2uhasan30hamwix١٦٠٢٠٩١٦٤٧٣٩٧٣٢
id:A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧
id:A2uhasan30hamwix160207145023750
I have used the following regular expression with python 2.7:
I have edited sid to id:
RE_SID = re.compile(r'sid:(<<")?(?P<sid>([A-Za-z0-9._+]*))', re.U)
to
>>> RE_SID = re.compile(ur'id:(<<")?(?P<sid>[A-Za-z\d._+]*)', re.U)
>>> sid = RE_SID.search('id:A2uhasan30hamwix١٦٠٢٢٧١٣٣٣١١٣٥٤').group('sid')
>>> sid
'A2uhasan30hamwix'
and this is my result:
is: A2uhasan30hamwix
After edit:
This is how I am reading the log file:
with open(cfg.log_file) as input_file: ...
fields = line.strip().split(' ')
and an example of line in log:
2015-11-30T23:58:13.760950+00:00 calxxx enexxxxce[10476]: INFO consume_essor: user:<<"ailxxxied">> callee_num:<<"+144442567413">> id:<<"A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧">> credits:0.0 result:ok provider:sipovvvv1.yv.vs
I will appreciated to help me to edit my regular expression.
Based on what we discussed in the chat, posting the solution:
import codecs
import re
RE_SID = re.compile(ur'id:(<<")?(?P<sid>[A-Za-z\d._+]*)', re.U) # \d used to match non-ASCII digits, too
input_file = codecs.open(cfg.log_file, encoding='utf-8') # Read the file with UTF8 encoding
for line in input_file:
fields = line.strip().split(u' ') # u prefix is important!
if len(fields) >= 11:
try:
# ......
sid = RE_SID.search(fields[7]).group('sid') # Or check if there is a match first
3 things to fix:
id instead of sid
use \d instead of 0-9 to also catch the arabic numerals
no need to add an extra capturing group inside the sid named group
Fixed version:
id:(<<")?(?P<sid>[A-Za-z\d_.+]+)
string = '''
id:A2uhasan30hamwix١٦٠٢٢٧١٣٣٣١١٣٥٤
id:A2uhasan30hamwix160212145302428
id:A2uhasan30hamwix١٦٠٢٠٩١٣٠١٥٠٠١١
id:A2uhasan30hamwix١٦٠٢٠٩١٦٤٧٣٩٧٣٢
id:A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧
id:A2uhasan30hamwix160207145023750
'''
import re
reObj = re.compile(r'id:.*')
ans = reObj.findall(string,re.DOTALL)
print(ans)
Output :
['id:A2uhasan30hamwix160212145302428 ',
'id:A2uhasan30hamwix١٦٠٢٠٩١٣٠١٥٠٠١١ ',
'id:A2uhasan30hamwix١٦٠٢٠٩١٦٤٧٣٩٧٣٢ ',
'id:A2uhasan30hamwix١٦٠٢٠٨١٩٢٨٠١٩٠٧ ',
'id:A2uhasan30hamwix160207145023750']
I want to apply regex for every newline in my txt file.
For example
comments={ts=2010-02-09T04:05:20.777+0000,comment_id=529590|2886|LOL|Baoping Wu|529360}
comments={ts=2010-02-09T04:20:53.281+0000, comment_id=529589|2886|cool|Baoping Wu|529360}
comments={ts=2010-02-09T05:19:19.802+0000,comment_id=529591|2886|ok|Baoping Wu|529360}
My Python Code is:
import re
p = re.compile(ur'(comment_id=)(\d+)\|(\d+)\|([^|]+)\|([^|]+)\|(\d+)', re.MULTILINE|re.DOTALL)
#open =
test_str = r"comments={ts=2010-02-09T04:05:20.777+0000, comment_id=529590|2886|LOL|Baoping Wu|529360}"
subst = ur"\1\2, user_id = \3, comment='\4', user= '\5', post_commented=\6"
result = re.sub(p, subst, test_str)
print result
I want to solve it with help of MULTILINE, but it doesnt Work.
Can anyone help me
The Output for the first line should be
comments={ts=2010-02-09T04:05:20.777+0000, comment_id=529590, user_id = 2886, comment='LOL', user= 'Baoping Wu', post_commented=529360}
My issue is only to apply the regex for every line and write it on txt file.
Your regex works without having to use MULTILINE or DOTALL. You can replace through the entire document at once. In action
import re
with open('file.txt', 'r') as f:
txt = f.read()
pattern = r'(comment_id=)(\d+)\|(\d+)\|([^|]+)\|([^|]+)\|(\d+)'
repl = r"\1\2, user_id = \3, comment='\4', user= '\5', post_commented=\6"
result = re.sub(pattern, repl, txt)
with open('file2.txt', 'w') as f:
f.write(result)
i need to search a bunch of text files which may contain content in this forms:
//for nv version
tf = new TextField();
tf.width = 600;
tf.height = 300;
tf.textColor = 0x00ffff;
tf.mouseEnabled = false;
this.addChild(tf);
Sys.__consoleWindowFuncBasic = log;
//nv end
and delete the part between the 2 lines and save them.
i split the text into lines, and check the text line by line,thus make the work very heavy,is there any easy way for doing this?
Check this out
beginMarker = "//for nv version"
endMarker = "//nv end"
include = True
with open('path/to/input') as infile, open('path/to/output', 'w') as outfile:
for line in infile:
if include:
if line.strip() != beginMarker:
outfile.write(line)
else:
include = False
else:
if line.strip() == endMarker:
include = True
Maybe you can try to use the regular expression to replace the match lines to an empty line.
#yuwang provided a sed version, here is a python version(python re docs):
>>> import re
>>> s = """some words
... other words
... //for nv version
... tf = new TextField();
... tf.width = 600;
... tf.height = 300;
... tf.textColor = 0x00ffff;
... tf.mouseEnabled = false;
... this.addChild(tf);
... Sys.__consoleWindowFuncBasic = log;
... //nv end
... yet some other words
... """
>>> p = r"//for nv version(.*?)//nv end\n" # reluctant instead of greedy
>>> print re.sub(p, "", s, flags=re.S) # use re.S to make `.` matchs `\n`
some words
other words
yet some other words