Output Print is Slow - python

I am writing a script, this part of the code is making my script output to print slow. I think its the nested loop which is causing the issue ( Used Dictionary concept there ). Is there any alternative way I can make my script to print the result without waiting for it.
Log = open("file.txt")
for LogLine in Log:
flag = True
for key, ConfLine in Conf.items():
for patterns in ConfLine:
patterns = DateString + patterns
if re.match(patterns, LogLine):
flag = False
break
if(flag == False):
break
if(flag):
print LogLine.strip()

C Panda's answer is good but it's not obvious that a regex full of | is the fastest way to try all regexes. Test the performance of this alternative:
pats = [re.compile(date_string+pat) for conf in Conf.values() for pat in conf]
with open('file.txt') as log:
for line in log:
if any(pat.match(line) for pat in pats):
print(line.strip())
On a side note, here's how your current code could be written with a clean break and no need for flag:
for ConfLine, patterns in ((c, p) for c in Conf.values() for p in c):
patterns = DateString + patterns
if re.match(patterns, LogLine):
break
else:
print LogLine.strip()

Try the following. It will give you a lot of speed up. Apply appropriate changes for Python 2.x
pats = (date_string+pat for conf in Conf.values() for pat in conf)
master_pat = re.compile('|'.join(pats))
with open('file.txt') as log:
for line in log:
if master_pat.match(line):
print(line.strip())
If I misread the logic and is not working, please comment.

Related

python nesting loops

I am trying perform a nested loop to combine data into a line by using matched MAC Addresses in both files.
I am able to pull the loop fine without the regex, however, when using the search regex below, it will only loop through the MAC_Lines once and print the correct results using the first entry in the MAC_Lines and stop. I'm unsure how to make the MAC_Lines go to the next line and repeat the process for all of the entries in the MAC_Lines.
try:
for mac in MAC_Lines:
MAC_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', mac, re.I)
MAC_address_final = MAC_address.group()
for arp in ARP_Lines:
ARP_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', arp, re.I)
ARP_address_final = ARP_address.group()
if MAC_address_final == ARP_address_final:
print mac + arp
continue
except Exception:
print 'completed.'
Results:
13,64,00:0c:29:36:9f:02,giga-swx 0/213,172.20.13.70, 00:0c:29:36:9f:02, vlan 64
completed.
I learned that the issue was how I opened the file. I should have used the 'open':'as' keywords when opening both files to allow the files to properly close and reopen for the next loop. Below is the code I was looking for.
Below is the code:
with open('MAC_List.txt', 'r') as read0:for items0 in read0:
MAC_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', items0, re.I)
if MAC_address:
mac_addy = MAC_address.group().upper()
with open('ARP_List.txt', 'r') as read1:
for items1 in read1:
ARP_address = re.search(r'([a-fA-F0-9]{2}[:|\-]?){6}', items1, re.I)
if ARP_address:
arp_addy = ARP_address.group()
if mac_addy == arp_addy:
print(items0.strip() + ' ' + items1.strip())

Python - Error Caused by Space in argv Arument [duplicate]

I'm a python learner. If I have a lines of text in a file that looks like this
"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Can I split the lines around the inverted commas? The only constant would be their position in the file relative to the data lines themselves. The data lines could range from 10 to 100+ characters (they'll be nested network folders). I cannot see how I can use any other way to do those markers to split on, but my lack of python knowledge is making this difficult.
I've tried
optfile=line.split("")
and other variations but keep getting valueerror: empty seperator. I can see why it's saying that, I just don't know how to change it. Any help is, as always very appreciated.
Many thanks
You must escape the ":
input.split("\"")
results in
['\n',
'Y:\\DATA\x0001\\SERVER\\DATA.TXT',
' ',
'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT',
'\n']
To drop the resulting empty lines:
[line for line in [line.strip() for line in input.split("\"")] if line]
results in
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
I'll just add that if you were dealing with lines that look like they could be command line parameters, then you could possibly take advantage of the shlex module:
import shlex
with open('somefile') as fin:
for line in fin:
print shlex.split(line)
Would give:
['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
No regex, no split, just use csv.reader
import csv
sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -'
def main():
for l in csv.reader([sample_line], delimiter=' ', quotechar='"'):
print l
The output is
['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-']
shlex module can help you.
import shlex
my_string = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
shlex.split(my_string)
This will spit
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
Reference: https://docs.python.org/2/library/shlex.html
Finding all regular expression matches will do it:
input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
re.findall('".+?"', # or '"[^"]+"', input)
This will return the list of file names:
["Y:\DATA\00001\SERVER\DATA.TXT", "V:\DATA2\00002\SERVER2\DATA2.TXT"]
To get the file name without quotes use:
[f[1:-1] for f in re.findall('".+?"', input)]
or use re.finditer:
[f.group(1) for f in re.finditer('"(.+?)"', input)]
The following code splits the line at each occurrence of the inverted comma character (") and removes empty strings and those consisting only of whitespace.
[s for s in line.split('"') if s.strip() != '']
There is no need to use regular expressions, an escape character, some module or assume a certain number of whitespace characters between the paths.
Test:
line = r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
output = [s for s in line.split('"') if s.strip() != '']
print(output)
>>> ['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
I think what you want is to extract the filepaths, which are separated by spaces. That is you want to split the line about items contained within quotations. I.e with a line
"FILE PATH" "FILE PATH 2"
You want
["FILE PATH","FILE PATH 2"]
In which case:
import re
with open('file.txt') as f:
for line in f:
print(re.split(r'(?<=")\s(?=")',line))
With file.txt:
"Y:\DATA\00001\SERVER\DATA MINER.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Outputs:
>>>
['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"']
This was my solution. It parses most sane input exactly the same as if it was passed into the command line directly.
import re
def simpleParse(input_):
def reduce_(quotes):
return '' if quotes.group(0) == '"' else '"'
rex = r'("[^"]*"(?:\s|$)|[^\s]+)'
return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)]
Use case: Collecting a bunch of single shot scripts into a utility launcher without having to redo command input much.
Edit:
Got OCD about the stupid way that the command line handles crappy quoting and wrote the below:
import re
tokens = list()
reading = False
qc = 0
lq = 0
begin = 0
for z in range(len(trial)):
char = trial[z]
if re.match(r'[^\s]', char):
if not reading:
reading = True
begin = z
if re.match(r'"', char):
begin = z
qc = 1
else:
begin = z - 1
qc = 0
lc = begin
else:
if re.match(r'"', char):
qc = qc + 1
lq = z
elif reading and qc % 2 == 0:
reading = False
if lq == z - 1:
tokens.append(trial[begin + 1: z - 1])
else:
tokens.append(trial[begin + 1: z])
if reading:
tokens.append(trial[begin + 1: len(trial) ])
tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens]
I know this got answered a million year ago, but this works too:
input = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
input = input.replace('" "','"').split('"')[1:-1]
Should output it as a list containing:
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
My question Python - Error Caused by Space in argv Arument was marked as a duplicate of this one. We have a number of Python books doing back to Python 2.3. The oldest referred to using a list for argv, but with no example, so I changed things to:-
repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath]
SWCore.main(repoCmd)
and in SWCore to:-
sys.argv = args
The shlex module worked but I prefer this.

get the path in a file inside {} by python

I have a code and I just want to have /X/Y/Z/C, /X/Y/Z/D, /X/Y/Z/E back(whatever comes after -tree).
It should actually reads the file, ignores everything till it sees WFS and then get the information in {}, find tree and just gives me the paths back.
I am a beginner in Python. The match pattern doesn't work cause I think the path changes every day.
any help will be appreciated.
The code:
DEFAULTS
{
FS
{
-A AAA
-B
} -aaaaaa
C
{
}
}
D "FW0"
{
}
WFS "C:" XXXX:"/C"
{
-trees
"/X/Y/Z/C"
"/X/Y/Z/D"
"/X/Y/Z/E"
-A AAA
}
A state machine-based lexical analyzer would do the trick reliably.
It recognizes the file's constructs that interest us: nested curly braces, named sections (an identifier and an opening brace on the following line; this one only cares about top-level sections) and clauses (started by -identifier inside a top-level section, possibly followed by data lines and terminated by another clause or the section's end).
Then it keeps reading the file and prints data lines found if they happen to be in the section and clause we're interested in. It also sets a flag upon finding them in order to quit immediately after that clause ends.
f = open("t.txt")
import re
identifier=None
brace_level=0
section=None
clause=None
req_clause_found=False
def in_req_clause(): return section=='WFS' and clause=='trees'
for l in (l.strip() for l in f):
if req_clause_found and not in_req_clause(): break
m=re.match(r'[A-Z]+',l) #adjust if section names can be different
if m and section is None:
identifier=m.group(0)
continue
m=re.match(r'\{(\s|$)',l)
if m:
brace_level+=1
if identifier is not None and brace_level==1:
section=identifier
identifier=None
continue
else: identifier=None
m=re.match(r'\}(\s|$)',l)
if m:
brace_level-=1
if brace_level==0: section=None
clause=None
continue
m=re.match(r'-([A-Za-z]+)',l) #adjust if clause names can be different
if m and brace_level==1:
clause=m.group(1)
continue
m=re.match(r'"(.*)"$',l)
if m and in_req_clause():
print m.group(1)
req_clause_found=True
continue
On the sample, this outputs
/X/Y/Z/C
/X/Y/Z/D
/X/Y/Z/E
I'm a little confused by the layout of your file but is there any reason not to parse it line-by-line?
def parse():
with open('data.txt') as fptr:
for line in fptr:
if line.startswith('WFS'):
for line in fptr:
if line.strip().startswith('-trees'):
result = []
for line in fptr:
if line.strip().startswith('"'):
result.append(line.strip())
else:
return result
That's not pretty but I think it'll work! Let's try it:
In [1]: !cat temp.txt
DEFAULTS
{
FS
{
-A AAA
-B
} -aaaaaa
C
{
}
}
D "FW0"
{
}
WFS "C:" XXXX:"/C"
{
-trees
"/X/Y/Z/C"
"/X/Y/Z/D"
"/X/Y/Z/E"
-A AAA
}
In [2]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:def parse():
: with open('temp.txt') as fptr:
: for line in fptr:
: if line.startswith('WFS'):
: for line in fptr:
: if line.strip().startswith('-trees'):
: result = []
: for line in fptr:
: if line.strip().startswith('"'):
: result.append(line.strip())
: else:
: return result
:
:--
In [3]: parse()
Out[3]: ['"/X/Y/Z/C"', '"/X/Y/Z/D"', '"/X/Y/Z/E"']
I'm not sure what the exact variations of your patters are, but you could use regex groups:
import re
myjunk = open("t.txt", "r")
for line in myjunk:
if re.match('(/[A-Z])*', line)
print line,
You may need to fiddle with the regex a bit, but the important point here is to invest a bit of time learning regex, and you won't have to deal with some of the unnecessarily complicated solutions suggested elsewhere. Regex is a mini language purpose built for so many things related to to text that it's really essential knowledge, even for the python newbie. You'll be glad you put the time in! And the python community is helpful, so why not join IRC and we'll see you in your favorite python channel for real time help.
Best of luck, let me know if you need more help.
PJ

grep in python properly

I am used to do scripting in bash, but I am also learning python.
So, as a way of learning, I am trying to modify my few old bash in python. As, say,I have a file, with line like:
TOTDOS= 0.38384E+02n_Ef= 0.81961E+02 Ebnd 0.86883E+01
to get the value of TOTDOS in bash, I just do:
grep "TOTDOS=" 630/out-Dy-eos2|head -c 19|tail -c 11
but by python, I am doing:
#!/usr/bin/python3
import re
import os.path
import sys
f1 = open("630/out-Dy-eos2", "r")
re1 = r'TOTDOS=\s*(.*)n_Ef=\s*(.*)\sEbnd'
for line in f1:
match1 = re.search(re1, line)
if match1:
TD = (match1.group(1))
f1.close()
print(TD)
Which is surely giving correct result, but seems to be much more then bash(not to mention problem with regex).
Question is, am I overworking in python, or missing something of it?
A python script that matches your bash line would be more like this:
with open('630/out-Dy-eos2', 'r') as f1:
for line in f1:
if "TOTDOS=" in line:
print line[8:19]
Looks a little bit better now.
[...] but seems to be much more than bash
Maybe (?) generators are the closest Python concept to the "pipe filtering" used in shell.
import itertools
#
# Simple generator to iterate through a file
# equivalent of line by line reading from an input file
def source(fname):
with open(fname,"r") as f:
for l in f:
yield l
src = source("630/out-Dy-eos2")
# First filter to keep only lines containing the required word
# equivalent to `grep -F`
filter1 = (l for l in src if "TOTDOS=" in l)
# Second filter to keep only line in the required range
# equivalent of `head -n ... | tail -n ...`
filter2 = itertools.islice(filter1, 10, 20,1)
# Finally output
output = "".join(filter2)
print(output)
Concerning your specific example, if you need it, you could use regexp in a generator:
re1 = r'TOTDOS=\s*(.*)n_Ef=\s*(.*)\sEbnd'
filter1 = (m.group(1) for m in (re.match(re1, l) for l in src) if m)
Those are only (some of the) basic building blocs available to you.

Python split string on quotes

I'm a python learner. If I have a lines of text in a file that looks like this
"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Can I split the lines around the inverted commas? The only constant would be their position in the file relative to the data lines themselves. The data lines could range from 10 to 100+ characters (they'll be nested network folders). I cannot see how I can use any other way to do those markers to split on, but my lack of python knowledge is making this difficult.
I've tried
optfile=line.split("")
and other variations but keep getting valueerror: empty seperator. I can see why it's saying that, I just don't know how to change it. Any help is, as always very appreciated.
Many thanks
You must escape the ":
input.split("\"")
results in
['\n',
'Y:\\DATA\x0001\\SERVER\\DATA.TXT',
' ',
'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT',
'\n']
To drop the resulting empty lines:
[line for line in [line.strip() for line in input.split("\"")] if line]
results in
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
I'll just add that if you were dealing with lines that look like they could be command line parameters, then you could possibly take advantage of the shlex module:
import shlex
with open('somefile') as fin:
for line in fin:
print shlex.split(line)
Would give:
['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
No regex, no split, just use csv.reader
import csv
sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -'
def main():
for l in csv.reader([sample_line], delimiter=' ', quotechar='"'):
print l
The output is
['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-']
shlex module can help you.
import shlex
my_string = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
shlex.split(my_string)
This will spit
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
Reference: https://docs.python.org/2/library/shlex.html
Finding all regular expression matches will do it:
input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
re.findall('".+?"', # or '"[^"]+"', input)
This will return the list of file names:
["Y:\DATA\00001\SERVER\DATA.TXT", "V:\DATA2\00002\SERVER2\DATA2.TXT"]
To get the file name without quotes use:
[f[1:-1] for f in re.findall('".+?"', input)]
or use re.finditer:
[f.group(1) for f in re.finditer('"(.+?)"', input)]
The following code splits the line at each occurrence of the inverted comma character (") and removes empty strings and those consisting only of whitespace.
[s for s in line.split('"') if s.strip() != '']
There is no need to use regular expressions, an escape character, some module or assume a certain number of whitespace characters between the paths.
Test:
line = r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
output = [s for s in line.split('"') if s.strip() != '']
print(output)
>>> ['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
I think what you want is to extract the filepaths, which are separated by spaces. That is you want to split the line about items contained within quotations. I.e with a line
"FILE PATH" "FILE PATH 2"
You want
["FILE PATH","FILE PATH 2"]
In which case:
import re
with open('file.txt') as f:
for line in f:
print(re.split(r'(?<=")\s(?=")',line))
With file.txt:
"Y:\DATA\00001\SERVER\DATA MINER.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Outputs:
>>>
['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"']
This was my solution. It parses most sane input exactly the same as if it was passed into the command line directly.
import re
def simpleParse(input_):
def reduce_(quotes):
return '' if quotes.group(0) == '"' else '"'
rex = r'("[^"]*"(?:\s|$)|[^\s]+)'
return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)]
Use case: Collecting a bunch of single shot scripts into a utility launcher without having to redo command input much.
Edit:
Got OCD about the stupid way that the command line handles crappy quoting and wrote the below:
import re
tokens = list()
reading = False
qc = 0
lq = 0
begin = 0
for z in range(len(trial)):
char = trial[z]
if re.match(r'[^\s]', char):
if not reading:
reading = True
begin = z
if re.match(r'"', char):
begin = z
qc = 1
else:
begin = z - 1
qc = 0
lc = begin
else:
if re.match(r'"', char):
qc = qc + 1
lq = z
elif reading and qc % 2 == 0:
reading = False
if lq == z - 1:
tokens.append(trial[begin + 1: z - 1])
else:
tokens.append(trial[begin + 1: z])
if reading:
tokens.append(trial[begin + 1: len(trial) ])
tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens]
I know this got answered a million year ago, but this works too:
input = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
input = input.replace('" "','"').split('"')[1:-1]
Should output it as a list containing:
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
My question Python - Error Caused by Space in argv Arument was marked as a duplicate of this one. We have a number of Python books doing back to Python 2.3. The oldest referred to using a list for argv, but with no example, so I changed things to:-
repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath]
SWCore.main(repoCmd)
and in SWCore to:-
sys.argv = args
The shlex module worked but I prefer this.

Categories

Resources