How to extract various parts of a string into separate strings - python

I am trying to make a function to help me debug. When I do the following:
s = traceback.format_stack()[-2]
print(s)
I get a console output something like:
File "/home/path/to/dir/filename.py", line 888, in function_name
How can I extract filename.py, 888 and function_name into separate strings? Using a regex?

string = 'File "/home/path/to/dir/filename.py", line 888, in function_name'
search = re.search('File (.*), line (.*), in (.*)', string, re.IGNORECASE)
print search.group(1)
print search.group(2)
print search.group(3)

You can try to use str.split():
>>> s = 'File "/home/path/to/dir/filename.py", line 888, in function_name'
>>> lst = s.split(',')
>>> lst
['File "/home/path/to/dir/filename.py"', ' line 888', ' in function_name']
so for the 888 and function_name, you can access them like this
>>> lst[1].split()[1]
>>> '888'
and for the filename.py, you can split it by '"'
>>> fst = lst[0].split('"')[1]
>>> fst
'/home/path/to/dir/filename.py'
then you can use os.path.basename:
>>> import os.path
>>> os.path.basename(fst)
'filename.py'

Try with this regular expression:
File \"[a-zA-Z\/_]*\/([a-zA-Z]+.[a-zA-Z]+)", line ([0-9]+), in (.*)
you can try using this site: https://regex101.com/

Related

To add a new line before a set of characters in a line using python

I have a line of huge characters in which a set of characters keep repeating. The line is : qwethisistheimportantpartqwethisisthesecondimportantpart
There are no spaces in the string. I want to add a new line before the string 'qwe' so that I can distinguish every important part from the other.
Output :
qwethisistheimportantpart
qwethisisthesecondimportantpart
I tried using
for line in infile:
if line.startswith("qwe"):
line="\n" + line
and it doesn't seem to work
str.replace() can do what you want:
line = 'qwethisistheimportantpartqwethisisthesecondimportantpart'
line = line.replace('qwe', '\nqwe')
print(line)
You can use re.split() and then join with \nqwe:
import re
s = "qwethisistheimportantpartqwethisisthesecondimportantpart"
print '\nqwe'.join(re.split('qwe', s))
Output:
qwethisistheimportantpart
qwethisisthesecondimportantpart
I hope this will help you
string = 'qwethisistheimportantpartqwethisisthesecondimportantpart'
split_factor = 'qwe'
a , b , c = map(str,string.split(split_factor))
print split_factor + b
print split_factor + c
Implemented in Python 2.7
This yields same output as you have mentioned buddy.
output:
qwethisistheimportantpart
qwethisisthesecondimportantpart

python regex using re.compile and match

eag = 'linux'
rpat = re.compile("^\s*%s\s*=\s*('.*\'')" % eag)
trying to grab r'^LIN' in a line in a text file, linux = r'^LIN',
lines = [line.strip() for line in open (myfile, 'r')]
for line in lines:
if re.match(rpat, line)
matched = re.match(rpat,line)
got_it = matched.group(1)
# do something here
Not quite sure if my rpat is correct
There is some space in the front of linux then some space until = then some space r'^LIN',
I'd use:
re.compile("^\s*%s\s*=\s*(r'[^']+')" % re.escape(eag))
This matches the r as well, which you omitted.
Demo:
>>> import re
>>> sample = "linux = r'^LIN',"
>>> eag = 'linux'
>>> rpat = re.compile("^\s*%s\s*=\s*(r'[^']+')" % re.escape(eag))
>>> rpat.match(sample).group(1)
"r'^LIN'"
Your regex is quite messy.
I have tried to fix it:
rpat = re.compile(r"^\s*%s\s*=\s*(r'.*')\s*" % eag)
Your forgot r to match r'LIN' and forgot about trailing spaces.
rpat = re.compile("^\s*%s\s*=\s*(.*\')" % eag) works too. Just had to get rid of ''

How to print string inside a parentheses in a line in Python?

I have lots of lines in a text file. They looks like, for example:
562: DEBUG, CIC, Parameter(Auto_Gain_ROI_Size) = 4
711: DEBUG, VSrc, Parameter(Auto_Contrast) = 0
I want to exact the string inside the parantheses, for example, output in this case should
"Auto_Gain_ROI_Size" and "Auto_Contrast".
Notice that, string is always enclosed by "Parameter()". Thanks.
You can use regex:
>>> import re
>>> s = "562: DEBUG, CIC, Parameter(Auto_Gain_ROI_Size) = 4"
>>> t = "711: DEBUG, VSrc, Parameter(Auto_Contrast) = 0 "
>>> myreg = re.compile(r'Parameter\((.*?)\)')
>>> print myreg.search(s).group(1)
Auto_Gain_ROI_Size
>>> print myreg.search(t).group(1)
Auto_Contrast
Or, without regex (albeit a bit more messier):
>>> print s.split('Parameter(')[1].split(')')[0]
Auto_Gain_ROI_Size
>>> print t.split('Parameter(')[1].split(')')[0]
Auto_Contrast

Python split string on quotes

I'm a python learner. If I have a lines of text in a file that looks like this
"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Can I split the lines around the inverted commas? The only constant would be their position in the file relative to the data lines themselves. The data lines could range from 10 to 100+ characters (they'll be nested network folders). I cannot see how I can use any other way to do those markers to split on, but my lack of python knowledge is making this difficult.
I've tried
optfile=line.split("")
and other variations but keep getting valueerror: empty seperator. I can see why it's saying that, I just don't know how to change it. Any help is, as always very appreciated.
Many thanks
You must escape the ":
input.split("\"")
results in
['\n',
'Y:\\DATA\x0001\\SERVER\\DATA.TXT',
' ',
'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT',
'\n']
To drop the resulting empty lines:
[line for line in [line.strip() for line in input.split("\"")] if line]
results in
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
I'll just add that if you were dealing with lines that look like they could be command line parameters, then you could possibly take advantage of the shlex module:
import shlex
with open('somefile') as fin:
for line in fin:
print shlex.split(line)
Would give:
['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
No regex, no split, just use csv.reader
import csv
sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -'
def main():
for l in csv.reader([sample_line], delimiter=' ', quotechar='"'):
print l
The output is
['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-']
shlex module can help you.
import shlex
my_string = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
shlex.split(my_string)
This will spit
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
Reference: https://docs.python.org/2/library/shlex.html
Finding all regular expression matches will do it:
input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
re.findall('".+?"', # or '"[^"]+"', input)
This will return the list of file names:
["Y:\DATA\00001\SERVER\DATA.TXT", "V:\DATA2\00002\SERVER2\DATA2.TXT"]
To get the file name without quotes use:
[f[1:-1] for f in re.findall('".+?"', input)]
or use re.finditer:
[f.group(1) for f in re.finditer('"(.+?)"', input)]
The following code splits the line at each occurrence of the inverted comma character (") and removes empty strings and those consisting only of whitespace.
[s for s in line.split('"') if s.strip() != '']
There is no need to use regular expressions, an escape character, some module or assume a certain number of whitespace characters between the paths.
Test:
line = r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
output = [s for s in line.split('"') if s.strip() != '']
print(output)
>>> ['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
I think what you want is to extract the filepaths, which are separated by spaces. That is you want to split the line about items contained within quotations. I.e with a line
"FILE PATH" "FILE PATH 2"
You want
["FILE PATH","FILE PATH 2"]
In which case:
import re
with open('file.txt') as f:
for line in f:
print(re.split(r'(?<=")\s(?=")',line))
With file.txt:
"Y:\DATA\00001\SERVER\DATA MINER.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Outputs:
>>>
['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"']
This was my solution. It parses most sane input exactly the same as if it was passed into the command line directly.
import re
def simpleParse(input_):
def reduce_(quotes):
return '' if quotes.group(0) == '"' else '"'
rex = r'("[^"]*"(?:\s|$)|[^\s]+)'
return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)]
Use case: Collecting a bunch of single shot scripts into a utility launcher without having to redo command input much.
Edit:
Got OCD about the stupid way that the command line handles crappy quoting and wrote the below:
import re
tokens = list()
reading = False
qc = 0
lq = 0
begin = 0
for z in range(len(trial)):
char = trial[z]
if re.match(r'[^\s]', char):
if not reading:
reading = True
begin = z
if re.match(r'"', char):
begin = z
qc = 1
else:
begin = z - 1
qc = 0
lc = begin
else:
if re.match(r'"', char):
qc = qc + 1
lq = z
elif reading and qc % 2 == 0:
reading = False
if lq == z - 1:
tokens.append(trial[begin + 1: z - 1])
else:
tokens.append(trial[begin + 1: z])
if reading:
tokens.append(trial[begin + 1: len(trial) ])
tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens]
I know this got answered a million year ago, but this works too:
input = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
input = input.replace('" "','"').split('"')[1:-1]
Should output it as a list containing:
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
My question Python - Error Caused by Space in argv Arument was marked as a duplicate of this one. We have a number of Python books doing back to Python 2.3. The oldest referred to using a list for argv, but with no example, so I changed things to:-
repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath]
SWCore.main(repoCmd)
and in SWCore to:-
sys.argv = args
The shlex module worked but I prefer this.

print specific word(number) out of a line

I can use Python to print one line in the .txt ,say, "[aln-core]:10000000 sequences have been processed"
But I want to print the number(10000000, this is the information I want); how can I do that/
thx
Do you mean you want to get the 10000000 out of some line="[aln-core]:10000000 sequences have been processed"?
If you're sure that line's always going look like that, try
line.split(':')[1].split[0]
I mean ...
line = "[aln-core]:10000000 sequences have been processed"
line.split(':')[1]
'10000000 sequences have been processed'
line.split(':')[1].split()[0]
'10000000'
A simple solution:
line = '[aln-core]:10000000 sequences have been processed'
line = ''.join(c for c in line if c.isdigit())
A regex solution:
import re
line = '[aln-core]:10000000 sequences have been processed'
print re.search('\d+', line).group()

Categories

Resources