How to print string inside a parentheses in a line in Python? - python

I have lots of lines in a text file. They looks like, for example:
562: DEBUG, CIC, Parameter(Auto_Gain_ROI_Size) = 4
711: DEBUG, VSrc, Parameter(Auto_Contrast) = 0
I want to exact the string inside the parantheses, for example, output in this case should
"Auto_Gain_ROI_Size" and "Auto_Contrast".
Notice that, string is always enclosed by "Parameter()". Thanks.

You can use regex:
>>> import re
>>> s = "562: DEBUG, CIC, Parameter(Auto_Gain_ROI_Size) = 4"
>>> t = "711: DEBUG, VSrc, Parameter(Auto_Contrast) = 0 "
>>> myreg = re.compile(r'Parameter\((.*?)\)')
>>> print myreg.search(s).group(1)
Auto_Gain_ROI_Size
>>> print myreg.search(t).group(1)
Auto_Contrast
Or, without regex (albeit a bit more messier):
>>> print s.split('Parameter(')[1].split(')')[0]
Auto_Gain_ROI_Size
>>> print t.split('Parameter(')[1].split(')')[0]
Auto_Contrast

Related

How to extract various parts of a string into separate strings

I am trying to make a function to help me debug. When I do the following:
s = traceback.format_stack()[-2]
print(s)
I get a console output something like:
File "/home/path/to/dir/filename.py", line 888, in function_name
How can I extract filename.py, 888 and function_name into separate strings? Using a regex?
string = 'File "/home/path/to/dir/filename.py", line 888, in function_name'
search = re.search('File (.*), line (.*), in (.*)', string, re.IGNORECASE)
print search.group(1)
print search.group(2)
print search.group(3)
You can try to use str.split():
>>> s = 'File "/home/path/to/dir/filename.py", line 888, in function_name'
>>> lst = s.split(',')
>>> lst
['File "/home/path/to/dir/filename.py"', ' line 888', ' in function_name']
so for the 888 and function_name, you can access them like this
>>> lst[1].split()[1]
>>> '888'
and for the filename.py, you can split it by '"'
>>> fst = lst[0].split('"')[1]
>>> fst
'/home/path/to/dir/filename.py'
then you can use os.path.basename:
>>> import os.path
>>> os.path.basename(fst)
'filename.py'
Try with this regular expression:
File \"[a-zA-Z\/_]*\/([a-zA-Z]+.[a-zA-Z]+)", line ([0-9]+), in (.*)
you can try using this site: https://regex101.com/

Python regex: How to match a string at the end of a line in a file?

I need to match a string at the end of a line of a file.
The contents of the file are:
network1:
type: Internal
I have made this regex to get the first line but it does not match anything. Note that my code's requirement is that the string which is to be matched is stored in a variable. Therefore:
var1 = 'network1'
re.match('\s+%s:'%var1,line)
However, when I check this regex on the interpreter, it works.
>> import re
>> line = ' network1:'
>> var1 = 'network1'
>> pat1 = re.match('\s+%s:'%var1,line)
>> var2 = pat1.group(0)
>> print var2
' network1:'
You need to use re.search function, since match tries to match the string from the beginning.
var1 = 'network1'
print(re.search(r'.*(\s+'+ var1 + r':)', line).group(1))
Example:
>>> import re
>>> s = 'foo network1: network1:'
>>> var1 = 'network1'
>>> print(re.search(r'.*(\s+'+ var1 + r':)', s).group(1))
network1:
>>> print(re.search(r'.*(..\s+'+ var1 + r':)', s).group(1)) # to check whether it fetches the last string or not.
1: network1:
So, you should do like
with open(file) as f:
for line in f:
if var1 in line:
print(re.search(r'.*(\s+'+ var1 + r':)', s).group(1))

How to convert a word in string to binary

I was working on a module(the import module stuff) which would help to convert words in string to hex and binary(And octal if possible).I finished the hex part.But now I am struggling in case of the binary.I don't know where to start from or what to do.What I want to do is simple.It would take an input string such as 'test'.The function inside the module would convert it to binary.
What I have done till now is given below:
def string_hex(string): # Converts a word to hex
keyword = string.encode()
import binascii
hexadecimal=str(binascii.hexlify(keyword), 'ascii')
formatted_hex=':'.join(hexadecimal[i:i+2] for i in range(0, len(hexadecimal), 2))
return formatted_hex
def hex_string(hexa):
# hexa(Given this name because there is a built-in function hex()) should be written as string.For accuracy on words avoid symbols(, . !)
string = bytes.fromhex(hexa)
formatted_string = string.decode()
return formatted_string
I saved in the directory where I have installed my python in the name experiment.py.This is the way I call it.
>>> from experiment import string_hex
>>> string_hex('test')
'74:65:73:74'
Just like that I am able to convert it back also like this:
>>> from experiment import hex_string
>>> hex_string('74657374')
'test'
Just like this wanted to convert words in strings to binary.And one more thing I am using python 3.4.2.Please help me.
You can do it as follows. You don't even have to import binascii.
def string_hex(string):
return ':'.join(format(ord(c), 'x') for c in string)
def hex_string(hexa):
hexgen = (hexa[i:i+2] for i in range(0, len(hexa), 2))
return ''.join(chr(eval('0x'+n)) for n in hexgen)
def string_bin(string):
return ':'.join(format(ord(c), 'b') for c in string)
def bin_string(binary):
bingen = (binary[i:i+7] for i in range(0, len(binary), 7))
return ''.join(chr(eval('0b'+n)) for n in bingen)
And here is the output:
>>> string_hex('test')
'74:65:73:74'
>>> hex_string('74657374')
'test'
>>> string_bin('test')
'1110100:1100101:1110011:1110100'
>>> bin_string('1110100110010111100111110100')
'test'

Python split string on quotes

I'm a python learner. If I have a lines of text in a file that looks like this
"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Can I split the lines around the inverted commas? The only constant would be their position in the file relative to the data lines themselves. The data lines could range from 10 to 100+ characters (they'll be nested network folders). I cannot see how I can use any other way to do those markers to split on, but my lack of python knowledge is making this difficult.
I've tried
optfile=line.split("")
and other variations but keep getting valueerror: empty seperator. I can see why it's saying that, I just don't know how to change it. Any help is, as always very appreciated.
Many thanks
You must escape the ":
input.split("\"")
results in
['\n',
'Y:\\DATA\x0001\\SERVER\\DATA.TXT',
' ',
'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT',
'\n']
To drop the resulting empty lines:
[line for line in [line.strip() for line in input.split("\"")] if line]
results in
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
I'll just add that if you were dealing with lines that look like they could be command line parameters, then you could possibly take advantage of the shlex module:
import shlex
with open('somefile') as fin:
for line in fin:
print shlex.split(line)
Would give:
['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
No regex, no split, just use csv.reader
import csv
sample_line = '10.0.0.1 foo "24/Sep/2015:01:08:16 +0800" www.google.com "GET /" -'
def main():
for l in csv.reader([sample_line], delimiter=' ', quotechar='"'):
print l
The output is
['10.0.0.1', 'foo', '24/Sep/2015:01:08:16 +0800', 'www.google.com', 'GET /', '-']
shlex module can help you.
import shlex
my_string = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
shlex.split(my_string)
This will spit
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
Reference: https://docs.python.org/2/library/shlex.html
Finding all regular expression matches will do it:
input=r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
re.findall('".+?"', # or '"[^"]+"', input)
This will return the list of file names:
["Y:\DATA\00001\SERVER\DATA.TXT", "V:\DATA2\00002\SERVER2\DATA2.TXT"]
To get the file name without quotes use:
[f[1:-1] for f in re.findall('".+?"', input)]
or use re.finditer:
[f.group(1) for f in re.finditer('"(.+?)"', input)]
The following code splits the line at each occurrence of the inverted comma character (") and removes empty strings and those consisting only of whitespace.
[s for s in line.split('"') if s.strip() != '']
There is no need to use regular expressions, an escape character, some module or assume a certain number of whitespace characters between the paths.
Test:
line = r'"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
output = [s for s in line.split('"') if s.strip() != '']
print(output)
>>> ['Y:\\DATA\\00001\\SERVER\\DATA.TXT', 'V:\\DATA2\\00002\\SERVER2\\DATA2.TXT']
I think what you want is to extract the filepaths, which are separated by spaces. That is you want to split the line about items contained within quotations. I.e with a line
"FILE PATH" "FILE PATH 2"
You want
["FILE PATH","FILE PATH 2"]
In which case:
import re
with open('file.txt') as f:
for line in f:
print(re.split(r'(?<=")\s(?=")',line))
With file.txt:
"Y:\DATA\00001\SERVER\DATA MINER.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"
Outputs:
>>>
['"Y:\\DATA\\00001\\SERVER\\DATA MINER.TXT"', '"V:\\DATA2\\00002\\SERVER2\\DATA2.TXT"']
This was my solution. It parses most sane input exactly the same as if it was passed into the command line directly.
import re
def simpleParse(input_):
def reduce_(quotes):
return '' if quotes.group(0) == '"' else '"'
rex = r'("[^"]*"(?:\s|$)|[^\s]+)'
return [re.sub(r'"{1,2}',reduce_,z.strip()) for z in re.findall(rex,input_)]
Use case: Collecting a bunch of single shot scripts into a utility launcher without having to redo command input much.
Edit:
Got OCD about the stupid way that the command line handles crappy quoting and wrote the below:
import re
tokens = list()
reading = False
qc = 0
lq = 0
begin = 0
for z in range(len(trial)):
char = trial[z]
if re.match(r'[^\s]', char):
if not reading:
reading = True
begin = z
if re.match(r'"', char):
begin = z
qc = 1
else:
begin = z - 1
qc = 0
lc = begin
else:
if re.match(r'"', char):
qc = qc + 1
lq = z
elif reading and qc % 2 == 0:
reading = False
if lq == z - 1:
tokens.append(trial[begin + 1: z - 1])
else:
tokens.append(trial[begin + 1: z])
if reading:
tokens.append(trial[begin + 1: len(trial) ])
tokens = [re.sub(r'"{1,2}',lambda y:'' if y.group(0) == '"' else '"', z) for z in tokens]
I know this got answered a million year ago, but this works too:
input = '"Y:\DATA\00001\SERVER\DATA.TXT" "V:\DATA2\00002\SERVER2\DATA2.TXT"'
input = input.replace('" "','"').split('"')[1:-1]
Should output it as a list containing:
['Y:\\DATA\x0001\\SERVER\\DATA.TXT', 'V:\\DATA2\x0002\\SERVER2\\DATA2.TXT']
My question Python - Error Caused by Space in argv Arument was marked as a duplicate of this one. We have a number of Python books doing back to Python 2.3. The oldest referred to using a list for argv, but with no example, so I changed things to:-
repoCmd = ['Purchaser.py', 'task', repoTask, LastDataPath]
SWCore.main(repoCmd)
and in SWCore to:-
sys.argv = args
The shlex module worked but I prefer this.

Print string in a form of Unicode codes

How can I print a string as a sequence of unicode codes in Python?
Input: "если" (in Russian).
Output: "\u0435\u0441\u043b\u0438"
This should work:
>>> s = u'если'
>>> print repr(s)
u'\u0435\u0441\u043b\u0438'
Code:
txt = u"если"
print repr(txt)
Output:
u'\u0435\u0441\u043b\u0438'
a = u"\u0435\u0441\u043b\u0438"
print "".join("\u{0:04x}".format(ord(c)) for c in a)
If you need a specific encoding, you can use :
txt = u'если'
print txt.encode('utf8')
print txt.encode('utf16')

Categories

Resources