how to split the text using python?

how to split the text using python? - python

f_output.write('\n{}, {}\n'.format(filename, summary))
I am printing the output as the name of the file. I am getting the output as VCALogParser_output_ARW.log, VCALogParser_output_CZC.log and so on. but I am interested only in printing ARW, CZC and so on. So please someone can tell me how to split this text ?

filename.split('_')[-1].split('.')[0]
this will give you : 'ARW'
summary.split('_')[-1].split('.')[0]
and this will give you: 'CZC'

If you are only interested in CZC and ARW without the .log then, you can do it with re.search method:
>>> import re
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> re.search(r'.*_(.*)\.log', s1).group(1)
'ARW'
>>> re.search(r'.*_(.*)\.log', s2).group(1)
'CZC'
Or better maker your patten p then call its search method when formatting your string:
>>> p = re.compile(r'.*_(.*)\.log')
>>>
>>> '\n{}, {}\n'.format(p.search(s1).group(1), p.search(s2).group(1))
'\nARW, CZC\n'
Also, it might be helpful using re.sub with positive look ahead and group naming:
>>> p = re.compile(r'.*(?<=_)(?P<mystr>[a-zA-Z0-9]+)\.log$')
>>>
>>>
>>> p.sub('\g<mystr>', s1)
'ARW'
>>> p.sub('\g<mystr>', s2)
'CZC'
>>>
>>>
>>> '\n{}, {}\n'.format(p.sub('\g<mystr>', s1), p.sub('\g<mystr>', s2))
'\nARW, CZC\n'
In case, you are not able or you don't want to use re module, then you can define lengths of strings that you don't need and index your string variables with them:
>>> i1 = len('VCALogParser_output_')
>>> i2 = len ('.log')
>>>
>>> '\n{}, {}\n'.format(s1[i1:-i2], s2[i1:-i2])
'\nARW, CZC\n'
But keep in mind that the above is valid as long as you have those common strings in all of your string variables.

fname.split('_')[-1]
is rought but this will give you 'CZC.log', 'ARW.log' and so on, assuming that all files have the same underscore-delimited format.

If the format of the file is always such that it ends with _ARW.log or _CZC.log this is really easy to do just using the standard string split() method, with two consecutive splits:
shortname = filename.split("_")[-1].split('.')[0]
Or, to make it (arguably) a bit more readable, we can use the os module:
shortname = os.path.splitext(filename)[0].split("_")[-1]

You can also try:
>>> s1 = 'VCALogParser_output_ARW.log'
>>> s2 = 'VCALogParser_output_CZC.log'
>>> s1.split('_')[2].split('.')[0]
ARW
>>> s2.split('_')[2].split('.')[0]
CZC

Parse file name correctly, so basically my guess is that you wanna to strip file extension .log and prefix VCALogParser_output_ to do that it's enough to use str.replace rather than using str.split
Use os.linesep when you writing to file to have cross-browser
Code below would perform desired result(after applying steps listed above):
import os
filename = 'VCALogParser_output_SOME_NAME.log'
summary = 'some summary'
fname = filename.replace('VCALogParser_output_', '').replace('.log', '')
linesep = os.linesep
f_output.write('{linesep}{fname}, {summary}{linesep}'
.format(fname=fname, summary=summary, linesep=linesep))
# or if vars in execution scope strictly controlled pass locals() into format
f_output.write('{linesep}{fname}, {summary}{linesep}'.format(**locals()))

Related

How to extract a substring from a string in Python?

Suppose I have a string , text2='C:\Users\Sony\Desktop\f.html', and I want to separate "C:\Users\Sony\Desktop" and "f.html" and store them in different variables then what should I do ? I tried out regular expressions but I wasn't successful.

os.path.split does what you want:
>>> import os
>>> help(os.path.split)
Help on function split in module ntpath:
split(p)
Split a pathname.
Return tuple (head, tail) where tail is everything after the final slash.
Either part may be empty.
>>> os.path.split(r'c:\users\sony\desktop\f.html')
('c:\\users\\sony\\desktop', 'f.html')
>>> path,filename = os.path.split(r'c:\users\sony\desktop\f.html')
>>> path
'c:\\users\\sony\\desktop'
>>> filename
'f.html'

Complex regex in Python

I am trying to write a generic pattern using regex so that it fetches only particular things from the string. Let's say we have strings like GigabitEthernet0/0/0/0 or FastEthernet0/4 or Ethernet0/0.222. The regex should fetch the first 2 characters and all the numerals. Therefore, the fetched result should be something like Gi0000 or Fa04 or Et00222 depending on the above cases.
x = 'GigabitEthernet0/0/0/2
m = re.search('([\w+]{2}?)[\\\.(\d+)]{0,}',x)
I am not able to understand how shall I write the regular expression. The values can be fetched in the form of a list also. I write few more patterns but it isn't helping.

In regex, you may use re.findall function.
>>> import re
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join(re.findall(r'\d', s))
'Gi0000'
OR
>>> ''.join(re.findall(r'^..|\d', s))
'Gi0000'
>>> ''.join(re.findall(r'^..|\d', 'Ethernet0/0.222'))
'Et00222'
OR
>>> s = 'GigabitEthernet0/0/0/0 '
>>> s[:2]+''.join([i for i in s if i.isdigit()])
'Gi0000'

z="Ethernet0/0.222."
print z[:2]+"".join(re.findall(r"(\d+)(?=[\d\W]*$)",z))
You can try this.This will make sure only digits from end come into play .

Here is another option:
s = 'Ethernet0/0.222'
"".join(re.findall('^\w{2}|[\d]+', s))

How to delete everything after a certain character in a string?

How would I delete everything after a certain character of a string in python? For example I have a string containing a file path and some extra characters. How would I delete everything after .zip? I've tried rsplit and split , but neither included the .zip when deleting extra characters.
Any suggestions?

Just take the first portion of the split, and add '.zip' back:
s = 'test.zip.zyz'
s = s.split('.zip', 1)[0] + '.zip'
Alternatively you could use slicing, here is a solution where you don't need to add '.zip' back to the result (the 4 comes from len('.zip')):
s = s[:s.index('.zip')+4]
Or another alternative with regular expressions:
import re
s = re.match(r'^.*?\.zip', s).group(0)

str.partition:
>>> s='abc.zip.blech'
>>> ''.join(s.partition('.zip')[0:2])
'abc.zip'
>>> s='abc.zip'
>>> ''.join(s.partition('.zip')[0:2])
'abc.zip'
>>> s='abc.py'
>>> ''.join(s.partition('.zip')[0:2])
'abc.py'

Use slices:
s = 'test.zip.xyz'
s[:s.index('.zip') + len('.zip')]
=> 'test.zip'
And it's easy to pack the above in a little helper function:
def removeAfter(string, suffix):
return string[:string.index(suffix) + len(suffix)]
removeAfter('test.zip.xyz', '.zip')
=> 'test.zip'

I think it's easy to create a simple lambda function for this.
mystrip = lambda s, ss: s[:s.index(ss) + len(ss)]
Can be used like this:
mystr = "this should stay.zipand this should be removed."
mystrip(mystr, ".zip") # 'this should stay.zip'

You can use the re module:
import re
re.sub('\.zip.*','.zip','test.zip.blah')

Wilcard matching substring in Python

I am completely new to Python and don't know how to get a sub-string which matches some wildcard condition from a string.
I am trying to get a timestamp from the following string:
sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data
I want to get only "1360922654.97671" part out of the string.
Please help.

Because you mentioned wildcards you can use re
In [77]: import re
In [78]: s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
In [79]: re.findall("\d+\.\d+", s)
Out[79]: ['1360922654.97671']

If the dots and dashes have their specific function within your string, you can use this:
>>> s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
>>> s.rsplit('.', 1)[0].split('-')[-1]
'1360922654.97671'
Step by step:
>>> s.rsplit('.', 1)
['sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671', 'data']
>>> s.rsplit('.', 1)[0]
'sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671'
>>> s.rsplit('.', 1)[0].split('-')
['sdc4', '251504', '7f5', 'f59c349f0e516894fc89d2686a0d57f5', '1360922654.97671']
>>> s.rsplit('.', 1)[0].split('-')[-1]
'1360922654.97671'
This will work for any strings in the form:
anything-WHATYOUWANT.stringwithoutdots

>>> s = "sdc4-251504-7f5-f59c349f0e516894fc89d2686a0d57f5-1360922654.97671.data"
>>> s.split('-')[-1][:-5]
'1360922654.97671'
slightly fewer characters, only works where the last part of the string is .data or another 5 character string.

Python - Extract important string information

I have the following string
http://example.com/variable/controller/id32434242423423234?param1=321&param2=4324342
How in best way to extract id value, in this case - 32434242423423234
Regardz,
Mladjo

You could just use a regular expression, e.g.:
import re
s = "http://example.com/variable/controller/id32434242423423234?param1=321&param2=4324342"
m = re.search(r'controller/id(\d+)\?',s)
if m:
print "Found the id:", m.group(1)
If you need the value as an number rather than a string, you can use int(m.group(1)). There are plenty of other ways of doing this that might be more appropriate, depending on the larger goal of your code, but without more context it's hard to say.

>>> import urlparse
>>> res=urlparse.urlparse("http://example.com/variable/controller/id32434242423423234?param1=321&param2=4324342")
>>> res.path
'/variable/controller/id32434242423423234'
>>> import posixpath
>>> posixpath.split(res.path)
('/variable/controller', 'id32434242423423234')
>>> directory,filename=posixpath.split(res.path)
>>> filename[2:]
'32434242423423234'
Using urlparse and posixpath might be too much for this case, but I think it is the clean way to do it.

>>> s
'http://example.com/variable/controller/id32434242423423234?param1=321&param2=4324342'
>>> s.split("id")
['http://example.com/variable/controller/', '32434242423423234?param1=321&param2=4324342']
>>> s.split("id")[-1].split("?")[0]
'32434242423423234'
>>>

While Regex is THE way to go, for simple things I have written a string parser. In a way, is the (uncomplete) reverse operation of a string formatting operation with PEP 3101. This is very convenient because it means that you do not have to learn another way of specifying the strings.
For example:
>>> 'The answer is {:d}'.format(42)
The answer is 42
The parser does the opposite:
>>> Parser('The answer is {:d}')('The answer is 42')
42
For your case, if you want an int as output
>>> url = 'http://example.com/variable/controller/id32434242423423234?param1=321&param2=4324342'
>>> fmt = 'http://example.com/variable/controller/id{:d}?param1=321&param2=4324342'
>>> Parser(fmt)(url)
32434242423423234
If you want a string:
>>> fmt = 'http://example.com/variable/controller/id{:s}?param1=321&param2=4324342'
>>> Parser(fmt)(url)
32434242423423234
If you want to capture more things in a dict:
>>> fmt = 'http://example.com/variable/controller/id{id:s}?param1={param1:s}&param2={param2:s}'
>>> Parser(fmt)(url)
{'id': '32434242423423234', 'param1': '321', 'param2': '4324342'}
or in a tuple:
If you want to capture more things in a dict:
>>> fmt = 'http://example.com/variable/controller/id{:s}?param1={:s}&param2={:s}'
>>> Parser(fmt)(url)
('32434242423423234', '321', '4324342')
Give it a try, it is hosted here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to split the text using python? - python

filename.split('_')[-1].split('.')[0] this will give you : 'ARW' summary.split('_')[-1].split('.')[0] and this will give you: 'CZC'

fname.split('_')[-1] is rought but this will give you 'CZC.log', 'ARW.log' and so on, assuming that all files have the same underscore-delimited format.

You can also try: >>> s1 = 'VCALogParser_output_ARW.log' >>> s2 = 'VCALogParser_output_CZC.log' >>> s1.split('_')[2].split('.')[0] ARW >>> s2.split('_')[2].split('.')[0] CZC

Related

How to extract a substring from a string in Python?

Complex regex in Python

How to delete everything after a certain character in a string?

Wilcard matching substring in Python

Python - Extract important string information

Categories

Resources