Equivalent to vlookup for strings in Python? - python

Is there a way to search for a substring in a larger string, and then return a different substring x places left or right of the original substring?
I want to look through a string like "blahblahlink:"www.example.com"blahblah for the string "link:" and return the subsequent url.
Thanks!
Python 3, if that matters.

I think you should use regular expressions. There is module called re in python for that.

Pure Python solution would be to use index which tells you where in the string the first match occurs, then use the [start:end] slicing notation to select the string from that point:
"blahblahlink:www.example.comblahblah".index("link")
# returns 8
i = "blahblahlink:www.example.comblahblah".index("link")
"blahblahlink:www.example.comblahblah"[i+5:i+20]
# returns 'www.example.com'

Related

Searching list of strings using regular expressions in python

I have a list of Strings. I want to select the strings which match a certain pattern using regular expression.
Python regular expressions dont take a list and I dont want to use loops.
Any suggestion?
Try:
def searcher(s):
if COMPILED_REGEXP_OBJECT.search(s):
return s
matching_strings = filter(searcher, YOUR_LIST_OF_STRING)
searcher() returns the string if it matches, else returns None. filter() only returns "true" objects, so will skip the Nones. It will also skip empty strings, but doubt that's a problem.
Or, better, as #JonClements pointed out:
matching_strings = filter(COMPILED_REGEXP_OBJECT.search, YOUR_LIST_OF_STRING)
Not only shorter, it only looks up the .search method once (instead of once per string).

Select a substring before the second occurrence of a character without a string slice

How can I select a substring, using python, that only contains the characters up to the second colon? For example, let's say I have a string ABC*01:02:03:04, and another, A*101:23:444. How could I extract the substrings A*01:02 and ABC*101:23 from the above strings, without using a string splicer, that is, something along the lines of mystring[:5]?
you could write
':'.join('ABC*01:02:03:04'.split(':')[:2])
It uses splicing, but it gives you the first two groups instead of an fixed amount of characters
You can use regular expressions.
import re
re.match(r'(.*?:.*?):.*', 'ABC*01:02:03:04').groups()[0]
-> 'ABC*01:02'
re.match(r'(.*?:.*?):.*', 'A*01:02:03:0').groups()[0]
-> 'A*01:02'

Parse a string to find and remove a float

I am created a pythod method that will take in a string of variable length, that will always include a floating point number at the end :
"adsfasdflkdslf:asldfasf-adslfk:1.5698464586546"
OR
"asdif adfi=9393 adfkdsf:1.84938"
I need to parse the string and return the floating point number at the end. There usually a delimiter character before the float, such as : - or a space.
def findFloat(stringArg):
stringArg.rstrip()
stringArg.replace("-",":")
if stringArg.rfind(":"):
locateFloat = stringArg.rsplit(":")
#second element should be the desired float
magicFloat = locateFloat[1]
return magicFloat
I am recieving a
magicFloat = locateFloat[1]
IndexError: list index out of range
Any guidence on how to locate the float and return it would be awesome.
In Python, strings are immutable. No matter what function you call on a string, the actual text of that string does not change. Thus, methods like rstrip, replace etc. create a new string representing the modified version. (You would know this if you read the documentation.) In your code, you do not assign the results of these calls anywhere in the first two statements, so the results are lost.
Without specifying a number of splits, rsplit does the exact same thing that split does. It checks for splits from the end, sure, but it still splits at every possible point, so the net effect is the same. You need to specify that you want to split at most one time.
However, you shouldn't do that anyway; a much simpler way to get "everything after the last colon, or everything if there is no colon" is to use rpartition.
You don't actually have to remove whitespace from the end for float conversion. Although you probably should actually, you know, perform the conversion.
Finally, there is no point in assigning to a variable just to return it; just return the expression directly.
Putting that together gives us the exceptionally simple:
def findFloat(stringArg):
return float(stringArg.replace('-', ':').rpartition(':')[2])
re always rocks. Depending on what your floating point number looks like (leading 0?) something like:
magicFloat = re.search('.*([0-9]\.[0-9]+)',st).group(1)
p.s. if you do this a lot, precompile the regex first:
re_float = re.compile('.*([0-9]\.[0-9]+)')
# later in your code
magicFloat = re_float.search(st).group(1)
You could do it in an easier manner:
def findFloat(stringArg):
s = stringArg.rstrip()
return s.split('-:')[-1]
rstrip() will return the stripped string, you must store it somewhere
split() can take multiple token, you can avoid the replace then
rsplit() is an optimization, but split()[-1] will always take the latest element in the split list
locateFloat is not defined if no rfind() is found
in you need to find a char, you could write if ':' in stringArg: instead.
Hope thoses tips would help you later :)
If it's always at the end you should use $ in your re:
import re
def findFloat(stringArg):
fl = re.search(r'([\.0-9]+)$', stringArg)
return fl and float(fl.group(1))
You can use regular expressions.
>>> st = "adsfasdflkdslf:asldfasf-adslfk:1.5698464586546"
>>> float(re.split(r':|\s|-',st)[-1])
1.5698464586545999
I have used re.split(pattern, string, maxsplit=0, flags=0) which split string by the occurrences of pattern.
Here pattern is your delimiter like :,white-space(\s),-.

python regular expression replacing part of a matched string

i got an string that might look like this
"myFunc('element','node','elementVersion','ext',12,0,0)"
i'm currently checking for validity using, which works fine
myFunc\((.+?)\,(.+?)\,(.+?)\,(.+?)\,(.+?)\,(.+?)\,(.+?)\)
now i'd like to replace whatever string is at the 3rd parameter.
unfortunately i cant just use a stringreplace on whatever sub-string on the 3rd position since the same 'sub-string' could be anywhere else in that string.
with this and a re.findall,
myFunc\(.+?\,.+?\,(.+?)\,.+?\,.+?\,.+?\,.+?\)
i was able to get the contents of the substring on the 3rd position, but re.sub does not replace the string it just returns me the string i want to replace with :/
here's my code
myRe = re.compile(r"myFunc\(.+?\,.+?\,(.+?)\,.+?\,.+?\,.+?\,.+?\)")
val = "myFunc('element','node','elementVersion','ext',12,0,0)"
print myRe.findall(val)
print myRe.sub("noVersion",val)
any idea what i've missed ?
thanks!
Seb
In re.sub, you need to specify a substitution for the whole matching string. That means that you need to repeat the parts that you don't want to replace. This works:
myRe = re.compile(r"(myFunc\(.+?\,.+?\,)(.+?)(\,.+?\,.+?\,.+?\,.+?\))")
print myRe.sub(r'\1"noversion"\3', val)
If your only tool is a hammer, all problems look like nails. A regular expression is a powerfull hammer but is not the best tool for every task.
Some tasks are better handled by a parser. In this case the argument list in the string is just like a Python tuple, sou you can cheat: use the Python builtin parser:
>>> strdata = "myFunc('element','node','elementVersion','ext',12,0,0)"
>>> args = re.search(r'\(([^\)]+)\)', strdata).group(1)
>>> eval(args)
('element', 'node', 'elementVersion', 'ext', 12, 0, 0)
If you can't trust the input ast.literal_eval is safer than eval for this. Once you have the argument list in the string decontructed I think you can figure out how to manipulate and reassemble it again, if needed.
Read the documentation: re.sub returns a copy of the string where every occurrence of the entire pattern is replaced with the replacement. It cannot in any case modify the original string, because Python strings are immutable.
Try using look-ahead and look-behind assertions to construct a regex that only matches the element itself:
myRe = re.compile(r"(?<=myFunc\(.+?\,.+?\,)(.+?)(?=\,.+?\,.+?\,.+?\,.+?\))")
Have you tried using named groups? http://docs.python.org/howto/regex.html#search-and-replace
Hopefully that will let you just target the 3rd match.
If you want to do this without using regex:
>>> s = "myFunc('element','node','elementVersion','ext',12,0,0)"
>>> l = s.split(",")
>>> l[2]="'noVersion'"
>>> s = ",".join(l)
>>> s
"myFunc('element','node','noVersion','ext',12,0,0)"

Pad an integer using a regular expression

I'm using regular expressions with a python framework to pad a specific number in a version number:
10.2.11
I want to transform the second element to be padded with a zero, so it looks like this:
10.02.11
My regular expression looks like this:
^(\d{2}\.)(\d{1})([\.].*)
If I just regurgitate back the matching groups, I use this string:
\1\2\3
When I use my favorite regular expression test harness (http://kodos.sourceforge.net/), I can't get it to pad the second group. I tried \1\20\3, but that interprets the second reference as 20, and not 2.
Because of the library I'm using this with, I need it to be a one liner. The library takes a regular expression string, and then a string for what should be used to replace it with.
I'm assuming I just need to escape the matching groups string, but I can't figure it out. Thanks in advance for any help.
How about a completely different approach?
nums = version_string.split('.')
print ".".join("%02d" % int(n) for n in nums)
What about removing the . from the regex?
^(\d{2})\.(\d{1})[\.](.*)
replace with:
\1.0\2.\3
Try this:
(^\d(?=\.)|(?<=\.)\d(?=\.)|(?<=\.)\d$)
And replace the match by 0\1. This will make any number at least two digits long.
Does your library support named groups? That might solve your problem.

Categories

Resources