Python: extracting text from strings using a key phrase - python

Struggling trying to find a way to do this, any help would be great.
I have a long string – it’s the Title field. Here are some samples.
AIR-LAP1142N-A-K
AIR-LP142N-A-K
Used Airo 802.11n Draft 2.0 SingleAccess Point AIR-LP142N-A-9
Airo AIR-AP142N-A-K9 IOS Ver 15.2
MINT Lot of (2) AIR-LA112N-A-K9 - Dual-band-based 802.11a/g/n
Genuine Airo 112N AP AIR-LP114N-A-K9 PoE
Wireless AP AIR-LP114N-A-9 Airy 50 availiable
I need to pull the part number out of the Title and assign it to a variable named ‘PartNumber’. The part number will always start with the characters ‘AIR-‘.
So for example-
Title = ‘AIR-LAP1142N-A-K9 W/POWER CORD’
PartNumber = yourformula(Title)
Print (PartNumber) will output AIR-LAP1142N-A-K9
I am fairly new to python and would greatly appreciate help. I would like it to ONLY print the part number not all the other text before or after.

What you’re looking for is called regular expressions and is implemented in the re module. For instance, you’d need to write something like :
>>> import re
>>> def format_title(title):
... return re.search("(AIR-\S*)", title).group(1)
>>> Title = "Cisco AIR-LAP1142N-A-K9 W/POWER CORD"
>>> PartNumber = format_title(Title)
>>> print(PartNumber)
AIR-LAP1142N-A-K9
The \S ensures you match everything from AIR- to the next blank character.

def yourFunction(title):
for word in title.split():
if word.startswith('AIR-'):
return word
>>> PartNumber = yourFunction(Title)
>>> print PartNumber
AIR-LAP1142N-A-K9

This is a sensible time to use a regular expression. It looks like the part number consists of upper-case letters, hyphens, and numbers, so this should work:
import re
def extract_part_number(title):
return re.search(r'(AIR-[A-Z0-9\-]+)', title).groups()[0]
This will throw an error if it gets a string that doesn't contain something that looks like a part number, so you'll probably want to add some checks to make sure re.search doesn't return None and groups doesn't return an empty tuple.

You may/could use the .split() function. What this does is that it'll split parts of the text separated by spaces into a list.
To do this the way you want it, I'd make a new variable (named whatever); though for this example, let's go with titleSplitList. (Where as this variable is equal to titleSplitList = Title.split())
From here, you know that the part of text you're trying to retrieve is the second item of the titleSplitList, so you could assign it to a new variable by:
PartNumber = titleSplitList[1]
Hope this helps.

Related

How to grab a specified number of characters after a part of a specified string?

Let's say I have a string defined like this:
string1 = '23h4b245hjrandomstring345jk3n45jkotherrandomstring'
The goal is to grab the 11 characters (these for example '345jk3n45jk') after a part of the string (this part for example 'randomstring') using a specified search term and the specified number of characters to grab after that search term.
I tried doing something like this:
string2 = substring(string1,'randomstring', 11)
I appreciate any help you guys have to offer!
string2 = string1[string1.find("randomstring")+len("randomstring"):string1.find("randomstring")+len("randomstring")+11]
In one line, using split, and supposing that your randomstring is unique in your string, which seems to be the case as you worded out the question :
string1 = '23h4b245hjrandomstring345jk3n45jkotherrandomstring'
randomstring = 'randomstring'
nb_char_to_take = 11
# split using randomstring as splitter, take part of the string after, i.e the second part of the array, and then the 11 first character
result = string1.split(randomstring)[1][:nb_char_to_take]
You can use a simple regular expression like this
import re
s = "23h4b245hjrandomstring345jk3n45jkotherrandomstring"
result = re.findall("randomstring(.{11})", s)[0]
string1 = '23h4b245hjrandomstring345jk3n45jkotherrandomstring'
string2 = string1[10:22]
print(string2)
randomstring
You could use that. Its called string slicing, you basically count the position of the letters and then the first number before the colon is your starting point the second is your ending point when you enter those position numbers you should get whatever is in-between those position, the last is for a different function I highly suggest you search string slicing on YouTube as my explanation wouldn't really help you, and also search up * Find string method* those should hep you get the idea behind those functions. Sorry couldn't be of much help hope the videos help.

How can I find a specific string in a variable using re and change it?

How can I find a specific string in a variable and change it with regular expressions
For example
import re
Variable_To_Change = "This is Variable Number ^NUMBER^"
How can I use RE to find the word ^NUMBER^ and change the variable so that it doesn't say ^NUMBER^ but actual number like "This is Variable Number 1"
why would you use re for this... I dont know but here you go
re.sub("\^NUMBER\^","1",my_string)
you could just use
my_string.replace("^NUMBER^",1)
Im now going to make some assumtions
you have a data structure like follows
data = {"NUMBER":1,"STRING":"hello friend","BOOL":True}
and you have a string as follows
my_string = "I have ^NUMBER^ of apples to share with ^STRING^ and this is ^BOOL^"
and you want to substitute in the data from your data dictionary to the string
this can be done with re or string.replace quite easily (If you would have better defined the original question I would have left this to begin with)
# with replace
for key,value in data.items():
my_string = my_string.replace("^{key}^".format(key=key),str(value))
print(my_string)
# with RE
def match_found(match):
return data.get(match.group(1),"???UNKNOWN VAR???")
my_string = re.sub("\^([A-Z]+)\^",match_found,my_string

I want to split a string by a character on its first occurence, which belongs to a list of characters. How to do this in python?

Basically, I have a list of special characters. I need to split a string by a character if it belongs to this list and exists in the string. Something on the lines of:
def find_char(string):
if string.find("some_char"):
#do xyz with some_char
elif string.find("another_char"):
#do xyz with another_char
else:
return False
and so on. The way I think of doing it is:
def find_char_split(string):
char_list = [",","*",";","/"]
for my_char in char_list:
if string.find(my_char) != -1:
my_strings = string.split(my_char)
break
else:
my_strings = False
return my_strings
Is there a more pythonic way of doing this? Or the above procedure would be fine? Please help, I'm not very proficient in python.
(EDIT): I want it to split on the first occurrence of the character, which is encountered first. That is to say, if the string contains multiple commas, and multiple stars, then I want it to split by the first occurrence of the comma. Please note, if the star comes first, then it will be broken by the star.
I would favor using the re module for this because the expression for splitting on multiple arbitrary characters is very simple:
r'[,*;/]'
The brackets create a character class that matches anything inside of them. The code is like this:
import re
results = re.split(r'[,*;/]', my_string, maxsplit=1)
The maxsplit argument makes it so that the split only occurs once.
If you are doing the same split many times, you can compile the regex and search on that same expression a little bit faster (but see Jon Clements' comment below):
c = re.compile(r'[,*;/]')
results = c.split(my_string)
If this speed up is important (it probably isn't) you can use the compiled version in a function instead of having it re compile every time. Then make a separate function that stores the actual compiled expression:
def split_chars(chars, maxsplit=0, flags=0, string=None):
# see note about the + symbol below
c = re.compile('[{}]+'.format(''.join(chars)), flags=flags)
def f(string, maxsplit=maxsplit):
return c.split(string, maxsplit=maxsplit)
return f if string is None else f(string)
Then:
special_split = split_chars(',*;/', maxsplit=1)
result = special_split(my_string)
But also:
result = split_chars(',*;/', my_string, maxsplit=1)
The purpose of the + character is to treat multiple delimiters as one if that is desired (thank you Jon Clements). If this is not desired, you can just use re.compile('[{}]'.format(''.join(chars))) above. Note that with maxsplit=1, this will not have any effect.
Finally: have a look at this talk for a quick introduction to regular expressions in Python, and this one for a much more information packed journey.

Search for string in file while ignoring id and replacing only a substring

I’ve got a master .xml file generated by an external application and want to create several new .xmls by adapting and deleting some rows with python. The search strings and replace strings for these adaptions are stored within an array, e.g.:
replaceArray = [
[u'ref_layerid_mapping="x4049" lyvis="off" toc_visible="off"',
u'ref_layerid_mapping="x4049" lyvis="on" toc_visible="on"'],
[u'<TOOL_BUFFER RowID="106874" id_tool_base="3651" use="false"/>',
u'<TOOL_BUFFER RowID="106874" id_tool_base="3651" use="true"/>'],
[u'<TOOL_SELECT_LINE RowID="106871" id_tool_base="3658" use="false"/>',
u'<TOOL_SELECT_LINE RowID="106871" id_tool_base="3658" use="true"/>']]
So I'd like to iterate through my file and replace all occurences of 'ref_layerid_mapping="x4049" lyvis="off" toc_visible="off"' with 'ref_layerid_mapping="x4049" lyvis="on" toc_visible="on"' and so on.
Unfortunately the ID values of "RowID", “id_tool_base” and “ref_layerid_mapping” might change occassionally. So what I need is to search for matches of the whole string in the master file regardless which id value is inbetween the quotation mark and only to replace the substring that is different in both strings of the replaceArray (e.g. use=”true” instead of use=”false”). I’m not very familiar with regular expressions, but I think I need something like that for my search?
re.sub(r'<TOOL_SELECT_LINE RowID="\d+" id_tool_base="\d+" use="false"/>', "", sentence)
I'm happy about any hint that points me in the right direction! If you need any further information or if something is not clear in my question, please let me know.
One way to do this is to have a function for replacing text. The function would get the match object from re.sub and insert id captured from the string being replaced.
import re
s = 'ref_layerid_mapping="x4049" lyvis="off" toc_visible="off"'
pat = re.compile(r'ref_layerid_mapping=(.+) lyvis="off" toc_visible="off"')
def replacer(m):
return "ref_layerid_mapping=" + m.group(1) + 'lyvis="on" toc_visible="on"';
re.sub(pat, replacer, s)
Output:
'ref_layerid_mapping="x4049"lyvis="on" toc_visible="on"'
Another way is to use back-references in replacement pattern. (see http://www.regular-expressions.info/replacebackref.html)
For example:
import re
s = "Ab ab"
re.sub(r"(\w)b (\w)b", r"\1d \2d", s)
Output:
'Ad ad'

Extracting text in the middle of a string - python

i was wondering if anyone has a simpler solution to extract a few letters in the middle of a string. i want to retrive the 3 letters (in this case, GMB) and all the entries follow the same patter. i'struggling o get a simpler way of doing this.
here is an example of what i've been using.
entry = "entries-alphabetical.jsp?raceid13=GMB$20140313A"
symbol = entry.strip('entries-alphabetical.jsp?raceid13=')
symbol = symbol[0:3]
print symbol
thanks
First of all the argument passed to str.strip is not prefix or suffix, it is just a combination of characters that you want to be stripped off from the string.
Since the string looks like an url, you can use urlparse.parse_qsl:
>>> import urlparse
>>> urlparse.parse_qsl(entry)
[('entries-alphabetical.jsp?raceid13', 'GMB$20140313A')]
>>> urlparse.parse_qsl(entry)[0][1][:3]
'GMB'
This is what regular expressions are for. http://docs.python.org/2/library/re.html
import re
val = re.search(r'(GMB.*)', entry)
print val.group(1)

Categories

Resources