Extracting text in the middle of a string - python - python

i was wondering if anyone has a simpler solution to extract a few letters in the middle of a string. i want to retrive the 3 letters (in this case, GMB) and all the entries follow the same patter. i'struggling o get a simpler way of doing this.
here is an example of what i've been using.
entry = "entries-alphabetical.jsp?raceid13=GMB$20140313A"
symbol = entry.strip('entries-alphabetical.jsp?raceid13=')
symbol = symbol[0:3]
print symbol
thanks

First of all the argument passed to str.strip is not prefix or suffix, it is just a combination of characters that you want to be stripped off from the string.
Since the string looks like an url, you can use urlparse.parse_qsl:
>>> import urlparse
>>> urlparse.parse_qsl(entry)
[('entries-alphabetical.jsp?raceid13', 'GMB$20140313A')]
>>> urlparse.parse_qsl(entry)[0][1][:3]
'GMB'

This is what regular expressions are for. http://docs.python.org/2/library/re.html
import re
val = re.search(r'(GMB.*)', entry)
print val.group(1)

Related

How can I find a specific string in a variable using re and change it?

How can I find a specific string in a variable and change it with regular expressions
For example
import re
Variable_To_Change = "This is Variable Number ^NUMBER^"
How can I use RE to find the word ^NUMBER^ and change the variable so that it doesn't say ^NUMBER^ but actual number like "This is Variable Number 1"
why would you use re for this... I dont know but here you go
re.sub("\^NUMBER\^","1",my_string)
you could just use
my_string.replace("^NUMBER^",1)
Im now going to make some assumtions
you have a data structure like follows
data = {"NUMBER":1,"STRING":"hello friend","BOOL":True}
and you have a string as follows
my_string = "I have ^NUMBER^ of apples to share with ^STRING^ and this is ^BOOL^"
and you want to substitute in the data from your data dictionary to the string
this can be done with re or string.replace quite easily (If you would have better defined the original question I would have left this to begin with)
# with replace
for key,value in data.items():
my_string = my_string.replace("^{key}^".format(key=key),str(value))
print(my_string)
# with RE
def match_found(match):
return data.get(match.group(1),"???UNKNOWN VAR???")
my_string = re.sub("\^([A-Z]+)\^",match_found,my_string

Regular expression to retrieve string parts within parentheses separated by commas

I have a String from which I want to take the values within the parenthesis. Then, get the values that are separated from a comma.
Example: x(142,1,23ERWA31)
I would like to get:
142
1
23ERWA31
Is it possible to get everything with one regex?
I have found a method to do so, but it is ugly.
This is how I did it in python:
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?)\)", string)
secondResult = re.search("(?<=\()(.*?)(?=\))", firstResult.group(0))
finalResult = [x.strip() for x in secondResult.group(0).split(',')]
for i in finalResult:
print(i)
142
1
23ERWA31
This works for your example string:
import re
string = "x(142,1,23ERWA31)"
l = re.findall (r'([^(,)]+)(?!.*\()', string)
print (l)
Result: a plain list
['142', '1', '23ERWA31']
The expression matches a sequence of characters not in (,,,) and – to prevent the first x being picked up – may not be followed by a ( anywhere further in the string. This makes it also work if your preamble x consists of more than a single character.
findall rather than search makes sure all items are found, and as a bonus it returns a plain list of the results.
You can make this a lot simpler. You are running your first Regex but then not taking the result. You want .group(1) (inside the brackets), not .group(0) (the whole match). Once you have that you can just split it on ,:
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?)\)", string)
for e in firstResult.group(1).split(','):
print(e)
A little wonky looking, and also assuming there's always going to be a grouping of 3 values in the parenthesis - but try this regex
\((.*?),(.*?),(.*?)\)
To extract all the group matches to a single object - your code would then look like
import re
string = "x(142,1,23ERWA31)"
firstResult = re.search("\((.*?),(.*?),(.*?)\)", string).groups()
You can then call the firstResult object like a list
>> print(firstResult[2])
23ERWA31

How to copy changing substring in string?

How can I copy data from changing string?
I tried to slice, but length of slice is changing.
For example in one case I should copy number 128 from string '"edge_liked_by":{"count":128}', in another I should copy 15332 from "edge_liked_by":{"count":15332}
You could use a regular expression:
import re
string = '"edge_liked_by":{"count":15332}'
number = re.search(r'{"count":(\d*)}', string).group(1)
Really depends on the situation, however I find regular expressions to be useful.
To grab the numbers from the string without caring about their location, you would do as follows:
import re
def get_string(string):
return re.search(r'\d+', string).group(0)
>>> get_string('"edge_liked_by":{"count":128}')
'128'
To only get numbers from the *end of the string, you can use an anchor to ensure the result is pulled from the far end. The following example will grab any sequence of unbroken numbers that is both preceeded by a colon and ends within 5 characters of the end of the string:
import re
def get_string(string):
rval = None
string_match = re.search(r':(\d+).{0,5}$', string)
if string_match:
rval = string_match.group(1)
return rval
>>> get_string('"edge_liked_by":{"count":128}')
'128'
>>> get_string('"edge_liked_by":{"1321":1}')
'1'
In the above example, adding the colon will ensure that we only pick values and don't match keys such as the "1321" that I added in as a test.
If you just want anything after the last colon, but excluding the bracket, try combining split with slicing:
>>> '"edge_liked_by":{"count":128}'.split(':')[-1][0:-1]
'128'
Finally, considering this looks like a JSON object, you can add curly brackets to the string and treat it as such. Then it becomes a nested dict you can query:
>>> import json
>>> string = '"edge_liked_by":{"count":128}'
>>> string = '{' + string + '}'
>>> string = json.loads(string)
>>> string.get('edge_liked_by').get('count')
128
The first two will return a string and the final one returns a number due to being treated as a JSON object.
It looks like the type of string you are working with is read from JSON, maybe you are getting it as the output of some API you are working with?
If it is JSON, you've probably gone one step too far in atomizing it to a string like this. I'd work with the original output, if possible, if I were you.
If not, to make it more JSON like, I'd convert it to JSON by wrapping it in {}, and then working with the json.loads module.
import json
string = '"edge_liked_by":{"count":15332}'
string = "{"+string+"}"
json_obj = json.loads(string)
count = json_obj['edge_liked_by']['count']
count will have the desired output. I prefer this option to using regular expressions because you can rely on the structure of the data and reuse the code in case you wish to parse out other attributes, in a very intuitive way. With regular expressions, the code you use will change if the data are decimal, or negative, or contain non-numeric characters.
Does this help ?
a='"edge_liked_by":{"count":128}'
import re
b=re.findall(r'\d+', a)[0]
b
Out[16]: '128'

Python: extracting text from strings using a key phrase

Struggling trying to find a way to do this, any help would be great.
I have a long string – it’s the Title field. Here are some samples.
AIR-LAP1142N-A-K
AIR-LP142N-A-K
Used Airo 802.11n Draft 2.0 SingleAccess Point AIR-LP142N-A-9
Airo AIR-AP142N-A-K9 IOS Ver 15.2
MINT Lot of (2) AIR-LA112N-A-K9 - Dual-band-based 802.11a/g/n
Genuine Airo 112N AP AIR-LP114N-A-K9 PoE
Wireless AP AIR-LP114N-A-9 Airy 50 availiable
I need to pull the part number out of the Title and assign it to a variable named ‘PartNumber’. The part number will always start with the characters ‘AIR-‘.
So for example-
Title = ‘AIR-LAP1142N-A-K9 W/POWER CORD’
PartNumber = yourformula(Title)
Print (PartNumber) will output AIR-LAP1142N-A-K9
I am fairly new to python and would greatly appreciate help. I would like it to ONLY print the part number not all the other text before or after.
What you’re looking for is called regular expressions and is implemented in the re module. For instance, you’d need to write something like :
>>> import re
>>> def format_title(title):
... return re.search("(AIR-\S*)", title).group(1)
>>> Title = "Cisco AIR-LAP1142N-A-K9 W/POWER CORD"
>>> PartNumber = format_title(Title)
>>> print(PartNumber)
AIR-LAP1142N-A-K9
The \S ensures you match everything from AIR- to the next blank character.
def yourFunction(title):
for word in title.split():
if word.startswith('AIR-'):
return word
>>> PartNumber = yourFunction(Title)
>>> print PartNumber
AIR-LAP1142N-A-K9
This is a sensible time to use a regular expression. It looks like the part number consists of upper-case letters, hyphens, and numbers, so this should work:
import re
def extract_part_number(title):
return re.search(r'(AIR-[A-Z0-9\-]+)', title).groups()[0]
This will throw an error if it gets a string that doesn't contain something that looks like a part number, so you'll probably want to add some checks to make sure re.search doesn't return None and groups doesn't return an empty tuple.
You may/could use the .split() function. What this does is that it'll split parts of the text separated by spaces into a list.
To do this the way you want it, I'd make a new variable (named whatever); though for this example, let's go with titleSplitList. (Where as this variable is equal to titleSplitList = Title.split())
From here, you know that the part of text you're trying to retrieve is the second item of the titleSplitList, so you could assign it to a new variable by:
PartNumber = titleSplitList[1]
Hope this helps.

Python Regular Express Lookahead multiple conditions

My string looks like this:
string = "*[EQ](#[Type],'A,B,C',#[Type],*[EQ](#[Type],D,E,F))"
The ideal output list is:
['#[Type]', 'A,B,C', '#[Type]', '*[EQ](#[Type],D,E,F)']
So I can parse the string as:
if #[Type] in ('A,B,C') then #[Type] else *[EQ](#[Type],D,E,F)
The challenge is to find all the commas followed by #, ' or *. I've tried the following code but it doesn't work:
interM = re.search(r"\*\[EQ\]\((.+)(?=,#|,\*|,\')+,(.+)\)", string)
print(interM.groups())
Edit:
The ultimate goal is to parse out the 4 components of the input string:
*[EQ](Value, Target, ifTrue, ifFalse)
>>> import re
>>> string = "*[EQ](#[Type],'A,B,C',#[Type],*[EQ](#[Type],D,E,F))"
>>> re.split(r"^\*\[EQ\]\(|\)$|,(?=[#'*])", string)[1:-1]
['#[Type]', "'A,B,C'", '#[Type]', '*[EQ](#[Type],D,E,F)']
Although, if you are looking for a more robust solution I'd highly recommend a Lexical Analyzer such as flex.
x="*[EQ](#[Type],'A,B,C',#[Type],*[EQ](#[Type],D,E,F))"
print re.findall(r"#[^,]+|'[^']+'|\*.*?\([^\)]*\)",re.findall(r"\*\[EQ\]\((.*?)\)$",x)[0])
Output:
['#[Type]', "'A,B,C'", '#[Type]', '*[EQ](#[Type],D,E,F)']
You can try something of this sort.You have not mentioned the logic or anything so not sure if this can be scaled.

Categories

Resources