I am trying to parse some data contained within a file:
>in:12 out:8 John
>in:20 out:12 Fred
>in:8 out:2 Danny
I would like to find the maximum in value, and find who has the maximum in (Fred does in my example).
It's a non-standard data format you've got there. Hence, you've to write a non-standard parser (a better idea would be to use a standard exchange format like JSON and use a parser from the standard library). I'd
create a Person class with, say, an in and out attribute
write a parser function that takes a line from the input file and, if the line contains valid data, creates a new Person
create a list of Persons from your input file called persons.
sort this list ascending by in: persons_sorted = sorted(persons, key=lambda p: p.in)
get the maximum: max_in_person = persons_sorted[-1]
Try this
>in:(\d+) out:\d+ (.*)
Group 1 will contain the in score and group 2 the name
You'll still have to filter the maximum of group 1 in python code to get the name as this is not what regexes are for.
I'm not a python programmer but this is a good start
for match in re.finditer(r">in:(\d+) out:\d+ (.*)", subject):
# match start: match.start()
# match end (exclusive): match.end()
# matched text: match.group()
Related
I have a file in.txt
name="XYZ_PP_0" number="0x12" bytesize="4" info="0x0000001A"
name="GK_LMP_2_0" number="0xA5" bytesize="8" info="0x00000000bbae321f"
name="MP_LKO_1_0" number="0x356" bytesize="4" info="0x00000234"
I need to check whether it satisfies the condition that is check if info value of number "0x12" + 0x00000004 = info value of number="0x356".
If it matches print the resulted value matches with given info value of number="0x356".
else print not matching.
How can i do this?
this is current attempt:
import re
pattern = r'(number=\"\w+\").*(info=\"\w+\")'
with open("in.txt", "rb") as fin:
for line in fin:
for match_number, match_info in re.findall(pattern, line):
but this will simply extract the number and info value.
Break it into steps.
Look up how to read in a text file, line by line. You'll end up with a list of lines of this file.
Figure out how to extract the value from the "number" field. A simple regular expression would serve you well here I think.
[Optional] Cast this value to the correct data type for your problem.
Do the comparison you're interested in.
You can easily google the syntax for all of these I think.
Edit: posted before there was any code in the original post. I'm not entirely sure what the question is anymore. Do you need help debugging?
Edit 2: Taking another stab at this since I think you're asking for RegEx syntax.
Change your RegEx pattern to have parentheses around the information you want to extract. A RegEx match for such a pattern will allow you to assign the values inside this parentheses to Python variables.
See this partial example.
import re
pattern = r'number=(\"\w+\").*info=(\"\w+\")'
s = 'name="XYZ_PP_0" number="0x12" bytesize="4" info="0x0000001A"'
m = re.search(pattern, s)
if m:
number, info = m.groups()
print("number is ", number)
print("info is", info)
# number is "0x12"
# info is "0x0000001A"
I am developing a program to read through a CSV file and create a dictionary of information from it. Each line in the CSV is essentially a new dictionary entry with the delimited objects being the values.
As one subpart of task, I need to extract an unknown number of numeric digits from within a string. I have a working version, but it does not seem very pythonic.
An example string looks like this:
variable = Applicaiton.Module_Name.VAR_NAME_ST12.WORD_type[0]
variable is string's name in the python code, and represents the variable name within a MODBUS. I want to extract just the digits prior to the .WORD_type[0] which relate to the number of bytes the string is packed into.
Here is my working code, note this is nested within a for statement iterating through the lines in the CSV. var_length and var_type are some of the keys, i.e. {"var_length": var_length}
if re.search(".+_ST[0-9]{1,2}\\.WORD_type.+", variable):
var_type = "string"
temp = re.split("\\.", variable)
temp = re.split("_", temp[2])
temp = temp[-1]
var_length = int(str.lstrip(temp, "ST")) / 2
You could maybe try using matching groups like so:
import re
variable = "Applicaiton.Module_Name.VAR_NAME_ST12.WORD_type[0]"
matches = re.match(r".+_ST(\d+)\.WORD_type.+", variable)
if matches:
print(matches[1])
matches[0] has the full match and matches[1] contains the matched group.
Struggling trying to find a way to do this, any help would be great.
I have a long string – it’s the Title field. Here are some samples.
AIR-LAP1142N-A-K
AIR-LP142N-A-K
Used Airo 802.11n Draft 2.0 SingleAccess Point AIR-LP142N-A-9
Airo AIR-AP142N-A-K9 IOS Ver 15.2
MINT Lot of (2) AIR-LA112N-A-K9 - Dual-band-based 802.11a/g/n
Genuine Airo 112N AP AIR-LP114N-A-K9 PoE
Wireless AP AIR-LP114N-A-9 Airy 50 availiable
I need to pull the part number out of the Title and assign it to a variable named ‘PartNumber’. The part number will always start with the characters ‘AIR-‘.
So for example-
Title = ‘AIR-LAP1142N-A-K9 W/POWER CORD’
PartNumber = yourformula(Title)
Print (PartNumber) will output AIR-LAP1142N-A-K9
I am fairly new to python and would greatly appreciate help. I would like it to ONLY print the part number not all the other text before or after.
What you’re looking for is called regular expressions and is implemented in the re module. For instance, you’d need to write something like :
>>> import re
>>> def format_title(title):
... return re.search("(AIR-\S*)", title).group(1)
>>> Title = "Cisco AIR-LAP1142N-A-K9 W/POWER CORD"
>>> PartNumber = format_title(Title)
>>> print(PartNumber)
AIR-LAP1142N-A-K9
The \S ensures you match everything from AIR- to the next blank character.
def yourFunction(title):
for word in title.split():
if word.startswith('AIR-'):
return word
>>> PartNumber = yourFunction(Title)
>>> print PartNumber
AIR-LAP1142N-A-K9
This is a sensible time to use a regular expression. It looks like the part number consists of upper-case letters, hyphens, and numbers, so this should work:
import re
def extract_part_number(title):
return re.search(r'(AIR-[A-Z0-9\-]+)', title).groups()[0]
This will throw an error if it gets a string that doesn't contain something that looks like a part number, so you'll probably want to add some checks to make sure re.search doesn't return None and groups doesn't return an empty tuple.
You may/could use the .split() function. What this does is that it'll split parts of the text separated by spaces into a list.
To do this the way you want it, I'd make a new variable (named whatever); though for this example, let's go with titleSplitList. (Where as this variable is equal to titleSplitList = Title.split())
From here, you know that the part of text you're trying to retrieve is the second item of the titleSplitList, so you could assign it to a new variable by:
PartNumber = titleSplitList[1]
Hope this helps.
I’ve got a master .xml file generated by an external application and want to create several new .xmls by adapting and deleting some rows with python. The search strings and replace strings for these adaptions are stored within an array, e.g.:
replaceArray = [
[u'ref_layerid_mapping="x4049" lyvis="off" toc_visible="off"',
u'ref_layerid_mapping="x4049" lyvis="on" toc_visible="on"'],
[u'<TOOL_BUFFER RowID="106874" id_tool_base="3651" use="false"/>',
u'<TOOL_BUFFER RowID="106874" id_tool_base="3651" use="true"/>'],
[u'<TOOL_SELECT_LINE RowID="106871" id_tool_base="3658" use="false"/>',
u'<TOOL_SELECT_LINE RowID="106871" id_tool_base="3658" use="true"/>']]
So I'd like to iterate through my file and replace all occurences of 'ref_layerid_mapping="x4049" lyvis="off" toc_visible="off"' with 'ref_layerid_mapping="x4049" lyvis="on" toc_visible="on"' and so on.
Unfortunately the ID values of "RowID", “id_tool_base” and “ref_layerid_mapping” might change occassionally. So what I need is to search for matches of the whole string in the master file regardless which id value is inbetween the quotation mark and only to replace the substring that is different in both strings of the replaceArray (e.g. use=”true” instead of use=”false”). I’m not very familiar with regular expressions, but I think I need something like that for my search?
re.sub(r'<TOOL_SELECT_LINE RowID="\d+" id_tool_base="\d+" use="false"/>', "", sentence)
I'm happy about any hint that points me in the right direction! If you need any further information or if something is not clear in my question, please let me know.
One way to do this is to have a function for replacing text. The function would get the match object from re.sub and insert id captured from the string being replaced.
import re
s = 'ref_layerid_mapping="x4049" lyvis="off" toc_visible="off"'
pat = re.compile(r'ref_layerid_mapping=(.+) lyvis="off" toc_visible="off"')
def replacer(m):
return "ref_layerid_mapping=" + m.group(1) + 'lyvis="on" toc_visible="on"';
re.sub(pat, replacer, s)
Output:
'ref_layerid_mapping="x4049"lyvis="on" toc_visible="on"'
Another way is to use back-references in replacement pattern. (see http://www.regular-expressions.info/replacebackref.html)
For example:
import re
s = "Ab ab"
re.sub(r"(\w)b (\w)b", r"\1d \2d", s)
Output:
'Ad ad'
Super NOOB to Python (2.4.3): I am executing a function containing a regular expression which searches through a txt file that I'm importing. I am able to read and run re.search on the text file and the output is correct. I need to fun this for multiple occurrences. The regex occurs 48 times in the text). The code is as follows:
!/usr/bin/python
import re
dataRead = open('pd_usage_14-04-23.txt', 'r')
dataWrite = open('test_write.txt', 'w')
text = (dataRead.read()) #reads and initializes text for conversion to string
s = str(text) #converts text to string for reading
def user(str):
re1='((?:[a-z][a-z]+))' # Word 1
re2='(\\s+)' # White Space 1
re3='((?:[a-z][a-z]+))' # Word 2
re4='(\\s+)' # White Space 2
re5='((?:[a-z][a-z]*[0-9]+[a-z0-9]*))' # Alphanum 1
rg = re.compile(re1+re2+re3+re4+re5,re.IGNORECASE|re.DOTALL)
#alphanum1=rg.group(5)
re.findall(rg, s, flags=0)
#print "("+alphanum1+")"+"\n"
#if m:
#word1=m.group(1)
#ws1=m.group(2)
#word2=m.group(3)
#ws2=m.group(4)
#alphanum1=m.group(5)
#print "("+alphanum1+")"+"\n"
return
user(s)
dataRead.close()
dataWrite.close()
OUTPUT: g706454
THIS OUTPUT IS CORRECT! BUT...!
I need to run it multiple times reading text thats further down.
I have 2 other definitions that need to be ran multiple times also. I need all 3 to run consecutively, and then run again but starting with the next line or something to search and output newer data. All the logic I tried implement returns the same output.
So I have something like this:
for count in range (0,47):
if stop_read:
date(s)
usage(s)
user(s)
stop_read is a definition that finds the next line after the data that I'm looking for (date, usage, user). I figured I could call this to say If you hit stop_read, read the next line and run definitions all over again.
Any help is greatly appreciated!
Here is what I do for a regex in Python 3, should be similar to Python 2. This is for a multiline searc.
regex = re.compile("\\w+-\\d+\\b", re.MULTILINE)
Then later on in code I have something like:
myset.update([m.group(0) for m in regex.finditer(logmsg.text)])
Maybe you might want to update your Python if you can, 2.4 is old, old, and stale.
looks like re.findall would solve your problem:
re.findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more groups are present in the pattern, return a
list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.