Python : Get count of successfully matched groups for regex - python

I want to capture data and numbers from a string in python. The string is a measurement from an RF sensor so it might be corrupted from bad transmission. Strings from the sensor look like this PA1015.7 TMPA20.53 HUM76.83.
My re is :
s= re.search('^(\D+)([0-9.]+'),message)
Now before I proceed I want to check if I truly received exactly two matches properly or if the string is garbled.
So I tried :
len(s)
But that errors out :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type '_sre.SRE_Match' has no len()
I do need access to the match group elements for processing later. (I think that eliminates findall)
key= s.group(1)
data= s.group(2)
What's missing?

Instead of using search, you should use findall instead:
s = re.findall('(\D+)([0-9.]+)',message)
print("matched " + str(len(s)))
search only returns whether there is or is no match in the input string, in the form of a boolean.

Related

Slicing a string from inside a formatted string gives 'TypeError: string indices must be integers'

Shouldn't both these commands do the same thing?
>>> "{0[0:5]}".format("lorem ipsum")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
>>> "{0}".format("lorem ipsum"[0:5])
'lorem'
The commands
>>> "{0[0]}".format("lorem ipsum")
'l'
and
>>> "{0}".format("lorem ipsum"[0])
'l'
evaluate the same. (I know that I can use other methods to do this, I am mainly just curious as to why it dosen't work)
The str.format syntax is handled by the library and supports only a few “expression” syntaxes that are not the same as regular Python syntax. For example,
"{0[foo]}".format(dict(foo=2)) # "2"
works without quotes around the dictionary key. Of course, there are limitations from this simplicity, like not being able to refer to a key with a ] in it, or interpreting a slice, as in your example.
Note that the f-strings mentioned by kendall are handled by the compiler and (fittingly) use (almost) unrestricted expression syntax. They need that power since they lack the obvious alternative of placing those expressions in the argument list to format.

Use Python reserved words in an XML File

I'm currently trying to use python's (3.6) xml.etree.ElementTree commands to write an xml file. Some of the Elements and Subelements I need to write must have "id" and "map" fields, which are reserved python words.
My problem is contained in the following line of code:
ET.SubElement(messages,'trigger',thing='1',bob='a', max='5')
But "max" is a function and I can't use it. Is there a character I can place there to allow me to write this field as I desire? Or some sort of known workaround?
EDIT: I am aware that an '_' stops the python from processing the word, but unfortunately this underscore will show up in my file...so I am trying to see if there is an 'invisible' option for the file I will later be writing.
Thanks much!
Python functions are no problem in the left side of a keyword expression:
>>> def abc(**kwargs):
print kwargs
>>> abc(id=2)
{'id': 2}
>>>
id, map, int, float, str, repr, etc. are built in symbols, not reserved words. You may use them like any other bunch of letters, but assigning it another value replaces the built in symbol:
>>> int(2.5)
2
>>> int = "5"
>>> int(2.5)
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
int(2.5)
TypeError: 'str' object is not callable
Notice how the first line is entirely legal, but will trigger a warning if you have a good IDE like pycharm.
If you want to send a actual reserved word to a function, like print, None, yield, or try, you can use the double star ** to convert a dictionary into keyword arguments, for example:
>>> abc(**{"print":2, "None":3})
{'print': 2, 'None': 3}
I hope this answers your question!

Get last character of string using `format`

I have to build a path from a given id using this template :
<last digit of id>/<second last digit of id>/<full id>
For instance, if my id is 3412, the expected result would be :
2/1/3412
The id is supposed to have at least 2 digits.
The first thing I tried was:
>>> "{my_id[3]}/{my_id[2]}/{my_id}".format(my_id=str(3412))
'2/1/3412'
But this would work only if the id is 4 digits long.
So what I was expecting to do then was:
>>> "{my_id[-1]}/{my_id[-2]}/{my_id}".format(my_id=str(3412))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
My question here is why can't I use negative indices in my string specifier? And why is Python telling me I'm not using integer indices? I didn't find anything in the documentation about it.
I know there are many other ways to do this, but I'm just curious about why this one does not work.
I'm using python 2.7, but the behaviour seems to be the same under python 3.4.
As vaultah and Bhargav Rao reported in the comments, this is a known issue of python. I'll just have to find an alternative solution!
>>> my_id = str(3412)
>>> "{}/{}/{}".format(my_id[-1], my_id[-2], my_id)
'2/1/3412'

Python Index Error: string index out of range

## A little helper program that capitalizes the first letter of a word
def Cap (s):
s = s.upper()[0]+s[1:]
return s
Giving me this error :
Traceback (most recent call last):
File "\\prov-dc\students\jadewusi\crack2.py", line 447, in <module>
sys.exit(main(sys.argv[1:]))
File "\\prov-dc\students\jadewusi\crack2.py", line 398, in main
foundit = search_method_3("passwords.txt")
File "\\prov-dc\students\jadewusi\crack2.py", line 253, in search_method_3
ourguess_pass = Cap(ourguess_pass)
File "\\prov-dc\students\jadewusi\crack2.py", line 206, in Cap
s = s.upper()[0]+s[1:]
IndexError: string index out of range
As others have already noted, the problem is that you're trying to access an item in an empty string. Instead of adding special handling in your implementation, you can simply use capitalize:
'hello'.capitalize()
=> 'Hello'
''.capitalize()
=> ''
It blows up, presumably, because there is no indexing an empty string.
>>> ''[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
And as it has been pointed out, splitting a string to call str.upper() on a single letter can be supplanted by str.capitalize().
Additionally, if you should regularly encounter a situation where this would be passed an empty string, you can handle it a couple of ways:
…#whatever previous code comes before your function
if my_string:
Cap(my_string) #or str.capitalize, or…
if my_string being more or less like if len(my_string) > 0.
And there's always ye old try/except, though I think you'll want to consider ye olde refactor first:
#your previous code, leading us to here…
try:
Cap(my_string)
except IndexError:
pass
I wouldn't stay married to indexing a string to call str.upper() on a single character, but you may have a unique set of reasons for doing so. All things being equal, though, str.capitalize() performs the same function.
>>> s = 'macGregor'
>>> s.capitalize()
'Macgregor'
>>> s[:1].upper() + s[1:]
'MacGregor'
>>> s = ''
>>> s[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: string index out of range
>>> s[:1].upper() + s[1:]
''
Why does s[1:] not bail on an empty string?
Tutorial on strings says:
Degenerate slice indices are handled gracefully: an index that is too
large is replaced by the string size, an upper bound smaller than the
lower bound returns an empty string.
See also Python's slice notation.
I just had the same error while I was sure that my string wasn't empty. So I thought I'd share this here, so people who get that error have as many potentional reasons as possible.
In my case, I declared a one character string, and python apparently saw it as a char type. It worked when I added another character. I don't know why it doesn't convert it automatically, but this might be a reason that causes an "IndexError: string index out of range", even if you think that the supposed string is not empty.
It might differ between Python versions, I see the original question refers to Python 3. I used Python 2.6 when this happened.

Python Xlrd Result format

i want to know, the result format of xlrd.
See the code
>>> sh.cell_value(rowx=2, colx=1)
u'Adam Gilchrist xxxxxxxxxxxxxxxxxxxxx'
Now when i try running a res.search
>>> temp1=sh.cell_value(rowx=2, colx=1)
>>> x=re.search("Adam",'temp1')
>>> x.group()
Traceback (most recent call last):
File "<pyshell#58>", line 1, in <module>
x.group()
AttributeError: 'NoneType' object has no attribute 'group'
I get nothing.
First i want to know , what is the 'u' with result.
What are the result formats returned by sh.cell_value. Is it integer, string etc.
Can we run regular expressions on them?
Answering your question first
First i want to know , what is the 'u' with result? u is the qualifier for unicode string. So u'Adam Gilchrist xxxxxxxxxxxxxxxxxxxxx' means the test in unicode.
What are the result formats returned by sh.cell_value . Is it integer , string etc.? Its unicode string
Can we run regular expressions on them ? Yes you can and this is how you do
temp1=u'Adam Gilchrist xxxxxxxxxxxxxxxxxxxxx'
x=re.search(u'Adam',temp1)
x.group()
u'Adam'
Its only that you have to specify the pattern in unicode also.
It's a Unicode string
Cell_value returns the value of the cell. The type depends on the type of the cell.
Yes. You can use regular expressions on Unicode strings, but your code isn't right.
Your code passes "temp1" to re.search as a string. It does not pass the variable temp1. You want:
>>> x=re.search(u"Adam",temp1)

Categories

Resources