i want to know, the result format of xlrd.
See the code
>>> sh.cell_value(rowx=2, colx=1)
u'Adam Gilchrist xxxxxxxxxxxxxxxxxxxxx'
Now when i try running a res.search
>>> temp1=sh.cell_value(rowx=2, colx=1)
>>> x=re.search("Adam",'temp1')
>>> x.group()
Traceback (most recent call last):
File "<pyshell#58>", line 1, in <module>
x.group()
AttributeError: 'NoneType' object has no attribute 'group'
I get nothing.
First i want to know , what is the 'u' with result.
What are the result formats returned by sh.cell_value. Is it integer, string etc.
Can we run regular expressions on them?
Answering your question first
First i want to know , what is the 'u' with result? u is the qualifier for unicode string. So u'Adam Gilchrist xxxxxxxxxxxxxxxxxxxxx' means the test in unicode.
What are the result formats returned by sh.cell_value . Is it integer , string etc.? Its unicode string
Can we run regular expressions on them ? Yes you can and this is how you do
temp1=u'Adam Gilchrist xxxxxxxxxxxxxxxxxxxxx'
x=re.search(u'Adam',temp1)
x.group()
u'Adam'
Its only that you have to specify the pattern in unicode also.
It's a Unicode string
Cell_value returns the value of the cell. The type depends on the type of the cell.
Yes. You can use regular expressions on Unicode strings, but your code isn't right.
Your code passes "temp1" to re.search as a string. It does not pass the variable temp1. You want:
>>> x=re.search(u"Adam",temp1)
Related
I want to capture data and numbers from a string in python. The string is a measurement from an RF sensor so it might be corrupted from bad transmission. Strings from the sensor look like this PA1015.7 TMPA20.53 HUM76.83.
My re is :
s= re.search('^(\D+)([0-9.]+'),message)
Now before I proceed I want to check if I truly received exactly two matches properly or if the string is garbled.
So I tried :
len(s)
But that errors out :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type '_sre.SRE_Match' has no len()
I do need access to the match group elements for processing later. (I think that eliminates findall)
key= s.group(1)
data= s.group(2)
What's missing?
Instead of using search, you should use findall instead:
s = re.findall('(\D+)([0-9.]+)',message)
print("matched " + str(len(s)))
search only returns whether there is or is no match in the input string, in the form of a boolean.
Shouldn't both these commands do the same thing?
>>> "{0[0:5]}".format("lorem ipsum")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string indices must be integers
>>> "{0}".format("lorem ipsum"[0:5])
'lorem'
The commands
>>> "{0[0]}".format("lorem ipsum")
'l'
and
>>> "{0}".format("lorem ipsum"[0])
'l'
evaluate the same. (I know that I can use other methods to do this, I am mainly just curious as to why it dosen't work)
The str.format syntax is handled by the library and supports only a few “expression” syntaxes that are not the same as regular Python syntax. For example,
"{0[foo]}".format(dict(foo=2)) # "2"
works without quotes around the dictionary key. Of course, there are limitations from this simplicity, like not being able to refer to a key with a ] in it, or interpreting a slice, as in your example.
Note that the f-strings mentioned by kendall are handled by the compiler and (fittingly) use (almost) unrestricted expression syntax. They need that power since they lack the obvious alternative of placing those expressions in the argument list to format.
I'm currently trying to use python's (3.6) xml.etree.ElementTree commands to write an xml file. Some of the Elements and Subelements I need to write must have "id" and "map" fields, which are reserved python words.
My problem is contained in the following line of code:
ET.SubElement(messages,'trigger',thing='1',bob='a', max='5')
But "max" is a function and I can't use it. Is there a character I can place there to allow me to write this field as I desire? Or some sort of known workaround?
EDIT: I am aware that an '_' stops the python from processing the word, but unfortunately this underscore will show up in my file...so I am trying to see if there is an 'invisible' option for the file I will later be writing.
Thanks much!
Python functions are no problem in the left side of a keyword expression:
>>> def abc(**kwargs):
print kwargs
>>> abc(id=2)
{'id': 2}
>>>
id, map, int, float, str, repr, etc. are built in symbols, not reserved words. You may use them like any other bunch of letters, but assigning it another value replaces the built in symbol:
>>> int(2.5)
2
>>> int = "5"
>>> int(2.5)
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
int(2.5)
TypeError: 'str' object is not callable
Notice how the first line is entirely legal, but will trigger a warning if you have a good IDE like pycharm.
If you want to send a actual reserved word to a function, like print, None, yield, or try, you can use the double star ** to convert a dictionary into keyword arguments, for example:
>>> abc(**{"print":2, "None":3})
{'print': 2, 'None': 3}
I hope this answers your question!
I have this simple regex:
text = re.sub("[إأٱآا]", "ا", text)
However, I get this (Python 2.7) error:
TypeError: expected string or buffer
I'm a regex newbie, I imagine this is a simple thing to fix, but I'm
not sure how? Thanks.
Define all your strings as unicode and don't forget to add the encoding line in the header of the file:
#coding: utf-8
import re
text = re.sub(u"[إأٱآا]", u"ا", u"الآلهة")
print text
To get:
الالهة
re.sub expects regex as first parameter. You need to escape the left bracket in your patterns. Use \[ instead of [
Sorry I couldn't fit this in the comments section. There is nothing wrong in the re.sub as far as I understand. Because if you code the chars back to unicode you get the below verbatim.
text = re.sub("[\u0625\u0623\u0671\u0622\u0627]", "\u0627", text)
Because it is arabic, remember it is right to left, the visuals are a bit jumbled that's all.
It is actually trying to replace a set of chars with one char.
Although why would one replace \u0627 with \u0627, I do not know.
The issue I believe is with text. If you can do print(text), then we can see if there are any chars in it that belong to "[إأٱآا]" == "[\u0625\u0623\u0671\u0622\u0627]"
Just a quip the \u0627 is the smallest vertical line on the left ;-)
Little help in understanding what it actually is use(just copy the whole statement in the question and do the below)
for x in mystr: print(x + '-' + str(ord(x)))
http://www.fileformat.info/info/unicode/char/0627/index.htm
EDITED
>>> re.sub(myset,myrep,text)
u'\u0627\u0627\u0627abc'
>>> res=re.sub(myset,myrep,text)
>>> res
u'\u0627\u0627\u0627abc'
>>> myrep
u'\u0627'
>>> myset
u'[\u0625\u0623\u0671\u0622\u0627]'
>>> text
u'\u0625\u0623\u0623abc'
>>> print(res)
اااabc
>>> print(myrep)
ا
>>> print(myset)
[إأٱآا]
>>> print(text)
إأأabc
>>>
So in essence All Works Well and the error is else where.
I think reproduced the error that is occurring elsewhere and here it is
>>> print(u'\u0625'+ord(u'\u0625'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: coercing to Unicode: need string or buffer, int found
Cheers!
This is how I eventually did it:
sText = re.sub(ur"[\u0625|\u0623|\u0671|\u0622|\u0627]", ur"\u0627", sText)
Thank you all for your help.
I was curious about how ASCII worked in python, so I decided to find out more. I learnt quite a bit, before I began to try to print letters using ASCII numbers. I'm not sure if I am doing it correctly, as I am using the string module, but I keep picking up an error
print(string.ascii_lowercase(104))
This should print out "h", as far as I know, but all that happens is that I receive an error.
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
string.ascii_lowercase(104)
TypeError: 'str' object is not callable
If someone could help me solve this, or tell me a better way, I would be ever grateful. Thanks in advance! :)
ascii_lowercase is a string, not a function. Use chr(104).
I guess what you want is chr
>>> chr(104)
'h'
The chr() function returns the corresponding character to the ASCII value you put in.
The ord() function returns the ASCII value of the character you put in.
Example:
chr(104) = 'h'
ord('h') = 104