Invalid character in identifier - python

I am working on the letter distribution problem from HP code wars 2012. I keep getting an error message that says "invalid character in identifier". What does this mean and how can it be fixed?
Here is the page with the information.
import string
def text_analyzer(text):
'''The text to be parsed and
the number of occurrences of the letters given back
be. Punctuation marks, and I ignore the EOF
simple. The function is thus very limited.
'''
result = {}
# Processing
for a in string.ascii_lowercase:
result [a] = text.lower (). count (a)
return result
def analysis_result (results):
# I look at the data
keys = analysis.keys ()
values \u200b\u200b= list(analysis.values \u200b\u200b())
values.sort (reverse = True )
# I turn to the dictionary and
# Must avoid that letters will be overwritten
w2 = {}
list = []
for key in keys:
item = w2.get (results [key], 0 )
if item = = 0 :
w2 [analysis results [key]] = [key]
else :
item.append (key)
w2 [analysis results [key]] = item
# We get the keys
keys = list (w2.keys ())
keys.sort (reverse = True )
for key in keys:
list = w2 [key]
liste.sort ()
for a in list:
print (a.upper (), "*" * key)
text = """I have a dream that one day this nation will rise up and live out the true
meaning of its creed: "We hold these truths to be self-evident, that all men
are created equal. "I have a dream that my four little children will one day
live in a nation where they will not be Judged by the color of their skin but
by the content of their character.
# # # """
analysis result = text_analyzer (text)
analysis_results (results)

The error SyntaxError: invalid character in identifier means you have some character in the middle of a variable name, function, etc. that's not a letter, number, or underscore. The actual error message will look something like this:
File "invalchar.py", line 23
values = list(analysis.values ())
^
SyntaxError: invalid character in identifier
That tells you what the actual problem is, so you don't have to guess "where do I have an invalid character"? Well, if you look at that line, you've got a bunch of non-printing garbage characters in there. Take them out, and you'll get past this.
If you want to know what the actual garbage characters are, I copied the offending line from your code and pasted it into a string in a Python interpreter:
>>> s=' values ​​= list(analysis.values ​​())'
>>> s
' values \u200b\u200b= list(analysis.values \u200b\u200b())'
So, that's \u200b, or ZERO WIDTH SPACE. That explains why you can't see it on the page. Most commonly, you get these because you've copied some formatted (not plain-text) code off a site like StackOverflow or a wiki, or out of a PDF file.
If your editor doesn't give you a way to find and fix those characters, just delete and retype the line.
Of course you've also got at least two IndentationErrors from not indenting things, at least one more SyntaxError from stay spaces (like = = instead of ==) or underscores turned into spaces (like analysis results instead of analysis_results).
The question is, how did you get your code into this state? If you're using something like Microsoft Word as a code editor, that's your problem. Use a text editor. If not… well, whatever the root problem is that caused you to end up with these garbage characters, broken indentation, and extra spaces, fix that, before you try to fix your code.

If your keyboard is set to English US (International) rather than English US the double quotation marks don't work. This is why the single quotation marks worked in your case.

Similar to the previous answers, the problem is some character (possibly invisible) that the Python interpreter doesn't recognize. Because this is often due to copy-pasting code, re-typing the line is one option.
But if you don't want to re-type the line, you can paste your code into this tool or something similar (Google "show unicode characters online"), and it will reveal any non-standard characters. For example,
s=' values ​​= list(analysis.values ​​())'
becomes
s=' values U+200B U+200B​​ = list(analysis.values U+200B U+200B ​​())'
You can then delete the non-standard characters from the string.

Carefully see your quotation, is this correct or incorrect! Sometime double quotation doesn’t work properly, it's depend on your keyboard layout.

I got a similar issue. My solution was to change minus character from:
—
to
-

I got that error, when sometimes I type in Chinese language.
When it comes to punctuation marks, you do not notice that you are actually typing the Chinese version, instead of the English version.
The interpreter will give you an error message, but for human eyes, it is hard to notice the difference.
For example, "," in Chinese; and "," in English.
So be careful with your language setting.

Not sure this is right on but when i copied some code form a paper on using pgmpy and pasted it into the editor under Spyder, i kept getting the "invalid character in identifier" error though it didn't look bad to me. The particular line was grade_cpd = TabularCPD(variable='G',\
For no good reason I replaced the ' with " throughout the code and it worked. Not sure why but it did work

A little bit late but I got the same error and I realized that it was because I copied some code from a PDF. Check the difference between these two:
-
−
The first one is from hitting the minus sign on keyboard and the second is from a latex generated PDF.

This error occurs mainly when copy-pasting the code. Try editing/replacing minus(-), bracket({) symbols.

You don't get a good error message in IDLE if you just Run the module. Try typing an import command from within IDLE shell, and you'll get a much more informative error message. I had the same error and that made all the difference.
(And yes, I'd copied the code from an ebook and it was full of invisible "wrong" characters.)

My solution was to switch my Mac keyboard from Unicode to U.S. English.

it is similar for me as well after copying the code from my email.
def update(self, k=1, step = 2):
if self.start.get() and not self.is_paused.get(): U+A0
x_data.append([i for i in range(0,k,1)][-1])
y = [i for i in range(0,k,step)][-1]
There is additional U+A0 character after checking with the tool as recommended by #Jacob Stern.

Related

printing \x1bZ in python is weird, explanation needed

The following python code has some weird behavior which I don't understand
print('\x1bZ')
when I run this code whether in a file or in the interpreter I have a wierd outcome:
actual values as displayed when you write this value to a file as bytes:
Discoveries at time of posting this question:
whether single quotes or double quotes make a difference (they don't)
0x1b is hex for 27 which in ascii is ESC which matches as displayed with the second picture. This lead me to theorize that the letter Z in the string literal can be replaced but as per my test in point number 3 it cant be reproduced with other letters
instead of \x1bZ (ESC and then Z) trying ESC and then some other letter (I haven't checked all possibilities) yielded no apparent result except from replacing Z with c which seems to clear the terminal
Hypothesis that I came up with:
This page may be relevant to the answer: https://pypi.org/project/py100/ because I have found a pattern there that resembles weird result: Esc[?1;Value0c where Value would be replaced by something. also ^[[?1;<n>0c appears in https://espterm.github.io/docs/VT100%20escape%20codes.html
Is this some encoding problem?
Is this related to ANSI character escaping? [?1;0c vs [38;2; which is used when changing background color of text
Questions:
Why is this particular sequence of characters results in this output?
What is VT100 and how it is related if it is related? (I visited it's Wikipedia page)
whether it is possible to print a string that contains this specific sequence without that weird outcome as displayed in the first picture
all help and knowledge about this will be appreciated!!

vscode python avoid indent of multiline string

So here is an example of a multiline string in vscode/python:
Cursor is after the p , and then you press enter, and end up like this:
i.e. the string ends up indented, which seems what you almost never want - why have an arbitratly amount of whitespace on the next line of this string ?
Is there any way change this in vscode, i.e. for multiline strings, it should end up with this:
I think this problem is related to different coding styles of different people.
For example,
def example(x):
if x:
a = '''
This is help
'''
def example(x):
if x:
a = '''This is help
'''
The automatic indenting of vscode line breaks is based on code blocks. If you want Vscode can identify multiline string, I think it would be better to submit future request in github. I've submitted this issue for you.
I am not 100% sure if what OP meant is just to refer to the indentation in the editor (namely, VSC) or if, by this:
i.e. the string ends up indented, which seems what you almost never want - why have an arbitrary amount of white space on the next line of this string?
...they also meant to refer to the actual output of the multi-line string,
(or also, just in case anybody else finds this post looking for a way to avoid this affecting the actual output of the multi-line string), I'd like to add as a complementary answer (cannot comment yet) that this was already beautifully answered here.
If that's the case and you're reading this for that reason, in short, all you want is to import the standard lib 'inspect' and post-process your string with it, using the cleandoc method.
Without breaking the indentation in your IDE, this method makes sure to give you the string output you actually expected:
All leading whitespace is removed from the first line. Any leading whitespace that can be uniformly removed from the second line onwards is removed. Empty lines at the beginning and end are subsequently removed. Also, all tabs are expanded to spaces.
(From the docs link above)
Hope that helps anyone.

Python character constants for chat bot

I'm currently writting a chat bot in python and I would like to be able to type special characters like emoji, etc. my first attempt was just to place the literal character in the code.
add_reaction('🇦')
Unfortunately not many editors support these characters, so they appear mostly as random gibberish. For readability this isn't very good either.
To solve the gibberish issue I used chr(charcode:{int}) which also made them more copy paste save.
Then I put all of them to a separate file special_chars.py so i could give the characters a name
thumbs_up = chr(...)
smiley_face = chr(...)
regional_a_z = [chr(127462+i) for i in range(0,25)]
...
However this file started to grow really long really quickly.
So is there a better way to do this?
Something to keep in mind:
if a long file isn't avoidable could the character codes be moved to a non-python file
potential list for consecutive characters or character groups ex: thumb-up and down, list of regional indicators
The unicodedata module of the standard library already contains names for the special characters:
>>> unicodedata.lookup('THUMBS UP SIGN')
'\U0001f44d'
>>> unicodedata.lookup("REGIONAL INDICATOR SYMBOL LETTER A")
'\U0001f1e6'
You can get the official name of a character by its code:
>>> unicodedata.name('\U0001F600')
'GRINNING FACE'

python 3 regex not finding confirmed matches

So I'm trying to parse a bunch of citations from a text file using the re module in python 3.4 (on, if it matters, a mac running mavericks). Here's some minimal code. Note that there are two commented lines: they represent two alternative searches. (Obviously, the little one, r'Rawls', is the one that works)
def makeRefList(reffile):
print(reffile)
# namepattern = r'(^[A-Z1][A-Za-z1]*-?[A-Za-z1]*),.*( \(?\d\d\d\d[a-z]?[.)])'
# namepattern = r'Rawls'
refsTuplesList = re.findall(namepattern, reffile, re.MULTILINE)
print(refsTuplesList)
The string in question is ugly, and so I stuck it in a gist: https://gist.github.com/paultopia/6c48c398a42d4834f2ae
As noted, the search string r'Rawls' produces expected output ['Rawls', 'Rawls']. However, the other search string just produces an empty list.
I've confirmed this regex (partially) works using the regex101 tester. Confirmation here: https://regex101.com/r/kP4nO0/1 -- this match what I expect it to match. Since it works in the tester, it should work in the code, right?
(n.b. I copied the text from terminal output from the first print command, then manually replaced \n characters in the string with carriage returns for regex101.)
One possible issue is that python has appended the bytecode flag (is the little b called a "flag?") to the string. This is an artifact of my attempt to convert the text from utf-8 to ascii, and I haven't figured out how to make it go away.
Yet re clearly is able to parse strings in that form. I know this because I'm converting two text files from utf-8 to ascii, and the following code works perfectly fine on the other string, converted from the other text file, which also has a little b in front of it:
def makeCiteList(citefile):
print(citefile)
citepattern = r'[\s(][A-Z1][A-Za-z1]*-?[A-Za-z1]*[ ,]? \(?\d\d\d\d[a-z]?[\s.,)]'
rawCitelist = re.findall(citepattern, citefile)
cleanCitelist = cleanup(rawCitelist)
finalCiteList = list(set(cleanCitelist))
print(finalCiteList)
return(finalCiteList)
The other chunk of text, which the code immediately above matches correctly: https://gist.github.com/paultopia/a12eba2752638389b2ee
The only hypothesis I can come up with is that the first, broken, regex expression is puking on the combination of newline characters and the string being treated as a byte object, even though a) I know the regex is correct for newlines (because, confirmation from the linked regex101), and b) I know it's matching the strings (because, confirmation from the successful match on the other string).
If that's true, though, I don't know what to do about it.
Thus, questions:
1) Is my hypothesis right that it's the combination of newlines and b that blows up my regex? If not, what is?
2) How do I fix that?
a) replace the newlines with something in the string?
b) rewrite the regex somehow?
c) somehow get rid of that b and make it into a normal string again? (how?)
thanks!
Addition
In case this is a problem I need to fix upstream, here's the code I'm using to get the text files and convert to ascii, replacing non-ascii characters:
this function gets called on utf-8 .txt files saved by textwrangler in mavericks
def makeCorpoi(citefile, reffile):
citebox = open(citefile, 'r')
refbox = open(reffile, 'r')
citecorpus = citebox.read()
refcorpus = refbox.read()
citebox.close()
refbox.close()
corpoi = [str(citecorpus), str(refcorpus)]
return corpoi
and then this function gets called on each element of the list the above function returns.
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
stringstring = str(bigstring)
return stringstring
Aah. I've tracked it down and answered my own question. Apparently one needs to call some kind of encode method on the decoded thing. The following code produces an actual string, with newlines and everything, out the other end (though now I have to fix a bunch of other bugs before I can figure out if the final output is as expected):
def conv2ASCII(bigstring):
def convHandler(error):
return ('1FOREIGN', error.start + 1)
codecs.register_error('foreign', convHandler)
bigstring = bigstring.encode('ascii', 'foreign')
newstring = bigstring.decode('ascii', 'foreign')
return newstring
apparently the str() function doesn't do the same job, for reasons that are mysterious to me. This is despite an answer here How to make new line commands work in a .txt file opened from the internet? which suggests that it does.

Python, not understanding double quotes with nothing inside

got a bit of a noob question.
I'm trying to get Metagoofil working because it keeps saying "error downloading webpage" etc etc.
A google search found that I can change a bit of the code in one of the config files and it will work properly again.
I'm having a problem though: this seems to be the code I want to use.
self.url = url.replace("/url?q=", "", 1).split("&amp")[0]
BUT, it doesn't seem to like me (based on syntax highlighting) have those two quotation marks together with nothing in between. When they are like above, it starts highlighting .split(" up to here thinking that this is the string.
My question is, how can I make the double quotations together without anything in the middle and have it register as its own string, so it doesn't highlight the .split("
The literal "" is a valid string with a length of zero. I guess your syntax highlighter is not working properly.

Categories

Resources