This topic is related to the Parsing a CS:GO script file in Python theme, but there is another problem.
I'm working on a content from CS:GO and now i'm trying to make a python tool importing all data from from /scripts/ folder into Python dictionaries.
The next step after parsing data is parsing Language resource file from /resources and making relations between dictionaries and language.
There is an original file for Eng localization:
https://github.com/spec45as/PySteamBot/blob/master/csgo_english.txt
The file format is similar to the previous task, but I have faced with another problems. All language files is in UTF-16-LE encode, i couldn't understand the way of working with encoded files and strings in Python (I'm mostly working with Java)
I have tried to make some solutions, based on open(fileName, encoding='utf-16-le').read(), but i don't know how to work with such encoded strings in pyparsing.
pyparsing.ParseException: Expected quoted string, starting with "
ending with " (at char 0), (line:1, col:1)
Another problem is lines with \"-like expressions, for example:
"musickit_midnightriders_01_desc" "\"HAPPY HOLIDAYS, ****ERS!\"\n -Midnight Riders"
How to parse these symbols if I want to leave these lines as they are?
There are a few new wrinkles to this input file that were not in the original CS:GO example:
embedded \" escaped quotes in some of the value strings
some of the quoted value strings span multiple lines
some of the values end with a trailing environment condition (such as [$WIN32], [$OSX])
embedded comments in the file, marked with '//'
The first two are addressed by modifying the definition of value_qs. Since values are now more fully-featured than keys, I decided to use separate QuotedString definitions for them:
key_qs = QuotedString('"').setName("key_qs")
value_qs = QuotedString('"', escChar='\\', multiline=True).setName("value_qs")
The third requires a bit more refactoring. The use of these qualifying conditions is similar to #IFDEF macros in C - they enable/disable the definition only if the environment matches the condition. Some of these conditions were even boolean expressions:
[!$PS3]
[$WIN32||$X360||$OSX]
[!$X360&&!$PS3]
This could lead to duplicate keys in the definition file, such as in these lines:
"Menu_Dlg_Leaderboards_Lost_Connection" "You must be connected to Xbox LIVE to view Leaderboards. Please check your connection and try again." [$X360]
"Menu_Dlg_Leaderboards_Lost_Connection" "You must be connected to PlayStation®Network and Steam to view Leaderboards. Please check your connection and try again." [$PS3]
"Menu_Dlg_Leaderboards_Lost_Connection" "You must be connected to Steam to view Leaderboards. Please check your connection and try again."
which contain 3 definitions for the key "Menu_Dlg_Leaderboards_Lost_Connection", depending on what environment values were set.
In order to not lose these values when parsing the file, I chose to modify the key at parse time by appending the condition if one was present. This code implements the change:
LBRACK,RBRACK = map(Suppress, "[]")
qualExpr = Word(alphanums+'$!&|')
qualExprCondition = LBRACK + qualExpr + RBRACK
key_value = Group(key_qs + value + Optional(qualExprCondition("qual")))
def addQualifierToKey(tokens):
tt = tokens[0]
if 'qual' in tt:
tt[0] += '/' + tt.pop(-1)
key_value.setParseAction(addQualifierToKey)
So that in the sample above, you would get 3 keys:
Menu_Dlg_Leaderboards_Lost_Connection/$X360
Menu_Dlg_Leaderboards_Lost_Connection/$PS3
Menu_Dlg_Leaderboards_Lost_Connection
Lastly, the handling of comments, probably the easiest. Pyparsing has built-in support for skipping over comments, just like whitespace. You just need to define the expression for the comment, and have the top-level parser ignore it. To support this feature, several common comment forms are pre-defined in pyparsing. In this case, the solution is just to change the final parser defintion to:
parser.ignore(dblSlashComment)
And LASTLY lastly, there is a minor bug in the implementation of QuotedString, in which standard whitespace string literals like \t and \n are not handled, and are just treated as an unnecessarily-escaped 't' or 'n'. So for now, when this line is parsed:
"SFUI_SteamOverlay_Text" "This feature requires Steam Community In-Game to be enabled.\n\nYou might need to restart the game after you enable this feature in Steam:\nSteam -> File -> Settings -> In-Game: Enable Steam Community In-Game\n" [$WIN32]
For the value string you just get:
This feature requires Steam Community In-Game to be enabled.nnYou
might need to restart the game after you enable this feature in
Steam:nSteam -> File -> Settings -> In-Game: Enable Steam Community
In-Gamen
instead of:
This feature requires Steam Community In-Game to be enabled.
You might need to restart the game after you enable this feature in Steam:
Steam -> File -> Settings -> In-Game: Enable Steam Community In-Game
I will have to fix this behavior in the next release of pyparsing.
Here is the final parser code:
from pyparsing import (Suppress, QuotedString, Forward, Group, Dict,
ZeroOrMore, Word, alphanums, Optional, dblSlashComment)
LBRACE,RBRACE = map(Suppress, "{}")
key_qs = QuotedString('"').setName("key_qs")
value_qs = QuotedString('"', escChar='\\', multiline=True).setName("value_qs")
# use this code to convert integer values to ints at parse time
def convert_integers(tokens):
if tokens[0].isdigit():
tokens[0] = int(tokens[0])
value_qs.setParseAction(convert_integers)
LBRACK,RBRACK = map(Suppress, "[]")
qualExpr = Word(alphanums+'$!&|')
qualExprCondition = LBRACK + qualExpr + RBRACK
value = Forward()
key_value = Group(key_qs + value + Optional(qualExprCondition("qual")))
def addQualifierToKey(tokens):
tt = tokens[0]
if 'qual' in tt:
tt[0] += '/' + tt.pop(-1)
key_value.setParseAction(addQualifierToKey)
struct = (LBRACE + Dict(ZeroOrMore(key_value)) + RBRACE).setName("struct")
value <<= (value_qs | struct)
parser = Dict(key_value)
parser.ignore(dblSlashComment)
sample = open('cs_go_sample2.txt').read()
config = parser.parseString(sample)
print (config.keys())
for k in config.lang.keys():
print ('- ' + k)
#~ config.lang.pprint()
print (config.lang.Tokens.StickerKit_comm01_burn_them_all)
print (config.lang.Tokens['SFUI_SteamOverlay_Text/$WIN32'])
Related
I'm writing some code that reads words from a text file and sorts them into a dictionary. It actually all runs fine, but for reference here it is:
def find_words(file_name, delimiter = " "):
"""
A function for finding the number of individual words, and the most popular words, in a given file.
The process will stop at any line in the file that starts with the word 'finish'.
If there is no finish point, the process will go to the end of the file.
Inputs: file_name: Name of file you want to read from, e.g. "mywords.txt"
delimiter: The way the words in the file are separated e.g. " " or ", "
: Delimiter will default to " " if left blank.
Output: Dictionary with all the words contained in the given file, and how many times each word appears.
"""
words = []
dictt = {}
with open(file_name, 'r') as wordfile:
for line in wordfile:
words = line.split(delimiter)
if words[0]=="finish":
break
# This next part is for filling the dictionary
# and correctly counting the amount of times each word appears.
for i in range(len(words)):
a = words[i]
if a=="\n" or a=="":
continue
elif dictt.has_key(a)==False:
dictt[words[i]] = 1
else:
dictt[words[i]] = int(dictt.get(a)) + 1
return dictt
The problem is that it only works if the arguments are given as string literals, e.g, this works:
test = find_words("hello.txt", " " )
But this doesn't:
test = find_words(hello.txt, )
The error message is undefined name 'hello'
I don't know how to alter the function arguments such that I can enter them without speech marks.
Thanks!
Simple, you define that name:
class hello:
txt = "hello.txt"
But joking aside, all the argument values in a function call are expressions. If you want to pass a string literally you'll have to make a string literal, using the quotes. Python is not a text preprocessor like m4 or cpp, and expects the entire program text to follow its syntax.
So it turns out I just misunderstood what was being asked. I've had it clarified by the course leader now.
As I am now fully aware, a function definition needs to be told when a string is being entered, hence the quote marks being required.
I admit full ignorance over my depth of understanding of how it all works - I thought you could pretty much put any assortment of letters and/or numbers in as an argument and then you can manipulate them within the function definition.
My ignorance may stem from the fact that I'm quite new to Python, having learned my coding basics on C++ where, if I remember correctly (it was well over a year ago), functions are defined with each argument being specifically set up as their type, e.g.
int max(int num1, int num2)
Whereas in Python you don't quite do it like that.
Thanks for the attempts at help (and ridicule!)
Problem is sorted now.
Is there a way to pretty-print Lisp-style code string (in other words, a bunch of balanced parentheses and text within) in Python without re-inventing a wheel?
Short answer
I think a reasonable approach, if you can, is to generate Python lists or custom objects instead of strings and use the pprint module, as suggested by #saulspatz.
Long answer
The whole question look like an instance of an XY-problem. Why? because you are using Python (why not Lisp?) to manipulate strings (why not data-structures?) representing generated Lisp-style code, where Lisp-style is defined as "a bunch of parentheses and text within".
To the question "how to pretty-print?", I would thus respond "I wouldn't start from here!".
The best way to not reinvent the wheel in your case, apart from using existing wheels, is to stick to a simple output format.
But first of all all, why do you need to pretty-print? who will look at the resulting code?
Depending on the exact Lisp dialect you are using and the intended usage of the code, you could format your code very differently. Think about newlines, indentation and maximum width of your text, for example. The Common Lisp pretty-printer is particulary evolved and I doubt you want to have the same level of configurability.
If you used Lisp, a simple call to pprint would solve your problem, but you are using Python, so stick with the most reasonable output for the moment because pretty-printing is a can of worms.
If your code is intended for human readers, please:
don't put closing parenthesis on their own lines
don't vertically align open and close parenthesis
don't add spaces between opening parenthesis
This is ugly:
( * ( + 3 x )
(f
x
y
)
)
This is better:
(* (+ 3 x)
(f x y))
Or simply:
(* (+ 3 x) (f x y))
See here for more details.
But before printing, you have to parse your input string and make sure it is well-formed. Maybe you are sure it is well-formed, due to how you generate your forms, but I'd argue that the printer should ignore that and not make too many assumptions. If you passed the pretty-printer an AST represented by Python objects instead of just strings, this would be easier, as suggested in comments. You could build a data-structure or custom classes and use the pprint (python) module. That, as said above, seems to be the way to go in your case, if you can change how you generate your Lisp-style code.
With strings, you are supposed to handle any possible input and reject invalid ones.
This means checking that parenthesis and quotes are balanced (beware of escape characters), etc.
Actually, you don't need to really build an intermediate tree for printing (though it would probably help for other parts of your program), because Lisp-style code is made of forms that are easily nested and use a prefix notation: you can scan your input string from left-to-right and print as required when seeing parenthesis (open parenthesis: recurse; close parenthesis, return from recursion). When you first encounter an unescaped double-quote ", read until the next one ", ...
This, coupled with a simple printing method, could be sufficient for your needs.
I think the easiest method would be to use triple quotations. If you say:
print """
(((This is some lisp code))) """
It should work.
You can format your code any way you like within the triple quotes and it will come out the way you want it to.
Best of luck and happy coding!
I made this rudimentary pretty printer once for prettifying CLIPS, which is based on Lisp. Might help:
def clips_pprint(clips_str: str) -> str:
"""Pretty-prints a CLIPS string.
Indents a CLIPS string for easier visual confirmation during development
and verification.
Assumes the CLIPS string is valid CLIPS, i.e. braces are paired.
"""
LB = "("
RB = ")"
TAB = " " * 4
formatted_clips_str = ""
tab_count = 0
for c in clips_str:
if c == LB:
formatted_clips_str += os.linesep
for _i in range(tab_count):
formatted_clips_str += TAB
tab_count += 1
elif c == RB:
tab_count -= 1
formatted_clips_str += c
return formatted_clips_str.strip()
I am working on the letter distribution problem from HP code wars 2012. I keep getting an error message that says "invalid character in identifier". What does this mean and how can it be fixed?
Here is the page with the information.
import string
def text_analyzer(text):
'''The text to be parsed and
the number of occurrences of the letters given back
be. Punctuation marks, and I ignore the EOF
simple. The function is thus very limited.
'''
result = {}
# Processing
for a in string.ascii_lowercase:
result [a] = text.lower (). count (a)
return result
def analysis_result (results):
# I look at the data
keys = analysis.keys ()
values \u200b\u200b= list(analysis.values \u200b\u200b())
values.sort (reverse = True )
# I turn to the dictionary and
# Must avoid that letters will be overwritten
w2 = {}
list = []
for key in keys:
item = w2.get (results [key], 0 )
if item = = 0 :
w2 [analysis results [key]] = [key]
else :
item.append (key)
w2 [analysis results [key]] = item
# We get the keys
keys = list (w2.keys ())
keys.sort (reverse = True )
for key in keys:
list = w2 [key]
liste.sort ()
for a in list:
print (a.upper (), "*" * key)
text = """I have a dream that one day this nation will rise up and live out the true
meaning of its creed: "We hold these truths to be self-evident, that all men
are created equal. "I have a dream that my four little children will one day
live in a nation where they will not be Judged by the color of their skin but
by the content of their character.
# # # """
analysis result = text_analyzer (text)
analysis_results (results)
The error SyntaxError: invalid character in identifier means you have some character in the middle of a variable name, function, etc. that's not a letter, number, or underscore. The actual error message will look something like this:
File "invalchar.py", line 23
values = list(analysis.values ())
^
SyntaxError: invalid character in identifier
That tells you what the actual problem is, so you don't have to guess "where do I have an invalid character"? Well, if you look at that line, you've got a bunch of non-printing garbage characters in there. Take them out, and you'll get past this.
If you want to know what the actual garbage characters are, I copied the offending line from your code and pasted it into a string in a Python interpreter:
>>> s=' values = list(analysis.values ())'
>>> s
' values \u200b\u200b= list(analysis.values \u200b\u200b())'
So, that's \u200b, or ZERO WIDTH SPACE. That explains why you can't see it on the page. Most commonly, you get these because you've copied some formatted (not plain-text) code off a site like StackOverflow or a wiki, or out of a PDF file.
If your editor doesn't give you a way to find and fix those characters, just delete and retype the line.
Of course you've also got at least two IndentationErrors from not indenting things, at least one more SyntaxError from stay spaces (like = = instead of ==) or underscores turned into spaces (like analysis results instead of analysis_results).
The question is, how did you get your code into this state? If you're using something like Microsoft Word as a code editor, that's your problem. Use a text editor. If not… well, whatever the root problem is that caused you to end up with these garbage characters, broken indentation, and extra spaces, fix that, before you try to fix your code.
If your keyboard is set to English US (International) rather than English US the double quotation marks don't work. This is why the single quotation marks worked in your case.
Similar to the previous answers, the problem is some character (possibly invisible) that the Python interpreter doesn't recognize. Because this is often due to copy-pasting code, re-typing the line is one option.
But if you don't want to re-type the line, you can paste your code into this tool or something similar (Google "show unicode characters online"), and it will reveal any non-standard characters. For example,
s=' values = list(analysis.values ())'
becomes
s=' values U+200B U+200B = list(analysis.values U+200B U+200B ())'
You can then delete the non-standard characters from the string.
Carefully see your quotation, is this correct or incorrect! Sometime double quotation doesn’t work properly, it's depend on your keyboard layout.
I got a similar issue. My solution was to change minus character from:
—
to
-
I got that error, when sometimes I type in Chinese language.
When it comes to punctuation marks, you do not notice that you are actually typing the Chinese version, instead of the English version.
The interpreter will give you an error message, but for human eyes, it is hard to notice the difference.
For example, "," in Chinese; and "," in English.
So be careful with your language setting.
Not sure this is right on but when i copied some code form a paper on using pgmpy and pasted it into the editor under Spyder, i kept getting the "invalid character in identifier" error though it didn't look bad to me. The particular line was grade_cpd = TabularCPD(variable='G',\
For no good reason I replaced the ' with " throughout the code and it worked. Not sure why but it did work
A little bit late but I got the same error and I realized that it was because I copied some code from a PDF. Check the difference between these two:
-
−
The first one is from hitting the minus sign on keyboard and the second is from a latex generated PDF.
This error occurs mainly when copy-pasting the code. Try editing/replacing minus(-), bracket({) symbols.
You don't get a good error message in IDLE if you just Run the module. Try typing an import command from within IDLE shell, and you'll get a much more informative error message. I had the same error and that made all the difference.
(And yes, I'd copied the code from an ebook and it was full of invisible "wrong" characters.)
My solution was to switch my Mac keyboard from Unicode to U.S. English.
it is similar for me as well after copying the code from my email.
def update(self, k=1, step = 2):
if self.start.get() and not self.is_paused.get(): U+A0
x_data.append([i for i in range(0,k,1)][-1])
y = [i for i in range(0,k,step)][-1]
There is additional U+A0 character after checking with the tool as recommended by #Jacob Stern.
I am trying to do a program that evaluates if a propositional logic formula is valid or invalid using the semantic three method.
I managed to evaluate if a formula is well formed or not so far:
from pyparsing import *
from string import lowercase
def fbf():
atom = Word(lowercase, max=1) #alfabeto minusculas
op = oneOf('^ V => <=>') #Operadores
identOp = oneOf('( [ {')
identCl = oneOf(') ] }')
form = Forward() #Iniciar de manera recursiva
#Gramatica
form << ( (Group(Literal('~') + form)) | ( Group(identOp + form + op + form + identCl) ) | ( Group(identOp + form + identCl) ) | (atom) )
return form
#Haciendo todo lo que se debe
entrada = raw_input("Entrada: ")
try:
print fbf().parseString(entrada, parseAll=True)
except ParseException as error: #Manejando error
print error.markInputline()
print error
print
Now I need to convert the negated forumla ~(form) acording to the Monrgan's Law, The BNF of Morgan's Law its something like this:
~((form) V (form)) = (~(form) ^ ~(form))
~((form) ^ (form)) = (~(form) V ~(form))
http://en.wikipedia.org/wiki/De_Morgans_laws
Parsing must be recursive; I was reading about Parseactions, but I don't really understand I'm new to python and very unskilled.
Can somebody help me on how to get this to work?
Juan Jose -
You are asking for a lot of work on the part of this audience, whether you realize it or not. Here are some suggestions on how to make progress on this problem:
Recognize that parsing the input is only the first step in this overall program. You can't just write any parser that gets through the input, and then declare yourself ready for the next step. You need to anticipate what you will do with the parsed output, and try to parse the data in such a way that it readies you to take the next step - which in your case is to do some logical transformations to apply DeMorgans Laws. In fact, you may be best off working backwards - assume you have a parser, what would you need your transformation code to work with, how would an expression look, and how would you perform the transform itself? This will naturally structure your thinking to the application domain, and give you a target result format when you start writing the parser.
When you start to write your parser, look at other pyparsing examples that do similar tasks, such as SimpleBool.py on the pyparsing wiki. See how they parse the input to create a set of evaluatable objects, which can then be acted upon in the application domain (whether it is to evaluate them, transform them, or whatever). Think about what kind of objects you want to create in your parser that will work with the transformation methods you outlined in the last step.
Take time to write a BNF for the syntax you will parse. Write out some sample test strings that you would parse to help you anticipate syntax issues. Is "~~p ^ q V r" a valid string? Can identifiers be multiple characters, or will you restrict to just single characters (single will be easier to work with at the beginning, and you can expand it later easily)? Keep your syntax simple if you can, such as just supporting ()'s for grouping, instead of any matched pair of ()'s, []'s, or {}'s.
When you implement your parser, start with simple test cases first and work your way up. You may have to backtrack a bit if you find that you made some assumptions early on that more complicated strings don't support, but that's pretty typical for most programming projects.
As an implementation tip, read up on using the operatorPrecedence helper, as it is specifically designed for these types of parsing jobs. Look at how it is used in SimpleBool.py to create an object hierarchy that mirrors the structure of the input string. Then think about what objects would do in your transformation process.
Good luck!
I am expecting users to upload a CSV file of max size 1MB to a web form that should fit a given format similar to:
"<String>","<String>",<Int>,<Float>
That will be processed later. I would like to verify the file fits a specified format so that the program that shall later use the file doesnt receive unexpected input and that there are no security concerns (say some injection attack against the parsing script that does some calculations and db insert).
(1) What would be the best way to go about doing this that would be fast and thorough? From what I've researched I could go the path of regex or something more like this. I've looked at the python csv module but that doesnt appear to have any built in verification.
(2) Assuming I go for a regex, can anyone direct me to towards the best way to do this? Do I match for illegal characters and reject on that? (eg. no '/' '\' '<' '>' '{' '}' etc.) or match on all legal eg. [a-zA-Z0-9]{1,10} for the string component? I'm not too familiar with regular expressions so pointers or examples would be appreciated.
EDIT:
Strings should contain no commas or quotes it would just contain a name (ie. first name, last name). And yes I forgot to add they would be double quoted.
EDIT #2:
Thanks for all the answers. Cutplace is quite interesting but is a standalone. Decided to go with pyparsing in the end because it gives more flexibility should I add more formats.
Pyparsing will process this data, and will be tolerant of unexpected things like spaces before and after commas, commas within quotes, etc. (csv module is too, but regex solutions force you to add "\s*" bits all over the place).
from pyparsing import *
integer = Regex(r"-?\d+").setName("integer")
integer.setParseAction(lambda tokens: int(tokens[0]))
floatnum = Regex(r"-?\d+\.\d*").setName("float")
floatnum.setParseAction(lambda tokens: float(tokens[0]))
dblQuotedString.setParseAction(removeQuotes)
COMMA = Suppress(',')
validLine = dblQuotedString + COMMA + dblQuotedString + COMMA + \
integer + COMMA + floatnum + LineEnd()
tests = """\
"good data","good2",100,3.14
"good data" , "good2", 100, 3.14
bad, "good","good2",100,3.14
"bad","good2",100,3
"bad","good2",100.5,3
""".splitlines()
for t in tests:
print t
try:
print validLine.parseString(t).asList()
except ParseException, pe:
print pe.markInputline('?')
print pe.msg
print
Prints
"good data","good2",100,3.14
['good data', 'good2', 100, 3.1400000000000001]
"good data" , "good2", 100, 3.14
['good data', 'good2', 100, 3.1400000000000001]
bad, "good","good2",100,3.14
?bad, "good","good2",100,3.14
Expected string enclosed in double quotes
"bad","good2",100,3
"bad","good2",100,?3
Expected float
"bad","good2",100.5,3
"bad","good2",100?.5,3
Expected ","
You will probably be stripping those quotation marks off at some future time, pyparsing can do that at parse time by adding:
dblQuotedString.setParseAction(removeQuotes)
If you want to add comment support to your input file, say a '#' followed by the rest of the line, you can do this:
comment = '#' + restOfline
validLine.ignore(comment)
You can also add names to these fields, so that you can access them by name instead of index position (which I find gives more robust code in light of changes down the road):
validLine = dblQuotedString("key") + COMMA + dblQuotedString("title") + COMMA + \
integer("qty") + COMMA + floatnum("price") + LineEnd()
And your post-processing code can then do this:
data = validLine.parseString(t)
print "%(key)s: %(title)s, %(qty)d in stock at $%(price).2f" % data
print data.qty*data.price
I'd vote for parsing the file, checking you've got 4 components per record, that the first two components are strings, the third is an int (checking for NaN conditions), and the fourth is a float (also checking for NaN conditions).
Python would be an excellent tool for the job.
I'm not aware of any libraries in Python to deal with validation of CSV files against a spec, but it really shouldn't be too hard to write.
import csv
import math
dataChecker = csv.reader(open('data.csv'))
for row in dataChecker:
if len(row) != 4:
print 'Invalid row length.'
return
my_int = int(row[2])
my_float = float(row[3])
if math.isnan(my_int):
print 'Bad int found'
return
if math.isnan(my_float):
print 'Bad float found'
return
print 'All good!'
Here's a small snippet I made:
import csv
f = csv.reader(open("test.csv"))
for value in f:
value[0] = str(value[0])
value[1] = str(value[1])
value[2] = int(value[2])
value[3] = float(value[3])
If you run that with a file that doesn't have the format your specified, you'll get an exception:
$ python valid.py
Traceback (most recent call last):
File "valid.py", line 8, in <module>
i[2] = int(i[2])
ValueError: invalid literal for int() with base 10: 'a3'
You can then make a try-except ValueError to catch it and let the users know what they did wrong.
There can be a lot of corner-cases for parsing CSV, so you probably don't want to try doing it "by hand". At least start with a package/library built-in to the language that you're using, even if it doesn't do all the "verification" you can think of.
Once you get there, then examine the fields for your list of "illegal" chars, or examine the values in each field to determine they're valid (if you can do so). You also don't even need a regex for this task necessarily, but it may be more concise to do it that way.
You might also disallow embedded \r or \n, \0 or \t. Just loop through the fields and check them after you've loaded the data with your csv lib.
Try Cutplace. It verifies that tabluar data conforms to an interface control document.
Ideally, you want your filtering to be as restrictive as possible - the fewer things you allow, the fewer potential avenues of attack. For instance, a float or int field has a very small number of characters (and very few configurations of those characters) which should actually be allowed. String filtering should ideally be restricted to only what characters people would have a reason to input - without knowing the larger context it's hard to tell you exactly which you should allow, but at a bare minimum the string match regex should require quoting of strings and disallow anything that would terminate the string early.
Keep in mind, however, that some names may contain things like single quotes ("O'Neil", for instance) or dashes, so you couldn't necessarily rule those out.
Something like...
/"[a-zA-Z' -]+"/
...would probably be ideal for double-quoted strings which are supposed to contain names. You could replace the + with a {x,y} length min/max if you wanted to enforce certain lengths as well.