Python - How to fix CWE-117: Improper Output Neutralization for Logs - python

I have tried using clean = message.replace('\n', '').replace('\r', '').
But it still fails the veracode scan.
Is there any better way to do it?
Also I read about ESAPI Python but not getting any good documentation to fix this issue.

ESAPI Python unfortunately is an extremely long dead project. I don't have any Python alternatives, but the library hasn't had a commit since August 9, 2010.
I'm not familiar with Python web frameworks, but I do teach regularly about secure programming.
Every time you make a function call, you need to ask yourself the question:
"Am I passing this off of to an interpreter?"
If the answer is "yes," then you need to encode to ensure that THAT function is going to treat the input purely as data and not executable code.
In this particular case, if your logging has the potential to be interpreted by a browser (which is common) you'd also want to encode for HTML. However if your environment never uses html-based logging and is always raw text, this could be considered "mitigated" or a "false positive" depending on how your company has agreed to those terms.
Also, I like the practice of wrapping any user input with a unique delimiter. You can use square brackets, but I like strong visual cues in my logs.
Picture: [USERACTION] + userId + ' ►' + encode(userInput) + '◀'
Or even:
[USERACTION] + userId + ' ( ͡ʘ ͜ʖ ͡ʘ)' + encode(userInput) + '<:::::[]=¤ (▀̿̿Ĺ̯̿̿▀̿ ̿)'
Anything that helps make it obvious that someone has tampered with your logs is good. The louder the better!

Related

How to place the iterated string variable in another string

Quick overview: I am writing a a very simple script using Python and Selenium to view Facebook Metrics for multiple Facebook pages.
I am trying to find a clean way to loop through the pages and output their results (it's only one number that I am collecting).
Here is what I have right now but it is not working.
# Navigate to metrics page
pages = ["page_example_1", "page_example_2", "page_example_3"]
for link in pages:
browser.get(('https://www.facebook.com/{link}/insights/?section=navVideos'))
# Navigate to metrics page
pages = ["page_example_1", "page_example_2", "page_example_3"]
for link in pages:
browser.get('https://www.facebook.com/'+ link + '/insights/?section=navVideos')
its just string concatenation
or if you are so much inclined to use that syntax, have a look at the comment by #heather
It didn't work for you, because you aimed to use Python 3.6's f-strings, but forgot to prepend your string with the f char - crucial for this syntax. E.g. your code should be (only the relevant part):
browser.get(f'https://www.facebook.com/{link}/insights/?sights/?section=navVideos')
Alternatively you could use string formatting (e.g. the established approach before 3.6):
browser.get('https://www.facebook.com/{}/insights/?sights/?section=navVideos'.format(link))
In general, string concatenation - 'string1' + variable + 'string2' - is discouraged in python for performance and readability reasons.
BTW, in your sample code you had brackets around the get()'s argument - it is browser.get((arg)), which essentially turned it to a tuple, and might've caused error in the call. Not sure was it a typo or on purpose, as you can see I and the other responders have removed it.

python lxml xpath AttributeError (NoneType) with correct xpath and usually working

I am trying to migrate a forum to phpbb3 with python/xpath. Although I am pretty new to python and xpath, it is going well. However, I need help with an error.
(The source file has been downloaded and processed with tagsoup.)
Firefox/Firebug show xpath: /html/body/table[5]/tbody/tr[position()>1]/td/a[3]/b
(in my script without tbody)
Here is an abbreviated version of my code:
forumfile="morethread-alte-korken-fruchtweinkeller-89069-6046822-0.html"
XPOSTS = "/html/body/table[5]/tr[position()>1]"
t = etree.parse(forumfile)
allposts = t.xpath(XPOSTS)
XUSER = "td[1]/a[3]/b"
XREG = "td/span"
XTIME = "td[2]/table/tr/td[1]/span"
XTEXT = "td[2]/p"
XSIG = "td[2]/i"
XAVAT = "td/img[last()]"
XPOSTITEL = "/html/body/table[3]/tr/td/table/tr/td/div/h3"
XSUBF = "/html/body/table[3]/tr/td/table/tr/td/div/strong[position()=1]"
for p in allposts:
unreg=0
username = None
username = p.find(XUSER).text #this is where it goes haywire
When the loop hits user "tompson" / position()=11 at the end of the file, I get
AttributeError: 'NoneType' object has no attribute 'text'
I've tried a lot of try except else finallys, but they weren't helpful.
I am getting much more information later in the script such as date of post, date of user registry, the url and attributes of the avatar, the content of the post...
The script works for hundreds of other files/sites of this forum.
This is no encode/decode problem. And it is not "limited" to the XUSER part. I tried to "hardcode" the username, then the date of registry will fail. If I skip those, the text of the post (code see below) will fail...
#text of getpost
text = etree.tostring(p.find(XTEXT),pretty_print=True)
Now, this whole error would make sense if my xpath would be wrong. However, all the other files and the first numbers of users in this file work. it is only this "one" at position()=11
Is position() uncapable of going >10 ? I don't think so?
Am I missing something?
Question answered!
I have found the answer...
I must have been very tired when I tried to fix it and came here to ask for help. I did not see something quite obvious...
The way I posted my problem, it was not visible either.
the HTML I downloaded and processed with tagsoup had an additional tag at position 11... this was not visible on the website and screwed with my xpath
(It probably is crappy html generated by the forum in combination with tagsoups attempt to make it parseable)
out of >20000 files less than 20 are afflicted, this one here just happened to be the first...
additionally sometimes the information is in table[4], other times in table[5]. I did account for this and wrote a function that will determine the correct table. Although I tested the function a LOT and thought it working correctly (hence did not inlcude it above), it did not.
So I made a better xpath:
'/html/body/table[tr/td[#width="20%"]]/tr[position()>1]'
and, although this is not related, I ran into another problem with unxpected encoding in the html file (not utf-8) which was fixed by adding:
parser = etree.XMLParser(encoding='ISO-8859-15')
t = etree.parse(forumfile, parser)
I am now confident that after adjusting for strange additional and multiple , and tags my code will work on all files...
Still I will be looking into lxml.html, as I mentioned in the comment, I have never used it before, but if it is more robust and may allow for using the files without tagsoup, it might be a better fit and save me extensive try/except statements and loops to fix the few files screwing with my current script...

how to implement python spell checker using google's "did you mean?"

I'm looking for a way to make a function in python where you pass in a string and it returns whether it's spelled correctly. I don't want to check against a dictionary. Instead, I want it to check Google's spelling suggestions. That way, celebrity names and other various proper nouns will count as being spelled correctly.
Here's where I'm at so far. It works most of the time, but it messes up with some celebrity names. For example, things like "cee lo green" or "posner" get marked as incorrect.
import httplib
import xml.dom.minidom
data = """
<spellrequest textalreadyclipped="0" ignoredups="0" ignoredigits="1" ignoreallcaps="1">
<text> %s </text>
</spellrequest>
"""
def spellCheck(word_to_spell):
con = httplib.HTTPSConnection("www.google.com")
con.request("POST", "/tbproxy/spell?lang=en", data % word_to_spell)
response = con.getresponse()
dom = xml.dom.minidom.parseString(response.read())
dom_data = dom.getElementsByTagName('spellresult')[0]
if dom_data.childNodes:
for child_node in dom_data.childNodes:
result = child_node.firstChild.data.split()
for word in result:
if word_to_spell.upper() == word.upper():
return True;
return False;
else:
return True;
Peter Norvig tells you how implement spell checker in Python.
Rather than sticking to Mr. Google, try out other big fellows.
If you really want to stick with search engines which count page requests, Yahoo and Bing are providing some excellent features. Yahoo is directly providing spell checking services using YQL tables (Free: 5000 request/day and non-commercial).
You have good number of Python API's which are capable to do a lot similar magic including on nouns that you mentioned (sometimes may turn around - after all its somewhere based upon probability)
So, in the second case, you got a good list (totally free)
GNU - Aspell (Even got python bindings)
PyEnchant
Whoosh (It does a lot more than spell checking but I think it has some edge on it.)
I hope they should give you a clear idea of how things work.
Actually spell checking involves very complex mechanisms in the areas of Machine learning, AI, NLP.. etc a lot more. So, companies like Google/ Yahoo don't really offer their API entirely free.

Python: Matching & Stripping port number from socket data

I have data coming in to a python server via a socket. Within this data is the string '<port>80</port>' or which ever port is being used.
I wish to extract the port number into a variable. The data coming in is not XML, I just used the tag approach to identifying data for future XML use if needed. I do not wish to use an XML python library, but simply use something like regexp and strings.
What would you recommend is the best way to match and strip this data?
I am currently using this code with no luck:
p = re.compile('<port>\w</port>')
m = p.search(data)
print m
Thank you :)
Regex can't parse XML and shouldn't be used to parse fake XML. You should do one of
Use a serialization method that is nicer to work with to start with, such as JSON or an ini file with the ConfigParser module.
Really use XML and not something that just sort of looks like XML and really parse it with something like lxml.etree.
Just store the number in a file if this is the entirety of your configuration. This solution isn't really easier than just using JSON or something, but it's better than the current one.
Implementing a bad solution now for future needs that you have no way of defining or accurately predicting is always a bad approach. You will be kept busy enough trying to write and maintain software now that there is no good reason to try to satisfy unknown future needs. I have never seen a case where "I'll put this in for later" has led to less headache later on, especially when I put it in by doing something completely wrong. YAGNI!
As to what's wrong with your snippet other than using an entirely wrong approach, angled brackets have a meaning in regex.
Though Mike Graham is correct, using regex for xml is not 'recommended', the following will work:
(I have defined searchType as 'd' for numerals)
searchStr = 'port'
if searchType == 'd':
retPattern = '(<%s>)(\d+)(</%s>)'
else:
retPattern = '(<%s>)(.+?)(</%s>)'
searchPattern = re.compile(retPattern % (searchStr, searchStr))
found = searchPattern.search(searchStr)
retVal = found.group(2)
(note the complete lack of error checking, that is left as an exercise for the user)

How can I use Microsoft Word's spelling/grammar checker programmatically?

I want to process a medium to large number of text snippets using a spelling/grammar checker to get a rough approximation and ranking of their "quality." Speed is not really of concern either, so I think the easiest way is to write a script that passes off the snippets to Microsoft Word (2007) and runs its spelling and grammar checker on them.
Is there a way to do this from a script (specifically, Python)? What is a good resource for learning about controlling Word programmatically?
If not, I suppose I can try something from Open Source Grammar Checker (SO).
Update
In response to Chris' answer, is there at least a way to a) open a file (containing the snippet(s)), b) run a VBA script from inside Word that calls the spelling and grammar checker, and c) return some indication of the "score" of the snippet(s)?
Update 2
I've added an answer which seems to work, but if anyone has other suggestions I'll keep this question open for some time.
It took some digging, but I think I found a useful solution. Following the advice at http://www.nabble.com/Edit-a-Word-document-programmatically-td19974320.html I'm using the win32com module (if the SourceForge link doesn't work, according to this Stack Overflow answer you can use pip to get the module), which allows access to Word's COM objects. The following code demonstrates this nicely:
import win32com.client, os
wdDoNotSaveChanges = 0
path = os.path.abspath('snippet.txt')
snippet = 'Jon Skeet lieks ponies. I can haz reputashunz? '
snippet += 'This is a correct sentence.'
file = open(path, 'w')
file.write(snippet)
file.close()
app = win32com.client.gencache.EnsureDispatch('Word.Application')
doc = app.Documents.Open(path)
print "Grammar: %d" % (doc.GrammaticalErrors.Count,)
print "Spelling: %d" % (doc.SpellingErrors.Count,)
app.Quit(wdDoNotSaveChanges)
which produces
Grammar: 2
Spelling: 3
which match the results when invoking the check manually from Word.

Categories

Resources