Extra line breaks being displayed after SQL storage - python

I'm currently developing this in Python (with web.py) on Windows, and using latest Chrome.
Simple test:
User is shown a basic web form with a component.
When form is submitted, the content of this textarea is placed into a MySql table, unmodified.
Later, the user returns to edit their last submission.
I then present a new form, with the textarea populated directly from the database for modification - HTML is prevented from being processed so tags are displayed.
However, when re-displayed to the user, every line now has an extra (unwanted) line-break between each line.
How can I prevent this?
eg:
Submitted Text:
Line 1
Line 2
When re-displayed, the text looks like:
Line 1
Line 2
I'm aware that this is going to be some kind of CR LF issue but can't quite get to the solution.
I tried a conversion to <br /> but that just displays the <br /> text not an actual line break.
I don't really want to modify the text before putting it into the database either.
But I guess I do need something that would compensate for various OS that display line breaks differently.
I've read through many of the similar questions here, but they are primarily PHP, or talk about nl2br which wouldn't be a solution here anyway.

If you are using print to output the text, append a comma at the end of your statement to remove the new-line character.
e.g.
print 'Some Text',
It may be that a new-line is already in your printed text and doesn't require the extra appended one from print.
If not, try .rstrip('\n') on your string to remove any new-lines.

Related

How to parse and preserve text formatting (Python-Docx)?

I'm using Python-Docx to export all the data from a 500-page Docx file into a spreadsheet using pandas. So far so good except that the process is removing all character styles. I have written the following to preserve superscript, but I can't seem to get it working.
for para in document.paragraphs:
content = para.text
for run in para.runs:
if run.font.superscript:
r.font.superscript = True
r = para.add_run(run.text)
scripture += r.text
My Input text might me, for example:
Genesis 1:1 1 In the beginning God created the heavens and the earth.
But my output into the Xlsx file is:
Genesis 1:1 1 In the beginning God created the heavens and the earth. (Still losing the superscript formatting).
How do I preserve the font.style of each run for export? Perhaps more specifically, how do I get the text formatting from each run to be encoded into the "scripture" string?
Any help is greatly appreciated!
You cannot encode font information in a str object. A str object is a sequence of characters and that's that. It cannot indicate "make these five characters bold and the following three characters italic. There's just no place to put that sort of thing and the str data type is not made for that job.
Font (character-formatting) information must be stored in a container object of some sort. In Word, that's a run. It HTML it can be a <span> element. If you want character-formatting in your spreadsheet, you'll need to know how character formatting is stored in the target format (Excel maybe) and then apply it to text in that export format on a run-by-run basis.
There are some other problems with your code you should be aware of:
the r in r.font.superscript = True is being used before being defined. The r = para.add_run(run.text) line would need to appear prior to that line to avoid problems. I wouldn't bother here because it's not actually doing anything here it turns out, but names need to be defined before use.
You are doubling the size of the source paragraph by adding runs to it. This part actually contributes nothing because you then call run.text which as we mentioned cannot contain any character-formatting information and so it gets stripped back out.
The same result as your current code can be achieved by this:
scripture = "".join(p.text for p in document.paragraphs)
but I think you'll at approach like:
Parse out bits that go in separate cells
Within the text that goes into a single cell, write a "rich-text" cell something like that described here for XlsxWriter: https://xlsxwriter.readthedocs.io/example_rich_strings.html

Detecting line breaks within Python/Django textarea fields

I have a textarea field in Django from which I want to detect line breaks. I can see that the line breaks are stored as reloading the page with the user input shows the breaks that the user inputted, but I can't detect them in Django/Python.
When I look at the output from PostgreSQL I can see \r characters, but these don't seem to show up within the Python environment.
I'd like to do something like this:
text_blocks = text_area.split("\r")
But this doesn't work and I suspect is naive.
How can I detect these seemingly invisible line breaks within Python/Django?
Try splitlines(), this is a built-in string method of python:
text_blocks = text_area.splitlines()
From the docs:
Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is
given and true.

Macro to get document contents preserving hyphenation in libreoffice writer

I need to access the text in a LibreOffice document.
The document has automatic hyphenation,
and I need to know the hyphen positions as they are displayed on screen.
The following code returns clear text without automatic hyphens:
XSCRIPTCONTEXT.getDocument().getText().getString()
This is the documentation I read:
https://wiki.openoffice.org/wiki/Documentation/DevGuide/Text/Working_with_Text_Documents
Also I looked at this extension: https://github.com/voikko/libreoffice-voikko
I also ran the Capitalise.py example under pyCharm remote debugger, but couldn't find any hints.
Automatic hyphens do not actually occur in the text in LibreOffice. Instead, they are displayed as needed. When a format such as PDF is exported, or if the document is printed, then hyphens are shown in the output.
The Hyphenator service is fairly easy to use in macros, and allows a word to be split up according to possible hyphenation positions.
To really determine where hyphens are getting displayed on screen, the following may work:
Traverse the document with a word cursor. Andrew Pitonyak's Macro Document section 7.3.8.5 gives an example of this in Basic.
Move the view cursor to the beginning of each word and check the Y position. For example, if self.oVC is the view cursor, then check the value of self.oVC.getPosition().Y.
Move the cursor to the end of the word, and see if the Y position changed.
If it did, then presumably the word was hyphenated.

Cannot post Python code to the website rosalind.info

I am trying to post a sample solution, written in Python, to rosalind.info.
I tried following their instructions:
To highlight code, add a shebang styled first line :::lexername to your code. Replace lexername with the lexer keyword for the language that you want to be highlighted as shown in the list of Pygments lexers."
However, I can't get it to work.
I have tried setting the first line to:
:::python
:::PythonLexer
#!:::python
#!:::PythonLexer
but it just appears as ordinary text.
It seems your first attempt was correct, but you did not click the 'Submit' button to view your code with the lexer applied.
In order to see the code with syntax highlighting, you must first submit your response. The WYSIWYG editor provided below the markdown box does not perform syntax highlighting. In order to see your code with proper highlighting you would type something like the following into the box.
:::python
print "Hello World"
which will look something like
print "Hello World"
once you click the 'Submit' button and view your response. You will have the option to edit your submission if you want to change things later.
Joshua's answer has linked you to the place where you can determine which lexer name you want to use. Simply choose type the corresponding 'short name' for the highlighting you would like to apply.
You could try
#!:::python3
Source: http://pygments.org/docs/lexers/
Two ways that worked for me were:
Indent the whole code block with 4 spaces. Make the first indented line either
#! python or
:::python.
The first version also adds line numbers to the formatted code.
Instead, you can also surround the code block with lines containing only triple backticks ```, in which case you don't need to indent the code with spaces. As in the previous case, make the first line after the opening backtick line either
#! python or
:::python.
The first version adds line numbers, as mentioned above.
As mentioned before by others, you need to "Submit" before you see the fully formatted result.

error parsing XML file using ElementTree.parse

I am using Python's elementtree library to parse an .XML file that I exported from MySQL query browser. When I export the result set to a .XML it includes this really weird character that shows up as the letters "BS" highlighted in a green rounded rectangle in my editor. (see screen shot) Anyway I iterate through the file and try to manually replace the character, but it must not be matching because after I do this:
for lines in file:
lines.replace("<Weird Char>", "").strip();
I get an error from the parse method. However if I replace the character manually in wordpad/notepad etc... the parse call works correctly. I am looking for a way to parse out the character without having to do it manually.
any help would be great: I included two screen shots, one of how the character appears in my editor, and another how it appears in Chrome.
Thanks
EDIT: You will probably have to zoom in on the images, sorry.
The backspace character is not a valid XML character and needs to be escaped (). I'm surprised MySQL is not doing that here, but I'm not familiar with MySQL. You can also check your data and clean it up with an update statement to get rid of that character if it is not valid data for the table.
As far as parsing it out in python, this should work:
lines.replace("\b", "")

Categories

Resources