long text as String in python - python

Hey I would like to declare a String in Python which is a long text (with line breaks and paragraphs). Is this possible?
If I just copy-paste de text into quotations Python only recognizes the first line and I have to manually remove all the line breaks if I want the entire text. If this is possible would it still be possible if the text have quotations ("")?

Use triple quotes :
mytext = """Some text
Some more text
etc...
"""

Surround your string content with """ to indicate a multi-line string.
>>>a = """
...this
...is
...a
...multi-line
...string
..."""
>>> a
'this\nis\na\nmulti-line\nstring\n'

Related

Is there a method to replace text that is not in double or single quotation python

I want to replace text that is not in single or double quotations in python.
def replace2(IN,target,replacement):
#some magic code
print(replace2("hi'hi'",i,d))
hd'hi'
print(replace2('hi"hi"',i,d))
hd'hi'

How do I import lines from a text file without format characters (backslashes/apostrophes) in Python3

First post and newbie to coding. Apologies if this is redundant. I've searched high and low and have yet to come across something specific to this particular issue.
I'm designing a choose your own adventure game using Python3. It's a bit heavy on text, so I wanted to keep the dialogue and other story aspects in a separate text file so that I can keep my code as lean and clean as possible. I've tried a few different tactics unsuccessfully, including an attempt at implementing a dictionary in the text file to use in the script like indices. I'm not discounting that tactic, but have moved on and here is where I'm currently at:
def intro():
intro = open("try.txt", "r")
print(intro.readlines(0))
intro()
Where 0 is the first line of text in the text file. I'm hoping that I can select individual lines to grab and display when navigating through the script through prompts, but the output includes square brackets around the bulk of the text, and backslashes before apostrophes. As an example: ['Tensions are rising on the high seas. There\'s trouble about.']
Is there any way to preclude the addition of escape characters and/or the square brackets in the printed output using this method?
Thanks in advance!
The escaped characters you're seeing aren't coming from the file. What's happening is you get a list of strings when you invoke readlines():
>>> f = open("foo.txt", 'r')
>>> f.readlines()
['hello world!\n', 'hello world!\n', 'hello world!\n', 'hello world!\n', 'hello world!\n', 'hello world!\n', 'hello world!\n', 'hello world!\n', 'hello world!\n', 'hello world!\n']
That's what the function is returning when you call it. Then you're invoking repr() on the list, which is intended mostly to display objects as structured code-like text - not for displaying them as regular strings. It's useful in error messages, or log output, and that sort of thing.
To get a single line of text from a file after invoking getlines() - which returns all the lines as a list, you index it like any other list:
print(myfile.getlines()[0])
Using print(...) this way ^, you get the line content without any extra characters.
print(repr(myfile.getlines()[0]))
And, using print(...) this way ^, the quotes and escaped newlines etc. will be displayed:
So there's no file text codecs strangeness happening here. That's how it's supposed to work.
readlines reads all the lines from a file and returns an array that has one element for each line. Its argument is a size_hint, meant to indicate a probable line length; it does not indicate which line to read.
If you want to access a particular element by index you have to use square brackets [0].
repr is used to get a string representation of an object. Since you're working with strings, it's unnecessary as a string's representation is just itself inside quotes.
What you need is most likely:
def intro():
intro = open("try.txt", "r")
print(intro.readlines()[0])
intro()
When you do print(repr(intro.readlines(0))) you're printing the whole list of strings, not just one of the strings. That's why you're seeing brackets and quotes around the string you want to print. I would try something like this
def intro():
intro = open("try.txt", "r")
line_list = intro.readlines()
print(line_list[0])
intro()
For the backslash in front of the apostrophes, you could process each string in the line_list to try to remove it. I'm not totally sure why you're seeing those backslashes though.
def intro():
intro = open("try.txt", "r")
line_list = intro.readlines()
new_line_list = []
for line in line_list:
new_line_list.append(line.replace("\\", ""))
print(new_line_list[0])
intro()
def intro():
intro = open("try.txt", "r")
print(intro.readlines()[0]) # here
intro()

Is there a way to get rid of the \n\n character after converting a PDF with slate?

I am using Slate to convert a PDF to text but when I convert it to a string it prints new line characters \n\n between just about every line. I have tried just about everything to remove it but Python does not seem to recognize that it is there.
I have tried .replace("\n\n", " ") and .split("\n\n") and .splitlines() as well as just about every combination/differentiation of those including windows version of newline \r\n.
I am using Spyder as a compiler but I have tested printing to a text file as well to be sure it wasn't just the console.
def Submit():
MakeDirs()
newlineChar = '\n\n'
global EOD_text
global EODFilname
with open(EODFilename, 'rb') as EODF:
EOD_text = str(slate.PDF(EODF))
EOD_text = EOD_text.replace("\n\n"," ")
print(EOD_text)
Example Output:
["End Of Day Report\n\nFor Sunday, 12/29/2019\n\nDivision Sales\n\nTotal Sales\n\nDivision\n\nOnline Sales\n\nGeneral Information\n\nDay Temp:\n\nNight Weather:\n\nNight Temp:\n\nOpening Mgr:\n\nClosing Mgr:\n\nNotes:\n\nDay Weather:\n\nCategory Sales\n\nCategory/Sub-Category\n\nTotal Sales\n\nConcessions\n\n
In this problem you have a piece of text with the \n\n characters like this:
example_text = 'example text \n\n example text'
Which you would like to remove. The easiest way to do this is:
print(example_text.replace('\n\n', '')
Which works fine for me, but does not for you for some reason. I assume (but cannot verify) that in your text the \n actually contains an escaped \, so it actually says \\n. This for instance could be the case if you're reading a text file which contains the text \n\n. In order to replace these, you use:
print(example_text.replace('\\n\\n', '')
Nathan got it to work with the suggestion to add a second \ in front of each \n. I believe Python was reading the "\n\n" as 2 newline character instead of a text string.

Removing quotes from text files

I need to read a pipe(|)-separated text file.
One of the fields contains a description that may contain double-quotes.
I noticed that all lines that contain a " is missing in the receiving dict.
To avoid this, I tried to read the entire line, and use the string.replace() to remove them, as shown below, but it looks like the presence of those quotes creates problem at the line-reading stage, i.e before the string.replace() method.
The code is below, and the question is 'how to force python not to use any separator and keep the line whole ?".
with open(fileIn) as txtextract:
readlines = csv.reader(txtextract,delimiter="µ")
for line in readlines:
(...)
LI_text = newline[107:155]
LI_text.replace("|","/")
LI_text.replace("\"","") # use of escape char don't work.
Note: I am using version 3.6
You may use regex
In [1]: import re
In [2]: re.sub(r"\"", "", '"remove all "double quotes" from text"')
Out[2]: 'remove all double quotes from text'
In [3]: re.sub(r"(^\"|\"$)", "", '"remove all "only surrounding quotes" from text"')
Out[3]: 'remove all "only surrounding quotes" from text'
or add quote='"' and quoting=csv.QUOTE_MINIMAL options to csv.reader(), like:
with open(fileIn) as txtextract:
readlines = csv.reader(txtextract, delimiter="µ", quote='"', quoting=csv.QUOTE_MINIMAL)
for line in readlines:
(...)
Lesson: method string.replace() does not change the string itself. The modified text must be stored back (string = string.replace() )

Python save text to variable without interpreting it

I am fetching a text block, and save it to a variable
Then i am splitting the text block by blank spaces, get a hotword, and save the word next to the hotword in a new variable.
The word I am trying to save is a Math function in the matlab notation.
Python always interprets the brackets and slashes in the text block before I can even process this
Text block example:
"This is a text with the hotword function x**2(3*x)+3*x"
The text should be splitted by blank spaces and saved to an array, but python always messes up the operators (, ) , / , - and +.
How can I escape a text without knowing what will come?
this line creates the error(twitter api):
textVar= tweet['text']
Your example works fine for me on both python2.7 and python3.5
line = "This is a text with the hotword function x**2(3*x)+3*x"
>>> line.split()
['This', 'is', 'a', 'text', 'with', 'the', 'hotword', 'function', 'x**2(3*x)+3*x']
Try using a raw string:
var1 = r"This is a text with the hotword function x**2(3*x)+3*x"
or for an example that has slashes in it:
var2 = r"This is text with \slashes \n and \t escape sequences that aren't interpreted as such"
Try evaluating var2 and you'll see python puts in additional escape sequences to turn each character into a string literal instead of an escape sequence.

Categories

Resources