Python how to split messages - python

In discord chat there is a limit of 2000 characters per message so is there any way to bypass it?
like example in below code when someone types !ping bot sends a embed message. So is it possible to make it split message after or before a certain line bot hides that messages and gives option to view or click next page or something.
#bot.command(pass_context=True)
async def ping(ctx):
embed=discord.Embed(title="Something Title", description="something anything goes here")
await bot.say(embed=embed)

You can split your text yourself or use the easy way as suggestend py #Prashant Godhani here and use the textwrap.wrap() function:
# easy way
import textwrap
import lorem
def sayLongLine(text, wrap_at=200):
for line in textwrap.wrap(text, wrap_at):
# use await bot.say - maybe add a delay if you have max says/second
print(line)
sayLongLine(lorem.paragraph(), 40)
If you'd rather replicate the functionality of the textwrap module yourself you can do so by splitting your text at spaces into words and combining the words until they would overshoot the length you are allowed to use. Put that word in the next sentence, join all current words back together and store it in a list. Loop until done, add last parts if needed and return the list:
# slightly more complex self-made wrapper:
import lorem
print("----------------------")
def sayLongLineSplitted(text,wrap_at=200):
"""Splits text at spaces and joins it to strings that are as long as
possible without overshooting wrap_at.
Returns a list of strings shorter then wrap_at."""
splitted = text.split(" ")
def gimme():
"""Yields sentences of correct lenght."""
len_parts = 0
parts = []
for p in splitted:
len_p = len(p)
if len_parts + len_p < wrap_at:
parts.append(p)
len_parts += len_p + 1
else:
yield ' '.join(parts).strip()
parts = [p]
len_parts = len_p
if parts:
yield ' '.join(parts).strip()
return list(gimme())
for part in sayLongLineSplitted(lorem.paragraph(),40):
print(part)
Output of self-made wrapper:
# 234567890123456789012345678901234567890
Ut velit magnam sed sed. Eius modi
quiquia numquam. Quaerat eius tempora
tempora consectetur etincidunt est. Sit
dolor quaerat quaerat amet voluptatem
dolorem dolore. Sit adipisci non
etincidunt est aliquam etincidunt sit.
Quaerat porro sed sit.
Output of textwrap-example:
# 234567890123456789012345678901234567890
Etincidunt aliquam etincidunt velit
numquam. Quisquam porro labore velit.
Modi modi porro quaerat dolor etincidunt
quisquam. Ut ipsum quiquia non quisquam
magnam ut sit. Voluptatem non non
dolorem. Tempora quaerat neque quaerat
dolorem velit magnam ipsum.

Related

How can I wrap text into a paragraph of x characters without importing any modules?

I have a list of words (lowercase) parsed from an article. I joined them together using .join() with a space into a long string. Punctuation will be treated like words (ie. with spaces before and after).
I want to write this string into a file with at most X characters (in this case, 90 characters) per line, without breaking any words. Each line cannot start with a space or end with a space.
As part of the assignment I am not allowed to import modules, which from my understanding, textwrap would've helped.
I have basically a while loop nested in a for loop that goes through every 90 characters of the string, and firstly checks if it is not a space (ie. in the middle of a word). The while loop would then iterate through the string until it reaches the next space (ie. incorporates the word unto the same line). I then check if this line, minus the leading and trailing whitespaces, is longer than 90 characters, and if it is, the while loop iterates backwards and reaches the character before the word that extends over 90 characters.
x = 0
for i in range(89, len(text), 90):
while text[i] != " ":
i += 1
if len(text[x:i].strip()) > 90:
while text[i - 1] != " ":
i = i - 1
file.write("".join(text[x:i]).strip() + "\n")
x = i
The code works for 90% of the file after comparing with the file with correct outputs. Occasionally there are lines where it would exceed 90 characters without wrapping the extra word into the next line.
EX:
Actual Output on one line (93 chars):
extraordinary thing , but i never read a patent medicine advertisement without being impelled
Expected Output with "impelled" on new line (84 chars + 8 chars):
extraordinary thing , but i never read a patent medicine advertisement without being\nimpelled
Are there better ways to do this? Any suggestions would be appreciated.
You could consider using a "buffer" to hold the data as you build each line to output. As you read each new word check if adding it to the "buffer" would exceed the line length, if it would then you print the "buffer" and then reset the "buffer" starting with the word that couldn't fit in the sentence.
data = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis a risus nisi. Nunc arcu sapien, ornare sit amet pretium id, faucibus et ante. Curabitur cursus iaculis nunc id convallis. Mauris at enim finibus, fermentum est non, fringilla orci. Proin nibh orci, tincidunt sed dolor eget, iaculis sodales justo. Fusce ultrices volutpat sapien, in tincidunt arcu. Vivamus at tincidunt tortor. Sed non cursus turpis. Sed tempor neque ligula, in elementum magna vehicula in. Duis ultricies elementum pellentesque. Pellentesque pharetra nec lorem at finibus. Pellentesque sodales ligula sed quam iaculis semper. Proin vulputate, arcu et laoreet ultrices, orci lacus pellentesque justo, ut pretium arcu odio at tellus. Maecenas sit amet nisi vel elit sagittis tristique ac nec diam. Suspendisse non lacus purus. Sed vulputate finibus facilisis."""
sentence_limit = 40
buffer = ""
for word in data.split():
word_length = len(word)
buffer_length = len(buffer)
if word_length > sentence_limit:
print(f"ERROR: the word '{word}' is longer than the sentence limit of {sentence_limit}")
break
if buffer_length + word_length < sentence_limit:
if buffer:
buffer += " "
buffer += word
else:
print(buffer)
buffer = word
print(buffer)
OUTPUT
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Duis a risus nisi. Nunc
arcu sapien, ornare sit amet pretium id,
faucibus et ante. Curabitur cursus
iaculis nunc id convallis. Mauris at
enim finibus, fermentum est non,
fringilla orci. Proin nibh orci,
tincidunt sed dolor eget, iaculis
sodales justo. Fusce ultrices volutpat
sapien, in tincidunt arcu. Vivamus at
tincidunt tortor. Sed non cursus turpis.
Sed tempor neque ligula, in elementum
magna vehicula in. Duis ultricies
elementum pellentesque. Pellentesque
pharetra nec lorem at finibus.
Pellentesque sodales ligula sed quam
iaculis semper. Proin vulputate, arcu et
laoreet ultrices, orci lacus
pellentesque justo, ut pretium arcu odio
at tellus. Maecenas sit amet nisi vel
elit sagittis tristique ac nec diam.
Suspendisse non lacus purus. Sed
vulputate finibus facilisis.
Using a regular expression:
import re
with open('f0.txt', 'r') as f:
# file must be 1 long single line of text)
text = f.read().rstrip()
for line in re.finditer(r'(.{1,70})(?:$|\s)', text):
print(line.group(1))
To approach another way without regex:
# Constant
J = 70
# output list
out = []
with open('f0.txt', 'r') as f:
# assumes file is 1 long line of text
line = f.read().rstrip()
i = 0
while i+J < len(line):
idx = line.rfind(' ', i, i+J)
if idx != -1:
out.append(line[i:idx])
i = idx+1
else:
out.append(line[i:i+J] + '-')
i += J
out.append(line[i:]) # get ending line portion
for line in out:
print(line)
Here are the file contents (1 long single string):
I have basically a while loop nested in a for loop that goes through every 90 characters of the string, and firstly checks if it is not a space (ie. in the middle of a word). The while loop would then iterate through the string until it reaches the next space (ie. incorporates the word unto the same line). I then check if this line, minus the leading and trailing whitespaces, is longer than 90 characters, and if it is, the while loop iterates backwards and reaches the character before the word that extends over 90 characters.
Output:
I have basically a while loop nested in a for loop that goes through
every 90 characters of the string, and firstly checks if it is not a
space (ie. in the middle of a word). The while loop would then
iterate through the string until it reaches the next space (ie.
incorporates the word unto the same line). I then check if this line,
minus the leading and trailing whitespaces, is longer than 90
characters, and if it is, the while loop iterates backwards and
reaches the character before the word that extends over 90 characters.

How to split string to substrings with given length but not breaking sentences?

I have a string with a large text and need to split it into multiple substrings with length <= N characters (as close to N as it's possible; N is always bigger than the largest sentence), but I also need not to break the sentences.
For example, if I have N = 80 and given text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer in tellus quam. Nam sit amet iaculis lacus, non sagittis nulla. Nam blandit quam eget velit maximus, eu consectetur sapien sodales. Etiam efficitur blandit arcu, quis rhoncus mauris elementum vel.
I want to get list of strings:
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer in tellus quam."
"Nam sit amet iaculis lacus, non sagittis nulla."
"Nam blandit quam eget velit maximus, eu consectetur sapien sodales."
"Etiam efficitur blandit arcu, quis rhoncus mauris elementum vel."
And also I want this to work with English and Russian.
How to achieve this?
The steps I'd take:
Initiate a list to store the lines and a current line variable to store the string of the current line.
Split the paragraph into sentences - this requires you to .split on '.', remove the trailing empty sentence (""), strip leading and trailing whitespace (.strip) and then add the fullstops back.
Loop through these sentences and:
if the sentence can be added onto the current line, add it
otherwise add the current working line string to the list of lines and set the current line string to be the current sentence
So, in Python, something like:
para = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer in tellus quam. Nam sit amet iaculis lacus, non sagittis nulla. Nam blandit quam eget velit maximus, eu consectetur sapien sodales. Etiam efficitur blandit arcu, quis rhoncus mauris elementum vel."
lines = []
line = ''
for sentence in (s.strip()+'.' for s in para.split('.')[:-1]):
if len(line) + len(sentence) + 1 >= 80: #can't fit on that line => start new one
lines.append(line)
line = sentence
else: #can fit on => add a space then this sentence
line += ' ' + sentence
giving lines as:
[
"Lorem ipsum dolor sit amet, consectetur adipiscing elit.Integer in tellus quam.",
"Nam sit amet iaculis lacus, non sagittis nulla.",
"Nam blandit quam eget velit maximus, eu consectetur sapien sodales."
]
There's no built-in for this that I can find, so here's a start. You can make it smarter by checking before and after for where to move the sentences, instead of just before. Length includes spaces, because I'm splitting naïvely instead of with regular expressions or something.
def get_sentences(text, min_length):
sentences = (sentence + ". "
for sentence in text.split(". "))
current_line = ""
for sentence in sentences:
if len(current_line >= min_length):
yield current_line
current_line = sentence
else:
current_line += sentence
yield current_line
It's slow for long lines, but it does the job.

Python .replace() function not working correctly

I'm trying to figure out why the .replace function in python isn't functioning correctly. I have spent the entire day yesterday searching for an answer but alas have not found one.
I'm trying to open and read a file, copy it into a list, count the number of lines in the list and remove all the punctuation (ie , . ! ? etc). I can do everything except remove the punctuation (and I must use the .replace function instead of importing a module).
with open('Small_text_file.txt', 'r') as myFile: #adding lines from file to list
contents = myFile.readlines()
fileList= []
# punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"]
for i in contents:
fileList.append(i.rstrip())
print('The Statistics are:\n','Number of lines:', len(fileList)) #first part of question
for item in fileList:
fileList = item.replace(',', "")
fileList = item.replace('.', "")
print(fileList)
The "Small text file" is:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Vivamus condimentum sagittis lacus? laoreet luctus ligula laoreet ut.
Vestibulum ullamcorper accumsan velit vel vehicula?
Proin tempor lacus arcu. Nunc at elit condimentum, semper nisi et, condimentum mi.
In venenatis blandit nibh at sollicitudin. Vestibulum dapibus mauris at orci maximus pellentesque.
Nullam id elementum ipsum. Suspendisse
Running the code returns the following:
The Statistics are:
Number of lines: 6
Nullam id elementum ipsum Suspendisse
So the code DOES remove the comma and period characters but it also removes the preceding 5 lines of the text and only prints the very last line. What am I doing wrong here?
Use enumerate:
for x, item in enumerate(fileList):
fileList[x] = item.replace(',', "").replace('.', "")
Note: item.replace() returns replaced string which you need to store in the right index of list. enumerate helps you keep track of index while iterating through the list.
It should be
for i,item in enumerate(fileList):
fileList[i] = item.replace(',', "").replace('.', "")
Without enumerate,
for i in range(len(fileList)):
fileList[i] = fileList[i].replace(',', "").replace('.', "")

get words from large file, using low memory in python

I need to iterate over the words in a file. The file could be very big (over 1TB), the lines could be very long (maybe just one line). Words are English, so reasonable in size. So I don't want to load in the whole file or even a whole line.
I have some code that works, but may explode if lines are to long (longer than ~3GB on my machine).
def words(file):
for line in file:
words=re.split("\W+", line)
for w in words:
word=w.lower()
if word != '': yield word
Can you tell be how I can, simply, rewrite this iterator function so that it does not hold more than needed in memory?
Don't read line by line, read in buffered chunks instead:
import re
def words(file, buffersize=2048):
buffer = ''
for chunk in iter(lambda: file.read(buffersize), ''):
words = re.split("\W+", buffer + chunk)
buffer = words.pop() # partial word at end of chunk or empty
for word in (w.lower() for w in words if w):
yield word
if buffer:
yield buffer.lower()
I'm using the callable-and-sentinel version of the iter() function to handle reading from the file until file.read() returns an empty string; I prefer this form over a while loop.
If you are using Python 3.3 or newer, you can use generator delegation here:
def words(file, buffersize=2048):
buffer = ''
for chunk in iter(lambda: file.read(buffersize), ''):
words = re.split("\W+", buffer + chunk)
buffer = words.pop() # partial word at end of chunk or empty
yield from (w.lower() for w in words if w)
if buffer:
yield buffer.lower()
Demo using a small chunk size to demonstrate this all works as expected:
>>> demo = StringIO('''\
... Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque in nulla nec mi laoreet tempus non id nisl. Aliquam dictum justo ut volutpat cursus. Proin dictum nunc eu dictum pulvinar. Vestibulum elementum urna sapien, non commodo felis faucibus id. Curabitur
... ''')
>>> for word in words(demo, 32):
... print word
...
lorem
ipsum
dolor
sit
amet
consectetur
adipiscing
elit
pellentesque
in
nulla
nec
mi
laoreet
tempus
non
id
nisl
aliquam
dictum
justo
ut
volutpat
cursus
proin
dictum
nunc
eu
dictum
pulvinar
vestibulum
elementum
urna
sapien
non
commodo
felis
faucibus
id
curabitur

How would one limit characters per line when printing a raw_input to a text file?

I'm attempting to write a report creating script. Simply put I have the user submit strings via a few raw_input()s. Theses strings are assigned to global variables and when they are finished I need the script to print the string but limit it to only 80chars per line. I've looked at the textwrap module and looked around for anyone else who's asked this. But i've only found people trying to limit the characters being printed within the script from a raw input or from a pre existing file and never trying to print out to a new file. Heres some code that is basically a shorter version of what im trying to do.
Here's the code:
def start():
global file_name
file_name = raw_input("\nPlease Specify a filename:\n>>> ")
print "The filename chosen is: %r" % file_name
create_report()
note_type()
def create_report():
global new_report
new_report = open(file_name, 'w')
print "Report created as: %r" % file_name
new_report.write("Rehearsal Report\n")
note_type()
def note_type():
print "\nPlease select which type of note you wish to make."
print """
1. Type1
2. Print
"""
answer = raw_input("\n>>> ")
if answer in "1 1. type1 Type1 TYPE1":
type1_note()
elif answer in "2 2. print Print PRINT":
print_notes()
else:
print "Unacceptable Response"
note_type()
def type1_note():
print "Please Enter your note:"
global t1note_text
t1note_text = raw_input(">>> ")
print "\nNote Added."
note_type()
def print_notes():
new_report.write("\nType 1: %r" % t1note_text)
new_report.close
print "Printed. Goodbye!"
exit(0)
start()
And Here is my terminal input
---
new-host-4:ism Bean$ python SO_Question.py
Please Specify a filename:
">>> " test3.txt
The filename chosen is: 'test3.txt'
Report created as: 'test3.txt'
Please select which type of note you wish to make.
1. Type1
2. Print
">>> " 1
Please Enter your note:
">>> "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam at dignissim diam. Donec aliquam consectetur pretium. Sed ac sem eu nulla tincidunt accumsan. Praesent vel velit odio. Donec porta mauris ut eros bibendum consequat. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Integer adipiscing nibh in turpis placerat non interdum magna convallis. Phasellus porta mauris at nibh laoreet ac vulputate elit semper.
Note Added.
Please select which type of note you wish to make.
1. Type1
2. Print
">>> "2
Printed. Goodbye!
new-host-4:ism Bean$
The only problem being that when I open the file (test3.txt) the entire paragraph of lorem ipsum is all printed to one line. Like this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam at dignissim diam. Donec aliquam consectetur pretium. Sed ac sem eu nulla tincidunt accumsan. Praesent vel velit odio. Donec porta mauris ut eros bibendum consequat. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Integer adipiscing nibh in turpis placerat non interdum magna convallis. Phasellus porta mauris at nibh laoreet ac vulputate elit semper.
Anybody got any advice to get textwrap to print 80chars per line to the file?
If you don't want to use any additional modules, you could split the value of your user input into 80 character chunks yourself:
def split_input(string, chunk_size):
num_chunks = len(string)/chunk_size
if (len(string) % chunk_size != 0):
num_chunks += 1
output = []
for i in range(0, num_chunks):
output.append(string[chunk_size*i:chunk_size*(i+1)])
return output
Then you could print the output list to a file:
input_chunks = split_input(user_input, 80)
for chunk in input_chunk:
outFile.write(chunk + "\n")
UPDATE:
This version will respect space-separated words:
def split_input(user_string, chunk_size):
output = []
words = user_string.split(" ")
total_length = 0
while (total_length < len(user_string) and len(words) > 0):
line = []
next_word = words[0]
line_len = len(next_word) + 1
while (line_len < chunk_size) and len(words) > 0:
words.pop(0)
line.append(next_word)
if (len(words) > 0):
next_word = words[0]
line_len += len(next_word) + 1
line = " ".join(line)
output.append(line)
total_length += len(line)
return output
In python 3, you can use textwrap.fill to print 80 characters lines :
import textwrap
print (textwrap.fill(your_text, width=80))
see https://docs.python.org/3.6/library/textwrap.html
You can try and use the Textwrap module:
from textwrap import TextWrapper
def print_notes(t1note_text):
wrapper = TextWrapper(width=80)
splittext = "\n".join(wrapper.wrap(t1note_text))
new_report.write("\nType 1: %r" % splittext)
new_report.close
print "Printed. Goodbye!"
exit(0)

Categories

Resources