I've got an exercise where I have two text, "left" and "right".
I need to make a function to make them side by side given a width as parameter and all of this using itertools and textwrap.
Here's my code :
import textwrap
import itertools
def sidebyside(left,right,width=79):
width = round((width+1)/2)
leftwrapped = textwrap.wrap(left,width = width-1)
for i in range(0,len(leftwrapped)):
leftwrapped[i] = leftwrapped[i].ljust(width)
rightwrapped = textwrap.wrap(right,width = width-1)
for i in range(0,len(rightwrapped)):
rightwrapped[i] = rightwrapped[i].ljust(width)
pipes = ["|"]*max(len(leftwrapped),len(rightwrapped))
paragraph = itertools.zip_longest(leftwrapped,pipes,rightwrapped, fillvalue="".ljust(width))
result = ""
for a in paragraph:
result = result + a[0] + a[1] + a[2] + "\n"
return(result)
Here's a sample of "left" & "right" :
left = (
"Lorem ipsum dolor sit amet, consectetur adipiscing elit. "
"Sed non risus. "
"Suspendisse lectus tortor, dignissim sit amet, "
"adipiscing nec, utilisez sed sin dolor."
)
right = (
"Morbi venenatis, felis nec pretium euismod, "
"est mauris finibus risus, consectetur laoreet "
"sem enim sed arcu. Maecenas sit amet eleifend sem. "
"Nullam ac libero metus. Praesent ac finibus nulla, vitae molestie dolor."
" Aliquam vestibulum viverra nisl, id porta mi viverra hendrerit."
" Ut et porta augue, et convallis ante."
)
My problem is that I'm getting some spacing issues, i.e: for the first line, for a given length of 20, I have this output :
'Lorem |Morbi ven '
But I need this output :
'Lorem |Morbi ven'
Found it, my round function was not good, I had to make two width, the first one being the round of the division and a second one being the result of width - round(width/2).
Talk is cheap, code is better :
from itertools import zip_longest
import textwrap
def sidebyside(left, right, width=79):
mid_width = (width - (1 - width%2)) // 2
return "\n".join(
f"{l.ljust(mid_width)}|{r.ljust(mid_width)}"
for l, r in zip_longest(
*map(lambda t: textwrap.wrap("".join(t), mid_width), (left, right)),
fillvalue=""
)
)
The goal of the original post was to solve a programming puzzle that required the sidebyside() method be implemented using only itertools.zip_longest() and textwrap.wrap().
This method works but has some disadvantages. For instance, it does not support line breaks, within the left and right texts (all spacing is removed before the texts are reflowed to be side-by-side).
Since this is a really useful method, I wrote an improved version of it, see the Gist itself for more information.
For example:
# some random text
LOREM = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
# split into paragraphs
LOREM_PARA = LOREM.replace(". ", ".\n\n").split("\n")
# arbitrarily truncate the first two lines for text B
TEXT_A = LOREM_PARA[:]
TEXT_B = LOREM_PARA[2:]
# reflow as side-by-side
print(side_by_side(TEXT_A, TEXT_B, width=50, as_string=True))
will output:
Lorem ipsum dolor sit | Ut enim ad minim
amet, consectetur | veniam, quis nostrud
adipiscing elit, sed do | exercitation ullamco
eiusmod tempor | laboris nisi ut aliquip
incididunt ut labore et | ex ea commodo
dolore magna aliqua. | consequat.
|
Ut enim ad minim | Duis aute irure dolor
veniam, quis nostrud | in reprehenderit in
exercitation ullamco | voluptate velit esse
laboris nisi ut aliquip | cillum dolore eu fugiat
ex ea commodo | nulla pariatur.
consequat. |
| Excepteur sint occaecat
Duis aute irure dolor | cupidatat non proident,
in reprehenderit in | sunt in culpa qui
voluptate velit esse | officia deserunt mollit
cillum dolore eu fugiat | anim id est laborum.
nulla pariatur. |
|
Excepteur sint occaecat |
cupidatat non proident, |
sunt in culpa qui |
officia deserunt mollit |
anim id est laborum. |
Related
I have tried many methods but it hasn't worked for me. I want to split a text files lines into multiple chunks. Specifically 50 lines per chunk.
Like this [['Line1', 'Line2' -- up to 50] and so on.
data.txt (example):
Line2
Line2
Line3
Line4
Line5
Line6
Line7
Line8
Python code:
with open('data.txt', 'r') as file:
sample = file.readlines()
chunks = []
for i in range(0, len(sample), 3): # replace 3 with 50 in your case
chunks.append(sample[i:i+3]) # replace 3 with 50 in your case
chunks (in my example, chunks of 3 lines):
[['Line1\n', 'Line2\n', 'Line3\n'], ['Line4\n', 'Line5\n', 'Line6\n'], ['Line7\n', 'Line8']]
You can apply the string.rstrip('\n') method on those lines to remove the \n at the end.
Alternative:
Without reading the whole file in memory (better):
chunks = []
with open('data.txt', 'r') as file:
while True:
chunk = []
for i in range(3): # replace 3 with 50 in your case
line = file.readline()
if not line:
break
chunk.append(line)
# or 'chunk.append(line.rstrip('\n')) to remove the '\n' at the ends
if not chunk:
break
chunks.append(chunk)
print(chunks)
Produces same result
A good way to do it would be to create a generic generator function that could break any sequence up into chunks of any size. Here's what I mean:
from itertools import zip_longest
def grouper(n, iterable): # Generator function.
"s -> (s0, s1, ...sn-1), (sn, sn+1, ...s2n-1), (s2n, s2n+1, ...s3n-1), ..."
FILLER = object() # Unique object
for group in zip_longest(*([iter(iterable)]*n), fillvalue=FILLER):
limit = group.index(FILLER) if group[-1] is FILLER else len(group)
yield group[:limit] # Sliced to remove any filler.
if __name__ == '__main__':
from pprint import pprint
with open('lorem ipsum.txt') as inf:
for chunk in grouper(3, inf):
pprint(chunk, width=90)
If the lorem ipsum.txt file contained these lines:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis ut volutpat sem.
In felis nibh, efficitur id orci tristique, sollicitudin rhoncus nibh. In
elementum suscipit est, et varius mi aliquam ac. Duis fringilla neque urna,
dapibus volutpat ex ullamcorper eget. Duis in mauris vitae neque porttitor
facilisis. Nulla ornare leo ac nibh facilisis, in feugiat eros accumsan.
Suspendisse elementum elementum libero, sed tempor ex sollicitudin ac. Cras
pharetra, neque eu porttitor mattis, odio quam interdum diam, quis aliquam ex
arcu non nisl. Duis consequat lorem metus. Mauris vitae ex ante. Duis vehicula.
The result will be the following chunks each composed of 3 lines or less:
('Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis ut volutpat sem.\n',
'In felis nibh, efficitur id orci tristique, sollicitudin rhoncus nibh. In\n',
'elementum suscipit est, et varius mi aliquam ac. Duis fringilla neque urna,\n')
('dapibus volutpat ex ullamcorper eget. Duis in mauris vitae neque porttitor\n',
'facilisis. Nulla ornare leo ac nibh facilisis, in feugiat eros accumsan.\n',
'Suspendisse elementum elementum libero, sed tempor ex sollicitudin ac. Cras\n')
('pharetra, neque eu porttitor mattis, odio quam interdum diam, quis aliquam ex\n',
'arcu non nisl. Duis consequat lorem metus. Mauris vitae ex ante. Duis vehicula.\n')
Update
If you want to remove the newline characters from the end of the lines of the file, you could do it with the same generic grouper() function by passing it a generator expression to preprocess the lines being read without needing to read them all into memory first:
if __name__ == '__main__':
from pprint import pprint
with open('lorem ipsum.txt') as inf:
lines = (line.rstrip() for line in inf) # Generator expr - cuz outer parentheses.
for chunk in grouper(3, lines):
pprint(chunk, width=90)
Output using generator expression:
('Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis ut volutpat sem.',
'In felis nibh, efficitur id orci tristique, sollicitudin rhoncus nibh. In',
'elementum suscipit est, et varius mi aliquam ac. Duis fringilla neque urna,')
('dapibus volutpat ex ullamcorper eget. Duis in mauris vitae neque porttitor',
'facilisis. Nulla ornare leo ac nibh facilisis, in feugiat eros accumsan.',
'Suspendisse elementum elementum libero, sed tempor ex sollicitudin ac. Cras')
('pharetra, neque eu porttitor mattis, odio quam interdum diam, quis aliquam ex',
'arcu non nisl. Duis consequat lorem metus. Mauris vitae ex ante. Duis vehicula.')
You can split the text by each newline using the str.splitlines() method. Then, using a list comprehension, you can use list slices to slice the list at increments of the chunk_size (50 in your case). Below, I used 3 as the chunk_size variable, but you can replace that with 50:
text = '''Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate
velit esse cillum dolore eu fugiat nulla pariatur.
Excepteur sint occaecat cupidatat non proident,
sunt in culpa qui officia deserunt mollit anim id est laborum.'''
lines = text.splitlines()
chunk_size = 3
chunks = [lines[i: i + chunk_size] for i in range(0, len(lines), chunk_size)]
print(chunks)
Output:
[['Lorem ipsum dolor sit amet, consectetur adipiscing elit,', 'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.', 'Ut enim ad minim veniam,'],
['quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.', 'Duis aute irure dolor in reprehenderit in voluptate', 'velit esse cillum dolore eu fugiat nulla pariatur.'],
['Excepteur sint occaecat cupidatat non proident,', 'sunt in culpa qui officia deserunt mollit anim id est laborum.']]
I want to print the abstract of a paper in the middle of Terminal screen of linux. The abstract is a continues long paragraph. I tried:
print(colored(text.center(80), 'blue'))
but since the string is long it still occupy the whole width of screen, while I want to justify the text between say columns 10 to 70 ( for 80 columns screen)
You can use textwrap module:
import textwrap
abstract = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
abstract = "\n".join(textwrap.wrap(abstract, 60)) # wrap at 60 characters
print(textwrap.indent(abstract, " "*10)) # indent with 10 spaces
I am trying to create a program that simulates word wrapping text found in programs like Word or Notepad. If I have a long text, I would like to print out 64 characters (or less) per line, followed by a newline return, without truncating words. Using Windows 10, PyCharm 2018.2.4 and Python 3.6, I've tried the following code:
long_str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit," \
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." \
"Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris" \
"nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in" \
"reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur." \
"Excepteur sint occaecat cupidatat non proident, sunt in culpa qui" \
"officia deserunt mollit anim id est laborum."
concat_str = long_str[:64] # The first 64 characters
rest_str = long_str[65:] # The rest of the string
rest_str_len = len(rest_str)
while rest_str_len > 64:
print(concat_str.lstrip() + " (" + str(len(concat_str)) + ")" + "\n")
concat_str = rest_str[:64]
rest_str = rest_str[65:]
rest_str_len = len(rest_str)
print(concat_str.lstrip() + " (" + str(len(concat_str)) + ")" + "\n")
print(rest_str.lstrip() + " (" + str(len(rest_str)) + ")")
This is so close, but there are two problems. First, the code truncates off letters at the end or beginning of lines, such as the following output:
# I've added the total len() at the end of each line just to check-sum.
'Lorem ipsum dolor sit amet, consectetur adipiscing elit,sed do e (64)'
'usmod tempor incididunt ut labore et dolore magna aliqua. Ut enim (64)'
'ad minim veniam, quis nostrud exercitation ullamco laborisnisi u (64)'
'aliquip ex ea commodo consequat. Duis aute irure dolor inrepreh (64)'
'nderit in voluptate velit esse cillum dolore eu fugiat nulla par (64)'
'atur. Excepteur sint occaecat cupidatat non proident, sunt in cul (64)'
'a quiofficia deserunt mollit anim id est laborum. (49)'
The second problem is that I need the code to print a newline only after a whole word (or punctuation), instead of chopping up the word at 64 characters.
Use textwrap.wrap:
import textwrap
long_str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit," \
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua." \
"Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris" \
"nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in" \
"reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur." \
"Excepteur sint occaecat cupidatat non proident, sunt in culpa qui" \
"officia deserunt mollit anim id est laborum."
lines = textwrap.wrap(long_str, 64, break_long_words=False)
print('\n'.join(lines))
This takes long string and splits it into lines of a particular width. Also, set break_long_words to False to prevent splitting of words.
I have a 1000 character long text string and I want to split this text in chunks smaller than 100 characters without splitting a whole word (99 characters are fine but 100 not). The wrapping/splitting should only be made on whitespaces:
Example:
text = "... this is a test , and so on..."
^
#position: 100
should be splitted to:
newlist = ['... this is a test ,', ' and so on...', ...]
I want to get a list newlist of the text splitted properly into readable (not word-cropped) chunks. How would you do this?
Use the textwrap module's wrap function. The below example splits the lines 10 characters wide:
In [1]: import textwrap
In [2]: textwrap.wrap("... this is a test , and so on...", 10)
Out[2]: ['... this', 'is a test', ', and so', 'on...']
You can use the textwrap module:
In [2]: import textwrap
In [3]: textwrap.wrap("""Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
...: tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
...: quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
...: consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
...: cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
...: proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
""", 40)
Out[3]:
['Lorem ipsum dolor sit amet, consectetur',
'adipisicing elit, sed do eiusmod tempor',
'incididunt ut labore et dolore magna',
'aliqua. Ut enim ad minim veniam, quis',
'nostrud exercitation ullamco laboris',
'nisi ut aliquip ex ea commodo consequat.',
'Duis aute irure dolor in reprehenderit',
'in voluptate velit esse cillum dolore eu',
'fugiat nulla pariatur. Excepteur sint',
'occaecat cupidatat non proident, sunt in',
'culpa qui officia deserunt mollit anim',
'id est laborum.']
Wordwrap like the other guys said, however for an alternative option:
def splitter(s, n):
for start in range(0, len(s), n):
yield s[start:start+n]
data = "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij"
for splitee in splitter(data, 10):
print splitee
When using the html2text python package to convert html to markdown it adds '\n' to the text. I also see this behaviour when trying the demo at http://www.aaronsw.com/2002/html2text/
Is there any way to turn this off? Of course I can remove them myself, but there might be occurrences of '\n' in the original text which I don't want to remove.
html2text('Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.')
u'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod\ntempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\nquis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\nconsequat. Duis aute irure dolor in reprehenderit in voluptate velit esse\ncillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non\nproident, sunt in culpa qui officia deserunt mollit anim id est laborum.\n\n'
In the latest version of html2text do this:
import html2text
h = html2text.HTML2Text()
h.body_width = 0
note = h.handle("<p>Hello, <a href='http://earth.google.com/'>world</a>!")
This removes the word wrapping that html2text otherwise does
Looking at the source to html2text.py, it looks like you can disable the wrapping behavior by setting BODY_WIDTH to 0. Something like this:
import html2text
html2text.BODY_WIDTH = 0
text = html2text.html2text('...')
Of course, resetting BODY_WIDTH globally changes the module's behavior. If I had a need to access this functionality, I'd probably seek to patch the module, creating a parameter to html2text() to modify this behavior per-call, and provide this patch back to the author.