rstrip() not working as expected

rstrip() not working as expected - python

I have written following python code. What i am expecting it to do is add a random word from the file "noise" to each line of "raw" and print it to the file "dataset"
#! /usr/bin/python
from random import randint
raw = open("raw_dataset_1", "r")
noise = open("random", "r")
dataset = open("raw_noisy", "w")
lines = noise.readlines()
for line in raw:
a = randint(1, 5449)
addNoise = lines[a-1]
#print a
#print addNoise
noisy = (line + addNoise)
noisy1= noisy.rstrip()
#print noisy1
dataset.write(noisy1)
My expected "dataset" file is :
city mountain sky sun chalk
bay lake sun tree discussions
beach sea sky sun background
But i'm getting:
city mountain sky sun
chalk
bay lake sun tree
discussions
beach sea sky sun
background
Can someone please point out my mistake?

I think you want to do noisy = (line.rstrip("\n") + " " + addNoise)
I tested it and it worked for me.

While reading each line using:
for line in raw:
line contains the newline at the end. You need to remove it.
Try using:
noisy = line.rstrip() + " " + addNoise

Related

Why wont my python print on the same line?

question = "question:"
# QUESTION
f = open('dopeit.rtf', encoding="utf8", errors='ignore')
line = f.readline()
while line:
if question in line:
newline = line.replace('Question: ', '"')
print(newline + '"', end=",")
# use realine() to read next line
line = f.readline()
f.close()
the output is something like this
","Who directed Star Wars?
","Who was the only non Jedi in the original Star Wars trilogy to use a lightsaber?
","What kind of flower was enchanted and dying in Beauty and the Beast?
","Which is the longest movie ever made?
I want it to be like this:
"Who directed Star Wars?","Who was the only non Jedi in the original Star Wars trilogy to use a lightsaber?","What kind of flower was enchanted and dying in Beauty and the Beast?","Which is the longest movie ever made?
So how can I make these changes? I tried using the "end" command but it still seems to bring it to the next line? Am I doing something wrong???

newline = line.replace('Question: ', '"').replace("\n", "")

How to reverse individual lines and words in Python3 specifically working on each paragraph in a text file?

Basically, I have a text file: -
Plants are mainly multi-cellular. Green plants obtain most of their
energy from sunlight via photosynthesis. There are about 320,000
species of plants. Some 260–290 thousand, produce seeds. Green plants
produce oxygen.
Green plants occupy a significant amount of land today. We should
conserve this greenery around us.
I wanted the output to be:-
oxygen. produce plants Green seeds. produce thousand, 260-290 Some
plants. of species 320,000 about are There photosynthesis. via
sunlight from energy their of most obtain plants Green multi-cellular.
mainly are Plants
us. around greenery this conserve should we today. land of amount
significant a occupy plants Green.
I used split() and then used .join() to combine the file, but it ended up reversing the whole thing and not paragraph-wise.

Change open("testp.txt") to open("[path to your file]")
import re
text = open("testp.txt").read()
rtext = ""
for p in re.split("\n", text):
for w in reversed(re.split(" ", p)):
rtext += w + " "
rtext = rtext[:-1] + "\n"
rtext = rtext[:-1]
print(rtext)
Update: this one is so simple:
import re
with open("testp.txt") as f:
print("\n".join(
" ".join(reversed(re.split(" ", p))) for p in re.split("\n", f.read())
))
Update: code without using regex:
with open("testp.txt") as f:
print("\n".join(
" ".join(reversed(p.split())) for p in f.read().splitlines()
))
Note that you can use .split("\n") instead of .splitlines()
The result for all versions is:
Input:
Plants are mainly multi-cellular. Green plants obtain most of their energy from sunlight via photosynthesis. There are about 320,000 species of plants. Some 260–290 thousand, produce seeds. Green plants produce oxygen.
Green plants occupy a significant amount of land today. We should conserve this greenery around us.
Output:
oxygen. produce plants Green seeds. produce thousand, 260-290 Some plants. of species 320,000 about are There photosynthesis. via sunlight from energy their of most obtain plants Green multi-cellular. mainly are Plants
us. around greenery this conserve should we today. land of amount significant a occupy plants Green

Read the file and use splitlines() to separate paragraphs. Then iterate the paragraphs, reversing the words.
with open("input.txt") as f:
read_data = f.read().splitlines()
for p in read_data:
words = p.split()
print(' '.join(reversed(words)))
To read and write to file you can do this:
with open("input.txt", 'r') as f:
read_data = f.read().splitlines()
with open("output.txt", 'w') as fout:
for p in read_data:
words = p.split()
fout.write(' '.join(reversed(words)))
fout.write('\n')

Python Code to Write to file with Left and Right Margins and Fixed Line Length

I am writing a Python program that reads a file and then writes its contents to another one, with added margins. The margins are user-input and the line length must be at most 80 characters.
I wrote a recursive function to handle this. For the most part, it is working. However, the 2 lines before any new paragraph display the indentation that was input for the right side, instead of keeping the left indentation.
Any clues on why this happen?
Here's the code:
left_Margin = 4
right_Margin = 5
# create variable to hold the number of characters to withhold from line_Size
avoid = right_Margin
num_chars = left_Margin
def insertNewlines(i, line_Size):
string_length = len(i) + avoid + right_Margin
if len(i) <= 80 + avoid + left_Margin:
return i.rjust(string_length)
else:
i = i.rjust(len(i)+left_Margin)
return i[:line_Size] + '\n' + ' ' * left_Margin + insertNewlines(i[line_Size:], line_Size)
with open("inputfile.txt", "r") as inputfile:
with open("outputfile.txt", "w") as outputfile:
for line in inputfile:
num_chars += len(line)
string_length = len(line) + left_Margin
line = line.rjust(string_length)
words = line.split()
# check if num of characters is enough
outputfile.write(insertNewlines(line, 80 - avoid - left_Margin))
For input of left_Margin=4 and right_Margin = 5, I expect this:
____Poetry is a form of literature that uses aesthetic and rhythmic
____qualities of language—such as phonaesthetics, sound symbolism, and
____metre—to evoke meanings in addition to, or in place of, the prosai
____c ostensible meaning.
____Poetry has a very long history, dating back to prehistorical ti
____mes with the creation of hunting poetry in Africa, and panegyric an
____d elegiac court poetry was developed extensively throughout the his
____tory of the empires of the Nile, Niger and Volta river valleys.
But The result is:
____Poetry is a form of literature that uses aesthetic and rhythmic
______qualities of language—such as phonaesthetics, sound symbolism, and
______metre—to evoke meanings in addition to, or in place of, the prosai
________c ostensible meaning.
_____Poetry has a very long history, dating back to prehistorical ti
_____mes with the creation of hunting poetry in Africa, and panegyric an
_____d elegiac court poetry was developed extensively throughout the his
_____tory of the empires of the Nile, Niger and Volta river valleys.

This isn't really a good fit for a recursive solution in Python. Below is an imperative/iterative solution of the formatting part of your question (I'm assuming you can take this and write it to a file instead). The code assumes that paragraphs are indicated by two consecutive newlines ('\n\n').
txt = """
Poetry is a form of literature that uses aesthetic and rhythmic qualities of language—such as phonaesthetics, sound symbolism, and metre—to evoke meanings in addition to, or in place of, the prosaic ostensible meaning.
Poetry has a very long history, dating back to prehistorical times with the creation of hunting poetry in Africa, and panegyric and elegiac court poetry was developed extensively throughout the history of the empires of the Nile, Niger and Volta river valleys.
"""
def format_paragraph(paragraph, length, left, right):
"""Format paragraph ``p`` so the line length is at most ``length``
with ``left`` as the number of characters for the left margin,
and similiarly for ``right``.
"""
words = paragraph.split()
lines = []
curline = ' ' * (left - 1) # we add a space before the first word
while words:
word = words.pop(0) # process the next word
# +1 in the next line is for the space.
if len(curline) + 1 + len(word) > length - right:
# line would have been too long, start a new line
lines.append(curline)
curline = ' ' * (left - 1)
curline += " " + word
lines.append(curline)
return '\n'.join(lines)
# we need to work on one paragraph at a time
paragraphs = txt.split('\n\n')
print('0123456789' * 8) # print a ruler..
for paragraph in paragraphs:
print(format_paragraph(paragraph, 80, left=4, right=5))
print() # next paragraph
the output of the above is:
01234567890123456789012345678901234567890123456789012345678901234567890123456789
Poetry is a form of literature that uses aesthetic and rhythmic
qualities of language such as phonaesthetics, sound symbolism, and
metre to evoke meanings in addition to, or in place of, the prosaic
ostensible meaning.
Poetry has a very long history, dating back to prehistorical times with
the creation of hunting poetry in Africa, and panegyric and elegiac
court poetry was developed extensively throughout the history of the
empires of the Nile, Niger and Volta river valleys.

Replace Single Character in a line of a Text file with Python

I have a text file with all of them currently having the same end character (N), which is being used to identify progress the system makes. I want to change the end character to "Y" in case the program ends via an error or other interruptions so that upon restarting the program will search until a line has the end character "N" and begin working from there. Below is my code as well as a sample from the text file.
UPDATED CODE:
def GeoCode():
f = open("geocodeLongLat.txt", "a")
with open("CstoGC.txt",'r') as file:
print("Geocoding...")
new_lines = []
for line in file.readlines():
check = line.split('~')
print(check)
if 'N' in check[-1]:
geolocator = Nominatim()
dot_number, entry_name, PHY_STREET,PHY_CITY,PHY_STATE,PHY_ZIP = check[0],check[1],check[2],check[3],check[4],check[5]
address = PHY_STREET + " " + PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
f.write(dot_number + '\n')
try:
location = geolocator.geocode(address)
f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
except AttributeError:
try:
address = PHY_CITY + " " + PHY_STATE + " " + PHY_ZIP
location = geolocator.geocode(address)
f.write(dot_number + "," + entry_name + "," + str(location.longitude) + "," + str(location.latitude) + "\n")
except AttributeError:
print("Cannot Geocode")
check[-1] = check[-1].replace('N','Y')
new_lines.append('~'.join(check))
with open('CstoGC.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
for line in new_lines:
file.writelines(line)
f.close()
Output:
2967377~DARIN COLE~22112 TWP RD 209~ALVADA~OH~44802~Y
WAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
143608~LARRY A PETERSON & DONNA M PETERSON~W6359 450TH AVE~ELLSWORTH~WI~54011~N
635528~JAMES E WEBB~3926 GREEN ROAD~SPRINGFIELD~TN~37172~N
805496~WAYNE MLADY~22272 135TH ST~CRESCO~IA~52136~N
704996~SAVINA C MUNIZ~814 W LA QUINTA DR~PHARR~TX~78577~N
893169~BINDEWALD MAINTENANCE INC~213 CAMDEN DR~SLIDELL~LA~70459~N
948130~LOGISTICIZE LTD~861 E PERRY ST~PAULDING~OH~45879~N
438760~SMOOTH OPERATORS INC~W8861 CREEK ROAD~DARIEN~WI~53114~N
518872~A B C RELOCATION SERVICES INC~12 BOCKES ROAD~HUDSON~NH~03051~N
576143~E B D ENTERPRISES INC~29 ROY ROCHE DRIVE~WINNIPEG~MB~R3C 2E6~N
968264~BRIAN REDDEMANN~706 WESTGOR STREET~STORDEN~MN~56174-0220~N
721468~QUALITY LOGISTICS INC~645 LEONARD RD~DUNCAN~SC~29334~N
As you can see I am already keeping track of which line I am at just by using x. Should I use something like file.readlines()?
Sample of text document:
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~N
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~N
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~N
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~N
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~N
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~N
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~N
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
Thank you!
Edit: updated code thanks to #idlehands

There are a few ways to do this.
Option #1
My original thought was to use the tell() and seek() method to go back a few steps but it quickly shows that you cannot do this conveniently when you're not opening the file in bytes and definitely not in a for loop of readlines(). You can see the reference threads here:
Is it possible to modify lines in a file in-place?
How to solve "OSError: telling position disabled by next() call"
The investigation led to this piece of code:
with open('file.txt','rb+') as file:
line = file.readline() # initiate the loop
while line: # continue while line is not None
print(line)
check = line.split(b'~')[-1]
if check.startswith(b'N'): # carriage return is expected for each line, strip it
# ... do stuff ... #
file.seek(-len(check), 1) # place the buffer at the check point
file.write(check.replace(b'N', b'Y')) # replace "N" with "Y"
line = file.readline() # read next line
In the first referenced thread one of the answers mentioned this could lead you to potential problems, and directly modifying the bytes on the buffer while reading it is probably considered a bad idea™. A lot of pros probably will scold me for even suggesting it.
Option #2a
(if file size is not horrendously huge)
with open('file.txt','r') as file:
new_lines = []
for line in file.readlines():
check = line.split('~')
if 'N' in check[-1]:
# ... do stuff ... #
check[-1] = check[-1].replace('N','Y')
new_lines.append('~'.join(check))
with open('file.txt','r+') as file: # IMPORTANT to open as 'r+' mode as 'w/w+' will truncate your file!
for line in new_lines:
file.writelines(line)
This approach loads all the lines into memory first, so you do the modification in memory but leave the buffer alone. Then you reload the file and write the lines that were changed. The caveat is that technically you are rewriting the entire file line by line - not just the string N even though it was the only thing changed.
Option #2b
Technically you could open the file as r+ mode from the onset and then after the iterations have completed do this (still within the with block but outside of the loop):
# ... new_lines.append('~'.join(check)) #
file.seek(0)
for line in new_lines:
file.writelines(line)
I'm not sure what distinguishes this from Option #1 since you're still reading and modifying the file in the same go. If someone more proficient in IO/buffer/memory management wants to chime in please do.
The disadvantage for Option 2a/b is that you always end up storing and rewriting the lines in the file even if you are only left with a few lines that needs to be updated from 'N' to 'Y'.
Results (for all solutions):
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~Y
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~Y
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~Y
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~Y
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~Y
And if you were to say, encountered a break at the line starting with 220940, the file would become:
570772~CORPORATE BANK TRANSIT OF KENTUCKY INC~3157 HIGHWAY 64 SUITE 100~EADS~TN~38028~Y
384767~MILLER FARMS TRANS LLC~1103 COURT ST~BEDFORD~IA~50833~Y
986150~R G S TRUCKING LTD~1765 LOMBARDIE DRIVE~QUESNEL~BC~V2J 4A8~Y
1012987~DONALD LARRY KIVETT~4509 LANSBURY RD~GREENSBORO~NC~27406-4509~Y
735308~ALZEY EXPRESS INC~2244 SOUTH GREEN STREET~HENDERSON~KY~42420~Y
870337~RIES FARMS~1613 255TH AVENUE~EARLVILLE~IA~52057~Y
148428~P R MASON & SON LLC~HWY 70 EAST~WILLISTON~NC~28589~Y
220940~TEXAS MOVING CO INC~908 N BOWSER RD~RICHARDSON~TX~75081-2869~N
854042~ARMANDO ORTEGA~6590 CHERIMOYA AVENUE~FONTANA~CA~92337~N
940587~DIAMOND A TRUCKING INC~192285 E COUNTY ROAD 55~HARMON~OK~73832~N
1032455~INTEGRITY EXPRESS LLC~380 OLMSTEAD AVENUE~DEPEW~NY~14043~N
889931~DUNSON INC~33 CR 3581~FLORA VISTA~NM~87415~N
There are pros and cons to these approaches. Try and see which one fits your use case the best.

I would read the entire input file into a list and .pop() the lines off one at a time. In case of an error, append the popped item to the list and write overwrite the input file. This way it will always be up to date and you won't need any other logic.

Continue writing in same line of file

I have opened the file I want to write to using:
data = open('input','a')
using a loop, I want to write some words to the file in the same line. And after every loop iteration I want to add a newline character.
while loop:
for loop:
/* do something */
if some_condition:
data.write(str(tag)+"")
data.write("\n")
My expected output was:
city mountain sky sun
bay lake sun tree
But I'm getting:
city
mountain
sky
sun
bay
lake
sun
tree
How can I change my code to get the expected output? Thanks.

Remove the newline at the end of tag before writing it.
data.write(str(tag).rstrip('\n'))

while loop:
for loop:
/*do something
*/
if some_condition:
data.write(str(tag)+"")
data.write(" ")
In other words, remove the data.write("\n");

Try removing data.write("\n") .

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

rstrip() not working as expected - python

I think you want to do noisy = (line.rstrip("\n") + " " + addNoise) I tested it and it worked for me.

While reading each line using: for line in raw: line contains the newline at the end. You need to remove it. Try using: noisy = line.rstrip() + " " + addNoise

Related

Why wont my python print on the same line?

How to reverse individual lines and words in Python3 specifically working on each paragraph in a text file?

Python Code to Write to file with Left and Right Margins and Fixed Line Length

Replace Single Character in a line of a Text file with Python

Continue writing in same line of file

Categories

Resources