Jython code for deleting spaces from Text file - python

I am trying to write a jython code for deleting spaces from Text file.I have a following scenario.
I have a text file like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
Here all lines are single lines.
But what i want is like
STARTBUR001 20120416
20120416MES201667 20120320000000000201203210000000002012032200000000020120323000000000201203240000000002012032600000000020120327000000000201203280000000002012032900000000020120330000000000
20120416MES202566 2012030500000000020120306000000000201203070000000002012030800000000020120309000000000201203100000000002012031100000000020120312000000000201203130000000002012031400000000020
20120416MES275921 20120305000000000201203060000000002012030700000000020120308000000000201203090000000002012031000000000020120311000000000201203120000000002012031300000000020120314000000000
END 0000000202
So in all i want to start checking from second line till i encounter END and delete all spaces at tyhe end of each line.
Can someone guide me for writing this code??
tried like:
srcfile=open('d:/BUR001.txt','r')
trgtfile=open('d:/BUR002.txt','w')
readfile=srcfile.readline()
while readfile:
trgtfile.write(readfile.replace('\s',''))
readfile=srcfile.readline()
srcfile.close()
trgtfile.close()
Thanks,
Mahesh

You can use fact that those special lines starts with special values:
line = srcfile.readline()
while line:
line2 = line
if not line2.startswith('START') and not line2.startswith('END'):
line2 = line2.replace(' ','')
trgtfile.write(line2)
line = srcfile.readline()
Also note that with readline() result strings ends with \n (or are empty at end of input file), and this code removes all spaces from the line, not only those at end of the line.
If I understood your example all you want is to remove empty lines, so instead of reading file line by line read it at once:
content = srcfile.read()
and then remove empty lines from content:
while '\n\n' in content:
content = content.replace('\n\n', '\n')

Related

Load a text file paragraph into a string without libraries

sorry if this question may look a bit dumb for some of you but i'm totally a beginner at programming in Python so i'm quite bad and got a still got a lot to learn.
So basically I have this long text file separated by paragraphs, sometimes the newline can be double or triple to make the task more hard for us so i added a little check and looks like it's working fine so i have a variable called "paragraph" that tells me in which paragraph i am currently.
Now basically i need to scan this text file and search for some sequences of words in it but the newline character is the worst enemy here, for example if i have the string = "dummy text" and i'm looking into this:
"random questions about files with a dummy
text and strings
hey look a new paragraph here"
As you can see there is a newline between dummy and text so reading the file line by line doesn't work. So i was wondering to load directly the entire paragraph to a string so this way i can even remove punctuation and stuff more easly and check directly if those sequences of words are contained in it.
All this must be done without libraries.
However my piece of code of paragraph counter works while the file is being read, so if uploading a whole paragraph in a string is possible i should basically use something like "".join until the paragraph increases by 1 because we're on the next paragraph? Any idea?
This should do the trick. It is very short and elegant:
with open('dummy text.txt') as file:
data = file.read().replace('\n', '')
print(data)#prints out the file
The output is:
"random questions about files with a dummy text and strings hey look a new paragraph here"
I think you do not need to think it in a difficult way. Here is a very commonly used pattern for this kind of problems.
paragraphs = []
lines = []
for line in open('text.txt'):
if not line.strip(): # empty line
if lines:
paragraphs.append("".join(lines))
lines = []
else:
lines.append(line)
if lines:
paragraphs.append("".join(lines))
If a stripped line is empty, you encounter the second \n and it means that you have to join previous lines to a paragraph.
If you encounter the 3rd \n, you must not join again so remove your previous lines (lines = []). In this way, you will not join the same paragraph again.
To check the last line, try this pattern.
f = open('text.txt')
line0 = f.readline()
while True:
# do what you have to do with the previous line, `line0`
line = f.readline()
if not line: # `line0` was the last line
# do what you have to do with the last line
break
line0 = line
You can strip the newline character. Here is an example from a different problem.
data = open('resources.txt', 'r')
book_list = []
for line in data:
new_line = line.rstrip('\n')
book_list.append(new_line)

List the first words per line from a text file in Python

I need to select the first word on each line and make a list from them from a text file:
I would copy the text but it's the formatting is quite screwed up. will try
All the other text is unnecessary.
I have tried
string=[]
for line in f:
String.append(line.split(None, 1)[0]) # add only first word
from another solution, but it keeps returning a "Index out of bounds" error.
I can get the first word from the first line using string=text.partition(' ')[0]
but I do not know how to repeat this for the other lines.
I am still new to python and to the site, I hope my formatting is bearable! (when opened, I encode the text to accept symbols, like so
wikitxt=open('racinesPrefixesSuffixes.txt', 'r', encoding='utf-8')
could this be the issue?)
The reason it's raising an IndexError is because the specific line is empty.
You can do this:
words = []
for line in f:
if line.strip():
words.append(line.split(maxsplit=1)[0])
Here line.strip() is checking if the line consists of only whitespace. If it does only consist of whitespace, it will simply skip the line.
Or, if you like list comprehension:
words = [line.split(maxsplit=1)[0] for line in f if line.strip()]

Writing to file with unwanted empty lines

I have a piece of code that's removing some unwanted lines from a text file and writing the results to a new one:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write('%s\n' % line)
The original file is single-spaced, but the new file has an empty line between each line of text. How can I lose the empty lines?
You're not removing the newline from the input lines, so you shouldn't be adding one (\n) on output.
Either strip the newlines off the lines you read or don't add new ones as you write it out.
Just do:
f = open('messyParamsList.txt')
g = open('cleanerParamsList.txt','w')
for line in f:
if not line.startswith('W'):
g.write(line)
Every line that you read from original file has \n (new line) character at the end, so do not add another one (right now you are adding one, which means you actually introduce empty lines).
My guess is that the variable "line" already has a newline in it, but you're writing an additional newline with the g.write('%s*\n*' % line)
line has a newline at the end.
Remove the \n from your write, or rstrip line.

Keep only lines that end with "#here" (RegEx, Python)

I have a text file with almost a thousand lines such as:
WorldNews,Current
WorldNews,Current,WorldNews#here',
'WorldNewsPro#here Zebra,Poacher',
'Dock,DS_URLs#here'
Zebra,Poacher,ZebraPoacher#here
Zebra,Dock,ZebraDock#here
Timer33,Timer33#here
Sometimes the line ends without "#here" sometimes it ends with "#here" sometimes it has "#here" in the middle of the line and sometimes the line ends with "#here'"
I want to strip all the lines that do NOT have "#here" in them at all. I tried RegEx:
> (^(#here$))
> [\W](#here)
etc. with no luck.
How should I pull the lines with "#here" so my new file (or the output) has only:
WorldNews,Current,WorldNews#here',
'WorldNewsProfessional52#here
Zebra,Poacher',
'DocuShare,AC_DS_URLs#here'
Zebra,Poacher,ZebraPoacher#here
Zebra,DocuShare,ZebraDocushare#here
XNTimer,XNTimer#here
I was thinking it should read the whole line from start to end and if it has #here anywhere in the line, print it. If not, ignore and read the next line.
Thanks,
Adrian
Maybe this helps: (assuming filename is the name of your input file)
with open(filename) as stream:
for line in stream:
if '#here' in line:
print line
You dont need regex. You can use a string methods to do such simple filtering:
def hasstr( lines, s ):
# a generator expression can filter out the lines
return (line for line in lines if s in line)
# get all lines in the file with #here in them
filtered = hasstr(open(the_file, 'rt'), '#here')
You want the in operator.
for line in sys.stdin:
if '#here' in line:
sys.stdout.write(line)

blank lines in file after sorting content of a text file in python

I have this small script that sorts the content of a text file
# The built-in function `open` opens a file and returns a file object.
# Read mode opens a file for reading only.
try:
f = open("tracks.txt", "r")
try:
# Read the entire contents of a file at once.
# string = f.read()
# OR read one line at a time.
#line = f.readline()
# OR read all the lines into a list.
lines = f.readlines()
lines.sort()
f.close()
f = open('tracks.txt', 'w')
f.writelines(lines) # Write a sequence of strings to a file
finally:
f.close()
except IOError:
pass
the only problem is that the text is displayed at the bottom of the text file everytime it's sortened...
I assume it also sorts the blank lines...anybody knows why?
and maybe can you suggest some tips on how to avoid this happening?
thanks in advance
An "empty" line read from a text file is represented in Python by a string containing only a newline ("\n"). You may also want to avoid lines whose "data" consists only of spaces, tabs, etc ("whitespace"). The str.strip() method lets you detect both cases (a newline is whitespace).
f = open("tracks.txt", "r")
# omit empty lines and lines containing only whitespace
lines = [line for line in f if line.strip()]
f.close()
lines.sort()
# now write the output file
This is a perfect opportunity to do some test-based development (see below). Some observations:
In the example below, I omit the aspect of reading from and writing to a file. That's not essential to this question, in my opinion.
I assume you want to strip trailing newlines and omit blank lines. If not, you'll need to adjust. (But you'll have the framework for asserting/confirming the expected behavior.)
I agree with chryss above that you generally don't need to reflexively wrap things in try blocks in Python. That's an anti-pattern that comes from Java (which forces it), I believe.
Anyway, here's the test:
import unittest
def sort_lines(text):
"""Return text sorted by line, remove empty lines and strip trailing whitespace."""
lines = text.split('\n')
non_empty = [line.rstrip() for line in lines if line.strip()]
non_empty.sort()
return '\n'.join(non_empty)
class SortTest(unittest.TestCase):
def test(self):
data_to_sort = """z some stuff
c some other stuff
d more stuff after blank lines
b another line
a the last line"""
actual = sort_lines(data_to_sort)
expected = """a the last line
b another line
c some other stuff
d more stuff after blank lines
z some stuff"""
self.assertEquals(actual, expected, "no match!")
unittest.main()
The reason it sorts the blank lines is that they are there. A blank line is an empty string followed by \n (or \r\n or \r, depending on the OS). Perfectly sortable.
I should like to note that "try:" nested into a "try:... except" block is a bit ugly, and I'd close the file after reading, for style's sake.

Categories

Resources