Removing specific text from every line

Removing specific text from every line - python

I have a txt file with this format:
something text1 pm,bla1,bla1
something text2 pm,bla2,bla2
something text3 am,bla3,bla3
something text4 pm,bla4,bla4
and in a new file I want to hold:
bla1,bla1
bla2,bla2
bla3,bla3
bla4,bla4
I have this which holds the first 10 characters for example of every line. Can I transform this or any other idea?
with open('example1.txt', 'r') as input_handle:
with open('example2.txt', 'w') as output_handle:
for line in input_handle:
output_handle.write(line[:10] + '\n')

This is what the csv module was made for.
import csv
reader = csv.reader(open('file.csv'))
for row in reader: print(row[1])
You can then just redirect the output of the file to the new file using your shell, or you can do something like this instead of the last line:
for row in reader:
with open('out.csv','w+') as f:
f.write(row[1]+'\n')

To remove the first ","-separated column from the file:
first, sep, rest = line.partition(",")
if rest: # don't write lines with less than 2 columns
output_handle.write(rest)

If the format is fixed:
with open('example1.txt', 'r') as input_handle:
with open('example2.txt', 'w') as output_handle:
for line in input_handle:
if line: # and maybe some other format check
od = line.split(',', 1)
output_handle.write(od[1] + "\n")

Here is how I would write it.
Python 2.7
import csv
with open('example1.txt', 'rb') as f_in, open('example2.txt', 'wb') as f_out:
writer = csv.writer(f_out)
for row in csv.reader(f_in):
writer.write(row[-2:]) # keeps the last two columns
Python 3.x (note the differences in arguments to open)
import csv
with open('example1.txt', 'r', newline='') as f_in:
with open('example2.txt', 'w', newline='') as f_out:
writer = csv.writer(f_out)
for row in csv.reader(f_in):
writer.write(row[-2:]) # keeps the last two columns

Try:
output_handle.write(line.split(",", 1)[1])
From the docs:
str.split([sep[, maxsplit]])
Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements).

Related

Sorting names in a text file, writing results to another text file

I have a csv file containing a few names that are written in one line seperated by commas but no spaces ex. "maho,baba,fika,anst,koka,root". What i would like to do is to sort these names alphabetically and write them to a new text file so the result becomes like this:
anst
baba
fika
etc.
This is my attempt at it which did not work..
names = list()
filename = 'users.csv'
with open (filename) as fin:
for line in fin:
names.append(line.strip())
names.sort()
print(names)
filename = 'names_sorted1.txt'
with open (filename, 'w') as fout:
for name in names:
fout.write(name + '\n')

You are trying to sort names, which will only contain one string: the entire chunk of comma-separated text. What you need is a way to separate it into a list of individual names, which can be done with the split method:
in_filename = 'users.csv'
with open(in_filename ) as fin:
names = sorted(fin.read().strip().split(','))
Then, we can use the join method to combine the list into one long string again, where each element from the list is separated from the next by '\n':
out_filename = 'names_sorted1.txt'
with open(out_filename , 'w') as fout:
fout.write('\n'.join(names) + '\n')

You can use this oneliner:
with open("names.csv") as f, open("new.csv", "w") as fw:
[fw.write(x+"\n") for x in sorted([x for x in ",".join(f.readlines()).split(",")])]
new.csv
anst
baba
fika
koka
maho
root
Demo

You could use the csv module like,
$ cat names.csv
maho,baba,fika,anst,koka,root
$ cat sort_names.py
import csv
with open('names.csv') as csvfile, open('new.txt', 'w') as f:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
print(row)
for word in sorted(row):
f.write("{}\n".format(word))
$ python sort_names.py
$ cat new.txt
anst
baba
fika
koka
maho
root

Split into two columns and convert txt text into a csv file

I have the following data:
Graudo. A selection of Pouteria caimito, a minor member...
TtuNextrecod. A selection of Pouteria caimito, a minor member of the Sapotaceae...
I want to split it into two columns
Column1 Column2
------------------------------------------------------------------------------
Graudo A selection of Pouteria caimito, a minor member...
TtuNextrecod A selection of Pouteria caimito, a minor member of the Sapotaceae...
Need help with the code. Thanks,
import csv # convert
import itertools #function for a efficient looping
with open('Abiutxt.txt', 'r') as in_file:
lines = in_file.read().splitlines() #returns a list with all the lines in string, including the line breaks
test = [line.split('. ')for line in lines ] #split period....but...need work
print(test)
stripped = [line.replace('', '').split('. ')for line in lines ]
grouped = itertools.izip(*[stripped]*1)
with open('logtestAbiutxt.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Column1', 'Column2'))
for group in grouped:
writer.writerows(group)

I am not sure you need zipping here at all. Simply iterate over every line of the input file, skip empty lines, split by the period and write to the csv file:
import csv
with open('Abiutxt.txt', 'r') as in_file:
with open('logtestAbiutxt.csv', 'w') as out_file:
writer = csv.writer(out_file, delimiter="\t")
writer.writerow(['Column1', 'Column2'])
for line in in_file:
if not line.strip():
continue
writer.writerow(line.strip().split(". ", 1))
Notes:
Note: specified a tab as a delimiter, but you could change it appropriately
thanks to #PatrickHaugh for the idea to split by the first occurence of ". " only as your second column may contain periods as well.

This should get you what you want. This will handle all the escaping.
import csv
with open('Abiutxt.txt', 'r') as in_file:
x = in_file.read().splitlines()
x = [line.split('. ', 1) for line in x if line]
with open('logtestAbiutxt.csv', "w") as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerow(['Column1', 'Column2'])
writer.writerows(x)

Write newline to csv in Python

I want to end each interation of a for loop with writing a new line of content (including newline) to a csv file. I have this:
# Set up an output csv file with column headers
with open('outfile.csv','w') as f:
f.write("title; post")
f.write("\n")
This does not appear to write an actual \n (newline) the file. Further:
# Concatenate into a row to write to the output csv file
csv_line = topic_title + ";" + thread_post
with open('outfile.csv','w') as outfile:
outfile.write(csv_line + "\n")
This, also, does not move the cursor in the outfile to the next line. Each new line, with every iteration of the loop, just overwrites the most recent one.
I also tried outfile.write(os.linesep) but did not work.

change 'w' to 'a'
with open('outfile.csv','a')

with open('outfile.csv', 'w', newline='') as f:
f.writerow(...)
Alternatively:
f = csv.writer('outfile.csv', lineterminator='\n')

I confront with same problem， only need follow:
f = csv.writer('outfile.csv', lineterminator='\n')

If your using python2 then use
with open('xxx.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow(fields)
If you are using python3 make use of newline=''

Please try with: open(output_file_name.csv, 'a+', newline='') as f:

Pipe delimiter file, but no pipe inside data

Problem
I need to re-format a text from comma (,) separated values to pipe (|) separated values. Pipe characters within the values of the original (comma separated) text shall be replaced by a space for representation in the (pipe separated) result text.
The pipe separated result text shall be written back to the same file from which the original comma separated text has been read.
I am using python 2.6
Possible Solution
I should read the file first and remove all pipes with spaces in that and later replace (,) with (|).
Is there a the better way to achieve this?

Don't reinvent the value-separated file parsing wheel. Use the csv module to do the parsing and the writing for you.
The csv module will add "..." quotes around values that contain the separator, so in principle you don't need to replace the | pipe symbols in the values. To replace the original file, write to a new (temporary) outputfile then move that back into place.
import csv
import os
outputfile = inputfile + '.tmp'
with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='|')
writer.writerows(reader)
os.remove(inputfile)
os.rename(outputfile, inputfile)
For an input file containing:
foo,bar|baz,spam
this produces
foo|"bar|baz"|spam
Note that the middle column is wrapped in quotes.
If you do need to replace the | characters in the values, you can do so as you copy the rows:
outputfile = inputfile + '.tmp'
with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='|')
for row in reader:
writer.writerow([col.replace('|', ' ') for col in row])
os.remove(inputfile)
os.rename(outputfile, inputfile)
Now the output for my example becomes:
foo|bar baz|spam

Sounds like you're trying to work with a variation of CSV - in that case, Python's CSV library might as well be what you need. You can use it with custom delimiters and it will auto-handle escaping for you (this example was yanked from the manual and modified):
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter='|')
spamwriter.writerow(['One', 'Two', 'Three])
There are also ways to modify quoting and escaping and other options. Reading works similarly.

You can create a temporary file from the original that has the pipe characters replaced, and then replace the original file with it when the processing is done:
import csv
import tempfile
import os
filepath = 'C:/Path/InputFile.csv'
with open(filepath, 'rb') as fin:
reader = csv.DictReader(fin)
fout = tempfile.NamedTemporaryFile(dir=os.path.dirname(filepath)
delete=False)
temp_filepath = fout.name
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
# writer.writeheader() # requires Python 2.7
header = dict(zip(reader.fieldnames, reader.fieldnames))
writer.writerow(header)
for row in reader:
for k,v in row.items():
row[k] = v.replace('|'. ' ')
writer.writerow(row)
fout.close()
os.remove(filepath)
os.rename(temp_filepath, filepath)

Using multiple re.sub() calls in one file with Python

I have a file with a large amount of random strings contained with in it. There are certain patterns that I wan't to remove, so I decided to use RegEX to check for them. So far this code, does exactly what I want it to:
#!/usr/bin/python
import csv
import re
import sys
import pdb
f=open('output.csv', 'w')
with open('retweet.csv', 'rb') as inputfile:
read=csv.reader(inputfile, delimiter=',')
for row in read:
f.write(re.sub(r'#\s\w+', ' ', row[0]))
f.write("\n")
f.close()
f=open('output2.csv', 'w')
with open('output.csv', 'rb') as inputfile2:
read2=csv.reader(inputfile2, delimiter='\n')
for row in read2:
a= re.sub('[^a-zA-Z0-9]', ' ', row[0])
b= str.split(a)
c= "+".join(b)
f.write("http://www.google.com/webhp#q="+c+"&btnI\n")
f.close()
The problem is, I would like to avoid having to open and close a file as this can get messy if I need to check for more patterns. How can I perform multiple re.sub() calls on the same file and write it out to a new file with all substitutions?
Thanks for any help!

Apply all your substitutions in one go on the current line:
with open('retweet.csv', 'rb') as inputfile:
read=csv.reader(inputfile, delimiter=',')
for row in read:
text = row[0]
text = re.sub(r'#\s\w+', ' ', text)
text = re.sub(another_expression, another_replacement, text)
# etc.
f.write(text + '\n')
Note that opening a file with csv.reader(..., delimiter='\n') sounds awfully much as if you are treating that file as a sequence of lines; you could just loop over the file:
with open('output.csv', 'rb') as inputfile2:
for line in inputfile2:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing specific text from every line - python

To remove the first ","-separated column from the file: first, sep, rest = line.partition(",") if rest: # don't write lines with less than 2 columns output_handle.write(rest)

If the format is fixed: with open('example1.txt', 'r') as input_handle: with open('example2.txt', 'w') as output_handle: for line in input_handle: if line: # and maybe some other format check od = line.split(',', 1) output_handle.write(od[1] + "\n")

Try: output_handle.write(line.split(",", 1)[1]) From the docs: str.split([sep[, maxsplit]]) Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements).

Related

Sorting names in a text file, writing results to another text file

Split into two columns and convert txt text into a csv file

Write newline to csv in Python

Pipe delimiter file, but no pipe inside data

Using multiple re.sub() calls in one file with Python

Categories

Resources