I am attempting to remove special characters from a specific column within my csv file. But I cant figure out a way to specify the column I would like to change. Here is what I have:
import csv
input_file = open('src/list.csv', 'r')
output_file = open('src/list_new.csv', 'w')
data = csv.reader(input_file)
writer = csv.writer(output_file, quoting=csv.QUOTE_ALL) # dialect='excel')
specials = '#'
for line in data:
line = str(line)
new_line = str.replace(line, specials, '')
writer.writerow(new_line.split(','))
input_file.close()
output_file.close()
Instead of searching through the whole file how can I specify the column ("Names") I would like to remove the special characters from?
Maybe use csv.DictReader? Then you can refer to the column by name.
Related
Im trying to code if the csv file is more than 3 lines, edit it to delete the first line.
I want to delete the first line from the existing file instead of saving it as a new file.
For this reason, I had to delete the existing file and create a file with the same name but
only one line is saved and the comma disappears.
I'm using Pandas data frame. But if it doesn't matter if I don't use it, I don't want to use it
Function name might be weird because I'm a beginner
Thanks.
file = open("./csv/selllist.csv", encoding="ANSI")
reader = csv.reader(file)
lines= len(list(reader))
if lines > 3:
df = pd.read_csv('./csv/selllist.csv', 'r+', encoding="ANSI")
dfa = df.iloc[1:]
print(dfa)
with open("./csv/selllist.csv", 'r+', encoding="ANSI") as x:
x.truncate(0)
with open('./csv/selllist.csv', 'a', encoding="ANSI", newline='') as fo:
# Pass the CSV file object to the writer() function
wo = writer(fo)
# Result - a writer object
# Pass the data in the list as an argument into the writerow() function
wo.writerow(dfa)
# Close the file object
fo.close()
print()
This is the type of csv file I deal with
string, string, string, string, string
string, string, string, string, string
string, string, string, string, string
string, string, string, string, string
Take a 2-step approach.
Open the file for reading and count the number of lines. If there are more than 3 lines, re-open the file (for writing) and update it.
For example:
lines = []
with open('./csv/selllist.csv') as csv:
lines = csv.readlines()
if len(lines) > 3:
with open('./csv/selllist.csv', 'w') as csv:
for line in lines[1:]: # skip first line
csv.write(line)
With pandas, you can just specify header=None while reading and writing:
import pandas as pd
if lines > 3:
df = pd.read_csv("data.csv", header=None)
df.iloc[1:].to_csv("data.csv", header=None, index=None)
With the csv module:
import csv
with open("data.csv") as infile:
reader = csv.reader(infile)
lines = list(reader)
if len(lines)>3:
with open("data.csv", "w", newline="") as outfile:
writer = csv.writer(outfile, delimiter=",")
writer.writerows(lines[1:])
With one open call and using seek and truncate.
Setup
out = """\
Look at me
I'm a file
for sure
4th line, woot!"""
with open('filepath.csv', 'w') as fh:
fh.write(out)
Solution
I'm aiming to minimize the stuff I'm doing. I'll only open one file and only one time. I'll only split one time.
with open('filepath.csv', 'r+') as csv:
top, rest = csv.read().split('\n', 1) # Only necessary to pop off first line
if rest.count('\n') > 1: # If 4 or more lines, there will be at
# least two more newline characters
csv.seek(0) # Once we're done reading, we need to
# go back to beginning of file
csv.truncate() # truncate to reduce size of file as well
csv.write(rest)
I have the following data:
Graudo. A selection of Pouteria caimito, a minor member...
TtuNextrecod. A selection of Pouteria caimito, a minor member of the Sapotaceae...
I want to split it into two columns
Column1 Column2
------------------------------------------------------------------------------
Graudo A selection of Pouteria caimito, a minor member...
TtuNextrecod A selection of Pouteria caimito, a minor member of the Sapotaceae...
Need help with the code. Thanks,
import csv # convert
import itertools #function for a efficient looping
with open('Abiutxt.txt', 'r') as in_file:
lines = in_file.read().splitlines() #returns a list with all the lines in string, including the line breaks
test = [line.split('. ')for line in lines ] #split period....but...need work
print(test)
stripped = [line.replace('', '').split('. ')for line in lines ]
grouped = itertools.izip(*[stripped]*1)
with open('logtestAbiutxt.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Column1', 'Column2'))
for group in grouped:
writer.writerows(group)
I am not sure you need zipping here at all. Simply iterate over every line of the input file, skip empty lines, split by the period and write to the csv file:
import csv
with open('Abiutxt.txt', 'r') as in_file:
with open('logtestAbiutxt.csv', 'w') as out_file:
writer = csv.writer(out_file, delimiter="\t")
writer.writerow(['Column1', 'Column2'])
for line in in_file:
if not line.strip():
continue
writer.writerow(line.strip().split(". ", 1))
Notes:
Note: specified a tab as a delimiter, but you could change it appropriately
thanks to #PatrickHaugh for the idea to split by the first occurence of ". " only as your second column may contain periods as well.
This should get you what you want. This will handle all the escaping.
import csv
with open('Abiutxt.txt', 'r') as in_file:
x = in_file.read().splitlines()
x = [line.split('. ', 1) for line in x if line]
with open('logtestAbiutxt.csv', "w") as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerow(['Column1', 'Column2'])
writer.writerows(x)
Currently, I take in a csv file using custom delimiters, "|". I then read it in and modify it using the code below:
import csv
ChangedDate = '2018-10-31'
firstfile = open('example.csv',"r")
firstReader = csv.reader(firstfile, delimiter='|')
firstData = list(firstReader)
outputFile = open("output.csv","w")
iteration = 0
for row in firstData:
firstData[iteration][25] = ChangedDate
iteration+=1
outputwriter = csv.writer(open("output.csv","w"))
outputwriter.writerows(firstData)
outputFile.close()
However, when I write the rows to my output file, they are comma seperated. This is a problem because I am dealing with large financial data, and therefore commas appear naturally, such as $8,000.00, hence the "|" delimiters of the original file. Is there a way to "re-delimit" my list before I write it to an output file?
You can provide the delimiter to the csv.writer:
with open("output.csv", "w") as f:
outputwriter = csv.writer(f, delimiter='|')
Problem
I need to re-format a text from comma (,) separated values to pipe (|) separated values. Pipe characters within the values of the original (comma separated) text shall be replaced by a space for representation in the (pipe separated) result text.
The pipe separated result text shall be written back to the same file from which the original comma separated text has been read.
I am using python 2.6
Possible Solution
I should read the file first and remove all pipes with spaces in that and later replace (,) with (|).
Is there a the better way to achieve this?
Don't reinvent the value-separated file parsing wheel. Use the csv module to do the parsing and the writing for you.
The csv module will add "..." quotes around values that contain the separator, so in principle you don't need to replace the | pipe symbols in the values. To replace the original file, write to a new (temporary) outputfile then move that back into place.
import csv
import os
outputfile = inputfile + '.tmp'
with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='|')
writer.writerows(reader)
os.remove(inputfile)
os.rename(outputfile, inputfile)
For an input file containing:
foo,bar|baz,spam
this produces
foo|"bar|baz"|spam
Note that the middle column is wrapped in quotes.
If you do need to replace the | characters in the values, you can do so as you copy the rows:
outputfile = inputfile + '.tmp'
with open(inputfile, 'rb') as inf, open(outputfile, 'wb') as outf:
reader = csv.reader(inf)
writer = csv.writer(outf, delimiter='|')
for row in reader:
writer.writerow([col.replace('|', ' ') for col in row])
os.remove(inputfile)
os.rename(outputfile, inputfile)
Now the output for my example becomes:
foo|bar baz|spam
Sounds like you're trying to work with a variation of CSV - in that case, Python's CSV library might as well be what you need. You can use it with custom delimiters and it will auto-handle escaping for you (this example was yanked from the manual and modified):
import csv
with open('eggs.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter='|')
spamwriter.writerow(['One', 'Two', 'Three])
There are also ways to modify quoting and escaping and other options. Reading works similarly.
You can create a temporary file from the original that has the pipe characters replaced, and then replace the original file with it when the processing is done:
import csv
import tempfile
import os
filepath = 'C:/Path/InputFile.csv'
with open(filepath, 'rb') as fin:
reader = csv.DictReader(fin)
fout = tempfile.NamedTemporaryFile(dir=os.path.dirname(filepath)
delete=False)
temp_filepath = fout.name
writer = csv.DictWriter(fout, reader.fieldnames, delimiter='|')
# writer.writeheader() # requires Python 2.7
header = dict(zip(reader.fieldnames, reader.fieldnames))
writer.writerow(header)
for row in reader:
for k,v in row.items():
row[k] = v.replace('|'. ' ')
writer.writerow(row)
fout.close()
os.remove(filepath)
os.rename(temp_filepath, filepath)
For Python I'm opening a csv file that appears like:
jamie,london,uk,600087
matt,paris,fr,80092
john,newyork,ny,80071
How do I enclose the words with quotes in the csv file so it appears like:
"jamie","london","uk","600087"
etc...
What I have right now is just the basic stuff:
filemame = "data.csv"
file = open(filename, "r")
Not sure what I would do next.
If you are just trying to convert the file, use the QUOTE_ALL constant from the csv module, like this:
import csv
with open('data.csv') as input, open('out.csv','w') as output:
reader = csv.reader(input)
writer = csv.writer(output, delimiter=',', quoting=csv.QUOTE_ALL)
for line in reader:
writer.writerow(line)