CSV Writing to File Difficulties - python

I am supposed to add a specific label to my CSV file based off conditions. The CSV file has 10 columns and the third, fourth, and fifth columns are the ones that affect the conditions the most and I add my label on the tenth column. I have code here which ended in an infinite loop:
import csv
import sys
w = open(sys.argv[1], 'w')
r = open(sys.argv[1], 'r')
reader = csv.reader(r)
writer = csv.writer(w)
for row in reader:
if row[2] or row[3] or row[4] == '0':
row[9] == 'Label'
writer.writerow(row)
w.close()
r.close()
I do not know why it would end in an infinite loop.
EDIT: I made a mistake and my original infinite loop program had this line:
w = open(sys.argv[1], 'a')
I changed 'a' to 'w' but this ended up erasing the entire CSV file itself. So now I have a different problem.

You have a problem here if row[2] or row[3] or row[4] == '0': and here row[9] == 'Label', you can use any to check several variables equal to the same value, and use = to assign a value, also i would recommend to use with open.
Additionally you can't read and write at the same time in csv file, so you need to save your changes to a new csv file, you can remove the original one after and rename the new one using os.remove and os.rename:
import csv
import sys
import os
with open('some_new_file.csv', 'w') as w, open(sys.argv[1], 'r') as r:
reader, writer = csv.reader(r), csv.writer(w)
for row in reader:
if any(x == '0' for x in (row[2], row[3], row[4])):
row[9] = 'Label'
writer.writerow(row)
os.remove('{}'.format(sys.argv[1]))
os.rename('some_new_file.csv', '{}'.format(sys.argv[1]))

You can write to a tempfile.NamedTemporaryFile and just use in to test for the "0" as you are matching a full string not a substring so you won't save anything by using any as you create a tuple of three elements so you may as well slice or just test for membership regardless, then you just replace the original file with shutil.move:
import csv
import sys
from shutil import move
from tempfile import NamedTemporaryFile
with NamedTemporaryFile("w", dir=".", delete=False) as w, open(sys.argv[1]) as r:
reader, writer = csv.reader(r), csv.writer(w)
writer.writerows(row[:-1] + ['Label'] if "0" in row[2:5] else row
for row in reader)
move(w.name, sys.argv[1])
sys.argv[1] is also you file name and a string so that is all you need to pass.

I think the Problem is in lines
w = open(sys.argv[1], 'w')
r = open(sys.argv[1], 'r')
You are opening the same file for reading and writing.So try using different file name.

Related

How do i delete a row from my csv file that fits my condition and write to a new output file

For example, i have the following dataset :
Date,Category,Amount,Description
06-06-2022,Food,300.0,celebration
02-09-2021,transport,3300.0,operation
And am required to delete one of the entries when condition 1 (e.g. description) and condition 2 (e.g. category) are met to get the following:
Date,Category,Amount,Description
06-06-2022,Food,300.0,celebration
so far, i have no issues with the conditions statement. However, when i execute, it would produced an output for the first line with commas for each individual character and no other info.
import os
import csv
with open("filename.csv","r+") as r, open("output.csv","w") as f:
writer=csv.writer(f)
for line in r:
if condition_1 in line and condition_2 in line:
print(line) #show the line that will be deleted
else:
writer.writerow(line) #write other lines to new file
os.remove('filename.csv')
os.rename('output.csv', "filename.csv")
Any help or tips would be appreciated!
csv is an useful built-in standard library. With regards to learning it, you forgot to include csv.reader. Simply pass the file as an object to csv.reader similar to what you've done with csv.writer.
This is a working solution that you might want:
# import os
import csv
with open("filename.csv","r+") as r, open("output.csv","w") as f:
# pass the file object to reader() to get the reader object
reader = csv.reader(r)
writer = csv.writer(f)
# Iterate over each row in the csv using reader object
for row in reader:
# row variable is a list that represents a row in csv
# print(row)
# print row as original text line
# print(', '.join(row))
if row[0] == '06-06-2022' and row[1] == 'Food':
print(f'{row} to be deleted') #show the line that will be deleted
else:
writer.writerow(row)
# os.remove('filename.csv')
# os.rename('output.csv', "filename.csv")
Try it here
https://replit.com/#huydhoang/Delete-row-from-csv#main.py
You already have a string of comma separated values in your line variable. You don't need the csv writer.
Instead you could try opening the file in append open('output.csv', 'a') mode and use f.write(line)
EDIT:
full code (working)
import os
with open("filename.csv", "r") as r, open("output.csv", "a") as f:
for line in r:
if 'transport' in line and '3300.0' in line:
print(line) # show the line that will be deleted
else:
f.write(line) # write other lines to new file
os.remove('filename.csv')
os.rename('output.csv', "filename.csv")

Reading data from one CSV and displaying parsed data on to another CSV file

I am very new to Python. I am trying to read a csv file and displaying the result to another CSV file. What I want to do is I want to write selective rows in the input csv file on to the output file. Below is the code I wrote so far. This code read every single row from the input file i.e. 1.csv and write it to an output file out.csv. How can I tweak this code say for example I want my output file to contain only those rows which starts with READ in column 8 and rows which are not equal to 0000 in column 10. Both of these conditions need to be met. Like start with READ and not equal to 0000. I want to write all these rows. Also this block of code is for a single csv file. Can anyone please tell me how I can do it for say 10000 csv files ? Also when I execute the code, I can see spaces between lines on my out csv. How can I remove those spaces ?
import csv
f1 = open("1.csv", "r")
reader = csv.reader(f1)
header = reader.next()
f2 = open("out.csv", "w")
writer = csv.writer(f2)
writer.writerow(header)
for row in reader:
writer.writerow(row)
f1.close()
f2.close()
Something like:
import os
import csv
import glob
class CSVReadWriter(object):
def munge(self, filename, suffix):
name,ext = os.path.split(filename)
return '{0}{1}.{2}'.format(name, suffix, ext)
def is_valid(self, row):
return row[8] == 'READ' and row[10] == '0000'
def filter_csv(fin, fout):
reader = csv.reader(fin)
writer = csv.writer(fout)
writer.write(reader.next()) # header
for row in reader:
if self.is_valid(row):
writer.writerow(row)
def read_write(self, iname, suffix):
with open(iname, 'rb') as fin:
oname = self.munge(filename, suffix)
with open(oname, 'wb') as fout:
self.filter_csv(fin, fout)
work_directory = r"C:\Temp\Data"
for filename in glob.glob(work_directory):
csvrw = CSVReadWriter()
csvrw.read_write(filename, '_out')
I've made it a class so that you can over ride the munge and is_valid methods to suit different cases. Being a class also means that you can store state better, for example if you wanted to output lines between certain criteria.
The extra spaces between lines that you mention are to do with \r\n carriage return and line feed line endings. Using open with 'wb' might resolve it.

Writing output to csv file [in correct format]

I realize this question has been asked a million times and there is a lot of documentation on it. However, I am unable to output the results in the correct format.
The below code was adopted from: Replacing empty csv column values with a zero
# Save below script as RepEmptyCells.py
# Add #!/usr/bin/python to script
# Make executable by chmod +x prior to running the script on desired .csv file
# Below code will look through your .csv file and replace empty spaces with 0s
# This can be particularly useful for genetic distance matrices
import csv
import sys
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
print(','.join(int(x) for x in row))
Currently, to get the correct output .csv file [i.e. in correct format] one can run the following command in bash:
#After making the script executable
./RepEmptyCells.py input.csv > output.csv # this produces the correct output
I've tried to use csv.writer function to produce the correctly formatted output.csv file (similar to ./RepEmptyCells.py input.csv > output.csv) without much luck.
I'd like to learn how to add this last part to the code to automate the process without having to do it in bash.
What I have tried:
f = open(output2.csv, 'w')
import csv
import sys
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
f.write(','.join(int(x) for x in row))
f.close()
When looking at the raw files from this code and the one before, they look the same.
However, when I open them in either excel or iNumbers the latter (i.e. output2.csv) shows only a single row of the data.
Its important that both output.csv and output2.csv can be opened in excel.
2 options:
Just do a f.write('\n') after your current f.write statement.
Use csv.writer. You mention it but it isn't in your code.
writer = csv.writer(f)
...
writer.writerow([int(x) for x in row]) # Note difference in parameter format
An humble proposition
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
# Use with statement to properly close files
# Use newline='' which is the right option for Python 3.x
with open(sys.argv[1], 'r', newline='') as fin, open(sys.argv[2], 'w', newline='') as fout:
reader = csv.reader(fin)
# You may need to redefine the dialect for some version of Excel that
# split cells on semicolons (for _Comma_ Separated Values, yes...)
writer = csv.writer(fout, dialect="excel")
for row in reader:
# Write as reading, let the OS do the caching alone
# Process the data as it comes in a generator, checking all cells
# in a row. If cell is empty, the or will return "0"
# Keep strings all the time: if it's not an int it would fail
# Converting to int will force the writer to convert it back to str
# anwway, and Excel doesn't make any difference when loading.
writer.writerow( cell or "0" for cell in row )
Sample in.csv
1,2,3,,4,5,6,
7,,8,,9,,10
Output out.csv
1,2,3,0,4,5,6,0
7,0,8,0,9,0,10
import csv
import sys
with open(sys.argv[1], 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row.replace(' ', '0')
and I don't understand your need for using the shell and redirecting.
a csv writer is just:
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(rows)

re.sub for a csv file

I am receiving a error on this code. It is "TypeError: expected string or buffer". I looked around, and found out that the error is because I am passing re.sub a list, and it does not take lists. However, I wasn't able to figure out how to change my line from the csv file into something that it would read.
I am trying to change all the periods in a csv file into commas. Here is my code:
import csv
import re
in_file = open("/test.csv", "rb")
reader = csv.reader(in_file)
out_file = open("/out.csv", "wb")
writer = csv.writer(out_file)
for row in reader:
newrow = re.sub(r"(\.)+", ",", row)
writer.writerow(newrow)
in_file.close()
out_file.close()
I'm sorry if this has already been answered somewhere. There was certainly a lot of answers regarding this error, but I couldn't make any of them work with my csv file. Also, as a side note, this was originally an .xslb excel file that I converted into csv in order to be able to work with it. Was that necessary?
You could use list comprehension to apply your substitution to each item in row
for row in reader:
newrow = [re.sub(r"(\.)+", ",", item) for item in row]
writer.writerow(newrow)
for row in reader does not return single element to parse it rather it returns list of of elements in that row so you have to unpack that list and parse each item individually, just like #Trii shew you:
[re.sub(r'(\.)+','.',s) for s in row]
In this case, we are using glob to access all the csv files in the directory.
The code below overwrites the source csv file, so there is no need to create an output file.
NOTE:
If you want to get a second file with the parameters provided with re.sub, replace write = open(i, 'w') for write = open('secondFile.csv', 'w')
import re
import glob
for i in glob.glob("*.csv"):
read = open(i, 'r')
reader = read.read()
csvRe = re.sub(re.sub(r"(\.)+", ",", str(reader))
write = open(i, 'w')
write.write(csvRe)
read.close()
write.close()

Python reading files in a directory

I have a .csv with 3000 rows of data in 2 columns like this:
uc007ayl.1 ENSMUSG00000041439
uc009mkn.1 ENSMUSG00000031708
uc009mkn.1 ENSMUSG00000035491
In another folder I have a graphs with name like this:
uc007csg.1_nt_counts.txt
uc007gjg.1_nt_counts.txt
You should notice those graphs have a name in the same format of my 1st column
I am trying to use python to identify those rows that have a graph and print the name of 2nd column in a new .txt file
These are the codes I have
import csv
with open("C:/*my dir*/UCSC to Ensembl.csv", "r") as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
print row[0]
But this as far as I can get and I am stuck.
You're almost there:
import csv
import os.path
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
graph_filename = os.path.join("C:/folder", row[0] + "_nt_counts.txt")
if os.path.exists(graph_filename):
print (row[1])
Note that the repeated calls to os.path.exists may slow down the process, especially if the directory lies on a remote filesystem and does not significantly more files than the number of lines in the CSV file. You may want to use os.listdir instead:
import csv
import os
graphs = set(os.listdir("C:/graph folder"))
with open("C:/*my dir*/UCSC to Ensembl.csv", "rb") as f:
reader = csv.reader(f, delimiter = ',')
for row in reader:
if row[0] + "_nt_counts.txt" in graphs:
print (row[1])
First, try to see if print row[0] really gives the correct file identifier.
Second, concatenate the path to the files with row[0] and check if this full path exists (if the file exists, actually) with os.path.exists(path) (see http://docs.python.org/library/os.path.html#os.path.exists ).
If it exits, you can write the row[1] (the second column) to a new file with f2.write("%s\n" % row[1] (first you have to open f2 for writing of course).
Well, the next step would be to check if the file exists? There are a few ways, but I like the EAFP approach.
try:
with open(os.path.join(the_dir,row[0])) as f: pass
except IOError:
print 'Oops no file'
the_dir is the directory where the files are.
result = open('result.txt', 'w')
for line in open('C:/*my dir*/UCSC to Ensembl.csv', 'r'):
line = line.split(',')
try:
open('/path/to/dir/' + line[0] + '_nt_counts.txt', 'r')
except:
continue
else:
result.write(line[1] + '\n')
result.close()
import csv
import os
# get prefixes of all graphs in another directory
suff = '_nt_counts.txt'
graphs = set(fn[:-len(suff)] for fn in os.listdir('another dir') if fn.endswith(suff))
with open(r'c:\path to\file.csv', 'rb') as f:
# extract 2nd column if the 1st one is a known graph prefix
names = (row[1] for row in csv.reader(f, delimiter='\t') if row[0] in graphs)
# write one name per line
with open('output.txt', 'w') as output_file:
for name in names:
print >>output_file, name

Categories

Resources