I have a CSV file with hundreds of rows, and I would like to select and export every 3 rows to a new CSV file with the new output CSV file being named after the first row of the selection.
For example in the following CSV file....
1980 10 12
1 2 3 4 5 6 7
4 6 8 1 0 8 6
1981 10 12
2 4 9 7 5 4 1
8 9 3 8 3 7 3
I would like to select the first 3 rows and export to a new CSV named "1980 10 12" based on the first row then select the next 3 rows and export to a new CSV named "1981 10 12" based on the first row of the next 3 rows. I would like to do this using python.
Using the csv module, plus itertools.islice() to select 3 rows each time:
import csv
import os.path
from itertools import islice
with open(inputfilename, 'rb') as infh:
reader = csv.reader(infh)
for row in reader:
filename = row[0].replace(' ', '_') + '.csv')
filename = os.path.join(directory, filename)
with open(filename, 'wb') as outfh:
writer = csv.writer(outfh)
writer.writerow(row)
writer.writerows(islice(reader, 2))
The writer.writerows(islice(reader, 2)) line takes the next 2 rows from the reader, copying them across to the writer CSV, after writing the current row (with the date) to the output file first.
You may need to adjust the delimiter argument for the csv.reader() and csv.writer() objects; the default is a comma, but you didn't specify the exact format and perhaps you need to set it to a '\t' tab instead.
If you are using Python 3, open the files with 'r' and 'w' text mode, and set newline='' for both; open(inputfilename, 'r', newline='') and open(filename, 'w', newline='').
import csv
with open("in.csv") as f:
reader = csv.reader(f)
chunks = []
for ind, row in enumerate(reader, 1):
chunks.append(row)
if ind % 3 == 0: # if we have three new rows, create a file using the first row as the name
with open("{}.csv".format(chunks[0][0].strip(), "w") as f1:
wr = csv.writer(f1)
wr.writerows(chunks) # write all rows
chunks = [] # reset chunks to an empty list
Using slight iterator trickery:
with open('in.csv', 'r') as infh:
for block in zip(*[infh]*3):
filename = block[0].strip() + '.csv'
with open(filename, 'w') as outfh:
outfh.writelines(block)
On Python 2.X you would use itertools.izip. The docs actually mention izip(*[iter(s)]*n) as an idiom for clustering a data series.
Related
1 7 c
5 2 q
4 5 a
5 0 c
for i,line in enumerate(read_tsv):
first = read_tsv[i][0]
second = read_tsv[i][1]
letter = read_tsv[i][2]
if i == 2:
I have a tsv file and I'd like to delete the rows where the 3rd values are not c. So I'd like it to look like this. So far I know how to seperate the values I just don't know how to delete the row based on the third tabbed value.
1 7 c
5 0 c
You can open the doc read/iterate it and filter out the unwanted rows then open it in write and write that data back
import csv
with open('filename.tsv', 'r') as f:
reader = csv.reader(f, delimiter='\t')
data = [row for row in reader if row[2] == 'c']
with open('filename.tsv', 'w') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerows(data)
I have below CSV file: I read csv and store each column into different variables(Name,Sub,Idx,Isd):
Name Sub Idx Isd
AAB YAH 2 7
AB VF 5
YHJ 3 4
YAH HJY 25 23
Now I want to store them into tabular form(as original data in csv).Now, I want to store the data only if all the cells have data(no empty cells).
My final output:
Name Sub Idx Isd
AAB YAH 2 7
YAH HJY 25 23
I use the following code:
import csv
from collections import defaultdict
columns=defaultdict(list)
with open('inputCSV.csv','r',) as f:
reader=csv.DictReader(f)
for row in reader:
for(k,v) in row.items():
columns[k].append(v)
name=columns['Name']
Sub=columns['Sub']
Idx=columns['Idx']
Isd=columns['Isd']
This would probably be easier to do just using a standard csv.reader() rather than a dictionary reader:
import csv
with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
header = next(csv_input)
csv_output.writerow(header) # copy the header to the output
for row in csv_input:
if len(row) and '' not in row:
csv_output.writerow(row)
This first copies the header from the input CSV to the output CSV. Then for each row, it first checks to make sure it is not an empty row. It then tests the whole row to see if there are any empty values. If there are none, it writes the whole row to the output CSV file.
Giving you an output.csv containing:
Name,Sub,Idx,Isd
AAB,YAH,2,7
YAH,HJY,25,23
Note, this assumes you are using Python 2.x. If Python 3.x is being used, replace this line:
with open('input.csv', newline='') as f_input, open('output.csv', 'w', newline='') as f_output:
I am looking for a script to add a new data column into existing csv file by python. I have a file (e.g. file.csv) which will have many rows and few columns. From for loop calculation, I got a new array (A in my code here). I want to append that new array (A) as the last column of existing csv file. I used the below code.
for xxx in xxxx:
A= xxx
f=open("file.csv")
data=[item for item in csv.reader(f)]
f.close()
new_column=[A]
new_data=[]
for i, item in enumerate (data):
try:
item.append (new_column[i])
except IndexError, e:
item.append(A)
new_data.append(item)
f=open('outfilefinal1.csv','w')
csv.writer(f).writerows(new_data)
f.close()
It did append a new column as the last column. But the problem is the whole column got one same value (A value form the last loop). So, how can I do if I want the A value from each for loop as my last column. Thanks.
Example input file
1 2
2 4
0 9
4 8
A value from each loop
3
4
0
9
So the final file should show
1 2 3
2 4 4
0 9 0
4 8 9
but in my case it shows as
1 2 9
2 4 9
0 9 9
4 8 9
Problem with your code is that you are overwriting your file inside a nested for loop.
Is A really an array that does not depend on file.csv? Then you could do something like this:
import csv
A = compute_new_values_list()
with open('file.csv') as fpi, open('out.csv', 'w') as fpo:
reader = csv.reader(fpi)
writer = csv.writer(fpo)
#optionaly handle csv header
headers = next(reader)
headers.append('new_column')
writer.writerow(headers)
for index, row in enumerate(reader):
row.append(A[index])
writer.writerow(row)
EDIT:
If you need a row of file.csv to compute your new value, you can use the same code, just compute your new value inside for loop:
import csv
with open('file.csv') as fpi, open('out.csv', 'w') as fpo:
reader = csv.reader(fpi)
writer = csv.writer(fpo)
#optionaly handle csv header
headers = next(reader)
headers.append('new_column')
writer.writerow(headers)
for row in reader:
new_value = compute_from_row(row)
row.append(new_value)
writer.writerow(row)
I am splitting a CSV file based on a column with dates into separate files. However, some rows do contain a date but the others cells are empty. I want to remove these rows that contain empty cells from the CSV. But I'm not sure how to do this.
Here's is my code:
csv.field_size_limit(sys.maxsize)
with open(main_file, "r") as fp:
root = csv.reader(fp, delimiter='\t', quotechar='"')
result = collections.defaultdict(list)
next(root)
for row in root:
year = row[0].split("-")[0]
result[year].append(row)
for i,j in result.items():
row_count = sum(1 for row in j)
print(row_count)
file_path = "%s%s-%s.csv"%(src_path, i, row_count)
with open(file_path, 'w') as fp:
writer = csv.writer(fp, delimiter='\t', quotechar='"')
writer.writerows(j)
Pandas is perfect for this, especially if you want this to be easily adjusted to, say, other file formats. Of course one could consider it an overkill.
To just remove rows with empty cells:
>>> import pandas as pd
>>> data = pd.read_csv('example.csv', sep='\t')
>>> print data
A B C
0 1 2 5
1 NaN 1 9
2 3 4 4
>>> data.dropna()
A B C
0 1 2 5
2 3 4 4
>>> data.dropna().to_csv('example_clean.csv')
I leave performing the splitting and saving into separate files using pandas as an exercise to start learning this great package if you want :)
This would skip all all rows with at least one empty cell:
with open(main_file, "r") as fp:
....
for row in root:
if not all(map(len, row)):
continue
Pandas is Best in Python for handling any type of data processing.For help you can go through on link :- http://pandas.pydata.org/pandas-docs/stable/10min.html
I am trying to create an importable module to delete a range of columns (specifically columns 73-177 in the file I am working with).I am attempting to edit this file i/o code that is written for removing a column based on the field name. I want to modify this code to delete columns 73-177 in a csv file. What do I need to do to accomplish this?
def removeColumns(num1, num2, inputFILE, FileName):
inPUTfile = open(inputFILE, 'r')
outPUTfile = open(FileName, 'w')
line = inPUTfile.readline()
while line:
# Delete Specified columns. First column range number, second column range number (+1)
lineList = line.split('\t')
removeCOL = "Calendar-Year"
i = 0
while lineList[i] != removeCOL: #(linesout?):
i = i + 1
lineList.pop(i) #remove these fields from the list.append
#write modified fields
remove = "\t".join(lineList)
outPUTfile.write(line) #write the new field names outfile
for line in inPUTfile: #remove field i from each remaining line and write it in the output file &modify input line
lineList = line.split( ) #convert to a list
lineList.pop(i) #remove fields from the list
line = '\t'.join(lineList)
line = line + '\n' #add a carriage return to the end of the row
outPUTfile.write(line)# Write the modified line in the output file
inPUTfile.close() #close the input file
outPUTfile.close() #close the output file
return outPUTfile
print outPUTfile
I realize that you asked how to modify the original code, but here I honestly think it'd be easier to understand how to do it a different way. Python has a useful csv module which handles a lot of the work for you. Something like:
import csv
remove_from = 2
remove_to = 5
with open("to_delete.csv", "rb") as fp_in, open("newfile.csv", "wb") as fp_out:
reader = csv.reader(fp_in, delimiter="\t")
writer = csv.writer(fp_out, delimiter="\t")
for row in reader:
del row[remove_from:remove_to]
writer.writerow(row)
will turn
$ cat to_delete.csv
a b c d e f g
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
into
$ cat newfile.csv
a b f g
1 2 6 7
8 9 13 14
15 16 20 21