I must be missing something, but I don't get it. I have a csv, it has 1200 fields. I'm only interested in 30. How do you get that to work? I can read/write the whole shebang, which is ok, but i'd really like to just write out the 30. I have a list of the fieldnames and I'm kinda hacking the header.
How would I translate below to use DictWriter/Reader?
for file in glob.glob( os.path.join(raw_path, 'P12*.csv') ):
fileReader = csv.reader(open(file, 'rb'))
fileLength = len(file)
fileGeom = file[fileLength-7:fileLength-4]
table = TableValues[fileGeom]
filename = file.split(os.sep)[-1]
with open(out_path + filename, "w") as fileout:
for line in fileReader:
writer = csv.writer(fileout, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
if 'ID' in line:
outline = line.insert(0,"geometryTable")
else:
outline = line.insert(0,table) #"%s,%s\n" % (line, table)
writer.writerow(line)
Here's an example of using DictWriter to write out only fields you care about. I'll leave the porting work to you:
import csv
headers = ['a','b','d','g']
with open('in.csv','rb') as _in, open('out.csv','wb') as out:
reader = csv.DictReader(_in)
writer = csv.DictWriter(out,headers,extrasaction='ignore')
writer.writeheader()
for line in reader:
writer.writerow(line)
in.csv
a,b,c,d,e,f,g,h
1,2,3,4,5,6,7,8
2,3,4,5,6,7,8,9
Result (out.csv)
a,b,d,g
1,2,4,7
2,3,5,8
Related
I am trying to output a csv file, but the problem is, the headers are gone, and I tried looking at my code line by line but I don't know what's wrong with my code..
My sample data is :
ABC.csv (assuming there are multiple data in it so I also add the code on how to remove it)
KeyID,GeneralID
145258,KL456
145259,BG486
145260,HJ789
145261,KL456
145259,BG486
145259,BG486
My code:
import csv
import fileinput
from collections import Counter
file_path_1 = "ABC.csv"
key_id = []
general_id = []
total_general_id = []
with open(file_path_1, 'rU') as f:
reader = csv.reader(f)
header = next(reader)
lines = [line for line in reader]
counts = Counter([l[1] for l in lines])
new_lines = [l + [str(counts[l[1])] for l in lines]
with open(file_path_1, 'wb') as f:
writer = csv.writer(f)
writer.writerow(header + ['Total_GeneralID'])
writer.writerows(new_lines)
with open(file_path_1, 'rU') as f:
reader = csv.DictReader(f)
for row in reader:
key_id.append(row['KeyID'])
general_id.append(row['GeneralID'])
total_general_id.append(['Total_GeneralID'])
New_List = [[] for _ in range(len(key_id))]
for attr in range(len(key_id)):
New_List[attr].append(key_id[attr])
New_List[attr].append(general_id[attr])
New_List[attr].append(total_general_id[attr])
with open('result_id_with_total.csv', 'wb+') as newfile:
header = ['KEY ID', 'GENERAL ID' , 'TOTAL GENERAL ID']
wr = csv.writer(newfile, delimiter=',', quoting = csv.QUOTE_MINIMAL)
wr.writerow(header) #I already add the headers but it won't work.
for item in New_List:
if item not in newfile:
wr.writerow(item)
Unfortunately, my output would be like this(result_id_with_total.csv);
145258,KL456,2
145259,BG486,1
145260,HJ789,1
145261,KL456,2
What I am trying to achieve;
KEY ID,GENERAL ID,TOTAL GENERAL ID
145258,KL456,2
145259,BG486,1
145260,HJ789,1
145261,KL456,2
My main problem in this code:
wr.writerow(header)
won't work.
This is to do with opening a file with wb+ (write bytes). Because when you write a file in bytes mode you need to pass to it an array of bytes and not strings.
I get this error in the console when I run it:
TypeError: a bytes-like object is required, not 'str'
Try changing wb+ to just w, this does the trick.
with open('result_id_with_total.csv', 'w') as newfile:
header = ['KEY ID', 'GENERAL ID' , 'TOTAL GENERAL ID']
wr = csv.writer(newfile, delimiter=',', quoting = csv.QUOTE_MINIMAL)
I have many CSV files, need to read all the files in loop and write file name and all the columns (header in row 1) in an output file.
Example
Input csv file 1 (test1.csv)
Id, Name, Age, Location
1, A, 25, India
Input csv file 2 (test2.csv)
Id, ProductName
1, ABC
Outputfile
test1.csv Id
test1.csv Name
test1.csv Age
test1.csv Location
test2.csv Id
test2.csv ProductName
Many thanks for your help.
Update:
This code works fine for this purpose:
import os
import csv
ofile = open('D:\Anuj\Personal\OutputFile/AHS_File_Columns_Info.csv', 'w')
directory = os.path.join('D:\Anuj\Personal\Python')
for root, dirs, files in os.walk(directory):
for file in files:
fullfilepath = directory + "/" + file
with open(fullfilepath,'r') as f:
output = file +','+ f.readline()
ofile.write(output)
clean solution using csv module for reading and writing
open output file and create a csv.writer instance on its handle
open each input file and create a csv.reader instance on their handle
get first row using next on the csv.reader iterator: gets titles as list (with a small post-processing to remove the spaces)
write titles alongside the current filename in a loop
code:
import csv
files=["test1.csv","test2.csv"]
with open("output.tsv","w",newline='') as fw:
cw = csv.writer(fw,delimiter="\t") # output is tab delimited
for filename in files:
with open(filename,'r') as f:
cr = csv.reader(f)
# get title
for column_name in (x.strip() for x in next(cr)):
cw.writerow([filename,column_name])
There are several advantages using csv module, the most important being that quoting & multi-line fields/titles are managed properly.
But I'm not sure I understand you correctly.
import csv
from typing import List
from typing import Tuple
TableType = List[List[str]]
def load_csv_table(file_name: str) -> Tuple[List[str], TableType]:
with open(file_name) as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
headers = next(csv_reader)
data_table = list(csv_reader)
return headers, data_table
def save_csv_table(file_name: str, headers: List[str], data_table: TableType):
with open(file_name, 'w', newline='') as csv_file:
writer = csv.writer(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow(headers)
for row in data_table:
writer.writerow(row)
input_files = ['file1.csv', 'file2.csv', 'file3.csv']
new_table = []
new_headers = []
for file_name in input_files:
headers, data_table = load_csv_table(file_name)
if not new_headers:
new_headers = ['Source'] + headers
new_table.extend(([file_name] + line for line in data_table))
save_csv_table('output.csv', new_headers, new_table)
A simple method is to use readline() on the file object:
files=["test1.csv","test2.csv"]
for my_file in files:
with open(my_file,'r') as f:
print my_file, f.readline()
I am transforming JSON like data to CSV and having a few issues.
The code is here:
import json
import csv
def parse_file(inputed_file):
with open(input_file, 'r') as inputed_file:
content = inputed_file.readlines()
split_file = open('test.csv', 'w')
for line in content:
lines = line.split('\t')
data = json.loads(lines[0])
writer = csv.DictWriter(split_file, fieldnames = ["title", "firstname"], delimiter = ',')
writer.writeheader()
The problem is this is adding a header on each row for the data, I want to only have the header displayed once. Then add this for the data to go below the headers:
writer.writerow(data)
I have looked at this and tried it but failed: How can I convert JSON to CSV?.
Create the DictWriter outside the loop, and just call writer.writeheader() there. Then call writer.writerow() inside the loop.
def parse_file(inputed_file):
with open(input_file, 'r') as inputed_file:
content = inputed_file.readlines()
split_file = open('test.csv', 'w')
writer = csv.DictWriter(split_file, fieldnames = ["title", "firstname"], delimiter = ',')
writer.writeheader()
for line in content:
lines = line.split('\t')
data = json.loads(lines[0])
writer.writerow(data)
Hi I'm trying to finish this small piece of code for modifying csv files, I've got this far with some help:
edit... some more info.
Basically what I’m looking to do is make some small changes to the csv file depending on the project and parent issue in JIRA. Python will then make the changes to the csv file before it is then read into JIRA - that’s the second part of the program I’ve not even really looked at yet.
I’m only looking to change the BOX-123 type cells and leave the blank ones blank.
But the idea of the program is that I can use it to make some small changes to a template which will then automatically create some issues in JIRA.
import os
import csv
project = 'Dudgeon'
parent = 'BOX-111'
rows = (1,1007)
current = os.getcwd()
filename = 'test.csv'
filepath = os.path.join(os.getcwd(), filename)
#print(current)
#print(filename)
print(filepath)
with open(filepath, 'r') as csvfile:
readCSV = csv.reader(csvfile)
next(readCSV, None)
for row in readCSV:
print(row[16])
row_count =sum(1 for row in readCSV)
print(row_count)
with open(filepath, 'r') as infile, open('out.csv', 'w') as outfile:
outfile.write(infile.readline()) # write out the 1st line
for line in infile:
cols = line.strip().split(',')
cols[16] = project
outfile.write(','.join(cols) + '\n')
with open('out.csv', 'r') as infile, open('out1.csv', 'w') as outfile:
for row in infile:
if row % 2 != 0:
cols [15] = parent
outfile.write()
Any help really appreciated.
You want to use the row's index when comparing to 0. Use enumerate():
with open('out.csv', 'r') as infile, open('out1.csv', 'w') as outfile:
for rowidx,row in enumerate(infile):
cols = row.strip().split(',')
if rowidx % 2 != 0:
cols[15] = parent
outfile.write(cols)
You really should be using the csv module here, though. Untested but should get you started.
with open('out.csv', 'r') as infile, open('out1.csv', 'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for rowidx,row in enumerate(reader):
if rowidx % 2 != 0:
row[15] = parent
writer.write_row(row)
A friend helped me last night and this is what they came up with:
with open(filepath, 'r') as infile, open('out.csv', 'w') as outfile:
outfile.write(infile.readline()) # write out the 1st line
for line in infile:
cols = line.strip().split(',')
cols[16] = project
outfile.write(','.join(cols) + '\n')
with open('out.csv', 'r') as infile, open('out1.csv', 'w') as outfile:
outfile.write(infile.readline()) # write out the 1st line
lineCounter = 0
for line in infile:
lineCounter += 1
cols = line.strip().split(',')
if lineCounter % 2 != 0:
cols[15] = parent
outfile.write(','.join(cols) + '\n')
EDIT: Thanks for the answers guys, got what I needed!!
Basically I am trying to take what I have stored in my textfile and I am trying to write that into a .csv file. In my file are tweets that I have stored and I am trying to have one tweet in each cell in my .csv file.
Right now it is only taking one tweet and creating a .csv file with it and I need it to take all of them. Any help is greatly appreciated. Here is what I have so far.
with open('reddit.txt', 'rb') as f:
reader = csv.reader(f, delimiter=':', quoting = csv.QUOTE_NONE)
for row in reader:
print row
cr = csv.writer(open('reddit.csv', 'wb'))
cr.writerow(row)
You'll need to create the writer outside of the loop:
with open('reddit.txt', 'rb') as input_file:
reader = csv.reader(input_file, delimiter=':', quoting = csv.QUOTE_NONE)
with open('reddit.csv', 'wb') as output_file:
writer = csv.writer(output_file)
for row in reader:
writer.writerow(row)
Although here it might be cleaner to open the files without with:
input_file = open('reddit.txt', 'rb')
output_file = open('reddit.csv', 'wb')
reader = csv.reader(input_file, delimiter=':', quoting=csv.QUOTE_NONE)
writer = csv.writer(output_file)
for row in reader:
writer.writerow(row)
input_file.close()
output_file.close()
Or you can still use with and just have a really long line:
with open('reddit.txt', 'rb') as input_file, open('reddit.csv', 'wb') as output_file:
reader = csv.reader(input_file, delimiter=':', quoting = csv.QUOTE_NONE)
writer = csv.writer(output_file)
for row in reader:
writer.writerow(row)
The line cr = csv.writer(open('reddit.csv', 'wb')) is inside the for loop. You need to open the file just once, place this line after
reader = csv.reader(f, delimiter=':', quoting = csv.QUOTE_NONE)
Then write to it as you did in each loop iteration.