Python: Adding a column and row of data to exisitng csv file - python

I am able to create a csv file with the necessary header and then append it with output. Now I would like to reopen the csv, create another header and then add data to the rows based on an if, else condition.
When I print the results out to console, I get the desired output (as seen below), but when I try to append the output to the csv file, I'm not seeing the same results.
Title: update 1
Added or Deleted Files: True
Title: update 2
Added or Deleted Files: False
Title: update 3
Added or Deleted Files: False
I believe it's how the if condition is executed when opening the csv file, but I can't seem to figure where I'm going wrong. The new column, Add or Deleted Files is created, but the values added to the rows beneath it don't match the output I get in the console, which is the correct output. The output under the column for Added or Deleted Files are all True and not True, False, False as shown in the console output. The Title column and titles of the pull request are all captured correctly as well as the new column in the new csv file, it's the values under Added or Deleted Files that are incorrect (as seen in the output below).
Title,Added or Deleted Files
update 1,True
update 2,True
update 3,True
The code contains print to console and output to csv. Thanks in advance for any help. It's the last with open statements that open the existing csv, creates a new one and then adds the column, but incorrect row data that's giving me trouble.
with open(filename, 'w+', newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['Title'])
for prs in repo.pull_requests():
getlabels = repo.issue(prs.number).as_dict()
if 'ready-to-merge' in [getlabels['name'] for getlabels in getlabels['labels']] and 'Validation Succeeded' in [getlabels['name'] for getlabels in getlabels['labels']]:
changes = repo.pull_request(prs.number).as_dict()
#print to console statement
print('Title: ', changes['title'])
#output to csv
with open(filename,'a+',newline='') as f:
csv_writer = csv.writer(f)
csv_writer.writerow([changes['title']])
#print to console
if 'added' in (data.status for data in repo.pull_request(prs.number).files()) or 'removed' in (data.status for data in repo.pull_request(prs.number).files()):
print('Added or Deleted Files: True')
else:
print('Added or Deleted Files: False')
#output to new csv with added column and new data
with open(filename, 'r') as csvinput:
with open(filename2, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator="\n")
reader = csv.reader(csvinput)
all = []
row = next(reader)
row.append('Added or Deleted Files')
all.append(row)
for row in reader:
all.append(row)
if 'added' in (data.status for data in repo.pull_request(prs.number).files()) or 'removed' in (data.status for data in repo.pull_request(prs.number).files()):
row.append('True')
else:
row.append('False')
writer.writerows(all)

The structure of your code is broken. Here is what happens:
open csv file, write header line, close
loop over request
add request to csv file (by open, append, close)
open second file and erase it (because of "w" mode)
for each line in csv
copy field from csv file
copy status of current request
So your result file was written in totality in last request iteration, and the value of 2nd column for that last request is consistenly copied to every line.
Your code should be:
with open(filename, 'w+', newline='') as f, open(filename2, 'w') as csvoutput:
csv_writer = csv.writer(f)
writer = csv.writer(csvoutput, lineterminator="\n")
row = ['Title']
csv_writer.writerow(row)
row.append('Added or Deleted Files')
writer.writerow(row)
for prs in repo.pull_requests():
...
row = [changes['title']]
csv_writer.writerow(row)
csv_writer.writerow([changes['title']])
...
if 'added' in (data.status for data in repo.pull_request(prs.number).files()) or 'removed' in (data.status for data in repo.pull_request(prs.number).files()):
row.append('True')
else:
row.append('False')
writer.writerow(row)
That is:
open the files once at the beginning of block and only close them at the end.
write the two files one row at a time, when processing elements from repo.pull_requests()
append the second column to row after writing to csv file and before writing to second file.

Related

copy all except first row into new csv file python

I generate several csv files each day, and I'm trying to write a python script that will open each csv file, one by one, rewrite the csv to a new location and fill in the mission information that is in the filename.
I've been able to cobble together separate python scripts that can do most of it, but I'm hitting brick walls.
each CSV has a standard filename:
"Number_Name_Location.csv"
Row 1 has a header that labels each column, Name, Location, Date etc, and each .csv can have n number of pre-filled in rows.
what I'm trying to automate is the following steps:
Opens the 1st .csv in the “/ToParse” folder
Fill in the remaining details for all rows, which is gathered from the filename itself.
Column A: Number
Column B: Date of Parsing
Column k: Name
Column L: Location
writes a new csv in the folder “/Parsed” w
continue to parse the next .csv until no .csv are in the "/ToParse" folder
move all orginal parsed files to /Parsed/Original
below is the cobbled together code, which is overwriting the header row,
How can i adjust the code to ignore the first row in the csv it's opening and just copy from rows 1 onward
import glob
import csv
from datetime import date
with open('Parsed/output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
for filename in glob.glob('*.csv'):
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
filename_parts = filename.split("_")
#print(filename_parts[0])
for row in csv_input:
row[0] = filename_parts[0]
row[1] = date.today()
row[10] = filename_parts[1]
row[11] = filename_parts[2]
csv_output.writerow(row)
The open(filename, newline='') seems strange to me. Are you sure you want newline=''?
You can skip the first line by manually doing f_input.readline() before the for loop. This will read (=skip) the first line and the for-loop will start on the next line.
Or you could do for row in csv_input[1:] but I am not sure that works and cannot test it right now. ([1:] means skip the first item in any list-like variable (formally called: Iterable) )

python csv file add to field based off another field

I have a csv file looks like this:
I have a column called “Inventory”, within that column I pulled data from another source and it put it in a dictionary format as you see.
What I need to do is iterate through the 1000+ lines, if it sees the keywords: comforter, sheets and pillow exist than write “bedding” to the “Location” column for that row, else write “home-fashions” if the if statement is not true.
I have been able to just get it to the if statement to tell me if it goes into bedding or “home-fashions” I just do not know how I tell it to write the corresponding results to the “Location” field for that line.
In my script, im printing just to see my results but in the end I just want to write to the same CSV file.
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for line in csv_dict_reader:
if 'comforter' in line['Inventory'] and 'sheets' in line['Inventory'] and 'pillow' in line['Inventory']:
print('Bedding')
print(line['Inventory'])
else:
print('home-fashions')
print(line['Inventory'])
The last column of your csv contains commas. You cannot read it using DictReader.
import re
data = []
with open('test.csv', 'r') as f:
# Get the header row
header = next(f).strip().split(',')
for line in f:
# Parse 4 columns
row = re.findall('([^,]*),([^,]*),([^,]*),(.*)', line)[0]
# Create a dictionary of one row
item = {header[0]: row[0], header[1]: row[1], header[2]: row[2],
header[3]: row[3]}
# Add each row to the list
data.append(item)
After preparing your data, you can check with your conditions.
for item in data:
if all([x in item['Inventory'] for x in ['comforter', 'sheets', 'pillow']]):
item['Location'] = 'Bedding'
else:
item['Location'] = 'home-fashions'
Write output to a file.
import csv
with open('output.csv', 'w') as f:
dict_writer = csv.DictWriter(f, data[0].keys())
dict_writer.writeheader()
dict_writer.writerows(data)
csv.DictReader returns a dict, so just assign the new value to the column:
if 'comforter' in line['Inventory'] and ...:
line['Location'] = 'Bedding'
else:
line['Location'] = 'home-fashions'
print(line['Inventory'])

Too many open output files while splitting a CSV

Very novice attempt at python here.
I tried implementing something like was discussed in this question Splitting csv file based on a particular column using Python
My goal is to take a file with 15 million lines of 500 ticker symbols and put each ticker in their own file.
However, when I'm running it, I'm getting
OSError: [Errno 24] Too many open files: 'APH.csv'
All of the lines of data are in order (ie all of the lines of data for ticker "A" are one right after another, so I could close a file before going on to the next one). I'm not sure where in this code I would close the file before going on to the next one. FYI - this is on a Mac if that matters.
My code is
import csv
with open('WIKI_PRICES_big.csv') as fin:
csvin = csv.DictReader(fin)
# Category -> open file lookup
outputs = {}
for row in csvin:
cat = row['ticker']
# Open a new file and write the header
if cat not in outputs:
fout = open('{}.csv'.format(cat), 'w')
dw = csv.DictWriter(fout, fieldnames=csvin.fieldnames)
dw.writeheader()
outputs[cat] = fout, dw
# Always write the row
outputs[cat][1].writerow(row)
# Close all the files
for fout, _ in outputs.values():
fout.close()
Based on the file structure you describe, the following should do it.
The trick is that if the ticker values are always in order, you only need to keep open a single file output file at any one time. You can then close the old one and reopen the new one when you come across a new ticker value.
import csv
fout = False
with open('WIKI_PRICES_big.csv') as fin:
csvin = csv.DictReader(fin)
seen = []
for row in csvin:
cat = row['ticker']
# Open a new file and write the header.
if cat not in seen:
seen.append(cat)
if fout: # Close old file if we have one.
fout.close()
fout = open('{}.csv'.format(cat), 'w')
dw = csv.DictWriter(fout, fieldnames=csvin.fieldnames)
dw.writeheader()
# Always write the row
dw.writerow(row)
fout.close()

Skip first row in a file

I'm trying to open 2 files, read the contents of both files, and export the rows of both files to an output file.
The issue I'm having is that the 2nd file that I'm reading in has the same headers in it's first row as the 1st, so I'm trying to skip writing the 1st row of the 2nd file to my new outputFile, but the code written below doesn't execute that.
I know my code says if row[1]=='Full Name' and not column[1], but if I skip writting row[1] to the outputFile, it skips the first column and not the first row, so I figured I'd use that first column in my if statement. The first column's header in my input data is Full Name, so that's why I used that particular if statement.
I didn't include incoming data because I didn't think it was necesary to answer this question, but if you feel it would be helpful, I'm more than glad to post it up here.
If anyone can help me skip writing the first row of the second incoming into my outputFile it would be greatly appreciated.
import csv, sys
firstFile = open(sys.argv[1],'rU')# U to parse
reader1 = csv.reader(firstFile, delimiter=',')
outputFile = open((sys.argv[3]), 'w')
for row in reader1:
row=(str(row))
row=row.replace("'", "")
row=row.replace("[", "")
row=row.replace("]", "")
outputFile.write(row)
outputFile.write('\n')
secondFile = open(sys.argv[2],'rU')
reader2 = csv.reader(secondFile, delimiter=',')
for row in reader2:
row=(str(row))
row=row.replace("'", "")
row=row.replace("[", "")
row=row.replace("]", "")
if row[1]=='Full Name':
next(reader2)
else:
outputFile.write(row)
outputFile.write('\n')
You can use the next function:
secondFile = open(sys.argv[2],'rU')
# skip first line
next(secondFile)
# csv parser does not see the first line of the file
reader2 = csv.reader(secondFile, delimiter=',')

add a new column to an existing csv file

I have a csv file with 5 columns and I want to add data in a 6th column. The data I have is in an array.
Right now, the code that I have will insert the data I would want in the 6th column only AFTER all the data that already exists in the csv file.
For instance I have:
wind, site, date, time, value
10, 01, 01-01-2013, 00:00, 5.1
89.6 ---> this is the value I want to add in a 6th column but it puts it after all the data from the csv file
Here is the code I am using:
csvfile = 'filename'
with open(csvfile, 'a') as output:
writer = csv.writer(output, lineterminator='\n')
for val in data:
writer.writerow([val])
I thought using 'a' would append the data in a new column, but instead it just puts it after ('under') all the other data... I don't know what to do!
Appending writes data to the end of a file, not to the end of each row.
Instead, create a new file and append the new value to each row.
csvfile = 'filename'
with open(csvfile, 'r') as fin, open('new_'+csvfile, 'w') as fout:
reader = csv.reader(fin, newline='', lineterminator='\n')
writer = csv.writer(fout, newline='', lineterminator='\n')
if you_have_headers:
writer.writerow(next(reader) + [new_heading])
for row, val in zip(reader, data)
writer.writerow(row + [data])
On Python 2.x, remove the newline='' arguments and change the filemodes from 'r' and 'w' to 'rb' and 'wb', respectively.
Once you are sure this is working correctly, you can replace the original file with the new one:
import os
os.remove(csvfile) # not needed on unix
os.rename('new_'+csvfile, csvfile)
csv module does not support writing or appending column. So the only thing you can do is: read from one file, append 6th column data, and write to another file. This shows as below:
with open('in.txt') as fin, open('out.txt', 'w') as fout:
index = 0
for line in fin:
fout.write(line.replace('\n', ', ' + str(data[index]) + '\n'))
index += 1
data is a int list.
I test these codes in python, it runs fine.
We have a CSV file i.e. data.csv and its contents are:
#data.csv
1,Joi,Python
2,Mark,Laravel
3,Elon,Wordpress
4,Emily,PHP
5,Sam,HTML
Now we want to add a column in this csv file and all the entries in this column should contain the same value i.e. Something text.
Example
from csv import writer
from csv import reader
new_column_text = 'Something text'
with open('data.csv', 'r') as read_object, \
open('data_output.csv', 'w', newline='') as write_object:
csv_reader = reader(read_object)
csv_writer = writer(write_object)
for row in csv_reader:
row.append(new_column_text)
csv_writer.writerow(row)
Output
#data_output.csv
1,Joi,Python,Something text
2,Mark,Laravel,Something text
3,Elon,Wordpress,Something text
4,Emily,PHP,Something text
5,Sam,HTML,Something text
The append mode of opening files is meant to add data to the end of a file. what you need to do is provide random access to your file writing. you need to use the seek() method
you can see and example here:
http://www.tutorialspoint.com/python/file_seek.htm
or read the python docs on it here: https://docs.python.org/2.4/lib/bltin-file-objects.html which isn't terribly useful
if you want to add to the end of a column you may want to open the file read a line to figure out it's length then seek to the end.

Categories

Resources