Cannot write a file - python

I have the following code:
import csv
import operator
import sys
with open('countryInfo.csv','r', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile, delimiter='\t')
result = sorted(reader, key = lambda d: float(d['population']),reverse = True)
# for row in result:
# print(row)
# for row in result:
# print(row['name'], row['capital'], row['population'])
writer = csv.DictWriter(open('country_simple_info.csv', 'w', encoding='utf-8'), reader.fieldnames)
#with open("country_simple_info.csv", "w", encoding='utf-8') as csvoutfile:
writer.writeheader()
writer.writerows(result)
The goal of this code is to write a program that opens a countryInfo.csv file and extracts the country name, capital city and population from each row, then writes a new file named country_simple_info.csv with country, capital and population in each row, with the rows sorted by population size, largest first. The file has columns with other information such as continent, languages, etc. but the code should ignore those. In my code above, when I uncomment the print statements, the code can print the expected output - something in the following format:
country,capital,population
China,Beijing,1330044000
India,New Delhi,1173108018
United States,Washington,310232863
.......
However, I cannot get the file to be written. Any ideas? And also, I am not allowed to use pandas.

with open('country_simple_info.csv', 'w', encoding='utf-8') as outputFile:
writer = csv.DictWriter(outputFile, reader.fieldnames)
writer.writeheader()
writer.writerows(result)
Using the "with open" statement will force the outputFile to close when it goes out of scope and thus saving the text written to it.
Hope this helps.

Related

CSV reading and writing; outputted CSV is blank

My program needs a function that reads data from a csv file ("all.csv") and extracts all the data pertaining to 'Virginia' (extract each row that has 'Virginia in it), then writes the extracted data to another csv file named "Virginia.csv" The program runs without error; however, when I open the "Virginia.csv" file, it is blank. My guess is that the issue is with my nested for loop, but I am not entirely sure what is causing the issue.
Here is the data within the all.csv file:
https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-counties.csv
Here is my code:
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state (input_file, output_file, state):
with open(input_file, 'r') as infile:
contents = infile.readlines()
with open(output_file, 'w') as outfile:
writer = csv.writer(outfile)
for row in range(len(contents)):
contents[row] = contents[row].split(',') #split elements
for row in range(len(contents)):
for word in range(len(contents[row])):
if contents[row][2] == state:
writer.writerow(row)
extract_records_for_state(input_file,output_file,state)
I ran your code and it gave me an error
Traceback (most recent call last):
File "c:\Users\Dolimight\Desktop\Stack Overflow\Geraldo\main.py", line 27, in
extract_records_for_state(input_file, output_file, state)
File "c:\Users\Dolimight\Desktop\Stack Overflow\Geraldo\main.py", line 24, in extract_records_for_state
writer.writerow(row)
_csv.Error: iterable expected, not int,
I fixed the error by putting the contents of the row [contents[row]] into the writerow() function and ran it again and the data showed up in Virginia.csv. It gave me duplicates so I also removed the word for-loop.
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state(input_file, output_file, state):
with open(input_file, 'r') as infile:
contents = infile.readlines()
with open(output_file, 'w') as outfile:
writer = csv.writer(outfile)
for row in range(len(contents)):
contents[row] = contents[row].split(',') # split elements
print(contents)
for row in range(len(contents)):
if contents[row][2] == state:
writer.writerow(contents[row]) # this is what I changed
extract_records_for_state(input_file, output_file, state)
You have two errors. The first is that you try to write the row index at writer.writerow(row) - the row is contents[row]. The second is that you leave the newline in the final column on read but don't strip it on write. Instead you could leverage the csv module more fully. Let the reader parse the rows. And instead of reading into a list, which uses a fair amount of memory, filter and write row by row.
import csv
input_file = 'all.csv'
output_file = 'Virginia.csv'
state = 'Virginia'
mylist = []
def extract_records_for_state (input_file, output_file, state):
with open(input_file, 'r', newline='') as infile, \
open(output_file, 'w', newline="") as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
# add header
writer.writerow(next(reader))
# filter for state
writer.writerows(row for row in reader if row[2] == state)
extract_records_for_state(input_file,output_file,state)
Looking at your code two things jump out at me:
I see a bunch of nested statements (logic)
I see you reading a CSV as plain text, then interpreting it as CSV yourself (contents[row] = contents[row].split(',')).
I recommend two things:
break up logic into distinct chunks: all that nesting can be hard to interpret and debug; do one thing, prove that works; do another thing, prove that works; etc...
use the CSV API to its fullest: use it to both read and write your CSVs
I don't want to try and replicate/fix your code, instead I'm offering this general approach to achieve those two goals:
import csv
# Read in
all_rows = []
with open('all.csv', 'r', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header (I didn't see you keep it)
for row in reader:
all_rows.append(row)
# Process
filtered_rows = []
for row in all_rows:
if row[2] == 'Virginia':
filtered_rows.append(row)
# Write out
with open('filtered.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(filtered_rows)
Once you understand both the logic and the API of those discrete steps, you can move on (advance) to composing something more complex, like the following which reads a row, decides if it should be written, and if so, writes it:
import csv
with open('filtered.csv', 'w', newline='') as f_out:
writer = csv.writer(f_out)
with open('all.csv', 'r', newline='') as f_in:
reader = csv.reader(f_in)
next(reader) # discard header
for row in reader:
if row[2] == 'Virginia':
writer.writerow(row)
Using either of those two pieces of code on this (really scaled-down) sample of all.csv:
date,county,state,fips,cases,deaths
2020-03-09,Fairfax,Virginia,51059,4,0
2020-03-09,Virginia Beach city,Virginia,51810,1,0
2020-03-09,Chelan,Washington,53007,1,1
2020-03-09,Clark,Washington,53011,1,0
gets me a filtered.csv that looks like:
2020-03-09,Fairfax,Virginia,51059,4,0
2020-03-09,Virginia Beach city,Virginia,51810,1,0
Given the size of this dataset, the second approach of write-on-demand-inside-the-read-loop is both faster (about 5x faster on my machine) and uses significantly less memory (about 40x less on my machine) because there's no intermediate storage with all_rows.
But, please take the time to run both, read them carefully, and see how each works the way it does.

getting average of some digits from a csv file as input and Write the averages in an output csv file in python 3

I am learning python3 :), and I am trying to read a CSV file with different rows
and take the average of the scores for each person(in each row)
and write it in a CSV file as an output in python 3.
The input file is like below:
David,5,2,3,1,6
Adele,3,4,1,5,2,4,2,1
...
The output file should seem like below:
David,4.75
Adele,2.75
...
It seems that I am reading the file correctly, as I print
the average for each name in the terminal, but in CSV
output file it prints only the average of the last name
of the input file, while I want to print all names and
corresponding averages in CSV output file.
Anybody can help me with it?
import csv
from statistics import mean
these_grades = []
name_list = []
reader = csv.reader(open('input.csv', newline=''))
for row in reader:
name = row[0]
name_list.append(name)
with open('result.csv', 'w', newline='\n') as f:
writer = csv.writer(f,
delimiter=',',
quotechar='"',
quoting=csv.QUOTE_MINIMAL)
for grade in row[1:]:
these_grades.append(int(grade))
for item in name_list:
writer.writerow([''.join(item), mean(these_grades)])
print('%s,%f' % (name , mean(these_grades)))
There are several issues in your code:
You're not using a context manager (with) when you read the input file. There's no reason to use it when writing but not when reading - you consequently don't close the "input.csv" file
You're using a list to store data from rows. This doesn't easily distinguish between the person's name and the scores associated with the person. It would be better to use a dictionary in which the key is the person's name, and the values stored against that key are the individual scores
You repeatedly open the file within a for loop in 'w' mode. Every time you open a file in write mode, it just wipes all the previous contents. You actually do write each row to the file, but you just wipe it again when you open the file on the next iteration.
You can use:
import csv
import statistics
# use a context manager to read the data in too, not just for writing
with open('input.csv') as infile:
reader = csv.reader(infile)
data = list(reader)
# Create a dictionary to store the scores against the name
scores = {}
for row in data:
scores[row[0]] = row[1:] # First item in the row is the key (name) and the rest is values
with open('output.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
# Now we need to iterate the dictionary and average the score on each iteration
for name, scores in scores.items():
ave_score = statistics.mean([int(item) for item in scores])
writer.writerow([name, ave_score])
This can be further consolidated, but it's less easy to see what's happening:
with open('input.csv') as infile, open('output.csv', 'w', newline='') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
for row in reader:
name = row[0]
values = row[1:]
ave_score = statistics.mean(map(int, values))
writer.writerow([name, ave_score])

Generate a new csv file and order data in ascending numeric order?

I have written a code that implements the given regex on every postcode that is included in the 'import_data.csv' file. It then generates a new csv file 'failed_validation.csv' which contains all the postcodes where the validation fails. The structure of both files is in the following format:
row_id postcode
134534 AABC 123
243534 AACD 4PQ
534345 QpCD 3DR
... ...
Following is my code:
import csv
import re
regex = r"(GIR\s0AA)|((([A-PR-UWYZ][0-9][0-9]?)|(([A-PR-UWYZ][A-HK-Y][0-9]((BR|FY|HA|HD|HG|HR|HS|HX|JE|LD|SM|SR|WC|WN|ZE)[0-9])[0-9])|([A-PR-UWYZ][A-HK-Y](AB|LL|SO)[0-9])|(WC[0-9][A-Z])|(([A-PR-UWYZ][0-9][A-HJKPSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))))\s[0-9][ABD-HJLNP-UW-Z]{2})"
codes = []
with open('../import_data.csv','r') as f:
r = csv.reader(f, delimiter=',')
for row in r:
if not(re.findall(regex, row[1])):
codes.append([row[0],row[1]])
with open('failed_validation.csv','w',newline='') as fp:
a = csv.writer(fp)
a.writerows(codes)
The code works fine but what I actually want is the postcodes in the new file need to be ordered as per the row_id, in ascending numeric order. I know how to generate a new file with Python, but I don't know how to order the data inside that file in ascending numeric order.
This will do it and preserve the header row:
import csv
import re
regex = r"(GIR\s0AA)|((([A-PR-UWYZ][0-9][0-9]?)|(([A-PR-UWYZ][A-HK-Y][0-9]((BR|FY|HA|HD|HG|HR|HS|HX|JE|LD|SM|SR|WC|WN|ZE)[0-9])[0-9])|([A-PR-UWYZ][A-HK-Y](AB|LL|SO)[0-9])|(WC[0-9][A-Z])|(([A-PR-UWYZ][0-9][A-HJKPSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY]))))\s[0-9][ABD-HJLNP-UW-Z]{2})"
codes = []
with open('import_data.csv', 'r', newline='') as fp:
reader = csv.reader(fp, delimiter=',')
header = next(reader)
for row in reader:
if not re.findall(regex, row[1]):
codes.append([row[0],row[1]])
with open('failed_validation.csv', 'w', newline='') as fp:
writer = csv.writer(fp)
writer.writerow(header)
writer.writerows(sorted(codes))
Sort your codes list before writing to the file.
headers = codes[0]
codes = sorted(codes[1:])
with open('failed_validation.csv','w',newline='') as fp:
a = csv.writer(fp)
a.writerow(header)
a.writerows(codes)

Create subset of large CSV file and write to new CSV file

I would like to create a subset of a large CSV file using the rows that have the 4th column ass "DOT" and output to a new file.
This is the code I currently have:
import csv
outfile = open('DOT.csv','w')
with open('Service_Requests_2015_-_Present.csv', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
if row[3] == "DOT":
outfile.write(row)
outfile.close()
The error is:
outfile.write(row)
TypeError: must be str, not list
How can I manipulate row so that I will be able to just straight up do write(row), if not, what is the easiest way?
You can combine your two open statements, as the with statement accepts multiple arguments, like this:
import csv
infile = 'Service_Requests_2015_-_Present.csv'
outfile = 'DOT.csv'
with open(infile, encoding='utf-8') as f, open(outfile, 'w') as o:
reader = csv.reader(f)
writer = csv.writer(o, delimiter=',') # adjust as necessary
for row in reader:
if row[3] == "DOT":
writer.writerow(row)
# no need for close statements
print('Done')
Make your outfile a csv.writer and use writerow instead of write.
outcsv = csv.writer(outfile, ...other_options...)
...
outcsv.writerow(row)
That is how I would do it... OR
outfile.write(",".join(row)) # comma delimited here...
In Above code you are trying to write list with file object , we can not write list that give error "TypeError: must be str, not list" you can convert list in string format then you able to write row in file. outfile.write(str(row))
or
import csv
def csv_writer(input_path,out_path):
with open(out_path, 'ab') as outfile:
writer = csv.writer(outfile)
with open(input_path, newline='', encoding='utf-8') as f:
reader = csv.reader(f)
for row in reader:
if row[3] == "DOT":
writer.writerow(row)
outfile.close()
csv_writer(input_path,out_path)
[This code for Python 3 version. In Python 2.7, the open function does not take a newline argument, hence the TypeError.]

How to Sort a Column in an Excel sheet

So I understand how sorting works in Python. If I put...
a = (["alpha2A", "hotel2A", "bravo2C", "alpha2B", "tango3B", "alpha3A", "zulu.A1", "foxtrot8F", "zulu.B1"]
a.sort()
print a
I will get...
'alpha2A', 'alpha2B', 'alpha3A', 'bravo2C', 'foxtrot8F', 'hotel2A', 'tango3B', 'zulu.A1', 'zulu.B1']
However, I want to sort a column in a Excel sheet so I tried...
isv = open("case_name.csv", "w+")
a = (["case_name.csv"[2]])
a.sort()
print a
And got a return of...
['s']
I understand that it is returning the 3rd letter in the file name but how do I make it sort and return the entire column of the Excel sheet?
Update: New Code
import csv
import operator
with open('case_name.csv') as infile:
data = list(csv.reader(infile, dialect=csv.excel_tab))
data.sort(key=operator.itemgetter(2))
with open('case_name_sorted.csv', 'w') as outfile:
writer = csv.writer(outfile, dialect='excel')
writer.writerows(data)
print(sum(1 for row in data if len(row) < 3))
And it returns
data = list(csv.reader(infile, dialect=csv.excel_tab))
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
import csv
import oprator
# read the data from the source file
with open('case_name.csv') as infile:
data = list(csv.reader(infile, dialect='excel'))
# sort a list of sublists by the item in index 2
data.sort(key=operator.itemgetter(2))
# time to write the results into a file
with open('case_name_sorted.csv', 'w') as outfile:
writer = csv.writer(outfile, dialect='excel')
writer.writerows(data)

Categories

Resources