Python: adding the same string to the end of each row - python

I'm modifying a CSV file with two fieldnames and I would like to add a string to the end of each row under an additional fieldname. I've figured out how to add the fieldname, but not how to add things to it. It seems like it should be simple, but I'm stumped.
import csv
with open('test2l.csv', 'r') as inny:
reader = csv.DictReader(inny)
with open('outfile.csv', 'w') as outty:
fieldnames = ["Genus", "Species", "Source"]
writer = csv.DictWriter(outty, fieldnames = fieldnames)
writer.writeheader()
for record in reader:
g = record['Genus']
s = record['Species']
Everything I tried has just added the string to the existing string in 'Species' and I haven't been able to create a record for 'Source', I assume because it's empty.
Thanks!

If you haven't already, check out the documentation for csv.Reader, csv.DictReader, and csv.DictWriter.
The documentation indicates that reader objects operate on an on object using the iterator protocol. Iterating once (for row in reader:, for example) to add a "Source" exhausts the underlying iterator. Attempting to then use the same reader with a writer later would not work.
To work around this, you could create a list:
rows = list(csv.DictReader(inny))
While this exhausts the iterator, you now have a list to work with. However, this might not be ideal if the list is very long.
Another solution would be to add the Source and write the row during the same iteration:
for row in reader:
row['Source'] = 'same string every time'
writer.writerow(row)

Simply write:
for record in reader:
record["Source"] = "whateversource"
And do so before the step where you are writing to a file.`

Related

Accessing Data in csv.reader

I'm trying to access a csv file of currency pairs using csv.reader. The first column shows dates, the first row shows the currency pair eg.USD/CAD. I can read in the file but cannot access the currency pairs data to perform simple calculations.
I've tried using next(x) to skip header row (currency pairs). If i do this, i get a Typeerror: csv reader is not subscriptable.
path = x
file = open(path)
dataset = csv.reader(file, delimiter = '\t',)
header = next(dataset)
header
Output shows the header row which is
['Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR']
I expect to be able to access the underlying currency pairs but i'm getting the type error as noted above. Is there a simple way to access the currency pairs, for example I want to use USD.describe() to get simple statistics on the USD currency pair.
How can i move from this stage to accessing the data underlying the header row?
try this example
import csv
with open('file.csv') as csv_file:
csv_reader = csv.Reader(csv_file, delimiter='\t')
line_count = 0
for row in csv_reader:
print(f'\t{row[0]} {row[1]} {row[3]}')
It's apparent from the output of your header row that the columns are comma-delimited rather than tab-delimited, so instead of passing delimiter = '\t' to csv.reader, you should let it use the default delimiter ',' instead:
dataset = csv.reader(file)
If you need to elaborate some statistics pandas is your friend. No need to use the csv module, use pandas.read_csv.
import pandas
filename = 'path/of/file.csv'
dataset = pandas.read_csv(filename, sep = '\t') #or whatever the separator is
pandas.read_csv uses the first line as the header automatically.
To see statistics, simply do:
dataset.describe()
Or for a single column:
dataset['column_name'].describe()
Are you sure that your delimiter is '\t'? In first row your delimiter is ','... Anyway you can skip first row by doing file.readline() before using it by csv.reader:
import csv
example = """Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR
1-2-3\tabc\t1.1\t1.2
4-5-6\txyz\t2.1\t2.2
"""
with open('demo.csv', 'w') as f:
f.write(example)
with open('demo.csv') as f:
f.readline()
reader = csv.reader(f, delimiter='\t')
for row in reader:
print(row)
# ['1-2-3', 'abc', '1.1', '1.2']
# ['4-5-6', 'xyz', '2.1', '2.2']
I think that you need something else... Can you add to your question:
example of first 3 lines in your csv
Example of what you'd like to access:
is using row[0], row[1] enough for you?
or do you want "named" access like row['Date'], row['USD'],
or you want something more complex like data_by_date['2019-05-01']['USD']

Making Python ignore CSV separator instruction [duplicate]

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.
You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:
To skip the first line just call:
next(inf)
Files in Python are iterators over lines.
Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...
In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))
You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process
use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc
Python 2.x
csvreader.next()
Return the next row of the reader’s iterable object as a list, parsed
according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
csvreader.__next__()
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row
The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row
this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()
The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()
Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)
Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])
For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])
I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)
I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py
just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython
Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]
Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.

Code swap. How would I swap the value of one CSV file column to another?

I have two CSV files. The first file(state_abbreviations.csv) has only states abbreviations and their full state names side by side(like the image below), the second file(test.csv) has the state abbreviations with additional info.
I want to replace each state abbreviation in test.csv with its associated state full name from the first file.
My approach was to read reach file, built a dict of the first file(state_abbreviations.csv). Read the second file(test.csv), then compare if an abbreviation matches the first file, if so replace it with the full name.
Any help is appreacited
import csv
state_initials = ("state_abbr")
state_names = ("state_name")
state_file = open("state_abbreviations.csv","r")
state_reader = csv.reader(state_file)
headers = None
final_state_initial= []
for row in state_reader:
if not headers:
headers = []
for i, col in enumerate(row):
if col in state_initials:
headers.append(i)
else:
final_state_initial.append((row[0]))
print final_state_initial
headers = None
final_state_abbre= []
for row in state_reader:
if not headers:
headers = []
for i, col in enumerate(row):
if col in state_initials:
headers.append(i)
else:
final_state_abbre.append((row[1]))
print final_state_abbre
final_state_initial
final_state_abbre
state_dictionary = dict(zip(final_state_initial, final_state_abbre))
print state_dictionary
You almost got it, the approach that is - building out a dict out of the abbreviations is the easiest way to do this:
with open("state_abbreviations.csv", "r") as f:
# you can use csv.DictReader() instead but lets strive for performance
reader = csv.reader(f)
next(reader) # skip the header
# assuming the first column holds the abbreviation, second the full state name
state_map = {state[0]: state[1] for state in reader}
Now you have state_map containing a map of all your state abbreviations, for example: state_map["FL"] contains Florida.
To replace the values in your test.csv, tho, you'll either have to load the whole file into memory, parse it, do the replacement and save it, or create a temporary file and stream-write to it the changes, then overwrite the original file with the temporary file. Assuming that test.csv is not too big to fit into your memory, the first approach is much simpler:
with open("test.csv", "r+U") as f: # open the file in read-write mode
# again, you can use csv.DictReader() for convenience, but this is significantly faster
reader = csv.reader(f)
header = next(reader) # get the header
rows = [] # hold our rows
if "state" in header: # proceed only if `state` column is found in the header
state_index = header.index("state") # find the state column index
for row in reader: # read the CSV row by row
current_state = row[state_index] # get the abbreviated state value
# replace the abbreviation if it exists in our state_map
row[state_index] = state_map.get(current_state, current_state)
rows.append(row) # append the processed row to our `rows` list
# now lets overwrite the file with updated data
f.seek(0) # seek to the file begining
f.truncate() # truncate the rest of the content
writer = csv.writer(f) # create a CSV writer
writer.writerow(header) # write back the header
writer.writerows(rows) # write our modified rows
It seems like you are trying to go through the file twice? This is absolutely not necessary: the first time you go through you are already reading all the lines, so you can then create your dictionary items directly.
In addition, comprehension can be very useful when creating lists or dictionaries. In this case it might be a bit less readable though. The alternative would be to create an empty dictionary, start a "real" for-loop and adding all the key:value pairs manually. (i.e: with state_dict[row[abbr]] = row[name])
Finally, I used the with statement when opening the file to ensure it is safely closed when we're done with it. This is good practice when opening files.
import csv
with open("state_abbreviations.csv") as state_file:
state_reader = csv.DictReader(state_file)
state_dict = {row['state_abbr']: row['state_name'] for row in state_reader}
print(state_dict)
Edit: note that, like the code you showed, this only creates the dictionary that maps abbreviations to state names. Actually replacing them in the second file would be the next step.
Step 1: Ask Python to remember the abbreviated full names, so we are using dictionary for that
with open('state_abbreviations.csv', 'r') as f:
csvreader = csv.reader(f)
next(csvreader)
abs = {r[0]: r[1] for r in csvreader}
step 2: Replace the abbreviations with full names and write to an output, I used "test_output.csv"
with open('test.csv', 'r') as reading:
csvreader = csv.reader(reading)
next(csvreader)
header = ['name', 'gender', 'birthdate', 'address', 'city', 'state']
with open( 'test_output.csv', 'w' ) as f:
writer = csv.writer(f)
writer.writerow(header)
for a in csvreader:
writer.writerow(a[0], a[1], a[2], a[3], a[4], abs[a[5]])

Sorting a table in python

I am creating a league table for a 6 a side football league and I am attempting to sort it by the points column and then display it in easygui. The code I have so far is this:
data = csv.reader(open('table.csv'), delimiter = ',')
sortedlist = sorted(data, key=operator.itemgetter(7))
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
for row in sortedlist:
fileWriter.writerow(row)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")
os.close
The number 7 relates to the points column in my csv file. I have a problem with Newtable only containing the teams information that has the highest points and the table.csv is apparently being used by another process and so cannot be removed.
If anyone has any suggestions on how to fix this it would be appreciated.
If the indentation in your post is actually the indentation in your script (and not a copy-paste error), then the problem is obvious:
os.rename() is executed during the for loop (which means that it's called once per line in the CSV file!), at a point in time where Newtable.csv is still open (not by a different process but by your script itself), so the operation fails.
You don't need to close f, by the way - the with statement takes care of that for you. What you do need to close is data - that file is also still open when the call occurs.
Finally, since a csv object contains strings, and strings are sorted alphabetically, not numerically (so "10" comes before "2"), you need to sort according to the numerical value of the string, not the string itself.
You probably want to do something like
with open('table.csv', 'rb') as infile:
data = csv.reader(infile, delimiter = ',')
sortedlist = [next(data)] + sorted(data, key=lambda x: int(x[7])) # or float?
# next(data) reads the header before sorting the rest
with open("Newtable.csv", "wb") as f:
fileWriter = csv.writer(f, delimiter=',')
fileWriter.writerows(sortedList) # No for loop needed :)
os.remove("table.csv")
os.rename("Newtable.csv", "table.csv")
I'd suggest using pandas:
Assuming an input file like this:
team,points
team1, 5
team2, 6
team3, 2
You could do:
import pandas as pd
a = pd.read_csv('table.csv')
b=a.sort('points',ascending=False)
b.to_csv('table.csv',index=False)

Add rows to a csvfile without creating an intermediate copy

How can I add rows to a csvfile by editing in place? I want to avoid the pattern of writing to a temp file and then replacing the original, (pseudocode):
add_records_to_csv(newdata, infile, tmpfile)
delete(infile)
rename(tmpfile, infile)
Here's the actual function. The lines "# <--" are what I want to get rid of and/or condense into something more straightforward:
def add_records_to_csv(dic, csvfile):
""" Append a dictionary to a CSV file.
Adapted from http://pymotw.com/2/csv/
"""
f_old = open(csvfile, 'rb') # <--
csv_old = csv.DictReader(f_old) # <--
fpath, fname = os.path.split(csvfile) # <--
csvfile_new = os.path.join(fpath, 'new_' + fname ) # <--
print(csvfile_new) # <--
f = open(csvfile_new, 'wb') # <--
try:
fieldnames = sorted(set(dic.keys() + csv_old.fieldnames))
writer = csv.DictWriter(f, fieldnames=fieldnames)
headers = dict( (n,n) for n in fieldnames )
writer.writerow(headers)
for row in csv_old:
writer.writerow(row)
writer.writerow(dic)
finally:
f_old.close()
f.close()
return csvfile_new
This is not going to be possible in general. Here is the reason, from your code:
fieldnames = sorted(set(dic.keys() + csv_old.fieldnames))
To me, this says that at least in some cases your new row contains columns that were not in the previous rows. When you add a row like this, you will have to update the header of the file (the first line), in addition to appending new rows at the end. If you need to have the column names in alphabetized order, then you may have to rearrange the fields in all the other rows in order to retain the ordering of the columns.
Because you may need to edit the first line of the file, in addition to appending new lines at the end and possibly editing all the lines in-between, there isn't a reasonable way to make this work in-place.
My suggestion is to try and figure out, ahead of time, all the fields/columns that you may need to include so that you guarantee your program will never have to edit the header and can simply add new rows.
If your new row has the same structure as the existing records the following will work:
import csv
def append_record_to_csv(dic, csvfile):
with open(csvfile, 'rb') as f:
# discover order of field names in header row
fieldnames = next(csv.reader(f))
with open(csvfile, 'ab') as f:
# assumes that dic contains only fieldnames in csv file
dwriter = csv.DictWriter(f, fieldnames=fieldnames)
dwriter.writerow(dic)
On the other hand, if your new row as a different structure than the existing rows a csv file is probably the wrong file format. In order to add a new column to a csv file every row needs to be edited. The performance of this approach is very bad and will be quite noticeable with a large csv file.

Categories

Resources