I have an excel spreadsheet saved as a CSV file, but cannot find a way to call individual values from cells into Python using the CSV module. Any help would be greatly appreciated
There is also a Python library capable of reading xls data. Have a look at python-xlrd.
For writing xls data, you can use python-xlwt.
The csv module provide readers that iterate over the rows of a csv file - the rows are lists of strings. One way to get access to individual cells would be to:
Read the entire file in as a list of lists
import csv
with open('test.csv', 'r') as f:
reader = csv.reader(f)
the_whole_file = list(reader)
Then access the individual cells by indexing into the_whole_file. The first index is the row and the second index is the column - both are zero based. To access the cell at the second row, fourth column:
row = 1
column = 3
cell_R1_C3 = the_whole_file[row][column]
print cell_R1_C3
If you have the excel file as a CSV, you can use csv.reader
import csv
myFilePath = "/Path/To/Your/File"
with open(myFilePath,'rb') as csvfile:
reader = csv.reader( csvfile, delimiter=',' )
for row in reader:
# 'row' has all the cells (thanks to wwii for the fix!). Get the first 4 columns
a, b, c, d = row[:4]
Related
I'm beginner in python, I'm trying to read a csv and to extract some of the result in another file:
import csv
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row[0])
I get the error IndexError: list index out of range. It happens when I select a row which doesn't exist. However, my csv as 5 columns and I can't isolate any of them.
Use the python Pandas library for File reading.
Make sure the encoding format for the CSV File.
import pandas as pd
data = pd.read_csv("file_name.csv")
data.head() //it will print the first 5 rows
//for 1 row
data.head(1)
check this, and you'll get the answer for yoru question
I'm trying to convert a csv file into a dictionary for calculation purposes and print it in exactly the same order (e.g. a, b, c, d, etc.) as they were in the original csv file.
I just learnt and tried using csv.DictReader(file) but I realized every time I did a print, the sequence of the row, column keeps changing, or I would say it mess up randomly.
Any solution for this or is there any part I did wrong?
import csv
with open("breast_cancer_v1.csv", 'r') as file:
csv_file = csv.DictReader(file)
for row in csv_file:
print(dict(row))
I had name my column header as "a, b, c ,d ,e ,f ,h, i, j, k, l, m" inside my csv but the print sequence mess up as below:
output
notice the dictionary keys are not in order and my rows are also facing the same problem if i have too many rows in the csv (e.g. 100+ rows)
My original csv data looks like this:
Or if you really want to print it as a dictionary, then you have to give each column a name with the fieldnames option:
import csv
with open("artist.csv", newline='') as file:
fieldnames = ['first_col', 'sec_col',...,'last_col']
csv_file = csv.DictReader(file, fieldnames=fieldnames)
for row in csv_file:
print(row)
The order you list the fieldnames in will be the order they appear in. You'll have to remove the original column header in your csv file.
if you want to print in order just use this:
import csv
with open("breast_cancer_v1.csv", 'r') as file:
for row in file:
print(row)
I'm trying to combine a bunch of csv files. Each csv file has a different number of columns. This is not a problem, I can easily loop through the files and pull in all the column headers, pasting them into an empty file to use as a base.
The problem I'm having is that the column headers are on different rows in each file.
For example:
Table1
Random Text
!,Header1,Header2,Header3
*,123,124,5235
*,124,15,23624
*,135,677,234
Table2
Random Text
Random Text
!,Header1,Header2,Header4
*,124,2156,7478
*,126,12357,547
*,237,12,267
Output:
Table,Header1,Header2,Header3,Header4
Table1,123,124,5235
Table1,124,15,23624
Table1,135,677,234
Table2,124,2156,7478
Table2,126,12357,547
Table2,237,12,267
My existing code looks something like this:
files = glob.glob(r'//Directory/*.csv')
#This block goes through each file and works out which variables exist
variablelist=[]
for f in files:
with open(f,'r') as csvfile:
read_rows = csv.reader(csvfile)
for row in read_rows:
if row[0]!="*": #The last row with no * in column 1 is the header row
rowlist = row
variablelist.extend(x for x in rowlist if x not in variablelist)
list.sort(variablelist)
I use the fact that the header row is the last row without a * in the first column. I work out which row the headers are on and then store the header names in a list - combining the same list from all files.
I then try and combine the files together using this code that I found by searching this website:
with open("out.csv", "w", newline="") as f_out: # Comment 2 below
writer = csv.DictWriter(f_out, fieldnames=variablelist)
for f in files:
with open(f, "r", newline="",) as f_in:
reader = csv.DictReader(f_in) # Uses the field names in this file
for line in reader:
# Comment 3 below
writer.writerow(line)
The problem is, I don't know how to deal with the headers being on different lines. I tried using code to define the header row number, but don't know how to implement this into the code above - (Can dictreader skip a dynamic number of rows before finding headers?)
with open(f,'r') as csvfile:
read_rows = csv.reader(csvfile)
header_row_number = 0
for row in read_rows:
if row[0]!="*":
header_row_number=read_rows.line_num
Any help would be much appreciated
I am trying to copy few columns from a csv file to a new CSV file. I have written below code to fulfill my requirements. But it is not giving me the expected output. Can someone please help me to get the required results..
import csv
f = csv.reader(open("C:/Users/...../file.csv","rb"))
f2= csv.writer(open("C:/Users/.../test123.csv","wb"))
for row in f:
for column in row:
f2.writerow((column[1],column[2],column[3],column[7]))
f.close()
f2.close()
The second iteration over each row is not necessary. Just access the columns in that row with the column index.
Also, I don't think there's a close method in csv reader and writer.
import csv
f = csv.reader(open("file.csv","rb"))
f2= csv.writer(open("test123.csv","wb"))
for row in f:
f2.writerow((row[1],row[2],row[3],row[7]))
I have 125 data files containing two columns and 21 rows of data and I'd like to import them into a single .csv file (as 125 pairs of columns and only 21 rows).
This is what my data files look like:
I am fairly new to python but I have come up with the following code:
import glob
Results = glob.glob('./*.data')
fout='c:/Results/res.csv'
fout=open ("res.csv", 'w')
for file in Results:
g = open( file, "r" )
fout.write(g.read())
g.close()
fout.close()
The problem with the above code is that all the data are copied into only two columns with 125*21 rows.
Any help is very much appreciated!
This should work:
import glob
files = [open(f) for f in glob.glob('./*.data')] #Make list of open files
fout = open("res.csv", 'w')
for row in range(21):
for f in files:
fout.write( f.readline().strip() ) # strip removes trailing newline
fout.write(',')
fout.write('\n')
fout.close()
Note that this method will probably fail if you try a large number of files, I believe the default limit in Python is 256.
You may want to try the python CSV module (http://docs.python.org/library/csv.html), which provides very useful methods for reading and writing CSV files. Since you stated that you want only 21 rows with 250 columns of data, I would suggest creating 21 python lists as your rows and then appending data to each row as you loop through your files.
something like:
import csv
rows = []
for i in range(0,21):
row = []
rows.append(row)
#not sure the structure of your input files or how they are delimited, but for each one, as you have it open and iterate through the rows, you would want to append the values in each row to the end of the corresponding list contained within the rows list.
#then, write each row to the new csv:
writer = csv.writer(open('output.csv', 'wb'), delimiter=',')
for row in rows:
writer.writerow(row)
(Sorry, I cannot add comments, yet.)
[Edited later, the following statement is wrong!!!] "The davesnitty's generating the rows loop can be replaced by rows = [[]] * 21." It is wrong because this would create the list of empty lists, but the empty lists would be a single empty list shared by all elements of the outer list.
My +1 to using the standard csv module. But the file should be always closed -- especially when you open that much of them. Also, there is a bug. The row read from the file via the -- even though you only write the result here. The solution is actually missing. Basically, the row read from the file should be appended to the sublist related to the line number. The line number should be obtained via enumerate(reader) where reader is csv.reader(fin, ...).
[added later] Try the following code, fix the paths for your puprose:
import csv
import glob
import os
datapath = './data'
resultpath = './result'
if not os.path.isdir(resultpath):
os.makedirs(resultpath)
# Initialize the empty rows. It does not check how many rows are
# in the file.
rows = []
# Read data from the files to the above matrix.
for fname in glob.glob(os.path.join(datapath, '*.data')):
with open(fname, 'rb') as f:
reader = csv.reader(f)
for n, row in enumerate(reader):
if len(rows) < n+1:
rows.append([]) # add another row
rows[n].extend(row) # append the elements from the file
# Write the data from memory to the result file.
fname = os.path.join(resultpath, 'result.csv')
with open(fname, 'wb') as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)