I'm beginner in python, I'm trying to read a csv and to extract some of the result in another file:
import csv
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row[0])
I get the error IndexError: list index out of range. It happens when I select a row which doesn't exist. However, my csv as 5 columns and I can't isolate any of them.
Use the python Pandas library for File reading.
Make sure the encoding format for the CSV File.
import pandas as pd
data = pd.read_csv("file_name.csv")
data.head() //it will print the first 5 rows
//for 1 row
data.head(1)
check this, and you'll get the answer for yoru question
Related
Good evening,
I'm having a problem with a code I'm writing, and I would love to get advice. I want to do the following:
Remove rows in a .csv file that contain a specific value (-3.4028*10^38)
Write a new .csv
The file I'm working with is large (12.2 GB, 87 million rows), and has 6 columns within it, with the first 5 columns being numerical values, and the last value containing text.
Here is my code:
import csv
directory = "/media/gman/Folder1/processed/test_removal1.csv"
with open('run1.csv', 'r') as fin, open(directory, 'w', newline='') as fout:
# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=False)
writer = csv.writer(fout, delimiter=',')
# write headers
writer.writerow(next(reader))
# iterate and write rows based on condition
for i in reader:
if (i[-1]) == -3.4028E38:
writer.writerow(i)
When I run this I get the following error message:
Error: line contains NUL
File "/media/gman/Aerospace_Classes/Programs/csv_remove.py", line 19, in <module>
for i in reader: Error: line contains NUL
I'm not sure how to proceed. If anyone has any suggestions, please let me know. Thank you.
I figured it out. Here is what I ended up doing:
#IMPORT LIBRARIES
import pandas as pd
#IMPORT FILE PATH
directory = '/media/gman/Grant/Maps/processed_maps/csv_combined.csv'
#CREATE DATAFRAME FROM IMPORTED CSV
data = pd.read_csv(directory)
data.head()
data.drop(data[data.iloc[:,2] < -100000].index, inplace=True) #remove rows that contain altitude values greater than -100,000 meters.
# this is to remove the -3.402823E038 meter altitude values that keep coming up.
#CONVERT PROCESSED DATAFRAME INTO NEW CSV FILE
df = data.to_csv(r'/media/gman/Grant/Maps/processed_maps/corrected_altitude_data.csv') #export good data to this file.
I went with pandas to remove rows based on a logic argument, this made a dataframe. I then exported the dataframe into a csv file.
I am working on implementation of a data mining algorithm in python. I have a large csv file which I am using as the input file to get the itemsets. I want to split the csv file into rows through program. Can someone tell how to make it possible?
import pandas as pd
pd.read_csv(file_name,sep='rows separator')
see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html for details.
I assume the rows are delimited by new-lines and that columns are delimited by commas. In which case just python already knows how to read it line by line which in your case means row by row. Then each row can be split where there are commas.
item_sets=[] #Will put the data in here
with open(filename, "r") as file: # open the file
for data_row in file: #get data one row at a time
# split up the row into columns, stripping whitespace from each one
# and store it in item_sets
item_sets.append( [x.strip() for x in data_row.split(",")] )
import csv
with open('eggs.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print row
will printout all rows of a csv file as lists
I assume pandas impelmentation of read_csv is more efficient, but the csv module is built into python so if you don't want any dependencies, you can use it.
I am trying to copy few columns from a csv file to a new CSV file. I have written below code to fulfill my requirements. But it is not giving me the expected output. Can someone please help me to get the required results..
import csv
f = csv.reader(open("C:/Users/...../file.csv","rb"))
f2= csv.writer(open("C:/Users/.../test123.csv","wb"))
for row in f:
for column in row:
f2.writerow((column[1],column[2],column[3],column[7]))
f.close()
f2.close()
The second iteration over each row is not necessary. Just access the columns in that row with the column index.
Also, I don't think there's a close method in csv reader and writer.
import csv
f = csv.reader(open("file.csv","rb"))
f2= csv.writer(open("test123.csv","wb"))
for row in f:
f2.writerow((row[1],row[2],row[3],row[7]))
I have a code to read csv file by row
import csv
with open('example.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row)
print(row[0])
But i want only selected columns what is the technique could anyone give me a script?
import csv
with open('example.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
column_one = [row[0] for row in readCSV ]
Will give you list of values from the first column. That being said - you'll have to read the entire file anyway.
You can't do that, because files are written byte-by-byte to your filesystem. To know where one line ends, you will have to read all the line to detect the presence of a line-break character. There's no way around this in a CSV.
So you'll have to read all the file -- but you can choose which parts of each row you want to keep.
I would definitely use pandas for that.
However, in plain python this one of the way to do it.
In this example I am extracting the content of row 3, column 4.
import csv
target_row = 3
target_col = 4
with open('yourfile.csv', 'rb') as csvfile:
reader = csv.reader(csvfile)
n = 0
for row in reader:
if row == target_row:
data = row.split()[target_col]
break
print data
read_csv in pandas module can load a subset of columns.
Assume you only want to load columns 1 and 3 in your .csv file.
import pandas as pd
usecols = [1, 3]
df = pd.read_csv('example.csv',usecols=usecols, sep=',')
Here is Doc for read_csv.
In addition, if your file is big, you can read the file piece by piece by specifying chucksize in read_csv
I have an excel spreadsheet saved as a CSV file, but cannot find a way to call individual values from cells into Python using the CSV module. Any help would be greatly appreciated
There is also a Python library capable of reading xls data. Have a look at python-xlrd.
For writing xls data, you can use python-xlwt.
The csv module provide readers that iterate over the rows of a csv file - the rows are lists of strings. One way to get access to individual cells would be to:
Read the entire file in as a list of lists
import csv
with open('test.csv', 'r') as f:
reader = csv.reader(f)
the_whole_file = list(reader)
Then access the individual cells by indexing into the_whole_file. The first index is the row and the second index is the column - both are zero based. To access the cell at the second row, fourth column:
row = 1
column = 3
cell_R1_C3 = the_whole_file[row][column]
print cell_R1_C3
If you have the excel file as a CSV, you can use csv.reader
import csv
myFilePath = "/Path/To/Your/File"
with open(myFilePath,'rb') as csvfile:
reader = csv.reader( csvfile, delimiter=',' )
for row in reader:
# 'row' has all the cells (thanks to wwii for the fix!). Get the first 4 columns
a, b, c, d = row[:4]