I want to read only first column from csv file. I tried the below code but didn't got the result from available solution.
data = open('data.csv')
reader = csv.reader(data)
interestingrows = [i[1] for i in reader]'
The error I got is:
Traceback (most recent call last):
File "G:/Setups/Python/pnn-3.py", line 12, in <module>
interestingrows = [i[1] for i in reader]
File "G:/Setups/Python/pnn-3.py", line 12, in <listcomp>
interestingrows = [i[1] for i in reader]
IndexError: list index out of range
You can also use DictReader to access columns by their header
For example: If you had a file called "stackoverflow.csv" with the headers ("Oopsy", "Daisy", "Rough", and "Tumble")
You could access the first column with this script:
import csv
with open(stackoverflow.csv) as csvFile:
#Works if the file is in the same folder,
# Otherwise include the full path
reader = csv.DictReader(csvFile)
for row in reader:
print(row["Oopsy"])
If you want the first item from an indexable iterable you should use 0 as the index. But in this case you can simply use zip() in order to get an iterator of columns and since the csv.reader returns an iterator you can use next() to get the first column.
with open('data.csv') as data:
reader = csv.reader(data)
first_column = next(zip(*reader))
Related
I am having trouble with parsing a specific key from a json string in a table. Below is my code to read a csv file and extract "employee_id" from the json column in each row:
with open('data.csv') as csvFile:
csv_reader = csv.reader(csvFile, delimiter=',')
next(csv_reader, None) # skips the header row
for row in csv_reader:
event_data = row[4]
data = json.loads(event_data)
print(data['employee_id'])
Here is a sample event_data output:
"{\"py/object\": \"employee_information.event_types.EmployeeCreated\", \"employee_id\": \"98765\", \"employee_first_name\": \"Jonathan\", \"employee_last_name\": \"Smith\", \"application_id\": \"1234\", \"address\": \"1234 street\"}"
But I get an error that says:
Traceback (most recent call last):
File "/Users/user/Documents/python_test/main.py", line 14, in <module>
print(data['employee_id'])
TypeError: string indices must be integers
I checked the type for data and it returns a str. I thought that json.loads was suppose to convert the json string into a python dict?
event_data is doubly-encoded for some reason, so you need to decode it twice.
data = json.loads(json.loads(event_data))
I am trying to find few items from a CSV file when I run the code sometimes it works but sometimes it produces error list index out of range
def find_check_in(name,date):
x = 0
f = open('employee.csv','r')
reader = csv.reader(f, delimiter=',')
for row in reader:
id = row[0]
dt = row[1]
v = row[2]
a = datetime.strptime(dt,"%Y-%m-%d")
if v == "Check-In" and id=="person":
x = 1
f.close()
return x
Traceback (most recent call last):
File "", line 51, in
x=find_check_in(name,date)
File "", line 21, in find_check_in
id = row[0]
IndexError: list index out of range
Your CSV file contains blank lines, resulting in row becoming an empty list, in which case there is no index 0, hence the error. Make sure your input CSV has no blank line, or add a condition to process the row only if it isn't empty:
for row in reader:
if row:
# the rest of your code
Seems like reader is returning a row with no elements. Does your data contain any such rows? Or perhaps you need to use the newline='' argument to reader?
https://docs.python.org/3/library/csv.html#csv.reader
Following is my input csv file contents
file3.csv:
a,ab
b,cd
c,nav
d,test
name,port
I want to write this into a existing csv file, in a specific column numbers.
For example:
I want to write, a,b,c,d,name into a column number --- AA
And I need to write ab,cd,nav,test,port into a column number ---AB
Python Script:
import csv
f1 = open ("file3.csv","r") # open input file for reading
with open('file4.csv', 'wb') as f: # output csv file
writer = csv.writer(f)
with open('file3.csv','r') as csvfile: # input csv file
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
row[7] = f1.readline() # edit the 8th column
writer.writerow(row)
f1.close()
I am getting following error:
MacBook-Pro:test$ python three.py
Traceback (most recent call last):
File "three.py", line 10, in
row[7] = f1.readline() # edit the 8th column
IndexError: list assignment index out of range
You can not use an index into a list for an element that does not already exist. You will need to increase the length of the row before assigning elements to specific indices.
If you want to assign to row[7] try this before:
if len(row) < 8:
row += [None] * (8 - len(row))
So, your inner loop will likely need to look something like:
for row in reader:
if len(row) < 8:
row += [None] * (8 - len(row))
new_values = f1.readline().strip().split(',')
row[7:7+1+len(new_values)] = new_values
writer.writerow(row)
I am 99% of the way there...
def xl_to_csv(xl_file):
wb = xlrd.open_workbook(xl_file)
sh = wb.sheet_by_index(0)
output = 'output.csv'
op = open(output, 'wb')
wr = csv.writer(op, quoting=csv.QUOTE_ALL)
for rownum in range(sh.nrows):
part_number = sh.cell(rownum,1)
#wr.writerow(sh.row_values(rownum)) #writes entire row
wr.writerow(part_number)
op.close()
using wr.writerow(sh.row_values(rownum)) I can write the entire row from the Excel file to a CSV, but there are like 150 columns and I only want one of them. So, I'm grabbing the one column that I want using part_number = sh.cell(rownum,1), but I can't seem to get the syntax correct to just write this variable out to a CSV file.
Here's the traceback:
Traceback (most recent call last):
File "test.py", line 61, in <module>
xl_to_csv(latest_file)
File "test.py", line 32, in xl_to_csv
wr.writerow(part_number)
_csv.Error: sequence expected
Try this:
wr.writerow([part_number.value])
The argument must be a list-like object.
The quickest fix is to throw your partnum in a list (and as per Abdou you need to add .value to get the value out of a cell):
for rownum in range(sh.nrows):
part_number = sh.cell(rownum,1).value # added '.value' to get value from cell
wr.writerow([part_number]) # added brackets to give writerow the list it wants
More generally, you can use a list comprehension to grab the columns you want:
cols = [1, 8, 110]
for rownum in range(sh.nrows):
wr.writerow([sh.cell(rownum, colnum).value for colnum in cols])
I'm trying to create a list out of the first row - the column headers - of a csv file using python. I put together this little script, but it prints two different lists. The first item printed is the first row of the csv and the second thing printed in the second row.
What am I doing wrong?
import csv
import sys
with open('agentsFullOutput.csv') as csvFile:
reader = csv.reader(csvFile)
print csvFile.next()
field_names_list = []
field_names_list = csvFile.next()
print field_names_list
Every time you call .next() it'll move on to the next row in the file. So you'll only get the headers of the CSV file in the first .next() call:
import csv
with open('agentsFullOutput.csv') as csvFile:
reader = csv.reader(csvFile)
field_names_list = reader.next()
Any subsequent .next() call will read the next row in the file so that would be a row of data.
Each time you call next() the next line from the file is yielded until the end of the file is reached.
In your code example, since you call next() twice, and in the second call you assign it to field_name_list, it assigns the 2nd row, not the first.
The following will assign the first row to the variable field_names_list.
with open('agentsFullOutput.csv') as csvFile:
reader = csv.reader(csvFile)
field_names_list = next(reader)
Using next(reader) instead of reader.next() means that the code is portable to Python 3. On the 3+ series, the iterator's next() method has been renamed to __next__() for consistency.
You can use the Pandas library to read the first few lines from the huge dataset.
import pandas as pd
data = pd.read_csv("names.csv", nrows=1)
You can mention the number of lines to be read in the nrows parameter.
with open(__csvfile) as csvFile:
reader = csv.DictReader(csvFile)