I am trying to read a 3 column csv into a dictionary with the code below. The 1st column is the unique identifier, and the following 2 are information related.
d = dict()
with open('filemane.csv', 'r') as infile:
reader = csv.reader(infile)
mydict = dict((rows[0:3]) for rows in reader)
print mydict
When I run this code I get this error:
Traceback (most recent call last):
File "commissionsecurity.py", line 34, in <module>
mydict = dict((rows[0:3]) for rows in reader)
ValueError: dictionary update sequence element #0 has length 3; 2 is required
Dictionaries need to get a key along with a value. When you have
mydict = dict((rows[0:3]) for rows in reader)
^^^^^^^^^
ambiguous
You are passing in a list that is of length 3 which is not of the length 2 (key, value) format that is expected. The error message hints at this by saying that the length required is 2 and not the 3 that was provided. To fix this make the key be rows[0] and the associated value be rows[1:3]:
mydict = dict((rows[0], rows[1:3]) for rows in reader)
^^^^^^ ^^^^^^^^^
key value
You can do something along the lines of:
with open('filemane.csv', 'r') as infile:
reader = csv.reader(infield)
d={row[0]:row[1:] for row in reader}
Or,
d=dict()
with open('filemane.csv', 'r') as infile:
reader = csv.reader(infield)
for row in reader:
d[row[0]]=row[1:]
Related
I am trying to find few items from a CSV file when I run the code sometimes it works but sometimes it produces error list index out of range
def find_check_in(name,date):
x = 0
f = open('employee.csv','r')
reader = csv.reader(f, delimiter=',')
for row in reader:
id = row[0]
dt = row[1]
v = row[2]
a = datetime.strptime(dt,"%Y-%m-%d")
if v == "Check-In" and id=="person":
x = 1
f.close()
return x
Traceback (most recent call last):
File "", line 51, in
x=find_check_in(name,date)
File "", line 21, in find_check_in
id = row[0]
IndexError: list index out of range
Your CSV file contains blank lines, resulting in row becoming an empty list, in which case there is no index 0, hence the error. Make sure your input CSV has no blank line, or add a condition to process the row only if it isn't empty:
for row in reader:
if row:
# the rest of your code
Seems like reader is returning a row with no elements. Does your data contain any such rows? Or perhaps you need to use the newline='' argument to reader?
https://docs.python.org/3/library/csv.html#csv.reader
I have a homework problem where we have to sort a specific csv file by column 3 in descending order. Then we have to return all the rows that have the max value in column 3 and we can't use pandas because we haven't learned how to use them yet. We have to code it has a function so the professor can call our function into his code and see it play out.
def bigRow():
new_row = []
with open('assignment2Data.csv', 'rU', newline='') as f:
reader = csv.reader(f, delimiter='^')
data = [x for x in reader]
max_thirdcol_val = max([x[2] for x in data])
for row in data:
if row[2] == max_thirdcol_val:
new_row.append(row)
return new_row
It errors out when I submit it because:
File "UNITTEST.py", line 21, in test_unit
assert(bigRow('assignment2Data.csv', 3)==answer or bigRow('assignment2Data.csv', 3)==answer2)
TypeError: bigRow() takes 0 positional arguments but 2 were given
The test code is trying to call your function with two arguments (the filename and the column index). So you need to write your function as:
def bigRow(filename, column_index):
with open(filename, 'rU', newline='') as f:
reader = csv.reader(f, delimiter='^')
data = [x for x in reader]
# TODO: Complete this function :)
I want to read only first column from csv file. I tried the below code but didn't got the result from available solution.
data = open('data.csv')
reader = csv.reader(data)
interestingrows = [i[1] for i in reader]'
The error I got is:
Traceback (most recent call last):
File "G:/Setups/Python/pnn-3.py", line 12, in <module>
interestingrows = [i[1] for i in reader]
File "G:/Setups/Python/pnn-3.py", line 12, in <listcomp>
interestingrows = [i[1] for i in reader]
IndexError: list index out of range
You can also use DictReader to access columns by their header
For example: If you had a file called "stackoverflow.csv" with the headers ("Oopsy", "Daisy", "Rough", and "Tumble")
You could access the first column with this script:
import csv
with open(stackoverflow.csv) as csvFile:
#Works if the file is in the same folder,
# Otherwise include the full path
reader = csv.DictReader(csvFile)
for row in reader:
print(row["Oopsy"])
If you want the first item from an indexable iterable you should use 0 as the index. But in this case you can simply use zip() in order to get an iterator of columns and since the csv.reader returns an iterator you can use next() to get the first column.
with open('data.csv') as data:
reader = csv.reader(data)
first_column = next(zip(*reader))
I have a csv file whose structure is like this:
Year-Sem,Course,Studentid,Score
201001,CS301,100,363
201001,CS301,101,283
201001,CS301,102,332
201001,CS301,103,254
201002,CS302,101,466
201002,CS302,102,500
Here each year is divided into two semesters - 01 (for fall) and 02 (for spring) and data has years from 2008 till 2014 (for a total of 14 semesters). Now what I want to do is to form a dictionary where course and studentid become the key and there respective score ordered by the year-sem as values. So the output should be something like this for each student:
[(studentid,course):(year-sem1 score,year-sem2 score,...)]
I first tried to make a dictionary of [(studentid,course):(score)] using this code but I get error as IndexError: list index out of range:
with open('file1.csv', mode='rU') as infile:
reader = csv.reader(infile,dialect=csv.excel_tab)
with open('file2.csv', mode='w') as outfile:
writer = csv.writer(outfile)
mydict = {(rows[2],rows[1]): rows[3] for rows in reader}
writer.writerows(mydict)
When I was not using dialect=csv.excel_tab and rU then I was getting error as _csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?.
How can I resolve this error and form the dictionary with structure [(studentid,course):(year-sem1 score,year-sem2 score,...)] that I had mentioned in my post above?
The dialect you've chosen seems to be wrong. csv.excel_tab uses the tabulator character as delimiter. For your data, the default dialect should work.
You got the error message about newlines earlier because of the missing U in the rU mode.
with open(r"test.csv", "rU") as file:
reader = csv.reader(file)
for row in reader:
print(row)
This example seems to work for me (Python 3).
If you have repeating keys you need to store the values in some container, if you want the data ordered you will need to use an OrderedDict:
import csv
from collections import OrderedDict
with open("in.csv") as infile, open('file2.csv', mode='w') as outfile:
d = OrderedDict()
reader, writer = csv.reader(infile), csv.writer(outfile)
header = next(reader) # skip header
# choose whatever column names you want
writer.writerow(["id-crse","score"])
# unpack the values from each row
for yr, cre, stid, scr in reader:
# use id and course as keys and append scores
d.setdefault("{} {}".format(stid, cre),[]).append(scr)
# iterate over the dict keys and values and write each new row
for k,v in d.items():
writer.writerow([k] + v)
Which will give you something like:
id-crse,score
100 CS301,363
101 CS301,283
102 CS301,332
103 CS301,254
101 CS302,466
102 CS302,500
In your own code you would only store the last value for the key, you also only write the keys using writer.writerows(mydict) as you are just iterating over the keys of the dict, not the keys and values. If the data is not all in chronological order you will have to call sorted on the reader object using itemgetter:
for yr, cre, stid, scr in sorted(reader,key=operator.itemgetter(3)):
............
My file is formatted into three columns of numbers:
2 12345 1.12345
1 54321 1.54321
3 12345 1.12345
I would like to have Python use the first two columns as keys and use the third column as the values. The file is large, meaning that I can't format it by hand. So how do I have Python automatically convert my large file into a dictionary?
Here is my code:
with open ('file name.txt' 'r') as f:
rows = ( line.split('\t') for line in f )
d = { row[0]:row[:3] for row in rows}
print(d)
The output prints the numbers diagonally all over the place. How do I format it properly?
Banana, you're close.
You need a comma separating the arguments of open.
You want to assign the third member of row, i.e. row[2].
You need to decide how to group the first two members of row into a hashable key. Making a tuple out of them, i.e. (row[0],row[1]) works.
Print the dictionary line by line.
Try:
with open('filename.txt','r') as f:
rows = ( line.split('\t') for line in f )
d = { (row[0],row[1]):row[2] for row in rows}
for key in d.keys():
print key,d[key]
I'm not sure exactly how you want the keys to layout. Regardless, you should use the csv module, using the '\t' as your delimiter.
import csv
with open('data.txt') as file:
tsvfile = csv.reader(file, delimiter='\t')
d = { "{},{}".format(row[0], row[1]): row[2] for row in tsvfile }
print(d)
Prints out:
{'3,12345': '1.12345', '1,54321': '1.54321', '2,12345': '1.12345'}
Alternatively, you have this:
with open('data.txt') as file:
tsvfile = csv.reader(file, delimiter='\t')
d = {}
for row in tsvfile:
d[row[0]] = row[2]
d[row[1]] = row[2]
print(d)
Prints out:
{'54321': '1.54321', '3': '1.12345', '1': '1.54321', '12345': '1.12345', '2': '1.12345'}
You should try -
import pprint
d = {}
with open ('file name.txt','r') as f:
for line in f:
row = line.split('\t')
if len(row) == 3:
d[(row[0], row[1])] = row[2]
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(d)
First of all your slicing is wrong.You can get the first tow column with line[:2] and 3rd with line[2].
Also you don't need to create your rows in a separated data structure you can use unpacking operation and map function within a dict comprehension :
with open ('ex.txt') as f:
d={tuple(i):j.strip() for *i,j in map(lambda line:line.split('\t'),f)}
print(d)
result :
{('2', '12345'): '1.12345', ('3', '12345'): '1.12345', ('1', '54321'): '1.54321'}
Note that as *i is a list and lists are unhashable objects you can not use it as your dictionary key so you can convert it to tuple.
And if you want to preserve the order you can use collections.OrderedDict :
from collections import OrderedDict
with open ('ex.txt') as f:
d=OrderedDict({tuple(i):j.strip() for *i,j in map(lambda line:line.split('\t'),f)})
print(d)
OrderedDict([(('2', '12345'), '1.12345'), (('1', '54321'), '1.54321'), (('3', '12345'), '1.12345')])