Convert a csv to a dictionary with multiple values? - python

I have a csv file like this:
pos,place
6696,266835
6698,266835
938,176299
940,176299
941,176299
947,176299
948,176299
949,176299
950,176299
951,176299
770,272944
2751,190650
2752,190650
2753,190650
I want to convert it to a dictionary like the following:
{266835:[6696,6698],176299:[938,940,941,947,948,949,950,951],190650:[2751,2752,2753]}
And then, fill the missing numbers in the range in the values:
{{266835:[6696,6697,6698],176299:[938,939,940,941,942,943,944,945,946947,948,949,950,951],190650:[2751,2752,2753]}
}
Right now i have tried to build the dictionary using solution suggested here, but it overwrites the old value with new one.
Any help would be great.
Here is a function that i wrote for converting csv2dict
def csv2dict(filename):
"""
reads in a two column csv file, and the converts it into dictionary
"""
import csv
with open(filename) as f:
f.readline()#ignore first line
reader=csv.reader(f,delimiter=',')
mydict=dict((rows[1],rows[0]) for rows in reader)
return mydict

Easiest is to use collections.defaultdict() with a list:
import csv
from collections import defaultdict
data = defaultdict(list)
with open(inputfilename, 'rb') as infh:
reader = csv.reader(infh)
next(reader, None) # skip the header
for col1, col2 in reader:
data[col2].append(int(col1))
if len(data[col2]) > 1:
data[col2] = range(min(data[col2]), max(data[col2]) + 1)
This also expands the ranges on the fly as you read the data.

Based on what you have tried -
from collections import default dict
# open archive reader
myFile = open ("myfile.csv","rb")
archive = csv.reader(myFile, delimiter=',')
arch_dict = defaultdict(list)
for rows in archive:
arch_dict[row[1]].append(row[0])
print arch_dict

Related

How to edit a CSV file row by row in Python without using Pandas

I have a CSV file and when I read it by importing the CSV library I get as the output:
['exam', 'id_student', 'grade']`
['maths', '573834', '7']`
['biology', '573834', '8']`
['biology', '578833', '4']
['english', '581775', '7']`
# goes on...
I need to edit it by creating a 4th column called 'Passed' with two possible values: True or False depending on whether the grade of the row is >= 7 (True) or not (False), and then count how many times each student passed an exam.
If it's not possible to edit the CSV file that way, I would need to just read the CSV file and then create a dictionary of lists with the following output:
dict = {'id_student':[573834, 578833, 581775], 'passed_count': [2,0,1]}
# goes on...
Thanks
Try using importing csv as pandas dataframe
import pandas as pd
data=pd.read_csv('data.csv')
And then use:
data['passed']=(data['grades']>=7).astype(bool)
And then save dataframe to csv as:
data.to_csv('final.csv',index=False)
It is totally possible to "edit" CSV.
Assuming you have a file students.csv with the following content:
exam,id_student,grade
maths,573834,7
biology,573834,8
biology,578833,4
english,581775,7
Iterate over input rows, augment the field list of each row with an additional item, and save it back to another CSV:
import csv
with open('students.csv', 'r', newline='') as source, open('result.csv', 'w', newline='') as result:
csvreader = csv.reader(source)
csvwriter = csv.writer(result)
# Deal with the header
header = next(csvreader)
header.append('Passed')
csvwriter.writerow(header)
# Process data rows
for row in csvreader:
row.append(str(int(row[2]) >= 7))
csvwriter.writerow(row)
Now result.csv has the content you need.
If you need to replace the original content, use os.remove() and os.rename() to do that:
import os
os.remove('students.csv')
os.rename('result.csv', 'students.csv')
As for counting, it might be an independent thing, you don't need to modify CSV for that:
import csv
from collections import defaultdict
with open('students.csv', 'r', newline='') as source:
csvreader = csv.reader(source)
next(csvreader) # Skip header
stats = defaultdict(int)
for row in csvreader:
if int(row[2]) >= 7:
stats[row[1]] += 1
print(stats)
You can include counting into the code above and have both pieces in one place. defaultdict (stats) has the same interface as dict if you need to access that.

Reorder the rows of a specific column in CSV

From a csv file, I'm trying to put in an ascending order the different rows of a big column (named CRIM) to do other manipulations after. First, I tried this:
def house_data():
with open('data.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
for line in data:
print(sorted(line['CRIM']))
But then it gave me a result of ordering every numbers in the values and not the value between them.
For example, if I had the number 1.96 and 0.92 , they would give me an output like this:
['1', '.','6', '9']
['0','.','2','9']
but I wanted
['0.92']
['1.96']
I read something about using the lambda and I tried this, but I didn't get any output.
def house_data():
with open('data.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
sorted(data, key=lambda line: line['CRIM'])
for line in data:
print(line['CRIM'])
use pandas
import pandas as pd
from pathlib import Path
file_path = Path('data.csv')
dataframe = pd.read_csv(file_path) # pass other required parameters.
dataframe.sort_values(['CRIM'])
First load all the data into a list and then sort the list using 'CRIM' as the key:
def house_data():
with open('data.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
lines = [] # all the lines
for line in data:
lines.append(line)
# or skip the for loop and do:
# lines = list(data)
# lines is a list of dictionaries
# now sort `lines` in-place using 'CRIM' as a float
lines.sort(key=lambda d: float(d['CRIM']))
return lines

how to select a specific column of a csv file in python

I am a beginner of Python and would like to have your opinion..
I wrote this code that reads the only column in a file on my pc and puts it in a list.
I have difficulties understanding how I could modify the same code with a file that has multiple columns and select only the column of my interest.
Can you help me?
list = []
with open(r'C:\Users\Desktop\mydoc.csv') as file:
for line in file:
item = int(line)
list.append(item)
results = []
for i in range(0,1086):
a = list[i-1]
b = list[i]
c = list[i+1]
results.append(b)
print(results)
You can use pandas.read_csv() method very simply like this:
import pandas as pd
my_data_frame = pd.read_csv('path/to/your/data')
results = my_data_frame['name_of_your_wanted_column'].values.tolist()
A useful module for the kind of work you are doing is the imaginatively named csv module.
Many csv files have a "header" at the top, this by convention is a useful way of labeling the columns of your file. Assuming you can insert a line at the top of your csv file with comma delimited fieldnames, then you could replace your program with something like:
import csv
with open(r'C:\Users\Desktop\mydoc.csv') as myfile:
csv_reader = csv.DictReader(myfile)
for row in csv_reader:
print ( row['column_name_of_interest'])
The above will print to the terminal all the values that match your specific 'column_name_of_interest' after you edit it to match your particular file.
It's normal to work with lots of columns at once, so that dictionary method of packing a whole row into a single object, addressable by column-name can be very convenient later on.
To a pure python implementation, you should use the package csv.
data.csv
Project1,folder1/file1,data
Project1,folder1/file2,data
Project1,folder1/file3,data
Project1,folder1/file4,data
Project1,folder2/file11,data
Project1,folder2/file42a,data
Project1,folder2/file42b,data
Project1,folder2/file42c,data
Project1,folder2/file42d,data
Project1,folder3/filec,data
Project1,folder3/fileb,data
Project1,folder3/filea,data
Your python program should read it by line
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
print(row)
# ['Project1', 'folder1/file1', 'data']
If you print the row element you will see it is a list like that
['Project1', 'folder1/file1', 'data']
If I would like to put in my list all elements in column 1, I need to put that element in my list, doing:
a.append(row[1])
Now in list a I will have a list like:
['folder1/file1', 'folder1/file2', 'folder1/file3', 'folder1/file4', 'folder2/file11', 'folder2/file42a', 'folder2/file42b', 'folder2/file42c', 'folder2/file42d', 'folder3/filec', 'folder3/fileb', 'folder3/filea']
Here is the complete code:
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
a.append(row[1])

How to get specific columns in a certain range from a csv file without using pandas

For some reason the pandas module does not work and I have to find another way to read a (large) csv file and have as Output specific columns within a certain range (e.g. first 1000 lines). I have the code that reads the entire csv file, but I haven't found a way to display just specific columns.
Any help is much appreciated!
import csv
fileObj = open('apartment-data-all-4-xaver.2018.csv')
csvReader = csv.reader( fileObj )
for row in csvReader:
print row
fileObj.close()
I created a small csv file with the following contents:
first,second,third
11,12,13
21,22,23
31,32,33
41,42,43
You can use the following helper function that uses namedtuple from collections module, and generates objects that allows you to access your columns like attributes:
import csv
from collections import namedtuple
def get_first_n_lines(file_name, n):
with open(file_name) as file_obj:
csv_reader = csv.reader(file_obj)
header = next(csv_reader)
Tuple = namedtuple('Tuple', header)
for i, row in enumerate(csv_reader, start=1):
yield Tuple(*row)
if i >= n: break
If you want to print first and third columns, having n=3 lines, you use the method like this (Python 3.6 +):
for line in get_first_n_lines(file_name='csv_file.csv', n=3):
print(f'{line.first}, {line.third}')
Or like this (Python 3.0 - 3.5):
for line in get_first_n_lines(file_name='csv_file.csv', n=3):
print('{}, {}'.format(line.first, line.third))
Outputs:
11, 13
21, 23
31, 33
use csv dictreader and then filter out specific rows and columns
import csv
data = []
with open('names.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
colnames = ['col1', 'col2']
for i in range(1000):
print(data[i][colnames[0]], data[i][colnames[1]])

Write dictionary of lists (varying length) to csv in Python

iam currently struggling with dictionaries of lists.
Given a dictionary like that:
GO_list = {'Seq_A': ['GO:1234', 'GO:2345', 'GO:3456'],
'Seq_B': ['GO:7777', 'GO:8888']}
No i wanted to write this dictionary to a csv file as
follows:
EDIT i have added the whole function to give more information
def map_GI2GO(gilist, mapped, gi_to_go):
with open(gilist) as infile:
read_gi = csv.reader(infile)
GI_list = {rows[0]:rows[1] for rows in read_gi} # read GI list into dictionary
GO_list = defaultdict(list) # set up GO list as empty dictionary of lists
infile.close()
with open(gi_to_go) as mapping:
read_go = csv.reader(mapping, delimiter=',')
for k, v in GI_list.items(): # iterate over GI list and mapping file
for row in read_go:
if len(set(row[0]).intersection(v)) > 0 :
GO_list[k].append(row[1]) # write found GOs into dictionary
break
mapping.close()
with open(mapped, 'wb') as outfile: # save mapped SeqIDs plus GOs
looked_up_go = csv.writer(outfile, delimiter='\t', quoting=csv.QUOTE_MINIMAL)
for key, val in GO_list.iteritems():
looked_up_go.writerow([key] + val)
outfile.close()
However this gives me the following output:
Seq_A,GO:1234;GO2345;GO:3456
Seq_B,GO:7777;GO:8888
I would prefer to have the list entries in separate columns,
separated by a defined delimiter. I have a hard time to get
rid of the ;, which are apparently separating the list entries.
Any ideas are welcome
If I were you I would try out itertools izip_longest to match up columns of varying length...
from csv import writer
from itertools import izip_longest
GO_list = {'Seq_A': ['GO:1234', 'GO:2345', 'GO:3456'],
'Seq_B': ['GO:7777', 'GO:8888']}
with open("test.csv","wb") as csvfile:
wr = writer(csvfile)
wr.writerow(GO_list.keys())#writes title row
for each in izip_longest(*GO_list.values()): wr.writerow(each)

Categories

Resources