I am trying to simply import a .csv into Python. I've read numerous documents but for the life of me I can't figure out how to do the following.
The CSV format is as follows
NYC,22,55
BOSTON,39,22
I'm trying to generate the following : {NYC = [22,55], BOSTON = [39,22]} so that I can call i[0] and i[1] in a loop for each variable.
I've tried
import csv
input_file = csv.DictReader(open("C:\Python\Sandbox\longlat.csv"))
for row in input_file:
print(row)
Which prints my variables, but I dont know hot to nest two numeric values within the city name and generate the list that im hoping to get.
Thanks for your help, sorry for my rookie question -
If you are not familiar with python comprehensions, you can use the following code that uses a for loop:
import csv
with open('C:\Python\Sandbox\longlat.csv', 'r') as f:
reader = csv.reader(f)
result = {}
for row in reader:
result[row[0]] = row[1:]
The previous code works if you want the numbers to be string, if you want them to be numbers use:
import csv
with open('C:\Python\Sandbox\longlat.csv', 'r') as f:
reader = csv.reader(f)
result = {}
for row in reader:
result[row[0]] = [int(e) for i in row[1:]] # float instead of int is also valid
Use dictionary comprehension:
import csv
with open(r'C:\Python\Sandbox\longlat.csv', mode='r') as csvfile:
csvread = csv.reader(csvfile)
result = {k: [int(c) for c in cs] for k, *cs in csvread}
This works in python-3.x, and produces on my machine:
>>> result
{'NYC': [22, 55], 'BOSTON': [39, 22]}
It also works for an arbitrary number of columns.
In case you use python-2.7, you can use indexing and slicing over sequence unpacking:
import csv
with open(r'C:\Python\Sandbox\longlat.csv', mode='r') as csvfile:
csvread = csv.reader(csvfile)
result = {row[0]: [int(c) for c in row[1:]] for row in csvread}
Each row will have 3 values. You want the first as the key and the rest as the value.
>>> row
['NYC','22','55']
>>> {row[0]: row[1:]}
{'NYC': ['22', '55']}
You can create the whole dict:
lookup = {row[0]: row[1:] for row in input_file}
You can also use pandas like so:
import pandas as pd
df = pd.read_csv(r'C:\Python\Sandbox\longlat.csv')
result = {}
for index, row in df.iterrows():
result[row[0]] = row[1:]
Heres a hint. Try familiarizing yourself with the str.split(x) function
strVar = "NYC,22,55"
listVar = strVar.split(',') # ["NYC", "22", "55"]
cityVar = listVar[0] # "NYC"
restVar = listVar[1:]; # ["22", "55"]
# If you want to convert `restVar` into integers
restVar = map(int, restVar)
Related
I am very new to python and am really struggling with this problem. I have a csv file with different columns, labeled "height" "weight" "full_name" etc. I'm trying to create a function that will look through the full_name column and return the longest name. (So if the longest name in the folder was Rachel Smith, I'm trying to return that value.)
Here the code that's worked the best so far:
import csv
file = "personal_data.csv"
f = open(file)
reader = csv.reader(f, delimiter=",")
col_index = next(reader).index('full_name')
highest = max(rec[col_index] for rec in reader)
print(highest) #using this statement to test if it works
f.close()
I think it's not working because it's only printing Rachel, not her full name, Rachel Smith. I'm not really sure though.
You can try to use key= parameter in max() function:
import csv
with open("personal_data.csv", "r") as f_in:
reader = csv.reader(f_in, delimiter=",")
col_index = next(reader).index("full_name")
highest = max([rec[col_index] for rec in reader], key=len) # <-- use key=len here
print(highest) # using this statement to test if it works
Use csv.DictReader to eliminate the need to find the full_name column index. Use max()'s key argument to make it return the value rather than the length of the value.
import csv
with open('personal_data.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile)
longest_name = max([row['full_name'] for row in reader], key=len)
print(longest_name)
If the file is large enough that you care about memory usage, use map() and itemgetter() to get the names and pass as the iterable argument to max().
import csv
import operator
with open('personal_data.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile)
names = map(operator.itemgetter('full_name'), reader)
longest_name = max(names, key=len)
print(longest_name)
Package into function:
import csv
def get_longest_value_from_col(filename, column_name):
with open(filename, 'r') as csvfile:
reader = csv.DictReader(csvfile)
longest_name = max([row[column_name] for row in reader], key=len)
return longest_name
From a csv file, I'm trying to put in an ascending order the different rows of a big column (named CRIM) to do other manipulations after. First, I tried this:
def house_data():
with open('data.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
for line in data:
print(sorted(line['CRIM']))
But then it gave me a result of ordering every numbers in the values and not the value between them.
For example, if I had the number 1.96 and 0.92 , they would give me an output like this:
['1', '.','6', '9']
['0','.','2','9']
but I wanted
['0.92']
['1.96']
I read something about using the lambda and I tried this, but I didn't get any output.
def house_data():
with open('data.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
sorted(data, key=lambda line: line['CRIM'])
for line in data:
print(line['CRIM'])
use pandas
import pandas as pd
from pathlib import Path
file_path = Path('data.csv')
dataframe = pd.read_csv(file_path) # pass other required parameters.
dataframe.sort_values(['CRIM'])
First load all the data into a list and then sort the list using 'CRIM' as the key:
def house_data():
with open('data.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
lines = [] # all the lines
for line in data:
lines.append(line)
# or skip the for loop and do:
# lines = list(data)
# lines is a list of dictionaries
# now sort `lines` in-place using 'CRIM' as a float
lines.sort(key=lambda d: float(d['CRIM']))
return lines
I have a csv file with some contents as shown below:
name,x,y
N1,30.2356,12.5263
N2,30.2452,12.5300
...and it goes on.
This is what I tried, I called them from .csv and seperately added to different lists.
import csv
nn = []
xkoor = []
ykoor = []
coord = []
with open('C:/Users/Mert/Desktop/py/transformation/1.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
nn.append(row[0].split(','))
xkoor.append(row[1].split(','))
ykoor.append(row[2].split(','))
j = 1
for i in range(len(xkoor)):
for j in range(len(ykoor)):
I'm trying to make a list as:
coord = [30.2356,12.5263],[30.2452,12.5300],....
and I couldn't understand how to do it. Any ideas?
The csv-reader should split rows for you by commas on default:
import csv
with open('somefile.csv') as fh:
reader = csv.reader(fh)
for row in reader:
print(row)
# outputs
['name', 'x', 'y']
['N1', '30.2356', '12.5263']
['N2', '30.2452', '12.5300 ']
With this in mind, if you are just looking to loop over coords, you can use unpacking to get your x and y, then build your list by appending tuples:
import csv
coords = []
with open('somefile.csv') as fh:
reader = csv.reader(fh)
next(reader) # skips the headers
for row in reader:
name, x, y = row
coords.append((float(x), float(y)))
# then you can iterate over that list like so
for x, y in coords:
# do something
Coords will then look like:
[(30.2356, 12.5263), (30.2452, 12.53)]
You should not split the strings by commas yourself since csv.reader already does it for you. Simply iterate over the csv.reader generator and unpack the columns as desired:
reader = csv.reader(f)
next(reader)
coord = [[float(x), float(y)] for _, x, y in reader]
Seems like you're over-complicating things.
If all you're trying to do is create an array of coordinates containing only the X and Y values, this is how you would accomplish that:
import csv
coord = []
with open('C:/Users/Mert/Desktop/py/transformation/1.csv', 'r') as f:
reader = csv.reader(f)
for row in reader:
rowlist = row.split(',')
coord.append(rowlist[1:3])
print(coord)
All you need to do is extract a subset on a per-row basis, and append it to your coord array. No need to call row split each time, or to create separate arrays for your axis.
K.I.S.S!
(Also, a word of advice - keep PII out of your questions. No need to use your whole windows file path, just indicate that it's a CSV file. I didn't need to know your name to answer the question!)
Why not pandas?!
read_csv will ready your file and convert as a dataframe
iterate on rows and access columns x and y
combine into a list of list
and it is easier to use
import pandas as pd
df = pd.read_csv('1.csv', header=0)
[[r.x, r.y] for _, r in df.iterrows()]
Result:
[[30.2356, 12.5263], [30.2452, 12.53]]
I'd go about it something like this:
import csv
# coordinates as strings
with open('some.csv', 'r') as f:
cord = [a for _, *a in csv.reader(f)]
# coordinates as floats
with open('some.csv', 'r') as f:
cord = [[float(x), float(y)] for _, x, y in csv.reader(f)]
[print(xy) for xy in cord]
If you are into oneliners:
data = """name,x,y
N1,30.2356,12.5263
N2,30.2452,12.5300"""
coords = [[x,y]
for line in data.split("\n")[1:]
for _,x,y in [line.split(",")]]
print(coords)
This yields
[['30.2356', '12.5263'], ['30.2452', '12.5300']]
i want to add up lines in a csv file (It's a BOM) if they are identical and in the same part, but not if they are a specific type.
Here is the example to make it more clear:
LevelName,Type,Amount
Part_1,a,1
Part_1,a,1
Part_1,b,1
Part_1,c,1
Part_1,d,1
Part_1,f,1
Part_2,a,1
Part_2,c,1
Part_2,d,1
Part_2,a,1
Part_2,a,1
Part_2,d,1
Part_2,d,1
So i need to some up all Types within a Part but not if the type is 'd'.
Result should look like this:
LevelName,Type,Amount
Part_1,a,2
Part_1,b,1
Part_1,c,1
Part_1,d,1
Part_1,f,1
Part_2,a,3
Part_2,c,1
Part_2,d,1
Part_2,d,1
Part_2,d,1
unfortunatly i can not use any external lib. so pandas is no option here.
That is how far i got:
import csv
map = {}
with open('infile.csv', 'rt') as f:
reader = csv.reader(f, delimiter = ',')
with open('outfile.csv', 'w', newline='') as fout:
writer = csv.writer(fout, delimiter=';', quoting=csv.QUOTE_MINIMAL)
writer.writerow(next(reader))
for row in reader:
(level, type, count) = row
if not type=='d':
Well, here i just don't get ahead...
Thanks for any hint!
Ok sorry about using pandas. Then first read the file saving the results in a defaultdict.
from collections import defaultdict
grouped = defaultdict(int)
if not type=='d':
grouped[(level, type)] += int(count)
Then you can save the result of that dict to a file
import csv
import os
cwd = os.getcwd()
master = {}
file = csv.DictReader(open(cwd+'\\infile.csv', 'rb'), delimiter=',')
data = [row for row in file]
for row in data:
master.setdefault(row['LevelName'], {})
if row['Type'] != 'd':
master[row['LevelName']].setdefault(row['Type'], 0)
master[row['LevelName']][row['Type']] += int(row['Amount'])
print (master)
Not as simple as the soloution above but this shows how to iterate over the data
OR i suppose you could concatenate the 'LevelName' and the 'Type' so that you have one less line of code. It depends on what you what you want.
for row in data:
if row['Type'] != 'd':
master.setdefault(row['LevelName'] + row['Type'], 0)
master[row['LevelName'] + row['Type']] += int(row['Amount'])
print (master)
EDIT
to write back to original format something like:
out = open(cwd+'\\outfile.csv', 'wb')
out.write('LevelName,Type,Amount\n')
for k,v in master.iteritems():
for z in v:
out.write('%s,%s,%s\n' % (k, z, str(v[z])))
I have data in a csv file that looks like that is imported as this.
import csv
with open('Half-life.csv', 'r') as f:
data = list(csv.reader(f))
the data will come out as this to where it prints out the rows like data[0] = ['10', '2', '2'] and so on.
What i'm wanting though is to retrieve the data as columns in instead of rows, to where in this case, there are 3 columns.
You can create three separate lists, and then append to each using csv.reader.
import csv
c1 = []
c2 = []
c3 = []
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
c1.append(row[0])
c2.append(row[1])
c3.append(row[2])
A little more automatic and flexible version of Alexander's answer:
import csv
from collections import defaultdict
columns = defaultdict(list)
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
for i in range(len(row)):
columns[i].append(row[i])
# Following line is only necessary if you want a key error for invalid column numbers
columns = dict(columns)
You could also modify this to use column headers instead of column numbers.
import csv
from collections import defaultdict
columns = defaultdict(list)
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
headers = next(reader)
column_nums = range(len(headers)) # Do NOT change to xrange
for row in reader:
for i in column_nums:
columns[headers[i]].append(row[i])
# Following line is only necessary if you want a key error for invalid column names
columns = dict(columns)
Another option, if you have numpy installed, you can use loadtxt to read a csv file into a numpy array. You can then transpose the array if you want more columns than rows (I wasn't quite clear on how you wanted the data to look). For example:
import numpy as np
# Load data
data = np.loadtxt('csv_file.csv', delimiter=',')
# Transpose data if needs be
data = np.transpose(data)