I want to know how to make a .csv list into a python list which I can do plotting and calculating:
I used:
fpath = r'C:112017\temp\tT.csv'
with open(fpath,'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
for row in reader:
print(list(reader))
it gives me lists like this (which I dont' want):
[['2014-12-30', '18.34244791666665'], ['2014-12-31', '18.540224913494818'], ['2015-01-01', '18.15729166666666'],......
If I use
print(row)
it gives me lists like this (looks better but I still can not use it for calculating):
...
['2016-07-27', '20.434809022479584']
['2016-07-28', '21.395138886239796']
['2016-07-29', '20.81571181284057']
['2016-07-30', '20.565711801250778']
...
How can I use panda to make a list? Or is there any easier way to achieve this? Is that possible to use something like:
date = row[0]
temp = row[1]
lis = pd.DataFrame(date,temp)
I guess there are some basic mistakes, but I can't fix it by myself.
Thank you for your time to help.
There is pandas.read_csv() method which will read the csv file and return a dataframe
Eg:
fpath = r'C:112017\temp\tT.csv'
df = pd.read_csv(fpath, delimiter=',', names=['date', 'temp'])
I guess you are trying to get the columns as list i.e. a list of dates and temperatures.
fpath = r'C:112017\temp\tT.csv'
with open(fpath,'r') as csvfile:
reader = csv.reader(csvfile, delimiter=',')
data = list(reader)
date, temp = list(map(list, zip(*data)))
# for python 2 use map(list,zip(*data))
# converting temp to float
temp = list(map(float,temp))
in my opinion if you want to perform calculations with data from a .csv file you should consider using pandas and numpy.
import pandas as pd
import numpy as np
# importing dataframe
df = pd.read_csv('filename.csv', delimiter=',')
# check the dataframe
print (df)
Related
I'm trying to sum a column in a csv file using python. Here's a sample of the csv data;
Date,Profit/Losses
Jan-2010,867884
Feb-2010,984655
Mar-2010,322013
Apr-2010,-69417
May-2010,310503
Jun-2010,522857
Jul-2010,1033096
Aug-2010,604885
Sep-2010,-216386
I want to sum the Profit/Losses column.
I am using the following code but it's returning a 0. Where could I be going wrong?
import os
import csv
# Path to collect data from the csv file in the Resources folder
pybank_csv = os.path.join("resources", "budget_data.csv")
with open(pybank_csv, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
next(csvfile, None)
t = sum(float(row[1]) for row in csvreader)
#print the results
print(f"Total: {t}")
Easiest way is to use pandas library.
Use pip install pandas to install pandas on your machine
and then
import pandas as pd
df = pd.read_csv('your_filename.csv')
sumcol = df['Profit/Losses'].sum()
print(sumcol)
The sum is in sumcol object now. For future reference, If your task is to work with the data provided in csv file, pandas is a blessing. This library provides you with thousands of different types of operations you could perform on your data. Refer Pandas Website for more info.
If you want to make use of csv package only then you can read the csv as a dict and then sum the Profit/Loss entry of dict for each row
total = 0
with open('your_filename.csv', newline='') as csvfile:
data = csv.DictReader(csvfile)
for row in data:
total = total + int(row['Profit/Losses'])
print(total)
Or If you want to use reader instead of dict reader, you need to ignore first row. Something like this
total = 0
with open('your_filename.csv', newline='') as csvfile:
data = csv.reader(csvfile)
for row in data:
if not str(row[1]).startswith('P'):
total = total + int(row[1])
print(total)
I am a beginner of Python and would like to have your opinion..
I wrote this code that reads the only column in a file on my pc and puts it in a list.
I have difficulties understanding how I could modify the same code with a file that has multiple columns and select only the column of my interest.
Can you help me?
list = []
with open(r'C:\Users\Desktop\mydoc.csv') as file:
for line in file:
item = int(line)
list.append(item)
results = []
for i in range(0,1086):
a = list[i-1]
b = list[i]
c = list[i+1]
results.append(b)
print(results)
You can use pandas.read_csv() method very simply like this:
import pandas as pd
my_data_frame = pd.read_csv('path/to/your/data')
results = my_data_frame['name_of_your_wanted_column'].values.tolist()
A useful module for the kind of work you are doing is the imaginatively named csv module.
Many csv files have a "header" at the top, this by convention is a useful way of labeling the columns of your file. Assuming you can insert a line at the top of your csv file with comma delimited fieldnames, then you could replace your program with something like:
import csv
with open(r'C:\Users\Desktop\mydoc.csv') as myfile:
csv_reader = csv.DictReader(myfile)
for row in csv_reader:
print ( row['column_name_of_interest'])
The above will print to the terminal all the values that match your specific 'column_name_of_interest' after you edit it to match your particular file.
It's normal to work with lots of columns at once, so that dictionary method of packing a whole row into a single object, addressable by column-name can be very convenient later on.
To a pure python implementation, you should use the package csv.
data.csv
Project1,folder1/file1,data
Project1,folder1/file2,data
Project1,folder1/file3,data
Project1,folder1/file4,data
Project1,folder2/file11,data
Project1,folder2/file42a,data
Project1,folder2/file42b,data
Project1,folder2/file42c,data
Project1,folder2/file42d,data
Project1,folder3/filec,data
Project1,folder3/fileb,data
Project1,folder3/filea,data
Your python program should read it by line
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
print(row)
# ['Project1', 'folder1/file1', 'data']
If you print the row element you will see it is a list like that
['Project1', 'folder1/file1', 'data']
If I would like to put in my list all elements in column 1, I need to put that element in my list, doing:
a.append(row[1])
Now in list a I will have a list like:
['folder1/file1', 'folder1/file2', 'folder1/file3', 'folder1/file4', 'folder2/file11', 'folder2/file42a', 'folder2/file42b', 'folder2/file42c', 'folder2/file42d', 'folder3/filec', 'folder3/fileb', 'folder3/filea']
Here is the complete code:
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
a.append(row[1])
I have a file 'data.csv' that looks something like
ColA, ColB, ColC
1,2,3
4,5,6
7,8,9
I want to open and read the file columns into lists, with the 1st entry of that list omitted, e.g.
dataA = [1,4,7]
dataB = [2,5,8]
dataC = [3,6,9]
In reality there are more than 3 columns and the lists are very long, this is just an example of the format. I've tried:
csv_file = open('data.csv','rb')
csv_array = []
for row in csv.reader(csv_file, delimiter=','):
csv_array.append(row)
Where I would then allocate each index of csv_array to a list, e.g.
dataA = [int(i) for i in csv_array[0]]
But I'm getting errors:
_csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
Also it feels like a very long winded way of just saving data to a few lists...
Thanks!
edit:
Here is how I solved it:
import pandas as pd
df = pd.read_csv('data.csv', names = ['ColA','ColB','ColC']
dataA = map(int,(df.ColA.tolist())[1:3])
and repeat for the rest of the columns.
Just to spell this out for people trying to solve a similar problem, perhaps without Pandas, here's a simple refactoring with comments.
import csv
# Open the file in 'r' mode, not 'rb'
csv_file = open('data.csv','r')
dataA = []
dataB = []
dataC = []
# Read off and discard first line, to skip headers
csv_file.readline()
# Split columns while reading
for a, b, c in csv.reader(csv_file, delimiter=','):
# Append each variable to a separate list
dataA.append(a)
dataB.append(b)
dataC.append(c)
This does nothing to convert the individual fields to numbers (use append(int(a)) etc if you want that) but should hopefully be explicit and flexible enough to show you how to adapt this to new requirements.
Use Pandas:
import pandas as pd
df = pd.DataFrame.from_csv(path)
rows = df.apply(lambda x: x.tolist(), axis=1)
To skip the header, create your reader on a seperate line. Then to convert from a list of rows to a list of columns, use zip():
import csv
with open('data.csv', 'rb') as f_input:
csv_input = csv.reader(f_input)
header = next(csv_input)
data = zip(*[map(int, row) for row in csv_input])
print data
Giving you:
[(1, 4, 7), (2, 5, 8), (3, 6, 9)]
So if needed:
dataA = data[0]
Seems like you have OSX line endings in your csv file. Try saving the csv file as "Windows Comma Separated (.csv)" format.
There are also easier ways to do what you're doing with the csv reader:
csv_array = []
with open('data.csv', 'r') as csv_file:
reader = csv.reader(csv_file)
# remove headers
reader.next()
# loop over rows in the file, append them to your array. each row is already formatted as a list.
for row in reader:
csv_array.append(row)
You can then set dataA = csv_array[0]
First if you read the csv file with csv.reader(csv_file, delimiter=','), you will still read the header.
csv_array[0] will be the header row -> ['ColA', ' ColB', ' ColC']
Also if you're using mac, this issues is already referenced here: CSV new-line character seen in unquoted field error
And I would recommend using pandas&numpy instead if you will do more analysis using the data. It read the csv file to pandas dataframe.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
use csv.DictReader() to select specific columns
dataA = []
dataB = []
with open('data.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file, delimiter=',')
for row in csv_reader:
dataA.append(row['ColA'])
dataB.append(row['ColB'])
Just having a bit of a struggle getting the right formatting with the csv output files.
I have the following list called found in python:
[['dropbearid', 'distance'],
['DB_1487', 17.543651156695343],
['DB_1901', 24.735333924441772],
['DB_2800', 6.607094868078008]]
When I use
import csv
out = csv.writer(open("myfile.csv","w"), delimiter=',',quoting=csv.QUOTE_ALL)
out.writerow(found)
I get a file that when I open in excel has the values but it has 'dropbearid','distance' in one cell and 'DB_1487','17.54...' in the next cell and so on all continued across the first row.
Is there a way to get the output setup so 'dropbearid' and 'distance' are placed across two columns and all the list below are put in rows below?
Thanks!
Welcome to Stackoverflow. You write the list of lists to one row, which is why you have two values in one cell, because you write every element of found (which is a list of two elements) in a cell. You need to iterate over the list of lists and write every list to a row. This should work:
import csv
out = csv.writer(open("myfile.csv","w"), delimiter=',',quoting=csv.QUOTE_ALL, newline='')
for row in found:
out.writerow(row)
The function writerow() will write a single row. So you need to use writerows() and set newline parameter as '' to avoid blank rows in file.
found = [['dropbearid', 'distance'],
['DB_1487', 17.543651156695343],
['DB_1901', 24.735333924441772],
['DB_2800', 6.607094868078008]]
import csv
with open('myfile.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerows(found)
Hope this helps! Cheers!
import pandas as pd
found = [['dropbearid', 'distance'],['DB_1487', 17.543651156695343],['DB_1901', 24.735333924441772],['DB_2800', 6.607094868078008]]
dffound = pd.DataFrame(found)
header = dffound.iloc[0]
dffound = dffound[1:]
dffound.rename(columns = header)
dffound.tocsv("enter path here")
Use pandas DataFrames for writing lists to CSV's, it makes formatting lot easier
import pandas as pd
dffound = pd.DataFrame(found,columns=['dropbearid', 'distance'])
dffound.to_csv('Found.csv')
You would use the writerows function instead of writerow, to write each item of the sub-list as a column.
with open('myfile.csv', 'w+') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
writer.writerows(d)
Use pandas library for this.
import pandas as pd
df = pd.DataFrame(found, columns=['dropbearid', 'distance'])
df = df.drop(0) # drop the header row
df.to_csv('Found.csv', index=False)
I have data in a csv file that looks like that is imported as this.
import csv
with open('Half-life.csv', 'r') as f:
data = list(csv.reader(f))
the data will come out as this to where it prints out the rows like data[0] = ['10', '2', '2'] and so on.
What i'm wanting though is to retrieve the data as columns in instead of rows, to where in this case, there are 3 columns.
You can create three separate lists, and then append to each using csv.reader.
import csv
c1 = []
c2 = []
c3 = []
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
c1.append(row[0])
c2.append(row[1])
c3.append(row[2])
A little more automatic and flexible version of Alexander's answer:
import csv
from collections import defaultdict
columns = defaultdict(list)
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
for i in range(len(row)):
columns[i].append(row[i])
# Following line is only necessary if you want a key error for invalid column numbers
columns = dict(columns)
You could also modify this to use column headers instead of column numbers.
import csv
from collections import defaultdict
columns = defaultdict(list)
with open('Half-life.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
headers = next(reader)
column_nums = range(len(headers)) # Do NOT change to xrange
for row in reader:
for i in column_nums:
columns[headers[i]].append(row[i])
# Following line is only necessary if you want a key error for invalid column names
columns = dict(columns)
Another option, if you have numpy installed, you can use loadtxt to read a csv file into a numpy array. You can then transpose the array if you want more columns than rows (I wasn't quite clear on how you wanted the data to look). For example:
import numpy as np
# Load data
data = np.loadtxt('csv_file.csv', delimiter=',')
# Transpose data if needs be
data = np.transpose(data)