I'm trying to access a csv file of currency pairs using csv.reader. The first column shows dates, the first row shows the currency pair eg.USD/CAD. I can read in the file but cannot access the currency pairs data to perform simple calculations.
I've tried using next(x) to skip header row (currency pairs). If i do this, i get a Typeerror: csv reader is not subscriptable.
path = x
file = open(path)
dataset = csv.reader(file, delimiter = '\t',)
header = next(dataset)
header
Output shows the header row which is
['Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR']
I expect to be able to access the underlying currency pairs but i'm getting the type error as noted above. Is there a simple way to access the currency pairs, for example I want to use USD.describe() to get simple statistics on the USD currency pair.
How can i move from this stage to accessing the data underlying the header row?
try this example
import csv
with open('file.csv') as csv_file:
csv_reader = csv.Reader(csv_file, delimiter='\t')
line_count = 0
for row in csv_reader:
print(f'\t{row[0]} {row[1]} {row[3]}')
It's apparent from the output of your header row that the columns are comma-delimited rather than tab-delimited, so instead of passing delimiter = '\t' to csv.reader, you should let it use the default delimiter ',' instead:
dataset = csv.reader(file)
If you need to elaborate some statistics pandas is your friend. No need to use the csv module, use pandas.read_csv.
import pandas
filename = 'path/of/file.csv'
dataset = pandas.read_csv(filename, sep = '\t') #or whatever the separator is
pandas.read_csv uses the first line as the header automatically.
To see statistics, simply do:
dataset.describe()
Or for a single column:
dataset['column_name'].describe()
Are you sure that your delimiter is '\t'? In first row your delimiter is ','... Anyway you can skip first row by doing file.readline() before using it by csv.reader:
import csv
example = """Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR
1-2-3\tabc\t1.1\t1.2
4-5-6\txyz\t2.1\t2.2
"""
with open('demo.csv', 'w') as f:
f.write(example)
with open('demo.csv') as f:
f.readline()
reader = csv.reader(f, delimiter='\t')
for row in reader:
print(row)
# ['1-2-3', 'abc', '1.1', '1.2']
# ['4-5-6', 'xyz', '2.1', '2.2']
I think that you need something else... Can you add to your question:
example of first 3 lines in your csv
Example of what you'd like to access:
is using row[0], row[1] enough for you?
or do you want "named" access like row['Date'], row['USD'],
or you want something more complex like data_by_date['2019-05-01']['USD']
I have a simple file named saleem.csv which contains the following lines of csv information:
File,Run,Module,Name,,,,,
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,broadcast queued,3,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies sent,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies received,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,nominal,1.188e+07,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,total,1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,lifetime,-1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,Mean power consumption,55.7565,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,num devices,1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,physical layer,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,device total (mWs),1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,account,0,1,2,3,4
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,energy (mWs),0,207.519,1024.7,0,0
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,time (s),0,3.83442,18.2656,0,
I want to skip the first line, read this file and only write column[2] and column[4] to a new csv file named out.csv. I have written the following to script to do the job.
import csv
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele = (row[2], row[4])
print dele
with open('out.csv', 'w+') as j:
writecsv = csv.writer(j)
#for row in dele:
for row in dele:
writecsv.writerows(dele)
f.close()
j.close()
This produces the following output:
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
Please help me, Sorry for the mistake previously please, as i mistakenly wrote row.
Edited to reflect revised question
Some problems I can see:
P1: writerows(...)
for row in dele:
writecsv.writerows(dele)
writerows takes a list of rows to write to the csv file. So it shouldn't be inside a loop where you iterate over all rows and attempt to write them individually.
P2: overwriting
for row in readcsv:
dele = (row[2], row[4])
You are continuously overwriting dele, so you aren't going to be keeping track of row[2] and row[4] from every row.
What you could do instead:
dele = []
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele.append([row[2], row[4])
print([row[2], row[4]])
with open('out.csv', 'w+') as j:
writecsv.csvwriter(j)
writecsv.writerows(dele)
This produced output:
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].appl,3
MyNetwork.node[0].appl,0
MyNetwork.node[0].appl,0
MyNetwork.node[0].batteryStats,1.188e+07
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,-1
MyNetwork.node[0].batteryStats,55.7565
MyNetwork.node[0].batteryStats,1
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
Also, unrelated to your issue at hand, the following code is unnecessary:
f.close()
j.close()
The reason why with open(...): syntax is so widely used, is because it handles gracefully closing the file for you. You don't need to separately close it yourself. As soon as the with block ends, the file will be closed.
I would suggest using the pandas library.
It makes working with csv files very easy.
import pandas as pd #standard convention for importing pandas
# reads the csv file into a pandas dataframe
dataframe = pd.read_csv('saleem.csv')
# make a new dataframe with just columns 2 and 4
print_dataframe = dataframe.iloc[:,[2,4]]
# output the csv file, but don't include the index numbers or header, just the data
print_dataframe.to_csv('out.csv', index=False, header=False)
If you use Ipython or Jupyter Notebook, you can type
dataframe.head()
to see the first few values of the dataframe. There is a lot more you can do with the library that might be worth learning, but in general it is a great way to read in, filter, and process csv data.
I'm trying to write lists like this to a CSV file:
['ABC','One,Two','12']
['DSE','Five,Two','52']
To a file like this:
ABC One,Two 12
DSE Five,Two 52
Basically, write anything inside '' to a cell.
However, it is splitting One and Two into different cells and merging ABC with One in the first cell.
Part of my script:
out_file_handle = open(output_path, "ab")
writer = csv.writer(out_file_handle, delimiter = "\t", dialect='excel', lineterminator='\n', quoting=csv.QUOTE_NONE)
output_final = (tsv_name_list.split(".")[0]+"\t"+key + "\t" + str(listOfThings))
output_final = str([output_final]).replace("[","").replace("]","").replace('"',"").replace("'","")
output_final = output_final.split("\\t")
print output_final #gives the first lists of strings I mentioned above.
writer.writerow(output_final)
First output_final line gives
ABC One,Two 12
DSE Five,Two 52
Using the csv module simply works, so you're going to need to be more specific about what's convincing you that the elements are bleeding across cells. For example, using the (now quite outdated) Python 2.7:
import csv
data_lists = [['ABC','One,Two','12'],
['DSE','Five,Two','52']]
with open("out.tsv", "wb") as fp:
writer = csv.writer(fp, delimiter="\t", dialect="excel", lineterminator="\n")
writer.writerows(data_lists)
I get an out.tsv file of:
dsm#winter:~/coding$ more out.tsv
ABC One,Two 12
DSE Five,Two 52
or
>>> out = open("out.tsv").readlines()
>>> for row in out: print repr(row)
...
'ABC\tOne,Two\t12\n'
'DSE\tFive,Two\t52\n'
which is exactly as it should be. Now if you take these rows, which are tab-delimited, and for some reason split them using commas as the delimiter, sure, you'll think that there are two columns, one with ABC\tOne and one with Two\t12. But that would be silly.
You've set up the CSV writer, but then for some reason you completely ignore it and try to output the lines manually. That's pointless. Use the functionality available.
writer = csv.writer(...)
for row in tsv_list:
writer.writerow(row)
I am developing a simple application in where it reads the CSV file sent in and produces some results based on the data points in the columns. Data.csv:
Something, everything, 6, xy
Something1, everything1, 7, ab
Something2, everything2, 9, pq
I open the file as following,
FileOpen = opne('../sources/data.csv', 'rU')
FileRead = csv.reader(FileOpen, delimiter = ',')
FileRead.next()
for row in FileRead:
#This does not work
if row[0] == 'something' and row[1] == 'something1':
print row[2]
#This works
if row[0] == 'something' and row[3] = 'xy':
print row[2]
The above code does not show anything. But if I used row[0] and row [3] in the if condition, it works well. So the problem is with the column 1, 2. But 0 and 3 columns work fine. Is the file format of CSV wrong? I following microsoft procedure to create csv from excel file.
The use and naming of row is completely correct. The main problem is the white space in your file. If I print row, I get
['Something', ' everything', ' 6']
^ ^
The solution will most likely deal with
Dialect.skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.
from here https://docs.python.org/2/library/csv.html#dialects-and-formatting-parameters
You pass this option in the constructor like this:
FileRead = csv.reader(FileOpen, delimiter = ',', skipinitialspace=True)
Yes, they were the spaces after all. To remove spaces in Excel, insert a new column near the column with the spaces and user =TRIM(C1). Then you can copy paste the data in a new file and create a CSV from that.