Printing selected columns from a csv file in Python

Printing selected columns from a csv file in Python - python

I have some code here:
with open("dsasa.csv", 'rb') as csvfile:
content = csv.reader(csvfile, delimiter='|')
for row in content:
print row```
I would like to print columns 2, 3, 4 from the csv file in the following format:
4556 | 432432898904 | Joseph Henry
4544 | 54522238904 | Mark Mulligan
I have two issues which I am encountering. One is that the delimiter pipe (|) is not appearing between the columns. The second issue is that I cannot print the specific columns I want by doing the manual way, ie. print row[2], row[3], row[4]
I looked at online info and tried a few different solutions but I can't seem to find the route to get this to work.
Any help would be greatly appreciated.
Thanks!

Try this:
with open("dsasa.csv", 'rb') as csvfile:
content = csv.reader(csvfile)
for row in content:
print "|".join([row[2],row[3],row[4]])
The delimiter argument within csv.reader refers to the input file not the output.

What does appear between the columns as the delimiter? Are you sure it is '|' and not a comma? I am guessing because you do not have the correct delimiter you cannot use print row[2], row[3], row[4]. Can you post a line of the CSV?

Related

Accessing Data in csv.reader

I'm trying to access a csv file of currency pairs using csv.reader. The first column shows dates, the first row shows the currency pair eg.USD/CAD. I can read in the file but cannot access the currency pairs data to perform simple calculations.
I've tried using next(x) to skip header row (currency pairs). If i do this, i get a Typeerror: csv reader is not subscriptable.
path = x
file = open(path)
dataset = csv.reader(file, delimiter = '\t',)
header = next(dataset)
header
Output shows the header row which is
['Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR']
I expect to be able to access the underlying currency pairs but i'm getting the type error as noted above. Is there a simple way to access the currency pairs, for example I want to use USD.describe() to get simple statistics on the USD currency pair.
How can i move from this stage to accessing the data underlying the header row?

try this example
import csv
with open('file.csv') as csv_file:
csv_reader = csv.Reader(csv_file, delimiter='\t')
line_count = 0
for row in csv_reader:
print(f'\t{row[0]} {row[1]} {row[3]}')

It's apparent from the output of your header row that the columns are comma-delimited rather than tab-delimited, so instead of passing delimiter = '\t' to csv.reader, you should let it use the default delimiter ',' instead:
dataset = csv.reader(file)

If you need to elaborate some statistics pandas is your friend. No need to use the csv module, use pandas.read_csv.
import pandas
filename = 'path/of/file.csv'
dataset = pandas.read_csv(filename, sep = '\t') #or whatever the separator is
pandas.read_csv uses the first line as the header automatically.
To see statistics, simply do:
dataset.describe()
Or for a single column:
dataset['column_name'].describe()

Are you sure that your delimiter is '\t'? In first row your delimiter is ','... Anyway you can skip first row by doing file.readline() before using it by csv.reader:
import csv
example = """Date,USD,Index,CNY,JPY,EUR,KRW,GBP,SGD,INR,THB,NZD,TWD,MYR,IDR,VND,AED,PGK,HKD,CAD,CHF,SEK,SDR
1-2-3\tabc\t1.1\t1.2
4-5-6\txyz\t2.1\t2.2
"""
with open('demo.csv', 'w') as f:
f.write(example)
with open('demo.csv') as f:
f.readline()
reader = csv.reader(f, delimiter='\t')
for row in reader:
print(row)
# ['1-2-3', 'abc', '1.1', '1.2']
# ['4-5-6', 'xyz', '2.1', '2.2']
I think that you need something else... Can you add to your question:
example of first 3 lines in your csv
Example of what you'd like to access:
is using row[0], row[1] enough for you?
or do you want "named" access like row['Date'], row['USD'],
or you want something more complex like data_by_date['2019-05-01']['USD']

Reading and splitting a .raw file for data processing

Basically I have data from a mechanical test in the output format .raw and I want to access it in Python.
The file needs to be splitted using delimiter ";" so it contains 13 columns.
By doing this the idea is to index and pullout the desired information, which in my case is the "Extension mm" and "Load N" values as arrays in row 41 in order to create plot.
I have never worked with .raw files and I dont know what to do.
The file can be downloaded here:
https://drive.google.com/file/d/0B0GJeyFBNd4FNEp0elhIWGpWWWM/view?usp=sharing
Hope somebody can help me out there!

you can convert the raw file into csv file then use the csv module remember to set the delimeter=' ' otherwise by default it take comma as delimeter
import csv
with open('TST0002.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader: //this will read each row line by line
print (row[0]) //you can use row[0] to get first element of that row.

Your file looks basically like a .tsv with 40 lines to skip. Could you try this ?
import csv
#export your file.raw to tsv
with open('TST0002.raw') as infile, open('new.tsv', 'w') as outfile:
lines = infile.readlines()[40:]
for line in lines:
outfile.write(line)
Or if you want to make directly some data analysis on your two columns :
import pandas as pd
df = pd.read_csv("TST0002.raw", sep="\t", skiprows=40, usecols=['Extension mm', 'Load N'])
print(df)
output:
Extension mm Load N
0 -118.284 0.1365034
1 -117.779 -0.08668576
2 -117.274 -0.1142517
3 -116.773 -0.1092401
4 -116.271 -0.1144083
5 -11.577 -0.1314806
6 -115.269 -0.03609632
7 -114.768 -0.06334914
....

manipulating a csv file and writing its output to a new csv file in python

I have a simple file named saleem.csv which contains the following lines of csv information:
File,Run,Module,Name,,,,,
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,broadcast queued,3,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies sent,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies received,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,nominal,1.188e+07,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,total,1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,lifetime,-1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,Mean power consumption,55.7565,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,num devices,1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,physical layer,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,device total (mWs),1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,account,0,1,2,3,4
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,energy (mWs),0,207.519,1024.7,0,0
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,time (s),0,3.83442,18.2656,0,
I want to skip the first line, read this file and only write column[2] and column[4] to a new csv file named out.csv. I have written the following to script to do the job.
import csv
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele = (row[2], row[4])
print dele
with open('out.csv', 'w+') as j:
writecsv = csv.writer(j)
#for row in dele:
for row in dele:
writecsv.writerows(dele)
f.close()
j.close()
This produces the following output:
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
Please help me, Sorry for the mistake previously please, as i mistakenly wrote row.

Edited to reflect revised question
Some problems I can see:
P1: writerows(...)
for row in dele:
writecsv.writerows(dele)
writerows takes a list of rows to write to the csv file. So it shouldn't be inside a loop where you iterate over all rows and attempt to write them individually.
P2: overwriting
for row in readcsv:
dele = (row[2], row[4])
You are continuously overwriting dele, so you aren't going to be keeping track of row[2] and row[4] from every row.
What you could do instead:
dele = []
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele.append([row[2], row[4])
print([row[2], row[4]])
with open('out.csv', 'w+') as j:
writecsv.csvwriter(j)
writecsv.writerows(dele)
This produced output:
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].appl,3
MyNetwork.node[0].appl,0
MyNetwork.node[0].appl,0
MyNetwork.node[0].batteryStats,1.188e+07
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,-1
MyNetwork.node[0].batteryStats,55.7565
MyNetwork.node[0].batteryStats,1
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
Also, unrelated to your issue at hand, the following code is unnecessary:
f.close()
j.close()
The reason why with open(...): syntax is so widely used, is because it handles gracefully closing the file for you. You don't need to separately close it yourself. As soon as the with block ends, the file will be closed.

I would suggest using the pandas library.
It makes working with csv files very easy.
import pandas as pd #standard convention for importing pandas
# reads the csv file into a pandas dataframe
dataframe = pd.read_csv('saleem.csv')
# make a new dataframe with just columns 2 and 4
print_dataframe = dataframe.iloc[:,[2,4]]
# output the csv file, but don't include the index numbers or header, just the data
print_dataframe.to_csv('out.csv', index=False, header=False)
If you use Ipython or Jupyter Notebook, you can type
dataframe.head()
to see the first few values of the dataframe. There is a lot more you can do with the library that might be worth learning, but in general it is a great way to read in, filter, and process csv data.

Python: Even after specifying delimiter, csv writer delimits at wrong place

I'm trying to write lists like this to a CSV file:
['ABC','One,Two','12']
['DSE','Five,Two','52']
To a file like this:
ABC One,Two 12
DSE Five,Two 52
Basically, write anything inside '' to a cell.
However, it is splitting One and Two into different cells and merging ABC with One in the first cell.
Part of my script:
out_file_handle = open(output_path, "ab")
writer = csv.writer(out_file_handle, delimiter = "\t", dialect='excel', lineterminator='\n', quoting=csv.QUOTE_NONE)
output_final = (tsv_name_list.split(".")[0]+"\t"+key + "\t" + str(listOfThings))
output_final = str([output_final]).replace("[","").replace("]","").replace('"',"").replace("'","")
output_final = output_final.split("\\t")
print output_final #gives the first lists of strings I mentioned above.
writer.writerow(output_final)
First output_final line gives
ABC One,Two 12
DSE Five,Two 52

Using the csv module simply works, so you're going to need to be more specific about what's convincing you that the elements are bleeding across cells. For example, using the (now quite outdated) Python 2.7:
import csv
data_lists = [['ABC','One,Two','12'],
['DSE','Five,Two','52']]
with open("out.tsv", "wb") as fp:
writer = csv.writer(fp, delimiter="\t", dialect="excel", lineterminator="\n")
writer.writerows(data_lists)
I get an out.tsv file of:
dsm#winter:~/coding$ more out.tsv
ABC One,Two 12
DSE Five,Two 52
or
>>> out = open("out.tsv").readlines()
>>> for row in out: print repr(row)
...
'ABC\tOne,Two\t12\n'
'DSE\tFive,Two\t52\n'
which is exactly as it should be. Now if you take these rows, which are tab-delimited, and for some reason split them using commas as the delimiter, sure, you'll think that there are two columns, one with ABC\tOne and one with Two\t12. But that would be silly.

You've set up the CSV writer, but then for some reason you completely ignore it and try to output the lines manually. That's pointless. Use the functionality available.
writer = csv.writer(...)
for row in tsv_list:
writer.writerow(row)

Two columns from CSV file are not working for if-statement?

I am developing a simple application in where it reads the CSV file sent in and produces some results based on the data points in the columns. Data.csv:
Something, everything, 6, xy
Something1, everything1, 7, ab
Something2, everything2, 9, pq
I open the file as following,
FileOpen = opne('../sources/data.csv', 'rU')
FileRead = csv.reader(FileOpen, delimiter = ',')
FileRead.next()
for row in FileRead:
#This does not work
if row[0] == 'something' and row[1] == 'something1':
print row[2]
#This works
if row[0] == 'something' and row[3] = 'xy':
print row[2]
The above code does not show anything. But if I used row[0] and row [3] in the if condition, it works well. So the problem is with the column 1, 2. But 0 and 3 columns work fine. Is the file format of CSV wrong? I following microsoft procedure to create csv from excel file.

The use and naming of row is completely correct. The main problem is the white space in your file. If I print row, I get
['Something', ' everything', ' 6']
^ ^
The solution will most likely deal with
Dialect.skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.
from here https://docs.python.org/2/library/csv.html#dialects-and-formatting-parameters
You pass this option in the constructor like this:
FileRead = csv.reader(FileOpen, delimiter = ',', skipinitialspace=True)

Yes, they were the spaces after all. To remove spaces in Excel, insert a new column near the column with the spaces and user =TRIM(C1). Then you can copy paste the data in a new file and create a CSV from that.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Printing selected columns from a csv file in Python - python

Try this: with open("dsasa.csv", 'rb') as csvfile: content = csv.reader(csvfile) for row in content: print "|".join([row[2],row[3],row[4]]) The delimiter argument within csv.reader refers to the input file not the output.

What does appear between the columns as the delimiter? Are you sure it is '|' and not a comma? I am guessing because you do not have the correct delimiter you cannot use print row[2], row[3], row[4]. Can you post a line of the CSV?

Related

Accessing Data in csv.reader

Reading and splitting a .raw file for data processing

manipulating a csv file and writing its output to a new csv file in python

Python: Even after specifying delimiter, csv writer delimits at wrong place

Two columns from CSV file are not working for if-statement?

Categories

Resources