Adding specific column from csv file to new csv file - python

I am trying to copy few columns from a csv file to a new CSV file. I have written below code to fulfill my requirements. But it is not giving me the expected output. Can someone please help me to get the required results..
import csv
f = csv.reader(open("C:/Users/...../file.csv","rb"))
f2= csv.writer(open("C:/Users/.../test123.csv","wb"))
for row in f:
for column in row:
f2.writerow((column[1],column[2],column[3],column[7]))
f.close()
f2.close()

The second iteration over each row is not necessary. Just access the columns in that row with the column index.
Also, I don't think there's a close method in csv reader and writer.
import csv
f = csv.reader(open("file.csv","rb"))
f2= csv.writer(open("test123.csv","wb"))
for row in f:
f2.writerow((row[1],row[2],row[3],row[7]))

Related

How to convert .dat to .csv using python? the data is being expressed in one column

Hi i'm trying to convert .dat file to .csv file.
But I have a problem with it.
I have a file .dat which looks like(column name)
region GPS name ID stop1 stop2 stopname1 stopname2 time1 time2 stopgps1 stopgps2
it delimiter is a tab.
so I want to convert dat file to csv file.
but the data keeps coming out in one column.
i try to that, using next code
import pandas as pd
with open('file.dat', 'r') as f:
df = pd.DataFrame([l.rstrip() for l in f.read().split()])
and
with open('file.dat', 'r') as input_file:
lines = input_file.readlines()
newLines = []
for line in lines:
newLine = line.strip('\t').split()
newLines.append(newLine)
with open('file.csv', 'w') as output_file:
file_writer = csv.writer(output_file)
file_writer.writerows(newLines)
But all the data is being expressed in one column.
(i want to express 15 column, 80,000 row, but it look 1 column, 1,200,000 row)
I want to convert this into a csv file with the original data structure.
Where is a mistake?
Please help me... It's my first time dealing with data in Python.
If you're already using pandas, you can just use pd.read_csv() with another delimiter:
df = pd.read_csv("file.dat", sep="\t")
df.to_csv("file.csv")
See also the documentation for read_csv and to_csv

Deleting Rows in a .csv File (Python)

Good evening,
I'm having a problem with a code I'm writing, and I would love to get advice. I want to do the following:
Remove rows in a .csv file that contain a specific value (-3.4028*10^38)
Write a new .csv
The file I'm working with is large (12.2 GB, 87 million rows), and has 6 columns within it, with the first 5 columns being numerical values, and the last value containing text.
Here is my code:
import csv
directory = "/media/gman/Folder1/processed/test_removal1.csv"
with open('run1.csv', 'r') as fin, open(directory, 'w', newline='') as fout:
# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=False)
writer = csv.writer(fout, delimiter=',')
# write headers
writer.writerow(next(reader))
# iterate and write rows based on condition
for i in reader:
if (i[-1]) == -3.4028E38:
writer.writerow(i)
When I run this I get the following error message:
Error: line contains NUL
File "/media/gman/Aerospace_Classes/Programs/csv_remove.py", line 19, in <module>
for i in reader: Error: line contains NUL
I'm not sure how to proceed. If anyone has any suggestions, please let me know. Thank you.
I figured it out. Here is what I ended up doing:
#IMPORT LIBRARIES
import pandas as pd
#IMPORT FILE PATH
directory = '/media/gman/Grant/Maps/processed_maps/csv_combined.csv'
#CREATE DATAFRAME FROM IMPORTED CSV
data = pd.read_csv(directory)
data.head()
data.drop(data[data.iloc[:,2] < -100000].index, inplace=True) #remove rows that contain altitude values greater than -100,000 meters.
# this is to remove the -3.402823E038 meter altitude values that keep coming up.
#CONVERT PROCESSED DATAFRAME INTO NEW CSV FILE
df = data.to_csv(r'/media/gman/Grant/Maps/processed_maps/corrected_altitude_data.csv') #export good data to this file.
I went with pandas to remove rows based on a logic argument, this made a dataframe. I then exported the dataframe into a csv file.

CSV file Reading rows in Python

I'm beginner in python, I'm trying to read a csv and to extract some of the result in another file:
import csv
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row[0])
I get the error IndexError: list index out of range. It happens when I select a row which doesn't exist. However, my csv as 5 columns and I can't isolate any of them.
Use the python Pandas library for File reading.
Make sure the encoding format for the CSV File.
import pandas as pd
data = pd.read_csv("file_name.csv")
data.head() //it will print the first 5 rows
//for 1 row
data.head(1)
check this, and you'll get the answer for yoru question

manipulating a csv file and writing its output to a new csv file in python

I have a simple file named saleem.csv which contains the following lines of csv information:
File,Run,Module,Name,,,,,
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterference,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferencePartial,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferenceDropped,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,broadcast queued,3,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies sent,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies received,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,nominal,1.188e+07,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,total,1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,lifetime,-1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,Mean power consumption,55.7565,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,num devices,1,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,physical layer,0,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,device total (mWs),1232.22,NaN,NaN,NaN,NaN
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,account,0,1,2,3,4
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,energy (mWs),0,207.519,1024.7,0,0
General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,time (s),0,3.83442,18.2656,0,
I want to skip the first line, read this file and only write column[2] and column[4] to a new csv file named out.csv. I have written the following to script to do the job.
import csv
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele = (row[2], row[4])
print dele
with open('out.csv', 'w+') as j:
writecsv = csv.writer(j)
#for row in dele:
for row in dele:
writecsv.writerows(dele)
f.close()
j.close()
This produces the following output:
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s
0
Please help me, Sorry for the mistake previously please, as i mistakenly wrote row.
Edited to reflect revised question
Some problems I can see:
P1: writerows(...)
for row in dele:
writecsv.writerows(dele)
writerows takes a list of rows to write to the csv file. So it shouldn't be inside a loop where you iterate over all rows and attempt to write them individually.
P2: overwriting
for row in readcsv:
dele = (row[2], row[4])
You are continuously overwriting dele, so you aren't going to be keeping track of row[2] and row[4] from every row.
What you could do instead:
dele = []
with open('saleem.csv') as f:
readcsv = csv.reader(f)
for row in readcsv:
dele.append([row[2], row[4])
print([row[2], row[4]])
with open('out.csv', 'w+') as j:
writecsv.csvwriter(j)
writecsv.writerows(dele)
This produced output:
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].nic.phy,0
MyNetwork.node[0].appl,3
MyNetwork.node[0].appl,0
MyNetwork.node[0].appl,0
MyNetwork.node[0].batteryStats,1.188e+07
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,-1
MyNetwork.node[0].batteryStats,55.7565
MyNetwork.node[0].batteryStats,1
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,1232.22
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
MyNetwork.node[0].batteryStats,0
Also, unrelated to your issue at hand, the following code is unnecessary:
f.close()
j.close()
The reason why with open(...): syntax is so widely used, is because it handles gracefully closing the file for you. You don't need to separately close it yourself. As soon as the with block ends, the file will be closed.
I would suggest using the pandas library.
It makes working with csv files very easy.
import pandas as pd #standard convention for importing pandas
# reads the csv file into a pandas dataframe
dataframe = pd.read_csv('saleem.csv')
# make a new dataframe with just columns 2 and 4
print_dataframe = dataframe.iloc[:,[2,4]]
# output the csv file, but don't include the index numbers or header, just the data
print_dataframe.to_csv('out.csv', index=False, header=False)
If you use Ipython or Jupyter Notebook, you can type
dataframe.head()
to see the first few values of the dataframe. There is a lot more you can do with the library that might be worth learning, but in general it is a great way to read in, filter, and process csv data.

Python - Printing individual cells from an Excel spreadsheet in CSV format

I have an excel spreadsheet saved as a CSV file, but cannot find a way to call individual values from cells into Python using the CSV module. Any help would be greatly appreciated
There is also a Python library capable of reading xls data. Have a look at python-xlrd.
For writing xls data, you can use python-xlwt.
The csv module provide readers that iterate over the rows of a csv file - the rows are lists of strings. One way to get access to individual cells would be to:
Read the entire file in as a list of lists
import csv
with open('test.csv', 'r') as f:
reader = csv.reader(f)
the_whole_file = list(reader)
Then access the individual cells by indexing into the_whole_file. The first index is the row and the second index is the column - both are zero based. To access the cell at the second row, fourth column:
row = 1
column = 3
cell_R1_C3 = the_whole_file[row][column]
print cell_R1_C3
If you have the excel file as a CSV, you can use csv.reader
import csv
myFilePath = "/Path/To/Your/File"
with open(myFilePath,'rb') as csvfile:
reader = csv.reader( csvfile, delimiter=',' )
for row in reader:
# 'row' has all the cells (thanks to wwii for the fix!). Get the first 4 columns
a, b, c, d = row[:4]

Categories

Resources