Adding specific column from csv file to new csv file - python
I am trying to copy few columns from a csv file to a new CSV file. I have written below code to fulfill my requirements. But it is not giving me the expected output. Can someone please help me to get the required results..
import csv
f = csv.reader(open("C:/Users/...../file.csv","rb"))
f2= csv.writer(open("C:/Users/.../test123.csv","wb"))
for row in f:
for column in row:
f2.writerow((column[1],column[2],column[3],column[7]))
f.close()
f2.close()
The second iteration over each row is not necessary. Just access the columns in that row with the column index.
Also, I don't think there's a close method in csv reader and writer.
import csv
f = csv.reader(open("file.csv","rb"))
f2= csv.writer(open("test123.csv","wb"))
for row in f:
f2.writerow((row[1],row[2],row[3],row[7]))
Related
How to convert .dat to .csv using python? the data is being expressed in one column
Hi i'm trying to convert .dat file to .csv file. But I have a problem with it. I have a file .dat which looks like(column name) region GPS name ID stop1 stop2 stopname1 stopname2 time1 time2 stopgps1 stopgps2 it delimiter is a tab. so I want to convert dat file to csv file. but the data keeps coming out in one column. i try to that, using next code import pandas as pd with open('file.dat', 'r') as f: df = pd.DataFrame([l.rstrip() for l in f.read().split()]) and with open('file.dat', 'r') as input_file: lines = input_file.readlines() newLines = [] for line in lines: newLine = line.strip('\t').split() newLines.append(newLine) with open('file.csv', 'w') as output_file: file_writer = csv.writer(output_file) file_writer.writerows(newLines) But all the data is being expressed in one column. (i want to express 15 column, 80,000 row, but it look 1 column, 1,200,000 row) I want to convert this into a csv file with the original data structure. Where is a mistake? Please help me... It's my first time dealing with data in Python.
If you're already using pandas, you can just use pd.read_csv() with another delimiter: df = pd.read_csv("file.dat", sep="\t") df.to_csv("file.csv") See also the documentation for read_csv and to_csv
Deleting Rows in a .csv File (Python)
Good evening, I'm having a problem with a code I'm writing, and I would love to get advice. I want to do the following: Remove rows in a .csv file that contain a specific value (-3.4028*10^38) Write a new .csv The file I'm working with is large (12.2 GB, 87 million rows), and has 6 columns within it, with the first 5 columns being numerical values, and the last value containing text. Here is my code: import csv directory = "/media/gman/Folder1/processed/test_removal1.csv" with open('run1.csv', 'r') as fin, open(directory, 'w', newline='') as fout: # define reader and writer objects reader = csv.reader(fin, skipinitialspace=False) writer = csv.writer(fout, delimiter=',') # write headers writer.writerow(next(reader)) # iterate and write rows based on condition for i in reader: if (i[-1]) == -3.4028E38: writer.writerow(i) When I run this I get the following error message: Error: line contains NUL File "/media/gman/Aerospace_Classes/Programs/csv_remove.py", line 19, in <module> for i in reader: Error: line contains NUL I'm not sure how to proceed. If anyone has any suggestions, please let me know. Thank you.
I figured it out. Here is what I ended up doing: #IMPORT LIBRARIES import pandas as pd #IMPORT FILE PATH directory = '/media/gman/Grant/Maps/processed_maps/csv_combined.csv' #CREATE DATAFRAME FROM IMPORTED CSV data = pd.read_csv(directory) data.head() data.drop(data[data.iloc[:,2] < -100000].index, inplace=True) #remove rows that contain altitude values greater than -100,000 meters. # this is to remove the -3.402823E038 meter altitude values that keep coming up. #CONVERT PROCESSED DATAFRAME INTO NEW CSV FILE df = data.to_csv(r'/media/gman/Grant/Maps/processed_maps/corrected_altitude_data.csv') #export good data to this file. I went with pandas to remove rows based on a logic argument, this made a dataframe. I then exported the dataframe into a csv file.
CSV file Reading rows in Python
I'm beginner in python, I'm trying to read a csv and to extract some of the result in another file: import csv with open('test.csv') as csvfile: readCSV = csv.reader(csvfile, delimiter=',') for row in readCSV: print(row[0]) I get the error IndexError: list index out of range. It happens when I select a row which doesn't exist. However, my csv as 5 columns and I can't isolate any of them.
Use the python Pandas library for File reading. Make sure the encoding format for the CSV File. import pandas as pd data = pd.read_csv("file_name.csv") data.head() //it will print the first 5 rows //for 1 row data.head(1) check this, and you'll get the answer for yoru question
manipulating a csv file and writing its output to a new csv file in python
I have a simple file named saleem.csv which contains the following lines of csv information: File,Run,Module,Name,,,,, General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterference,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterference,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferencePartial,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferencePartial,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithInterferenceDropped,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].nic.phy,nbFramesWithoutInterferenceDropped,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,broadcast queued,3,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies sent,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].appl,replies received,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,nominal,1.188e+07,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,total,1232.22,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,lifetime,-1,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,Mean power consumption,55.7565,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,num devices,1,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,physical layer,0,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,device total (mWs),1232.22,NaN,NaN,NaN,NaN General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,account,0,1,2,3,4 General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,energy (mWs),0,207.519,1024.7,0,0 General-0.sca,General-0-20160706-14:58:51-10463,MyNetwork.node[0].batteryStats,time (s),0,3.83442,18.2656,0, I want to skip the first line, read this file and only write column[2] and column[4] to a new csv file named out.csv. I have written the following to script to do the job. import csv with open('saleem.csv') as f: readcsv = csv.reader(f) for row in readcsv: dele = (row[2], row[4]) print dele with open('out.csv', 'w+') as j: writecsv = csv.writer(j) #for row in dele: for row in dele: writecsv.writerows(dele) f.close() j.close() This produces the following output: M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s 0 M,y,N,e,t,w,o,r,k,.,n,o,d,e,[,4,],.,b,a,t,t,e,r,y,S,t,a,t,s 0 Please help me, Sorry for the mistake previously please, as i mistakenly wrote row.
Edited to reflect revised question Some problems I can see: P1: writerows(...) for row in dele: writecsv.writerows(dele) writerows takes a list of rows to write to the csv file. So it shouldn't be inside a loop where you iterate over all rows and attempt to write them individually. P2: overwriting for row in readcsv: dele = (row[2], row[4]) You are continuously overwriting dele, so you aren't going to be keeping track of row[2] and row[4] from every row. What you could do instead: dele = [] with open('saleem.csv') as f: readcsv = csv.reader(f) for row in readcsv: dele.append([row[2], row[4]) print([row[2], row[4]]) with open('out.csv', 'w+') as j: writecsv.csvwriter(j) writecsv.writerows(dele) This produced output: MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].nic.phy,0 MyNetwork.node[0].appl,3 MyNetwork.node[0].appl,0 MyNetwork.node[0].appl,0 MyNetwork.node[0].batteryStats,1.188e+07 MyNetwork.node[0].batteryStats,1232.22 MyNetwork.node[0].batteryStats,-1 MyNetwork.node[0].batteryStats,55.7565 MyNetwork.node[0].batteryStats,1 MyNetwork.node[0].batteryStats,0 MyNetwork.node[0].batteryStats,1232.22 MyNetwork.node[0].batteryStats,0 MyNetwork.node[0].batteryStats,0 MyNetwork.node[0].batteryStats,0 Also, unrelated to your issue at hand, the following code is unnecessary: f.close() j.close() The reason why with open(...): syntax is so widely used, is because it handles gracefully closing the file for you. You don't need to separately close it yourself. As soon as the with block ends, the file will be closed.
I would suggest using the pandas library. It makes working with csv files very easy. import pandas as pd #standard convention for importing pandas # reads the csv file into a pandas dataframe dataframe = pd.read_csv('saleem.csv') # make a new dataframe with just columns 2 and 4 print_dataframe = dataframe.iloc[:,[2,4]] # output the csv file, but don't include the index numbers or header, just the data print_dataframe.to_csv('out.csv', index=False, header=False) If you use Ipython or Jupyter Notebook, you can type dataframe.head() to see the first few values of the dataframe. There is a lot more you can do with the library that might be worth learning, but in general it is a great way to read in, filter, and process csv data.
Python - Printing individual cells from an Excel spreadsheet in CSV format
I have an excel spreadsheet saved as a CSV file, but cannot find a way to call individual values from cells into Python using the CSV module. Any help would be greatly appreciated
There is also a Python library capable of reading xls data. Have a look at python-xlrd. For writing xls data, you can use python-xlwt.
The csv module provide readers that iterate over the rows of a csv file - the rows are lists of strings. One way to get access to individual cells would be to: Read the entire file in as a list of lists import csv with open('test.csv', 'r') as f: reader = csv.reader(f) the_whole_file = list(reader) Then access the individual cells by indexing into the_whole_file. The first index is the row and the second index is the column - both are zero based. To access the cell at the second row, fourth column: row = 1 column = 3 cell_R1_C3 = the_whole_file[row][column] print cell_R1_C3
If you have the excel file as a CSV, you can use csv.reader import csv myFilePath = "/Path/To/Your/File" with open(myFilePath,'rb') as csvfile: reader = csv.reader( csvfile, delimiter=',' ) for row in reader: # 'row' has all the cells (thanks to wwii for the fix!). Get the first 4 columns a, b, c, d = row[:4]