Good evening,
I'm having a problem with a code I'm writing, and I would love to get advice. I want to do the following:
Remove rows in a .csv file that contain a specific value (-3.4028*10^38)
Write a new .csv
The file I'm working with is large (12.2 GB, 87 million rows), and has 6 columns within it, with the first 5 columns being numerical values, and the last value containing text.
Here is my code:
import csv
directory = "/media/gman/Folder1/processed/test_removal1.csv"
with open('run1.csv', 'r') as fin, open(directory, 'w', newline='') as fout:
# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=False)
writer = csv.writer(fout, delimiter=',')
# write headers
writer.writerow(next(reader))
# iterate and write rows based on condition
for i in reader:
if (i[-1]) == -3.4028E38:
writer.writerow(i)
When I run this I get the following error message:
Error: line contains NUL
File "/media/gman/Aerospace_Classes/Programs/csv_remove.py", line 19, in <module>
for i in reader: Error: line contains NUL
I'm not sure how to proceed. If anyone has any suggestions, please let me know. Thank you.
I figured it out. Here is what I ended up doing:
#IMPORT LIBRARIES
import pandas as pd
#IMPORT FILE PATH
directory = '/media/gman/Grant/Maps/processed_maps/csv_combined.csv'
#CREATE DATAFRAME FROM IMPORTED CSV
data = pd.read_csv(directory)
data.head()
data.drop(data[data.iloc[:,2] < -100000].index, inplace=True) #remove rows that contain altitude values greater than -100,000 meters.
# this is to remove the -3.402823E038 meter altitude values that keep coming up.
#CONVERT PROCESSED DATAFRAME INTO NEW CSV FILE
df = data.to_csv(r'/media/gman/Grant/Maps/processed_maps/corrected_altitude_data.csv') #export good data to this file.
I went with pandas to remove rows based on a logic argument, this made a dataframe. I then exported the dataframe into a csv file.
Related
Hi i'm trying to convert .dat file to .csv file.
But I have a problem with it.
I have a file .dat which looks like(column name)
region GPS name ID stop1 stop2 stopname1 stopname2 time1 time2 stopgps1 stopgps2
it delimiter is a tab.
so I want to convert dat file to csv file.
but the data keeps coming out in one column.
i try to that, using next code
import pandas as pd
with open('file.dat', 'r') as f:
df = pd.DataFrame([l.rstrip() for l in f.read().split()])
and
with open('file.dat', 'r') as input_file:
lines = input_file.readlines()
newLines = []
for line in lines:
newLine = line.strip('\t').split()
newLines.append(newLine)
with open('file.csv', 'w') as output_file:
file_writer = csv.writer(output_file)
file_writer.writerows(newLines)
But all the data is being expressed in one column.
(i want to express 15 column, 80,000 row, but it look 1 column, 1,200,000 row)
I want to convert this into a csv file with the original data structure.
Where is a mistake?
Please help me... It's my first time dealing with data in Python.
If you're already using pandas, you can just use pd.read_csv() with another delimiter:
df = pd.read_csv("file.dat", sep="\t")
df.to_csv("file.csv")
See also the documentation for read_csv and to_csv
I have a folder with hundreds of csv files with 9 values from a temperature sensor in it. The columns are sensor_id, lat, lon (for the coordinates) and some other stuff that i don't need. The columns that i need are just the three [timestamp, temperature and humidity].
I already tried to use a module to import just the columns that i want and
i tried to delete the columns that i dont want with loops.
slowly i despair, can someone help me pls?
If you are open to use Pandas, you can do it simply by using usecols parameter, while reading the csv file.
df = pd.read_csv('your_file/path/file.csv', usecols=['col1', 'col2'])
print(df.shape)
df.head()
Here's some code that should do it, just add in your target directory and change the numbers on the last line to the index of the column you want (with the first column being 0):
import os
import csv
targetdir = "" # fill this in
allrows = []
files = os.listdir(targetDir)
for file in files:
with open('innovators.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
allrows.append([row[1], row[3], row[5]])
I'm beginner in python, I'm trying to read a csv and to extract some of the result in another file:
import csv
with open('test.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:
print(row[0])
I get the error IndexError: list index out of range. It happens when I select a row which doesn't exist. However, my csv as 5 columns and I can't isolate any of them.
Use the python Pandas library for File reading.
Make sure the encoding format for the CSV File.
import pandas as pd
data = pd.read_csv("file_name.csv")
data.head() //it will print the first 5 rows
//for 1 row
data.head(1)
check this, and you'll get the answer for yoru question
I am trying to copy few columns from a csv file to a new CSV file. I have written below code to fulfill my requirements. But it is not giving me the expected output. Can someone please help me to get the required results..
import csv
f = csv.reader(open("C:/Users/...../file.csv","rb"))
f2= csv.writer(open("C:/Users/.../test123.csv","wb"))
for row in f:
for column in row:
f2.writerow((column[1],column[2],column[3],column[7]))
f.close()
f2.close()
The second iteration over each row is not necessary. Just access the columns in that row with the column index.
Also, I don't think there's a close method in csv reader and writer.
import csv
f = csv.reader(open("file.csv","rb"))
f2= csv.writer(open("test123.csv","wb"))
for row in f:
f2.writerow((row[1],row[2],row[3],row[7]))
I have an excel spreadsheet saved as a CSV file, but cannot find a way to call individual values from cells into Python using the CSV module. Any help would be greatly appreciated
There is also a Python library capable of reading xls data. Have a look at python-xlrd.
For writing xls data, you can use python-xlwt.
The csv module provide readers that iterate over the rows of a csv file - the rows are lists of strings. One way to get access to individual cells would be to:
Read the entire file in as a list of lists
import csv
with open('test.csv', 'r') as f:
reader = csv.reader(f)
the_whole_file = list(reader)
Then access the individual cells by indexing into the_whole_file. The first index is the row and the second index is the column - both are zero based. To access the cell at the second row, fourth column:
row = 1
column = 3
cell_R1_C3 = the_whole_file[row][column]
print cell_R1_C3
If you have the excel file as a CSV, you can use csv.reader
import csv
myFilePath = "/Path/To/Your/File"
with open(myFilePath,'rb') as csvfile:
reader = csv.reader( csvfile, delimiter=',' )
for row in reader:
# 'row' has all the cells (thanks to wwii for the fix!). Get the first 4 columns
a, b, c, d = row[:4]