I need to do some calculations using a .csv file. The first 4 rows of the file are header information, so actual data starts at row 5 down to row 80,000+, and I will be calculating averages for specific columns. How do I only process lines after the header information?
This is part of my code so far:
for datafile in datafolder:
# open file in read mode
o_csvFile = open(datafile)
# get the 5th line
fifthLine = linecache.getline(roverFile, 5)
# use while loop to read each line in file
startReading >= fifthLine
while startReading:
line = o_csvFile.readline()
With Pandas you can use the skiprows argument of read_csv() to begin after a set of header rows:
import pandas as pd
pd.read_csv("data.csv", skiprows=4)
Related
I am trying to insert data into a specific cell in csv. My code is as follows.
The existing file.
Output
The data in cell A1("Custmor") is replaced with new data("Name").
My code is as follows.
import pandas as pd
#The existing CSV file
file_source = r"C:\Users\user\Desktop\Customer.csv"
#Read the existing CSV file
df = pd.read_csv(file_source)
#Insert"Name"into cell A1 to replace "Customer"
df[1][0]="Name"
#Save the file
df.to_csv(file_source, index=False)
And it doesn't work. Please help me finding the bug.
Customer is column header, you need do
df = df.rename(columns={'Customer': 'Name'})
I am assuming you are going to want to work with header less csv so if that's the case, your code is already correct, just need to add header=None while reading from csv
import pandas as pd
#The existing CSV file
file_source = r"C:\Users\user\Desktop\Customer.csv"
#Read the existing CSV file
df = pd.read_csv(file_source,header=None) #notice this line is now different
#Insert"Name"into cell A1 to replace "Customer"
df[1][0]="Name"
#Save the file
df.to_csv(file_source, index=False,header=None) #made this header less too
Hi i'm trying to convert .dat file to .csv file.
But I have a problem with it.
I have a file .dat which looks like(column name)
region GPS name ID stop1 stop2 stopname1 stopname2 time1 time2 stopgps1 stopgps2
it delimiter is a tab.
so I want to convert dat file to csv file.
but the data keeps coming out in one column.
i try to that, using next code
import pandas as pd
with open('file.dat', 'r') as f:
df = pd.DataFrame([l.rstrip() for l in f.read().split()])
and
with open('file.dat', 'r') as input_file:
lines = input_file.readlines()
newLines = []
for line in lines:
newLine = line.strip('\t').split()
newLines.append(newLine)
with open('file.csv', 'w') as output_file:
file_writer = csv.writer(output_file)
file_writer.writerows(newLines)
But all the data is being expressed in one column.
(i want to express 15 column, 80,000 row, but it look 1 column, 1,200,000 row)
I want to convert this into a csv file with the original data structure.
Where is a mistake?
Please help me... It's my first time dealing with data in Python.
If you're already using pandas, you can just use pd.read_csv() with another delimiter:
df = pd.read_csv("file.dat", sep="\t")
df.to_csv("file.csv")
See also the documentation for read_csv and to_csv
My program writes data into a CSV file using the pandas' to_csv function. At first run, the CSV file is originally empty and my code wrote data in it (which is supposed to be). At the second run, (take note that I'm still using the same CSV file), my code wrote data in it again (which is good). The problem is, there is a large number of empty rows between the data from the first run and the data from the second run.
Below is my code:
#place into a file
csvFile = open(file, 'a', newline = '',encoding='utf-8')
if file_empty == True:
df.to_csv(csvFile, sep=',', columns=COLS, index=False, mode='ab', encoding='utf-8') #header true
else:
df.to_csv(csvFile, sep=',', columns=COLS, header=False, index=False, mode='ab', encoding='utf-8') #header false
I used the variable file_empty in order for the program to not write column headers if there is already data present in the CSV file.
Below is the sample output from the CSV file:
Last data from first run is in line 396 of CSV file,
first row data from second run is in line 1308 of the same CSV file.
So there are empty rows starting from line 397 up to line 1307. How can I remove them so that when the program is run again, there is no empty rows between them?
Here is the data sample and code to append the data and remove blank lines..
below are the lines may help you
import pandas
conso_frame = pandas.read_csv('consofile1.csv')
df_2 = pandas.read_csv('csvfile2.csv')
# Column Names should be same
conso_frame = conso_frame.append(df_2)
print(conso_frame)
conso_frame.dropna(subset = ["Intent"], inplace=True)
print(conso_frame)
conso_frame.to_csv('consofile1.csv', index=False)
Good evening,
I'm having a problem with a code I'm writing, and I would love to get advice. I want to do the following:
Remove rows in a .csv file that contain a specific value (-3.4028*10^38)
Write a new .csv
The file I'm working with is large (12.2 GB, 87 million rows), and has 6 columns within it, with the first 5 columns being numerical values, and the last value containing text.
Here is my code:
import csv
directory = "/media/gman/Folder1/processed/test_removal1.csv"
with open('run1.csv', 'r') as fin, open(directory, 'w', newline='') as fout:
# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=False)
writer = csv.writer(fout, delimiter=',')
# write headers
writer.writerow(next(reader))
# iterate and write rows based on condition
for i in reader:
if (i[-1]) == -3.4028E38:
writer.writerow(i)
When I run this I get the following error message:
Error: line contains NUL
File "/media/gman/Aerospace_Classes/Programs/csv_remove.py", line 19, in <module>
for i in reader: Error: line contains NUL
I'm not sure how to proceed. If anyone has any suggestions, please let me know. Thank you.
I figured it out. Here is what I ended up doing:
#IMPORT LIBRARIES
import pandas as pd
#IMPORT FILE PATH
directory = '/media/gman/Grant/Maps/processed_maps/csv_combined.csv'
#CREATE DATAFRAME FROM IMPORTED CSV
data = pd.read_csv(directory)
data.head()
data.drop(data[data.iloc[:,2] < -100000].index, inplace=True) #remove rows that contain altitude values greater than -100,000 meters.
# this is to remove the -3.402823E038 meter altitude values that keep coming up.
#CONVERT PROCESSED DATAFRAME INTO NEW CSV FILE
df = data.to_csv(r'/media/gman/Grant/Maps/processed_maps/corrected_altitude_data.csv') #export good data to this file.
I went with pandas to remove rows based on a logic argument, this made a dataframe. I then exported the dataframe into a csv file.
I'm trying to skip the first line in a csv.file of the format:
#utm32Hetrs89_h_dvr90
667924.1719,6161062.7744,-37.15227
667924.9051,6161063.4086,-37.15225
667925.6408,6161064.0452,-37.15223
667926.2119,6161064.6107,-37.15221
667926.4881,6161065.0492,-37.15220
667926.7642,6161065.4876,-37.15220
667927.0403,6161065.9260,-37.15219
667927.3164,6161066.3644,-37.15218
This is my code so far:
with open('C:\\Users\\Bruger\\Desktop\\dtu\\S\\data\\WL_geoid_values.txt',newline='') as file:
readCSV = csv.reader(file, delimiter=',',skipinitialspace=True)
header = next(readCSV)
for row in readCSV:
coordsx.append(float(row[0]))
coordsy.append(float(row[1]))
h_gravs.append(float(row[2]))
I get an error saying i can't convert a string to a float. How do i make sure that it actually skips the first line before i start reading the rows?
I humbly suggest to use pandas to read CSV files. You can define the lines to skip and the format in a few lines:
import pandas as pd
# One single line to read all the data with the right format
df = pd.read_csv('C:\\Users\\Bruger\\Desktop\\dtu\\S\\data\\WL_geoid_values.txt',
skiprows = 1, # Skip first row
names = ['coordsx','coordsy','h_gravs'] # Rename each column
)
# Separating each column and turning then into lists
coordsx = df['coordsx'].tolist()
coordsy= df['coordsy'].tolist()
h_gravs= df['h_gravs'].tolist()