Reading and splitting a .raw file for data processing - python

Basically I have data from a mechanical test in the output format .raw and I want to access it in Python.
The file needs to be splitted using delimiter ";" so it contains 13 columns.
By doing this the idea is to index and pullout the desired information, which in my case is the "Extension mm" and "Load N" values as arrays in row 41 in order to create plot.
I have never worked with .raw files and I dont know what to do.
The file can be downloaded here:
https://drive.google.com/file/d/0B0GJeyFBNd4FNEp0elhIWGpWWWM/view?usp=sharing
Hope somebody can help me out there!

you can convert the raw file into csv file then use the csv module remember to set the delimeter=' ' otherwise by default it take comma as delimeter
import csv
with open('TST0002.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
for row in reader: //this will read each row line by line
print (row[0]) //you can use row[0] to get first element of that row.

Your file looks basically like a .tsv with 40 lines to skip. Could you try this ?
import csv
#export your file.raw to tsv
with open('TST0002.raw') as infile, open('new.tsv', 'w') as outfile:
lines = infile.readlines()[40:]
for line in lines:
outfile.write(line)
Or if you want to make directly some data analysis on your two columns :
import pandas as pd
df = pd.read_csv("TST0002.raw", sep="\t", skiprows=40, usecols=['Extension mm', 'Load N'])
print(df)
output:
Extension mm Load N
0 -118.284 0.1365034
1 -117.779 -0.08668576
2 -117.274 -0.1142517
3 -116.773 -0.1092401
4 -116.271 -0.1144083
5 -11.577 -0.1314806
6 -115.269 -0.03609632
7 -114.768 -0.06334914
....

Related

How to convert .dat to .csv using python? the data is being expressed in one column

Hi i'm trying to convert .dat file to .csv file.
But I have a problem with it.
I have a file .dat which looks like(column name)
region GPS name ID stop1 stop2 stopname1 stopname2 time1 time2 stopgps1 stopgps2
it delimiter is a tab.
so I want to convert dat file to csv file.
but the data keeps coming out in one column.
i try to that, using next code
import pandas as pd
with open('file.dat', 'r') as f:
df = pd.DataFrame([l.rstrip() for l in f.read().split()])
and
with open('file.dat', 'r') as input_file:
lines = input_file.readlines()
newLines = []
for line in lines:
newLine = line.strip('\t').split()
newLines.append(newLine)
with open('file.csv', 'w') as output_file:
file_writer = csv.writer(output_file)
file_writer.writerows(newLines)
But all the data is being expressed in one column.
(i want to express 15 column, 80,000 row, but it look 1 column, 1,200,000 row)
I want to convert this into a csv file with the original data structure.
Where is a mistake?
Please help me... It's my first time dealing with data in Python.
If you're already using pandas, you can just use pd.read_csv() with another delimiter:
df = pd.read_csv("file.dat", sep="\t")
df.to_csv("file.csv")
See also the documentation for read_csv and to_csv

How to add Header Columns to Nested JSON Values [duplicate]

I am trying to add a header to my CSV file.
I am importing data from a .csv file which has two columns of data, each containing float numbers. Example:
11 22
33 44
55 66
Now I want to add a header for both columns like:
ColA ColB
11 22
33 44
55 66
I have tried this:
with open('mycsvfile.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow(('ColA', 'ColB'))
I used 'a' to append the data, but this added the values in the bottom row of the file instead of the first row. Is there any way I can fix it?
One way is to read all the data in, then overwrite the file with the header and write the data out again. This might not be practical with a large CSV file:
#!python3
import csv
with open('file.csv',newline='') as f:
r = csv.reader(f)
data = [line for line in r]
with open('file.csv','w',newline='') as f:
w = csv.writer(f)
w.writerow(['ColA','ColB'])
w.writerows(data)
i think you should use pandas to read the csv file, insert the column headers/labels, and emit out the new csv file. assuming your csv file is comma-delimited. something like this should work:
from pandas import read_csv
df = read_csv('test.csv')
df.columns = ['a', 'b']
df.to_csv('test_2.csv')
I know the question was asked a long time back. But for others stumbling across this question, here's an alternative to Python.
If you have access to sed (you do if you are working on Linux or Mac; you can also download Ubuntu Bash on Windows 10 and sed will come with it), you can use this one-liner:
sed -i 1i"ColA,ColB" mycsvfile.csv
The -i will ensure that sed will edit in-place, which means sed will overwrite the file with the header at the top. This is risky.
If you want to create a new file instead, do this
sed 1i"ColA,ColB" mycsvfile.csv > newcsvfile.csv
In this case, You don't need the CSV module. You need the fileinput module as it allows in-place editing:
import fileinput
for line in fileinput.input(files=['mycsvfile.csv'], inplace=True):
if fileinput.isfirstline():
print 'ColA,ColB'
print line,
In the above code, the print statement will print to the file because of the inplace=True parameter.
For the issue where the first row of the CSV file gets replaced by the header, we need to add an option.
import pandas as pd
df = pd.read_csv('file.csv', **header=None**)
df.to_csv('file.csv', header = ['col1', 'col2'])
You can set reader.fieldnames in your code as list
like in your case
with open('mycsvfile.csv', 'a') as fd:
reader = csv.DictReader(fd)
reader.fieldnames = ["ColA" , "ColB"]
for row in fd

Deleting Rows in a .csv File (Python)

Good evening,
I'm having a problem with a code I'm writing, and I would love to get advice. I want to do the following:
Remove rows in a .csv file that contain a specific value (-3.4028*10^38)
Write a new .csv
The file I'm working with is large (12.2 GB, 87 million rows), and has 6 columns within it, with the first 5 columns being numerical values, and the last value containing text.
Here is my code:
import csv
directory = "/media/gman/Folder1/processed/test_removal1.csv"
with open('run1.csv', 'r') as fin, open(directory, 'w', newline='') as fout:
# define reader and writer objects
reader = csv.reader(fin, skipinitialspace=False)
writer = csv.writer(fout, delimiter=',')
# write headers
writer.writerow(next(reader))
# iterate and write rows based on condition
for i in reader:
if (i[-1]) == -3.4028E38:
writer.writerow(i)
When I run this I get the following error message:
Error: line contains NUL
File "/media/gman/Aerospace_Classes/Programs/csv_remove.py", line 19, in <module>
for i in reader: Error: line contains NUL
I'm not sure how to proceed. If anyone has any suggestions, please let me know. Thank you.
I figured it out. Here is what I ended up doing:
#IMPORT LIBRARIES
import pandas as pd
#IMPORT FILE PATH
directory = '/media/gman/Grant/Maps/processed_maps/csv_combined.csv'
#CREATE DATAFRAME FROM IMPORTED CSV
data = pd.read_csv(directory)
data.head()
data.drop(data[data.iloc[:,2] < -100000].index, inplace=True) #remove rows that contain altitude values greater than -100,000 meters.
# this is to remove the -3.402823E038 meter altitude values that keep coming up.
#CONVERT PROCESSED DATAFRAME INTO NEW CSV FILE
df = data.to_csv(r'/media/gman/Grant/Maps/processed_maps/corrected_altitude_data.csv') #export good data to this file.
I went with pandas to remove rows based on a logic argument, this made a dataframe. I then exported the dataframe into a csv file.

CSV row splitting

I am working on implementation of a data mining algorithm in python. I have a large csv file which I am using as the input file to get the itemsets. I want to split the csv file into rows through program. Can someone tell how to make it possible?
import pandas as pd
pd.read_csv(file_name,sep='rows separator')
see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html for details.
I assume the rows are delimited by new-lines and that columns are delimited by commas. In which case just python already knows how to read it line by line which in your case means row by row. Then each row can be split where there are commas.
item_sets=[] #Will put the data in here
with open(filename, "r") as file: # open the file
for data_row in file: #get data one row at a time
# split up the row into columns, stripping whitespace from each one
# and store it in item_sets
item_sets.append( [x.strip() for x in data_row.split(",")] )
import csv
with open('eggs.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
print row
will printout all rows of a csv file as lists
I assume pandas impelmentation of read_csv is more efficient, but the csv module is built into python so if you don't want any dependencies, you can use it.

Append a Header for CSV file?

I am trying to add a header to my CSV file.
I am importing data from a .csv file which has two columns of data, each containing float numbers. Example:
11 22
33 44
55 66
Now I want to add a header for both columns like:
ColA ColB
11 22
33 44
55 66
I have tried this:
with open('mycsvfile.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow(('ColA', 'ColB'))
I used 'a' to append the data, but this added the values in the bottom row of the file instead of the first row. Is there any way I can fix it?
One way is to read all the data in, then overwrite the file with the header and write the data out again. This might not be practical with a large CSV file:
#!python3
import csv
with open('file.csv',newline='') as f:
r = csv.reader(f)
data = [line for line in r]
with open('file.csv','w',newline='') as f:
w = csv.writer(f)
w.writerow(['ColA','ColB'])
w.writerows(data)
i think you should use pandas to read the csv file, insert the column headers/labels, and emit out the new csv file. assuming your csv file is comma-delimited. something like this should work:
from pandas import read_csv
df = read_csv('test.csv')
df.columns = ['a', 'b']
df.to_csv('test_2.csv')
I know the question was asked a long time back. But for others stumbling across this question, here's an alternative to Python.
If you have access to sed (you do if you are working on Linux or Mac; you can also download Ubuntu Bash on Windows 10 and sed will come with it), you can use this one-liner:
sed -i 1i"ColA,ColB" mycsvfile.csv
The -i will ensure that sed will edit in-place, which means sed will overwrite the file with the header at the top. This is risky.
If you want to create a new file instead, do this
sed 1i"ColA,ColB" mycsvfile.csv > newcsvfile.csv
In this case, You don't need the CSV module. You need the fileinput module as it allows in-place editing:
import fileinput
for line in fileinput.input(files=['mycsvfile.csv'], inplace=True):
if fileinput.isfirstline():
print 'ColA,ColB'
print line,
In the above code, the print statement will print to the file because of the inplace=True parameter.
For the issue where the first row of the CSV file gets replaced by the header, we need to add an option.
import pandas as pd
df = pd.read_csv('file.csv', **header=None**)
df.to_csv('file.csv', header = ['col1', 'col2'])
You can set reader.fieldnames in your code as list
like in your case
with open('mycsvfile.csv', 'a') as fd:
reader = csv.DictReader(fd)
reader.fieldnames = ["ColA" , "ColB"]
for row in fd

Categories

Resources