I am trying to convert this dataset: COCOMO81 to arff.
Before converting to .arff, I am trying to convert it to .csv
I am following this LINK to do this.
I got that dataset from promise site. I copied the entire page to notepad as cocomo81.txt and now I am trying to convert that cocomo81.txt file to .csv using python.
(I intend to convert the .csv file to .arff later using weka)
However, when I run
import pandas as pd
read_file = pd.read_csv(r"cocomo81.txt")
I get THIS ParserError.
To fix this, I followed this solution and modified my command to
read_file = pd.read_csv(r"cocomo81.txt",on_bad_lines='warn')
I got a bunch of warnings - you can see what it looks like here
and then I ran
read_file.to_csv(r'.\cocomo81csv.csv',index=None)
But it seems that the fix for ParserError didn't work in my case because my cocomo81csv.csv file looks like THIS in Excel.
Can someone please help me understand where I am going wrong and how can I use datasets from the promise repository in .arff format?
Looks like it's a csv file with comments as the first lines. The comment lines are indicated by % characters, but also #(?), and the actual csv data starts at line 230.
You should skip the first rows and manually set the column names, try something like this:
# set column names manually
col_names = ["rely", "data", "cplx", "time", "stor", "virt", "turn", "acap", "aexp", "pcap", "vexp", "lexp", "modp", "tool", "sced", "loc", "actual" ]
filename = "cocomo81.arff.txt"
# read csv data
df = pd.read_csv(filename, skiprows=229, sep=',', decimal='.', header=None, names=col_names)
print(df)
You first need to parse the txt file.
Column names can be taken after #attribute
#attribute rely numeric
#attribute data numeric
#attribute cplx numeric
#attribute time numeric
..............................
And in the csv file, load only the data after #data which is at the end of the file. You can just copy/paste.
0.88,1.16,0.7,1,1.06,1.15,1.07,1.19,1.13,1.17,1.1,1,1.24,1.1,1.04,113,2040
0.88,1.16,0.85,1,1.06,1,1.07,1,0.91,1,0.9,0.95,1.1,1,1,293,1600
1,1.16,0.85,1,1,0.87,0.94,0.86,0.82,0.86,0.9,0.95,0.91,0.91,1,132,243
0.75,1.16,0.7,1,1,0.87,1,1.19,0.91,1.42,1,0.95,1.24,1,1.04,60,240
...................................................................
And then read the resulting csv file
pd.read_csv(file, names=["rely", "data", "cplx", ...])
I have a dataset from the State Security Department in my county that has some problems.
I can't read the records at all from the file that is made available in CSV, bringing up only empty records. When I convert the file to XLSX it does get read.
I would like to know if there is any possible solution to the above problem.
The dataset is available at: here or here.
I tried the code below, but i only get nulls, except for the first row in the first column:
df = pd.read_csv('mensal_ss.csv', sep=';', names=cols, encoding='latin1')
image
Thank you!
If you try with utf-16 as the encoding, it seems to work. However, note that the year rows complicates the parsing, so you may need some extra manipulation of the csv to circumvent that depending on what you want to do with the data
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16')
try to use 'utf-16-le':
import pandas as pd
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16-le')
print(df.head())
I currently have a json file i downloaded from github of a dataset that I edited by adding columns of values. How would I export my newly edited dataset as a csv file that I could upload back to github?
Currently my data is saved as:
import pandas as pd
url = 'https://raw.githubusercontent.com/xxx.json' #example of raw url taken from github
df = pd.read_json(url) #dataset from json file
df['H_values'] = output #new column added of values
Since I updated the original dataset (df) with a column called "H_values" I would like to export this version of the dataset as a csv file (the last line of the code is the updated data). Thanks!
Simple:
df.to_csv("output.csv")
Check the Doc on pandas librairy for more information, on the fonction and its parameter
The documentation for dataframes suggest an option to convert one into a *.csv like string: Link
Now you just need to save this in the file:
with open("Output.csv", "w") as text_file:
text_file.write(df.to_csv(index=False))
for more options I refere to the doc-link above.
I tried to get some basic statistics of my columns in csv file, but apparently, I can't even get the contents of the columns in my output.
I tried data['columnname']
import pandas as p
data = p.read_csv('Amazon.csv',delimiter='~}',na_values='nan')
data.columns
data['Title']
I expect to get the contents of 'Title' in my output
Without knowing the exact format of the .csv it's a bit hard, but hopefully something like this helps:
data = pd.read_csv("Amazon.csv", ... , header=0)
Setting header=0 will read the first line of the file and make it the column names.
If names aren't defined by the .csv you can either add them to the first line and use header=0 or use names=<array-like>.
data = pd.read_csv("Amazon.csv", ... , names=['Title',...,'LastCol'])
See: Pandas Docs for read_csv
Currently I am getting the below Error and I am tried out the below posts:
Solution 1
Solution 2
But I am not able to get the error resolved. My python code is as below:
import pandas as pd
testdata = pd.read_csv(file_name, header=None, delim_whitespace=True)
I tried to print the value in testdata but it doesn't show any output.
The following is my csvfile:
Firstly, declare your filename inside testdata as a string, and make sure it is either in the local directory, or that you have the correct filepath.
import pandas as pd
testdata = pd.read_csv("filename.csv", header=None, delim_whitespace=True)
If that does not work, post some information about the environment you are using.
First, you probably don't need header=None as you seem to have headers in the file.
Also try removing the blank line between the headers and the first line of data.
Check and double check your file name.