[Python]; Parser error: Too many columns specified

[Python]; Parser error: Too many columns specified - python

I just want to read a simple .csv file with a header specifying the column types.
The following is the code:
import pandas as pd
url="https://www.dropbox.com/s/n6yt908tgetuq63/LasVegasTripAdvisorReviews-Dataset.csv?dl=0"
names=['User country','Nr. reviews','Nr. hotel reviews','Helpful
votes','Score','Period of stay','Traveler Type','Pool','Gym','Tennis
court','Spa','Casino','Free internet','Hotel name','Hotel stars','Nr.
rooms','User continent','Member years','Review month','Review weekday']
data=pd.read_csv(url, names=names, header=0, delimiter=';',
error_bad_lines=False)
print(data.shape)
OUT:-
ParserError: Too many columns specified: expected 20 and found 2
P.S:The URL is public and can be accessed

The problem is the URL doesn't directly lead to the .csv file. It leads to the entire html page.
You can see that by removing the names argument
pd.read_csv(url, header=0, delimiter=';', error_bad_lines=False)
This successfully executes, but when inspecting the returned values, you'll see html code and JavaScript scripts.
What you need to do is make sure you provide actual csv as input (try another source for the .csv file)

In dropbox url just replace 0 with 1 as below
https://www.dropbox.com/s/n6yt908tgetuq63/LasVegasTripAdvisorReviews-Dataset.csv?dl=1
Which makes the file to be downloaded directly

Related

Getting "ParserError" when I try to read a .txt file using pd.read_csv()

I am trying to convert this dataset: COCOMO81 to arff.
Before converting to .arff, I am trying to convert it to .csv
I am following this LINK to do this.
I got that dataset from promise site. I copied the entire page to notepad as cocomo81.txt and now I am trying to convert that cocomo81.txt file to .csv using python.
(I intend to convert the .csv file to .arff later using weka)
However, when I run
import pandas as pd
read_file = pd.read_csv(r"cocomo81.txt")
I get THIS ParserError.
To fix this, I followed this solution and modified my command to
read_file = pd.read_csv(r"cocomo81.txt",on_bad_lines='warn')
I got a bunch of warnings - you can see what it looks like here
and then I ran
read_file.to_csv(r'.\cocomo81csv.csv',index=None)
But it seems that the fix for ParserError didn't work in my case because my cocomo81csv.csv file looks like THIS in Excel.
Can someone please help me understand where I am going wrong and how can I use datasets from the promise repository in .arff format?

Looks like it's a csv file with comments as the first lines. The comment lines are indicated by % characters, but also #(?), and the actual csv data starts at line 230.
You should skip the first rows and manually set the column names, try something like this:
# set column names manually
col_names = ["rely", "data", "cplx", "time", "stor", "virt", "turn", "acap", "aexp", "pcap", "vexp", "lexp", "modp", "tool", "sced", "loc", "actual" ]
filename = "cocomo81.arff.txt"
# read csv data
df = pd.read_csv(filename, skiprows=229, sep=',', decimal='.', header=None, names=col_names)
print(df)

You first need to parse the txt file.
Column names can be taken after #attribute
#attribute rely numeric
#attribute data numeric
#attribute cplx numeric
#attribute time numeric
..............................
And in the csv file, load only the data after #data which is at the end of the file. You can just copy/paste.
0.88,1.16,0.7,1,1.06,1.15,1.07,1.19,1.13,1.17,1.1,1,1.24,1.1,1.04,113,2040
0.88,1.16,0.85,1,1.06,1,1.07,1,0.91,1,0.9,0.95,1.1,1,1,293,1600
1,1.16,0.85,1,1,0.87,0.94,0.86,0.82,0.86,0.9,0.95,0.91,0.91,1,132,243
0.75,1.16,0.7,1,1,0.87,1,1.19,0.91,1.42,1,0.95,1.24,1,1.04,60,240
...................................................................
And then read the resulting csv file
pd.read_csv(file, names=["rely", "data", "cplx", ...])

Cannot read content in CSV File in Pandas

I have a dataset from the State Security Department in my county that has some problems.
I can't read the records at all from the file that is made available in CSV, bringing up only empty records. When I convert the file to XLSX it does get read.
I would like to know if there is any possible solution to the above problem.
The dataset is available at: here or here.
I tried the code below, but i only get nulls, except for the first row in the first column:
df = pd.read_csv('mensal_ss.csv', sep=';', names=cols, encoding='latin1')
image
Thank you!

If you try with utf-16 as the encoding, it seems to work. However, note that the year rows complicates the parsing, so you may need some extra manipulation of the csv to circumvent that depending on what you want to do with the data
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16')

try to use 'utf-16-le':
import pandas as pd
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16-le')
print(df.head())

Saving my dataset as a csv file in python

I currently have a json file i downloaded from github of a dataset that I edited by adding columns of values. How would I export my newly edited dataset as a csv file that I could upload back to github?
Currently my data is saved as:
import pandas as pd
url = 'https://raw.githubusercontent.com/xxx.json' #example of raw url taken from github
df = pd.read_json(url) #dataset from json file
df['H_values'] = output #new column added of values
Since I updated the original dataset (df) with a column called "H_values" I would like to export this version of the dataset as a csv file (the last line of the code is the updated data). Thanks!

Simple:
df.to_csv("output.csv")
Check the Doc on pandas librairy for more information, on the fonction and its parameter

The documentation for dataframes suggest an option to convert one into a *.csv like string: Link
Now you just need to save this in the file:
with open("Output.csv", "w") as text_file:
text_file.write(df.to_csv(index=False))
for more options I refere to the doc-link above.

How to get the contents of rows from csv file?

I tried to get some basic statistics of my columns in csv file, but apparently, I can't even get the contents of the columns in my output.
I tried data['columnname']
import pandas as p
data = p.read_csv('Amazon.csv',delimiter='~}',na_values='nan')
data.columns
data['Title']
I expect to get the contents of 'Title' in my output

Without knowing the exact format of the .csv it's a bit hard, but hopefully something like this helps:
data = pd.read_csv("Amazon.csv", ... , header=0)
Setting header=0 will read the first line of the file and make it the column names.
If names aren't defined by the .csv you can either add them to the first line and use header=0 or use names=<array-like>.
data = pd.read_csv("Amazon.csv", ... , names=['Title',...,'LastCol'])
See: Pandas Docs for read_csv

EmptyDataError: No columns to parse from file

Currently I am getting the below Error and I am tried out the below posts:
Solution 1
Solution 2
But I am not able to get the error resolved. My python code is as below:
import pandas as pd
testdata = pd.read_csv(file_name, header=None, delim_whitespace=True)
I tried to print the value in testdata but it doesn't show any output.
The following is my csvfile:

Firstly, declare your filename inside testdata as a string, and make sure it is either in the local directory, or that you have the correct filepath.
import pandas as pd
testdata = pd.read_csv("filename.csv", header=None, delim_whitespace=True)
If that does not work, post some information about the environment you are using.

First, you probably don't need header=None as you seem to have headers in the file.
Also try removing the blank line between the headers and the first line of data.
Check and double check your file name.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

[Python]; Parser error: Too many columns specified - python

In dropbox url just replace 0 with 1 as below https://www.dropbox.com/s/n6yt908tgetuq63/LasVegasTripAdvisorReviews-Dataset.csv?dl=1 Which makes the file to be downloaded directly

Related

Getting "ParserError" when I try to read a .txt file using pd.read_csv()

Cannot read content in CSV File in Pandas

Saving my dataset as a csv file in python

How to get the contents of rows from csv file?

EmptyDataError: No columns to parse from file

Categories

Resources