EmptyDataError: No columns to parse from file - python

Currently I am getting the below Error and I am tried out the below posts:
Solution 1
Solution 2
But I am not able to get the error resolved. My python code is as below:
import pandas as pd
testdata = pd.read_csv(file_name, header=None, delim_whitespace=True)
I tried to print the value in testdata but it doesn't show any output.
The following is my csvfile:

Firstly, declare your filename inside testdata as a string, and make sure it is either in the local directory, or that you have the correct filepath.
import pandas as pd
testdata = pd.read_csv("filename.csv", header=None, delim_whitespace=True)
If that does not work, post some information about the environment you are using.

First, you probably don't need header=None as you seem to have headers in the file.
Also try removing the blank line between the headers and the first line of data.
Check and double check your file name.

Related

Cannot read content in CSV File in Pandas

I have a dataset from the State Security Department in my county that has some problems.
I can't read the records at all from the file that is made available in CSV, bringing up only empty records. When I convert the file to XLSX it does get read.
I would like to know if there is any possible solution to the above problem.
The dataset is available at: here or here.
I tried the code below, but i only get nulls, except for the first row in the first column:
df = pd.read_csv('mensal_ss.csv', sep=';', names=cols, encoding='latin1')
image
Thank you!
If you try with utf-16 as the encoding, it seems to work. However, note that the year rows complicates the parsing, so you may need some extra manipulation of the csv to circumvent that depending on what you want to do with the data
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16')
try to use 'utf-16-le':
import pandas as pd
df = pd.read_csv('mensal_ss.csv', sep=';', encoding='utf-16-le')
print(df.head())

What do I have to change that Jupyter shows columns?

I just want to import this csv file. It can read it but somehow it doesn't create columns. Does anyone know why?
This is my code:
import pandas as pd
songs_data = pd.read_csv('../datasets/spotify-top50.csv', encoding='latin-1')
songs_data.head(n=10)
Result that I see in Jupyter:
P.S.: I'm kinda new to Jupyter and programming, but after all I found it should work properly. I don't know why it doesn't do it.
To properly load a csv file you should specify some parameters. for example in you case you need to specify quotechar:
df = pd.read_csv('../datasets/spotify-top50.csv',quotechar='"',sep=',', encoding='latin-1')
df.head(10)
If you still have a problem you should have a look at your CSV file again and also pandas documentation, so that you can set parameters to match with your CSV file structure.

pandas running extremely slow

I am trying to read in a tsv file (0.5gb) using pandas however, I can't seem to get it to work. I have stripped my code down to its simplest form and still no luck:
import pandas as pd
import os
rawpath = 'my path'
filename = 'my file name'
finalfile = os.path.join(rawpath, filename)
df = pd.read_csv(finalfile, nrows=5000, sep='\t')
print(df.head())
I have tried to chunk the file, with no luck, read_table doesn't work either. I have gone in and freed up as much memory as possible on my machine but when I finally recieve any output from Pycharm, it says:
pandas.errors.ParserError: Error tokenizing data. C error: out of memory
Can anyone assist please?
Try setting dtype=object and na_values = "Your NA format" (optional, if you know it)
Also, make sure you have the right separator.
something like:
df = pd.read_csv(finalfile, nrows=5000, sep='\t', dtype=object, na_values = '-NaN')
Edit:
Also, you mentioned chunking the file. I am not sure what you mean by that, but i will mention that you can chunk directly using pandas, instead of nrows. Final code:
mylist = []
for chunk in pd.read_csv(finalfile, sep='\t', dtype=object, na_values = '-NaN', chunksize=100):
mylist.append(chunk)
big_data = pd.concat(mylist, axis= 0)
del mylist

[Python]; Parser error: Too many columns specified

I just want to read a simple .csv file with a header specifying the column types.
The following is the code:
import pandas as pd
url="https://www.dropbox.com/s/n6yt908tgetuq63/LasVegasTripAdvisorReviews-Dataset.csv?dl=0"
names=['User country','Nr. reviews','Nr. hotel reviews','Helpful
votes','Score','Period of stay','Traveler Type','Pool','Gym','Tennis
court','Spa','Casino','Free internet','Hotel name','Hotel stars','Nr.
rooms','User continent','Member years','Review month','Review weekday']
data=pd.read_csv(url, names=names, header=0, delimiter=';',
error_bad_lines=False)
print(data.shape)
OUT:-
ParserError: Too many columns specified: expected 20 and found 2
P.S:The URL is public and can be accessed
The problem is the URL doesn't directly lead to the .csv file. It leads to the entire html page.
You can see that by removing the names argument
pd.read_csv(url, header=0, delimiter=';', error_bad_lines=False)
This successfully executes, but when inspecting the returned values, you'll see html code and JavaScript scripts.
What you need to do is make sure you provide actual csv as input (try another source for the .csv file)
In dropbox url just replace 0 with 1 as below
https://www.dropbox.com/s/n6yt908tgetuq63/LasVegasTripAdvisorReviews-Dataset.csv?dl=1
Which makes the file to be downloaded directly

Pandas.read_csv() with special characters (accents) in column names �

I have a csv file that contains some data with columns names:
"PERIODE"
"IAS_brut"
"IAS_lissé"
"Incidence_Sentinelles"
I have a problem with the third one "IAS_lissé" which is misinterpreted by pd.read_csv() method and returned as �.
What is that character?
Because it's generating a bug in my flask application, is there a way to read that column in an other way without modifying the file?
In [1]: import pandas as pd
In [2]: pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";").columns
Out[2]: Index([u'PERIODE', u'IAS_brut', u'IAS_liss�', u'Incidence_Sentinelles'], dtype='object')
I found the same problem with spanish, solved it with with "latin1" encoding:
import pandas as pd
pd.read_csv("Openhealth_S-Grippal.csv",delimiter=";", encoding='latin1')
Hope it helps!
You can change the encoding parameter for read_csv, see the pandas doc here. Also the python standard encodings are here.
I believe for your example you can use the utf-8 encoding (assuming that your language is French).
df = pd.read_csv("Openhealth_S-Grippal.csv", delimiter=";", encoding='utf-8')
Here's an example showing some sample output. All I did was make a csv file with one column, using the problem characters.
df = pd.read_csv('sample.csv', encoding='utf-8')
Output:
IAS_lissé
0 1
1 2
2 3
Try using:
import pandas as pd
df = pd.read_csv('file_name.csv', encoding='utf-8-sig')
Using utf-8 didn't work for me. E.g. this piece of code:
bla = pd.DataFrame(data = [1, 2])
bla.to_csv('funkyNamé , things.csv')
blabla = pd.read_csv('funkyNamé , things.csv', delimiter=";", encoding='utf-8')
blabla
Ultimately returned: OSError: Initializing from file failed
I know you said you didn't want to modify the file. If you meant the file content vs the filename, I would rename the file to something without an accent, read the csv file under its new name, then reset the filename back to its original name.
originalfilepath = r'C:\Users\myself\\funkyNamé , things.csv'
originalfolder = r'C:\Users\myself'
os.rename(originalfilepath, originalFolder+"\\tempName.csv")
df = pd.read_csv(originalFolder+"\\tempName.csv", encoding='ISO-8859-1')
os.rename(originalFolder+"\\tempName.csv", originalfilepath)
If you did mean "without modifying the filename, my apologies for not being helpful to you, and I hope this helps someone else.

Categories

Resources