Reading the CSV file using Pandas results in an error - python

I have a problem reading the csv file using pandas. Here is the code:
import pandas as pd
import numpy as np
df=pd.read_csv('item.csv')
Here is the error:
Traceback (most recent call last):
File "Script.py", line 3, in <module>
df=pd.read_csv('item.csv')
File "C:\Python38\lib\site-packages\pandas\io\parsers.py", line 605, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Python38\lib\site-packages\pandas\io\parsers.py", line 457, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python38\lib\site-packages\pandas\io\parsers.py", line 814, in __init__
self._engine = self._make_engine(self.engine)
File "C:\Python38\lib\site-packages\pandas\io\parsers.py", line 1045, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Python38\lib\site-packages\pandas\io\parsers.py", line 1893, in __init__
self._reader = parsers.TextReader(self.handles.handle, **kwds)
File "pandas\_libs\parsers.pyx", line 518, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 620, in pandas._libs.parsers.TextReader._get_header
File "pandas\_libs\parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1943, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 11843: invalid start byte

Try:
import pandas as pd
import numpy as np
df=pd.read_csv(r'item.csv')

If you add encoding, it will works. Additionally, you can see a link below for standard encodings in python when you need.
data = pd.read_csv(filename, encoding= 'unicode_escape')
https://docs.python.org/3/library/codecs.html#standard-encodings

Related

Pandas can't read in csv file produced by Numbers?

I have a comma delimited csv file that is exported by Mac Numbers, and I am trying to read it into a dataframe, but get an error message:
df = pd.read_csv('game.csv', dtype={"rating": str}, error_bad_lines='ignore', encoding='utf8', sep=',')
The error message is:
Traceback (most recent call last):
File "/Users/congminmin/nlp/data_collection/crawler/data/game/test.py", line 5, in <module>
df = pd.read_csv('game_app_apple.missing.url.csv', dtype={"rating": str}, error_bad_lines='ignore', encoding='utf8', sep=',')
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 426, in pandas._libs.parsers.TextReader.__cinit__
ValueError: invalid literal for int() with base 10: 'ignore'
Is my csv not valid? But it is produced by Numbers. Even if I removed the dtype parameter, it got the same issue. If i removed the error_bad_lines='ignore', I got the following error:
File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
The csv exported by Numbers is comma delimited, and I want to read into a dataframe and output as tab delimited, but got the problem above.
Add data: The original data is Chinese and the 'rating' in code above is actually '评分' translation in actual data below:
I had to take a screenshot, since it is recognized as spam by stackoverflow:

Manipulate SQL dataframe with python script in Power BI

I'd like to execute a simple python script in Power BI on a SQL dataframe.
But the error seems to indicate like the SQL table has been read as a CSV file and I don't know why the script consider the dataframe as a CSV file instead of an SQL dataframe as it is.
The python script is :
import pandas as pd
dataset['COD-MARQ'] = dataset['COD-MARQ'].str.strip()
Any ideas on how shoud I process ?
thanks
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
df1 = pandas.read_csv('input_df_da064532-6620-4e48-a091-ff580b127759.csv')
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 458, in _read
data = parser.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas\_libs\parsers.pyx", line 1194, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas\_libs\parsers.pyx", line 1814, in pandas._libs.parsers._try_int64
MemoryError: Unable to allocate 64.0 KiB for an array with shape (8192,) and data type int64
Détails :
DataSourceKind=Python
DataSourcePath=Python
Message=Ρŷтнőŋ şсŗĩрţ εггǿŗ.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
df1 = pandas.read_csv('input_df_da064532-6620-4e48-a091-ff580b127759.csv')
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 458, in _read
data = parser.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.Tex...
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException ```
I'm not positive it's the problem, but it looks to me like the dataset is referring to the previous step rather than the original source, which means it's no longer in a SQL dataframe format. You probably want to either import the original source using python or else modify your script to treat the dataset not as a SQL dataframe but in whatever format the Query Editor passes to the python script (which I think is a pandas dataframe).
On a separate note, in this particular case, it seems unnecessary to use python for a simple transformation that can just as easily be done natively in M.

Pandas throws ParserError on one computer but not on another

Here's the code I have, which works perfectly fine on my friend's computer:
#!/usr/bin/python
import pandas as pd
df = pd.read_csv("report.csv")
df = df.drop("Agent Name", axis=1)
df.to_csv("agent_report_updated.csv")
Here's the error I receive on mine:
Traceback (most recent call last):
File "./agent_calls_report.py", line 10, in <module>
df = pd.read_csv("report.csv")
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 34 fields in line 3, saw 35
Any idea why this would work on one computer and not another? Edit: I've confirmed that we are using the same versions of both Python (3.7.1) and Pandas, the only difference is that he has a Mac while I'm on Linux.
I believe this is a problem with encoding
try this :
import pandas as pd
df = pd.read_csv("report.csv",encoding='cp1252')
df = df.drop("Agent Name", axis=1)
df.to_csv("agent_report_updated.csv")
There are other encoding options you can try utf-8 instead of cp1252.
Here is a list of encodings used.

Problems with pd.read_csv

I have Anaconda 3 on Windows 10. I am using pd.read_csv() to load csv files but I get error messages. To begin with I tried df = pd.read_csv('C:\direct_marketing.csv') which worked and the file was imported.
Then I tried df = pd.read_csv('C:\tutorial.csv') and I received the following error message:
Traceback (most recent call last):
File "<ipython-input-3-ce208cc2684f>", line 1, in <module>
df = pd.read_csv('C:\tutorial.csv')
File "C:\Users\Alexandros_7\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Alexandros_7\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\Alexandros_7\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "C:\Users\Alexandros_7\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Users\Alexandros_7\Anaconda3\lib\site-packages\pandas\io\parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas\parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3427)
File "pandas\parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:6861)
OSError: File b'C:\tutorial.csv' does not exist
Then I moved the file to a new folder and renamed it and again used read.csv() to import it:
df = pd.read_csv('C:\Users\test.csv')
This time I received a different error message:
File "<ipython-input-5-03c6d380c174>", line 1
df = pd.read_csv('C:\Users\test.csv')
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
Could you help me understand what is going on and how to handle this situation?
Thanks a lot!
Try escaping the backslashes:
df = pd.read_csv('C:\\Users\\test.csv')
try use two back-slash '\' instead of '\'. It might have take it as a escape sign.. ?
Another option is to add r before the path i.e. df = pd.read_csv(r'C:\Users\test.csv')

Encoding with pandas.read_csv when file name has accents

I'm trying to load a CSV with pandas, but am running into a problem if the file name has accents. It's clearly an encoding problem, but although read_csv lets you set encoding for text within the file, I can't figure out how to encode the file name properly.
input_file = r'C:\...\Datasets\%s\Provinces\Points\%s.csv' % (country, province)
self.locs = pandas.read_csv(input_file,sep=',',skipinitialspace=True)
The CSV file is Anzoátegui.csv. When I'm getting errors,
input_file = 'C:\\...\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv
Error code:
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist
So maybe it's converting my string to bytes? I tried using io.StringIO(input_file) as well, which puts the correct file name as a column header on an empty DataFrame:
Empty DataFrame
Columns: [C:\PF2\QGIS Valmiera\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv]
Index: []
Any ideas on how to get this file to load? Unfortunately I can't just strip out accents, as I have to interface with software that requires the proper name, and I have a ton of files to format (not just the one). Thanks!
Edit: Full error
Traceback (most recent call last):
File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt
result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression
result = eval(compiled, updated_globals, frame.f_locals)
File "<string>", line 1, in <module>
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__
self._make_engine(self.engine)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040)
File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387)
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist
Ok folks, I got a little lost in dependency hell, but it turns out that this issue was fixed in pandas 0.14.0. Install the updated version to get files named with accents to import correctly.
Comments at github.
Thanks for the input!

Categories

Resources