Can't open some .csv file using read_csv()

Can't open some .csv file using read_csv() - python

I am writing a Python function to open two .csv files and make changes to the data inside. I am using pandas and pd.read_csv('text') to open the files. Everything works well and the function works for one .csv file. However, when I try it on a different smaller .csv file the file cannot even open.
This is part of the error I am getting when I try to open the .csv file.
Traceback (most recent call last):
File "C:\Users\...\Downloads\test\test.py", line 3, in <module>
df = pd.read_csv('data2.csv')
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 611, in _read
return parser.read(nrows)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 8836, saw 5
This is the code I am using to access the .csv files.
import pandas as pd
df = pd.read_csv('test.csv')
All the files are in the correct folders and the file paths are all correct. Any help is appreciated, thanks

Related

Manipulate SQL dataframe with python script in Power BI

I'd like to execute a simple python script in Power BI on a SQL dataframe.
But the error seems to indicate like the SQL table has been read as a CSV file and I don't know why the script consider the dataframe as a CSV file instead of an SQL dataframe as it is.
The python script is :
import pandas as pd
dataset['COD-MARQ'] = dataset['COD-MARQ'].str.strip()
Any ideas on how shoud I process ?
thanks
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
df1 = pandas.read_csv('input_df_da064532-6620-4e48-a091-ff580b127759.csv')
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 458, in _read
data = parser.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas\_libs\parsers.pyx", line 1194, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas\_libs\parsers.pyx", line 1814, in pandas._libs.parsers._try_int64
MemoryError: Unable to allocate 64.0 KiB for an array with shape (8192,) and data type int64
Détails :
DataSourceKind=Python
DataSourcePath=Python
Message=Ρŷтнőŋ şсŗĩрţ εггǿŗ.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
df1 = pandas.read_csv('input_df_da064532-6620-4e48-a091-ff580b127759.csv')
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 458, in _read
data = parser.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.Tex...
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException ```

I'm not positive it's the problem, but it looks to me like the dataset is referring to the previous step rather than the original source, which means it's no longer in a SQL dataframe format. You probably want to either import the original source using python or else modify your script to treat the dataset not as a SQL dataframe but in whatever format the Query Editor passes to the python script (which I think is a pandas dataframe).
On a separate note, in this particular case, it seems unnecessary to use python for a simple transformation that can just as easily be done natively in M.

problem in reading products CSV file with pandas python

I have products CSV file and I am trying to read this file with pandas python but i get this error
my code
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 454, in _read
data = parser.read(nrows)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 7549, saw 8
this is the link of file
another thing when I deleted most of the rows and remain just 4 rows the file read.

By default pandas assumes your csv is separated by commas ',', you should pass the proper separator to the read_csv call.
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv', sep=';')
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

File appears to be separated by ;. Try:
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv',sep=";")

Follow these steps, this approach suggested by Ransaka works
Blockquote
use delimiter as; so you should pass external argument sep=';. Try using df = pd.read_csv('YOUR_CSV_PATH', sep=';')
Blockquote
And you can just use the file name as it is instead of a path.

Reading text file using pandas using python

I am very new to Python. I am trying to read my text file using python Data Science library Pandas. But I get an error of Unicode which I don't understand.If you could help me then it would be very beneficial to me. I am uploading my code here:
import pandas as pd
text = pd.read_csv("/home/system/Documents/Heena/NLP/modi.txt", sep = " ", header = None)
Error Code:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 62 fields in line 7, saw 67

Because the data inside a space character, CVS perceives this as a different column. As a solution to this, separate the data with a different character. Then make the sep value this character. Example;
test.csv
data1;data2;data3
My dear countrymen;12;test data1
I convey my best wishes to all of you on this auspicious occasion of Independence Day.;45;test data2
test.py
import pandas as pd
text = pd.read_csv("test.csv", sep = ";")
You can also look at this answer

Pandas throws ParserError on one computer but not on another

Here's the code I have, which works perfectly fine on my friend's computer:
#!/usr/bin/python
import pandas as pd
df = pd.read_csv("report.csv")
df = df.drop("Agent Name", axis=1)
df.to_csv("agent_report_updated.csv")
Here's the error I receive on mine:
Traceback (most recent call last):
File "./agent_calls_report.py", line 10, in <module>
df = pd.read_csv("report.csv")
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 34 fields in line 3, saw 35
Any idea why this would work on one computer and not another? Edit: I've confirmed that we are using the same versions of both Python (3.7.1) and Pandas, the only difference is that he has a Mac while I'm on Linux.

I believe this is a problem with encoding
try this :
import pandas as pd
df = pd.read_csv("report.csv",encoding='cp1252')
df = df.drop("Agent Name", axis=1)
df.to_csv("agent_report_updated.csv")
There are other encoding options you can try utf-8 instead of cp1252.
Here is a list of encodings used.

Encoding with pandas.read_csv when file name has accents

I'm trying to load a CSV with pandas, but am running into a problem if the file name has accents. It's clearly an encoding problem, but although read_csv lets you set encoding for text within the file, I can't figure out how to encode the file name properly.
input_file = r'C:\...\Datasets\%s\Provinces\Points\%s.csv' % (country, province)
self.locs = pandas.read_csv(input_file,sep=',',skipinitialspace=True)
The CSV file is Anzoátegui.csv. When I'm getting errors,
input_file = 'C:\\...\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv
Error code:
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist
So maybe it's converting my string to bytes? I tried using io.StringIO(input_file) as well, which puts the correct file name as a column header on an empty DataFrame:
Empty DataFrame
Columns: [C:\PF2\QGIS Valmiera\Datasets\Venezuela\Provinces\Points\Anzoátegui.csv]
Index: []
Any ideas on how to get this file to load? Unfortunately I can't just strip out accents, as I have to interface with software that requires the proper name, and I have a ton of files to format (not just the one). Thanks!
Edit: Full error
Traceback (most recent call last):
File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_comm.py", line 891, in doIt
result = pydevd_vars.evaluateExpression(self.thread_id, self.frame_id, self.expression, self.doExec)
File "C:\PF2\eclipse-standard-kepler-SR2-win32-x86_64\eclipse\plugins\org.python.pydev_3.3.3.201401272249\pysrc\pydevd_vars.py", line 486, in evaluateExpression
result = eval(compiled, updated_globals, frame.f_locals)
File "<string>", line 1, in <module>
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 404, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 205, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 486, in __init__
self._make_engine(self.engine)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 594, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Python33\lib\site-packages\pandas\io\parsers.py", line 952, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 330, in pandas.parser.TextReader.__cinit__ (pandas\parser.c:3040)
File "parser.pyx", line 557, in pandas.parser.TextReader._setup_parser_source (pandas\parser.c:5387)
OSError: File b'C:\\PF2\\QGIS Valmiera\\Datasets\\Venezuela\\Provinces\\Points\\Anzo\xc3\xa1tegui.csv' does not exist

Ok folks, I got a little lost in dependency hell, but it turns out that this issue was fixed in pandas 0.14.0. Install the updated version to get files named with accents to import correctly.
Comments at github.
Thanks for the input!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't open some .csv file using read_csv() - python

Related

Manipulate SQL dataframe with python script in Power BI

problem in reading products CSV file with pandas python

Reading text file using pandas using python

Pandas throws ParserError on one computer but not on another

Encoding with pandas.read_csv when file name has accents

Categories

Resources