Reading text file using pandas using python - python

I am very new to Python. I am trying to read my text file using python Data Science library Pandas. But I get an error of Unicode which I don't understand.If you could help me then it would be very beneficial to me. I am uploading my code here:
import pandas as pd
text = pd.read_csv("/home/system/Documents/Heena/NLP/modi.txt", sep = " ", header = None)
Error Code:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 62 fields in line 7, saw 67

Because the data inside a space character, CVS perceives this as a different column. As a solution to this, separate the data with a different character. Then make the sep value this character. Example;
test.csv
data1;data2;data3
My dear countrymen;12;test data1
I convey my best wishes to all of you on this auspicious occasion of Independence Day.;45;test data2
test.py
import pandas as pd
text = pd.read_csv("test.csv", sep = ";")
You can also look at this answer

Related

Can't open some .csv file using read_csv()

I am writing a Python function to open two .csv files and make changes to the data inside. I am using pandas and pd.read_csv('text') to open the files. Everything works well and the function works for one .csv file. However, when I try it on a different smaller .csv file the file cannot even open.
This is part of the error I am getting when I try to open the .csv file.
Traceback (most recent call last):
File "C:\Users\...\Downloads\test\test.py", line 3, in <module>
df = pd.read_csv('data2.csv')
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 611, in _read
return parser.read(nrows)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 8836, saw 5
This is the code I am using to access the .csv files.
import pandas as pd
df = pd.read_csv('test.csv')
All the files are in the correct folders and the file paths are all correct. Any help is appreciated, thanks

Pandas can't read in csv file produced by Numbers?

I have a comma delimited csv file that is exported by Mac Numbers, and I am trying to read it into a dataframe, but get an error message:
df = pd.read_csv('game.csv', dtype={"rating": str}, error_bad_lines='ignore', encoding='utf8', sep=',')
The error message is:
Traceback (most recent call last):
File "/Users/congminmin/nlp/data_collection/crawler/data/game/test.py", line 5, in <module>
df = pd.read_csv('game_app_apple.missing.url.csv', dtype={"rating": str}, error_bad_lines='ignore', encoding='utf8', sep=',')
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 426, in pandas._libs.parsers.TextReader.__cinit__
ValueError: invalid literal for int() with base 10: 'ignore'
Is my csv not valid? But it is produced by Numbers. Even if I removed the dtype parameter, it got the same issue. If i removed the error_bad_lines='ignore', I got the following error:
File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
The csv exported by Numbers is comma delimited, and I want to read into a dataframe and output as tab delimited, but got the problem above.
Add data: The original data is Chinese and the 'rating' in code above is actually '评分' translation in actual data below:
I had to take a screenshot, since it is recognized as spam by stackoverflow:

Manipulate SQL dataframe with python script in Power BI

I'd like to execute a simple python script in Power BI on a SQL dataframe.
But the error seems to indicate like the SQL table has been read as a CSV file and I don't know why the script consider the dataframe as a CSV file instead of an SQL dataframe as it is.
The python script is :
import pandas as pd
dataset['COD-MARQ'] = dataset['COD-MARQ'].str.strip()
Any ideas on how shoud I process ?
thanks
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
df1 = pandas.read_csv('input_df_da064532-6620-4e48-a091-ff580b127759.csv')
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 458, in _read
data = parser.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas\_libs\parsers.pyx", line 1194, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas\_libs\parsers.pyx", line 1814, in pandas._libs.parsers._try_int64
MemoryError: Unable to allocate 64.0 KiB for an array with shape (8192,) and data type int64
Détails :
DataSourceKind=Python
DataSourcePath=Python
Message=Ρŷтнőŋ şсŗĩрţ εггǿŗ.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
df1 = pandas.read_csv('input_df_da064532-6620-4e48-a091-ff580b127759.csv')
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 458, in _read
data = parser.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.Tex...
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException ```
I'm not positive it's the problem, but it looks to me like the dataset is referring to the previous step rather than the original source, which means it's no longer in a SQL dataframe format. You probably want to either import the original source using python or else modify your script to treat the dataset not as a SQL dataframe but in whatever format the Query Editor passes to the python script (which I think is a pandas dataframe).
On a separate note, in this particular case, it seems unnecessary to use python for a simple transformation that can just as easily be done natively in M.

problem in reading products CSV file with pandas python

I have products CSV file and I am trying to read this file with pandas python but i get this error
my code
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 454, in _read
data = parser.read(nrows)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 7549, saw 8
this is the link of file
another thing when I deleted most of the rows and remain just 4 rows the file read.
By default pandas assumes your csv is separated by commas ',', you should pass the proper separator to the read_csv call.
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv', sep=';')
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
File appears to be separated by ;. Try:
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv',sep=";")
Follow these steps, this approach suggested by Ransaka works
Blockquote
use delimiter as; so you should pass external argument sep=';. Try using df = pd.read_csv('YOUR_CSV_PATH', sep=';')
Blockquote
And you can just use the file name as it is instead of a path.

Pandas throws ParserError on one computer but not on another

Here's the code I have, which works perfectly fine on my friend's computer:
#!/usr/bin/python
import pandas as pd
df = pd.read_csv("report.csv")
df = df.drop("Agent Name", axis=1)
df.to_csv("agent_report_updated.csv")
Here's the error I receive on mine:
Traceback (most recent call last):
File "./agent_calls_report.py", line 10, in <module>
df = pd.read_csv("report.csv")
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 34 fields in line 3, saw 35
Any idea why this would work on one computer and not another? Edit: I've confirmed that we are using the same versions of both Python (3.7.1) and Pandas, the only difference is that he has a Mac while I'm on Linux.
I believe this is a problem with encoding
try this :
import pandas as pd
df = pd.read_csv("report.csv",encoding='cp1252')
df = df.drop("Agent Name", axis=1)
df.to_csv("agent_report_updated.csv")
There are other encoding options you can try utf-8 instead of cp1252.
Here is a list of encodings used.

Categories

Resources