Reading text file using pandas using python

Reading text file using pandas using python - python

I am very new to Python. I am trying to read my text file using python Data Science library Pandas. But I get an error of Unicode which I don't understand.If you could help me then it would be very beneficial to me. I am uploading my code here:
import pandas as pd
text = pd.read_csv("/home/system/Documents/Heena/NLP/modi.txt", sep = " ", header = None)
Error Code:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/home/system/anaconda3/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 62 fields in line 7, saw 67

Because the data inside a space character, CVS perceives this as a different column. As a solution to this, separate the data with a different character. Then make the sep value this character. Example;
test.csv
data1;data2;data3
My dear countrymen;12;test data1
I convey my best wishes to all of you on this auspicious occasion of Independence Day.;45;test data2
test.py
import pandas as pd
text = pd.read_csv("test.csv", sep = ";")
You can also look at this answer

Related

Can't open some .csv file using read_csv()

I am writing a Python function to open two .csv files and make changes to the data inside. I am using pandas and pd.read_csv('text') to open the files. Everything works well and the function works for one .csv file. However, when I try it on a different smaller .csv file the file cannot even open.
This is part of the error I am getting when I try to open the .csv file.
Traceback (most recent call last):
File "C:\Users\...\Downloads\test\test.py", line 3, in <module>
df = pd.read_csv('data2.csv')
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\util\_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 611, in _read
return parser.read(nrows)
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\readers.py", line 1778, in read
) = self._engine.read( # type: ignore[attr-defined]
File "C:\Users\...\AppData\Roaming\Python\Python311\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 230, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas\_libs\parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas\_libs\parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 8836, saw 5
This is the code I am using to access the .csv files.
import pandas as pd
df = pd.read_csv('test.csv')
All the files are in the correct folders and the file paths are all correct. Any help is appreciated, thanks

Pandas can't read in csv file produced by Numbers?

I have a comma delimited csv file that is exported by Mac Numbers, and I am trying to read it into a dataframe, but get an error message:
df = pd.read_csv('game.csv', dtype={"rating": str}, error_bad_lines='ignore', encoding='utf8', sep=',')
The error message is:
Traceback (most recent call last):
File "/Users/congminmin/nlp/data_collection/crawler/data/game/test.py", line 5, in <module>
df = pd.read_csv('game_app_apple.missing.url.csv', dtype={"rating": str}, error_bad_lines='ignore', encoding='utf8', sep=',')
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 426, in pandas._libs.parsers.TextReader.__cinit__
ValueError: invalid literal for int() with base 10: 'ignore'
Is my csv not valid? But it is produced by Numbers. Even if I removed the dtype parameter, it got the same issue. If i removed the error_bad_lines='ignore', I got the following error:
File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2
The csv exported by Numbers is comma delimited, and I want to read into a dataframe and output as tab delimited, but got the problem above.
Add data: The original data is Chinese and the 'rating' in code above is actually '评分' translation in actual data below:
I had to take a screenshot, since it is recognized as spam by stackoverflow:

Manipulate SQL dataframe with python script in Power BI

I'd like to execute a simple python script in Power BI on a SQL dataframe.
But the error seems to indicate like the SQL table has been read as a CSV file and I don't know why the script consider the dataframe as a CSV file instead of an SQL dataframe as it is.
The python script is :
import pandas as pd
dataset['COD-MARQ'] = dataset['COD-MARQ'].str.strip()
Any ideas on how shoud I process ?
thanks
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
df1 = pandas.read_csv('input_df_da064532-6620-4e48-a091-ff580b127759.csv')
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 458, in _read
data = parser.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas\_libs\parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas\_libs\parsers.pyx", line 1194, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas\_libs\parsers.pyx", line 1814, in pandas._libs.parsers._try_int64
MemoryError: Unable to allocate 64.0 KiB for an array with shape (8192,) and data type int64
Détails :
DataSourceKind=Python
DataSourcePath=Python
Message=Ρŷтнőŋ şсŗĩрţ εггǿŗ.
Traceback (most recent call last):
File "PythonScriptWrapper.PY", line 7, in <module>
df1 = pandas.read_csv('input_df_da064532-6620-4e48-a091-ff580b127759.csv')
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 458, in _read
data = parser.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1186, in read
ret = self._engine.read(nrows)
File "C:\Users\afalieres\AppData\Local\Programs\Python\Python38-32\lib\site-packages\pandas\io\parsers.py", line 2145, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.Tex...
ErrorCode=-2147467259
ExceptionType=Microsoft.PowerBI.Scripting.Python.Exceptions.PythonScriptRuntimeException ```

I'm not positive it's the problem, but it looks to me like the dataset is referring to the previous step rather than the original source, which means it's no longer in a SQL dataframe format. You probably want to either import the original source using python or else modify your script to treat the dataset not as a SQL dataframe but in whatever format the Query Editor passes to the python script (which I think is a pandas dataframe).
On a separate note, in this particular case, it seems unnecessary to use python for a simple transformation that can just as easily be done natively in M.

problem in reading products CSV file with pandas python

I have products CSV file and I am trying to read this file with pandas python but i get this error
my code
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 454, in _read
data = parser.read(nrows)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 1133, in read
ret = self._engine.read(nrows)
File "C:\Users\Compu City\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\parsers.py", line 2037, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
File "pandas\_libs\parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas\_libs\parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 7549, saw 8
this is the link of file
another thing when I deleted most of the rows and remain just 4 rows the file read.

By default pandas assumes your csv is separated by commas ',', you should pass the proper separator to the read_csv call.
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv', sep=';')
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

File appears to be separated by ;. Try:
import pandas as pd
df = pd.read_csv('D:\\work\\amazon\\move_in_links\\final.csv',sep=";")

Follow these steps, this approach suggested by Ransaka works
Blockquote
use delimiter as; so you should pass external argument sep=';. Try using df = pd.read_csv('YOUR_CSV_PATH', sep=';')
Blockquote
And you can just use the file name as it is instead of a path.

Pandas throws ParserError on one computer but not on another

Here's the code I have, which works perfectly fine on my friend's computer:
#!/usr/bin/python
import pandas as pd
df = pd.read_csv("report.csv")
df = df.drop("Agent Name", axis=1)
df.to_csv("agent_report_updated.csv")
Here's the error I receive on mine:
Traceback (most recent call last):
File "./agent_calls_report.py", line 10, in <module>
df = pd.read_csv("report.csv")
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/usr/lib/python3.7/site-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 34 fields in line 3, saw 35
Any idea why this would work on one computer and not another? Edit: I've confirmed that we are using the same versions of both Python (3.7.1) and Pandas, the only difference is that he has a Mac while I'm on Linux.

I believe this is a problem with encoding
try this :
import pandas as pd
df = pd.read_csv("report.csv",encoding='cp1252')
df = df.drop("Agent Name", axis=1)
df.to_csv("agent_report_updated.csv")
There are other encoding options you can try utf-8 instead of cp1252.
Here is a list of encodings used.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading text file using pandas using python - python

Related

Can't open some .csv file using read_csv()

Pandas can't read in csv file produced by Numbers?

Manipulate SQL dataframe with python script in Power BI

problem in reading products CSV file with pandas python

Pandas throws ParserError on one computer but not on another

Categories

Resources