pandas cannot converge

pandas cannot converge - python

I get an error that a file does not exist while I have the file there in the folder, would you please tell me where I am making a mistake?
pd.DataFrame.from_csv
I am getting an error shown below.
Traceback (most recent call last):
File "main.py", line 194, in <module>
start_path+end_res)
File "/Users/admin/Desktop/script/mergeT.py", line 5, in merge
df_peak = pd.DataFrame.from_csv(peak_score, index_col = False, sep='\t')
File "/Library/Python/2.7/site-packages/pandas/core/frame.py", line 1231, in from_csv
infer_datetime_format=infer_datetime_format)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 645, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 388, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 729, in __init__
self._make_engine(self.engine)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 922, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1389, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4175)
File "pandas/parser.pyx", line 667, in pandas.parse**strong text**r.TextReader._setup_parser_source (pandas/parser.c:8440)
IOError: File results\scoring\fed\score_peak.txt does not exist
I have tried to set a path to the exact file
for example

As per documentation of pandas 0.19.1 pandas.DataFrame.from_csv does not support index_col = False. Try to use pandas.read_csv instead (with the same parameters). Also make sure you are using the up to date version of pandas.
See if this works:
import pandas as pd
def merge(peak_score, profile_score, res_file):
df_peak = pd.read_csv(peak_score, index_col = False, sep='\t')
df_profile = pd.read_csv(profile_score, index_col = False, sep='\t')
result = pd.concat([df_peak, df_profile], axis=1)
print result.head()
test = []
for a,b in zip(result['prot_a_p'],result['prot_b_p']):
if a == b:
test.append(1)
else:
test.append(0)
result['test']=test
result = result[result['test']==0]
del result['test']
result = result.fillna(0)
result.to_csv(res_file)
if __name__ == '__main__':
pass
Regarding the path issue when changing from Windows to OS X:
In all flavours of Unix, paths are written with slashes /, while in Windows backslashes \ are used. Since OS X is a descendant of Unix, as other users have correctly pointed out, when you change there from Windows you need to adapt your paths.

Related

pandas read_csv error when loading a plain CSV file

I am having this very weird error with python pandas:
import pandas as pd
df = pd.read_csv('C:\Temp\test.csv', index_col=None, comment='#', sep=',')
The test.csv is a very simple CSV file created in Notepad:
aaa,bbb,date
hhhhh,wws,20220701
Now I get the error:
File "C:\test\untitled0.py", line 10, in <module>
df = pd.read_csv('C:\temp\test.csv', index_col=None, comment='#', sep=',')
File "C:\...\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
File "C:\...\lib\site-packages\pandas\io\parsers\readers.py", line 586, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\...\lib\site-packages\pandas\io\parsers\readers.py", line 482, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\...\lib\site-packages\pandas\io\parsers\readers.py", line 811, in __init__
self._engine = self._make_engine(self.engine)
File "C:\...\lib\site-packages\pandas\io\parsers\readers.py", line 1040, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\...\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 51, in __init__
self._open_handles(src, kwds)
File "C:\...\lib\site-packages\pandas\io\parsers\base_parser.py", line 229, in _open_handles
errors=kwds.get("encoding_errors", "strict"),
File "C:\...\lib\site-packages\pandas\io\common.py", line 707, in get_handle
newline="",
OSError: [Errno 22] Invalid argument: 'C:\temp\test.csv'
I also tried to use Excel to export a CSV file, and get the same error.
Does anyone know what goes wrong?

In a python string, the backslash in '\t' is an escape character which causes those two characters ( \ followed by t) to mean tab. You can get around this using raw strings by prefacing the opening quote with the letter 'r':
r'C:\Temp\test.csv'

pandas read_csv throwing ValueError: Invalid file path or buffer object type: <class 'list'>

I want to read a csv file sent as a command line argument. Thought I could directly use FileType object of argsprase but I'm getting errors.
from argparse import ArgumentParser, FileType
from pandas import read_csv
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument("input_file_path", help="Input CSV file", type=FileType('r'), nargs=1)
df = read_csv(parser.parse_args().input_file_path, sep="|")
print(df.to_string())
Pandas read_csv is unable to read FileType object when I execute the program as given below - what is missing?
python csv_splitter.py test.csv
Traceback (most recent call last):
File "csv_splitter.py", line 7, in <module>
df = read_csv(parser.parse_args().input_file_path, sep="|")
File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 605, in read_csv
return _read(filepath_or_buffer, kwds)
File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 457, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 814, in __init__
self._engine = self._make_engine(self.engine)
File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1045, in _make_engine
return mapping[engine](self.f, **self.options) # type: ignore[call-arg]
File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1862, in __init__
self._open_handles(src, kwds)
File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\parsers.py", line 1357, in _open_handles
self.handles = get_handle(
File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\common.py", line 558, in get_handle
ioargs = _get_filepath_or_buffer(
File "C:\Users\kakkrah\AppData\Roaming\Python\Python38\site-packages\pandas\io\common.py", line 371, in _get_filepath_or_buffer
raise ValueError(msg)
ValueError: Invalid file path or buffer object type: <class 'list'>

pd.read_csv cannot read a list of files, only one at a time.
To read multiple files into one dataframe, use pd.concat with a generator:
df = pd.concat(pd.read_csv(p) for p in paths)
Or pd.concat with map:
df = pd.concat(map(pd.read_csv, paths))
In OP's case, even though nargs=1 limits the arg parser to consuming 1 file, it still returns a list of that 1 file object:
print(parser.parse_args().input_file_path)
# [ <_io.TextIOWrapper> ]
So just index the single file:
df = pd.read_csv(parser.parse_args().input_file_path[0])
# ^^^

Declaring path in pycharm not working

I am using Pycharm and I created the project in a folder called collaborative filtering. I have some csv in a folder called ml-latest-small that I also placed in the collaborative filtering folder that has the .py file I am working from.
I am getting the following errors:
Traceback (most recent call last):
File "/Users/usernamehere/Desktop/Machine Learning/Lesson 5/CollaborativeFiltering/movies.py", line 32, in <module>
cf = CollabFilterDataset.from_csv(path, 'ratings.csv', 'userId', 'movieId', 'rating')
File "/Users/usernamehere/Desktop/Machine Learning/Lesson 5/CollaborativeFiltering/venv/lib/python3.6/site-packages/fastai/column_data.py", line 146, in from_csv
df = pd.read_csv(os.path.join(path,csv))
File "/Users/usernamehere/Desktop/Machine Learning/Lesson 5/CollaborativeFiltering/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/usernamehere/Desktop/Machine Learning/Lesson 5/CollaborativeFiltering/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Users/usernamehere/Desktop/Machine Learning/Lesson 5/CollaborativeFiltering/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 818, in __init__
self._make_engine(self.engine)
File "/Users/usernamehere/Desktop/Machine Learning/Lesson 5/CollaborativeFiltering/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Users/usernamehere/Desktop/Machine Learning/Lesson 5/CollaborativeFiltering/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 1695, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 402, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 718, in pandas._libs.parsers.TextReader._setup_parser_source
FileNotFoundError: File b'/Users/usernamehere/Users/usernamehere/Desktop/Machine Learning/Lesson 5/ratings.csv' does not exist
I am not sure what is wrong with the way I am declaring the path. Here is my code:
import torch
from fastai.learner import *
from fastai.column_data import *
path = '~/Users/usernamehere/Desktop/Machine Learning/Lesson 5'
ratings = pd.read_csv(path+'ratings.csv')
#print(ratings.head())
movies = pd.read_csv(path+'movies.csv')
#print(movies.head())
# Crete a subset for Excel
g = ratings.groupby('userId')['rating'].count()
topUsers = g.sort_values(ascending=False)[:15]
g = ratings.groupby('movieId')['rating'].count()
topMovies = g.sort_values(ascending=False)[:15]
top_r = ratings.join(topUsers, rsuffix='_r', how='inner', on='userId')
top_r = top_r.join(topMovies, rsuffix='_r', how='inner', on='movieId')
# pd.crosstab(top_r.userId, top_r.movieId, top_r.rating, aggfunc=np.sum)
# Collaborative Filtering - High Level
# Get a valisation indexes
val_idxs = get_cv_idxs(len(ratings))
wd = 2e-4
n_factors = 50
cf = CollabFilterDataset.from_csv(path, 'ratings.csv', 'userId', 'movieId', 'rating')
Edit:
Changing the path to path='ml-latest-small/' seemed to work.

Since you are on a *nix-based system, I would recommend you escape your spaces with \. Here's a simple test on Mac to show a scenario with and without escaping:
$ pwd
/tmp
$ mkdir "Machine Learning"
$ cd Machine Learning
-bash: cd: Machine: No such file or directory
$ cd Machine\ Learning
$ pwd
/tmp/Machine Learning

Here, the ~ means $HOME (read here):
which is why you end up with:
/Users//Users/usernamehere/Desktop/Machine Learning/Lesson 5/ratings.csv' which is not a valid path.

Python Decompressing gzip csv in pandas csv reader

The following code works in Python3 but fails in Python2
r = requests.get("http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz", stream=True)
decompressed_file = gzip.GzipFile(fileobj=r.raw)
data = pd.read_csv(decompressed_file, sep=',')
data.columns = ["timestamp", "price" , "volume"] # set df col headers
return data
The error I get in Python2 is the following:
TypeError: 'int' object has no attribute '__getitem__'
The error is on the line where I set data equal to pd.read_csv(...)
Seems to be a pandas error to me
Stacktrace:
Traceback (most recent call last):
File "fetch.py", line 51, in <module>
print(f.get_historical())
File "fetch.py", line 36, in get_historical
data = pd.read_csv(f, sep=',')
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 709, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 449, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 818, in __init__
self._make_engine(self.engine)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1049, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1695, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 562, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 760, in pandas._libs.parsers.TextReader._get_header
File "pandas/_libs/parsers.pyx", line 965, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2197, in pandas._libs.parsers.raise_parser_error
io.UnsupportedOperation: seek

The issue from the traceback you posted is related to the fact that the Response object's raw attribute is a file-like object that does not support the .seek method that typical file objects support. However, when ingesting the file object with pd.read_csv, pandas (in python2) seems to be making use of the seek method of the provided file object.
You can confirm that the returned response's raw data is not seekable by calling r.raw.seekable(), which should normally return False.
The way to circumvent this issue may be to wrap the returned data into an io.BytesIO object as follows:
import gzip
import io
import pandas as pd
import requests
# file_url = "http://api.bitcoincharts.com/v1/csv/coinbaseUSD.csv.gz"
file_url = "http://api.bitcoincharts.com/v1/csv/aqoinEUR.csv.gz"
r = requests.get(file_url, stream=True)
dfile = gzip.GzipFile(fileobj=io.BytesIO(r.raw.read()))
data = pd.read_csv(dfile, sep=',')
print(data)
0 1 2
0 1314964052 2.60 0.4
1 1316277154 3.75 0.5
2 1316300526 4.00 4.0
3 1316300612 3.80 1.0
4 1316300622 3.75 1.5
As you can see, I used a smaller file from the directory of files available. You can switch this to your desired file.
In any case, io.BytesIO(r.raw.read()) should be seekable, and therefore should help avoid the io.UnsupportedOperation exception you are encountering.
As for the TypeError exception, it is inexistent in this snippet of code.
I hope this helps.

Python Pandas print error in Eclipse's PyDev: unknown encoding: MS874

I am trying to use Pandas library to read csv files, using Eclipse's PyDev.
foo.csv file:
"head1", "head2",
"A", "123"
test.py:
import pandas as pd
data = pd.read_csv('foo.csv');
print data
I ran this and got an error:
Traceback (most recent call last):
File "C:\Users\qqq\studyspace\macd\test3.py", line 4, in <module>
print data
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 666, in __str__
return self.__bytes__()
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 676, in __bytes__
return self.__unicode__().encode(encoding, 'replace')
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 691, in __unicode__
fits_horizontal = self._repr_fits_horizontal_()
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 651, in _repr_fits_horizontal_
d.to_string(buf=buf)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 1488, in to_string
formatter.to_string()
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 314, in to_string
strcols = self._to_str_columns()
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 258, in _to_str_columns
str_index = self._get_formatted_index()
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 472, in _get_formatted_index
fmt_index = [index.format(name=show_index_names, formatter=fmt)]
File "C:\Python27\lib\site-packages\pandas\core\index.py", line 450, in format
return self._format_with_header(header, **kwargs)
File "C:\Python27\lib\site-packages\pandas\core\index.py", line 472, in _format_with_header
result = _trim_front(format_array(values, None, justify='left'))
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 1321, in format_array
return fmt_obj.get_result()
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 1448, in get_result
return _make_fixed_width(fmt_values, self.justify)
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 1495, in _make_fixed_width
max_len = np.max([_strlen(x) for x in strings])
File "C:\Python27\lib\site-packages\pandas\core\format.py", line 184, in _strlen
return len(x.decode(encoding))
LookupError: unknown encoding: MS874
I have tried to run this in IPython, and it does not give the error, so I think the problem is with my Eclipse setting. I use Eclipse Juno and I installed Pandas via Python(x,y).
I have tried to solve it blindly like this
import pandas as pd
data = pd.read_csv('foo.csv');
b = True;
while(b):
try:
print data
b = False
except:
print 'foooo'
And it just printed 'foooo' forever.

I have found the solution.
Right click on the project => Properties => Resource => Text file encoding. Choose other => UTF-8.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas cannot converge - python

Related

pandas read_csv error when loading a plain CSV file

pandas read_csv throwing ValueError: Invalid file path or buffer object type: <class 'list'>

Declaring path in pycharm not working

Python Decompressing gzip csv in pandas csv reader

Python Pandas print error in Eclipse's PyDev: unknown encoding: MS874

Categories

Resources