When I try to use from_csv method in python 3.7, I receive attribution error:
import pandas as pd
pd.DataFrame.from_csv(adr)
AttributeError: type object 'DataFrame' has no attribute 'from_csv'
How can I solve this problem?
from_csv is deprecated now. There are no further developments on this.
Its suggested to use pd.read_csv now.
import pandas as pd
df = pd.read_csv("your-file-path-here")
And python warning now says the same -
main:1: FutureWarning: from_csv is deprecated. Please use read_csv(...) instead. Note that some of the default arguments are different, so please refer to the documentation for from_csv when changing your function calls
import pandas as pd
df = pd.read_csv('<CSV_FILE>')
To read CSV file in a pandas dataframe you need to use the function read_csv. You may try the following code
import pandas as pd
pd.read_csv('adr.csv')
The following link will give you an idea about how to use pandas to read and write a CSV file.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Related
I have below line in my data pipeline code which takes json array and normalizes it using pandas.json_normalize
df = pd.json_normalize(reviews, sep='_')
Now when reviews is getting null or None, it has suddenly started failing. What should be done here?
I tried writing all the data that review receives in a for loop, and from that I understood, this failure occurs only when review receives null
Which version of pandas are you using?
If you get the error AttributeError: module 'pandas' has oo attribute 'json_normalize' after inserting from pandas import json_normalize it may be due to the version you are using.
You have to downgrade the pandas to the version before 1.0.3. Since you need to import the json_normalize module from the pandas package directly into newer versions from pandas, import json_normalize instead.
After that you can try out something like:
from pandas.io.json import json_normalize
data = { "xy": ["1","2","3"] }
json = json_normalize(data)
I use the modin library for multiprocessing.
While the library is great for faster processing, it fails at merge and I would like to revert to default pandas in between the code.
I understand as per PEP 8: E402 conventions, import should be declared once and at the top of the code however my case would need otherwise.
import pandas as pd
import modin.pandas as mpd
import os
import ray
ray.init()
os.environ["MODIN_ENGINE"] = "ray"
df = mpd.read_csv()
do stuff
Then I would like to revert to default pandas within the same code
but how would i do the below in pandas as there does not seem to be a clear way to switch from pd and mpd in the below lines and unfortunately modin seems to take precedence over pandas.
df = df.loc[:, df.columns.intersection(['col1', 'col2'])]
df = df.drop_duplicates()
df = df.sort_values(['col1', 'col2'], ascending=[True, True])
Is it possible?
if yes, how?
You can simply do the following :
import modin.pandas as mpd
import pandas as pd
This way you have both modin as well as original pandas in memory and you can efficiently switch as per your need.
Since many have posted answers however in this particular case, as applicable and pointed out by #Nin17 and this comment from Modin GitHub, to convert from Modin to Pandas for single core processing of some of the operations like df.merge you can use
import pandas as pd
import modin.pandas as mpd
import os
import ray
ray.init()
os.environ["MODIN_ENGINE"] = "ray"
df_modin = mpd.read_csv() #reading dataframe into Modin for parallel processing
df_pandas = df_modin._to_pandas() #converting Modin Dataframe into pandas for single core processing
and if you would like to reconvert the dataframe to a modin dataframe for parallel processing
df_modin = mpd.DataFrame(df_pandas)
You can try pandarallel package instead of modin , It is based on similar concept : https://pypi.org/project/pandarallel/#description
Pandarallel Benchmarks : https://libraries.io/pypi/pandarallel
As #Nin17 said in a comment on the question, this comment from the Modin GitHub describes how to convert a Modin dataframe to pandas. Once you have a pandas dataframe, you call any pandas method on it. This other comment from the same issue describes how to convert the pandas dataframe back to a Modin dataframe.
I have noticed that when we set some options for pandas DataFrames such as pandas.DataFrame('max_rows',10) it works perfectly for DataFrame objects.
However, it has no effect on Style objects.
Check the following code :
import pandas as pd
import numpy as np
data= np.zeros((10,20))
pd.set_option('max_rows',4)
pd.set_option('max_columns',10)
df=pd.DataFrame(data)
display(df)
display(df.style)
Which will result in :
I do not know how to set the properties for Style object.
Thanks.
Styler is developing its own options. The current version 1.3.0 of pandas has not got many. Perhaps only the styler.render.max_elements.
Some recent pull requests to the github repo are adding these features but they will be Stylers own version.
As #attack69 mentioned, styler has its own options under development.
However, I could mimic set_option(max_row) and set_option(max_columns) for styler objects.
Check the following code:
import pandas as pd
import numpy as np
data= np.zeros((10,20))
mx_rw=4
mx_cl=10
pd.set_option('max_rows',mx_rw)
pd.set_option('max_columns',mx_cl)
df=pd.DataFrame(data)
display(df)
print(type(df))
df.loc[mx_rw/2]='...'
df.loc[:][mx_cl/2]='...'
temp=list(range(0,int(mx_rw/2),1))
temp.append('...')
temp.extend(range(int(mx_rw/2)+1,data.shape[0],1))
df.index=temp
del temp
temp=list(range(0,int(mx_cl/2),1))
temp.append('...')
temp.extend(range(int(mx_cl/2)+1,data.shape[1],1))
df.columns=temp
del temp
df=df.drop(list(range(int(mx_rw/2)+1,data.shape[0]-int(mx_rw/2),1)),0)
df=df.drop(list(range(int(mx_cl/2)+1,data.shape[1]-int(mx_cl/2),1)),1)
df=df.style.format(precision=1)
display(df)
print(type(df))
which both DataFrame and Styler object display the same thing.
I try to manipulate a real big dataset in python using pandas. The code I am using is the following
import numpy as np
import pandas as pd
from pandas import DataFrame
from pandas import Series
pd.set_option('display.max_columns', None)
df = pd.read_csv('Medicare.txt', 'r', sep='\t', na_values=['.'])
print (len(df))
df.head(10)
The error I am receiving is the following
TypeError: parser_f() got multiple values for argument 'sep'
Can someone tell me what I am doing wrong?
Thank you
The second positional argument to read_csv is sep. For some reason you are passing an 'r' there, as well as an explicit sep kwarg.
pyLDAvis library prepare method has crashed while using pandas library inside.
Here is the code:
def load_R_model(filename):
with open(filename, 'r') as j:
data_input = json.load(j)
data = {'topic_term_dists': data_input['phi'],
'doc_topic_dists': data_input['theta'],
'doc_lengths': data_input['doc.length'],
'vocab': data_input['vocab'],
'term_frequency': data_input['term.frequency']}
return data
movies_model_data = load_R_model('movie_reviews_input.json')
print('Topic-Term shape:%s' %str(np.array(movies_model_data['topic_term_dists']).shape))
print('Doc-Topic shape: %s' %str(np.array(movies_model_data['doc_topic_dists']).shape))
movies_vis_data = pyLDAvis.prepare(np.array(movies_model_data['topic_term_dists']),
np.array(movies_model_data['doc_topic_dists']),
np.array(movies_model_data['doc_lengths']),
np.array(movies_model_data['vocab']),
np.array(movies_model_data['term_frequency']))
Error:
... line 283, in prepare
topic_proportion=>(topic_freq/topic_freq.sum()).sort_values(ascending=False)
...
AttributeError: 'Series' object has no attribute 'sort_values'
Why pandas has no attribute as sort_values although I updated most recent version?
As per the documentation. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html
'sort_values' is new in version 0.17.0.
So, please update pandas version.
How to check pandas version:
import pandas as pd
pd.__version__
How to update pandas.
using conda: conda update pandas
using pip: pip install pandas -U
I have got the same error recently. It's because of pandas.DataFrame.sortlevel() is deprecated since pandas version 0.20.0. Use DataFrame.sort_index() instead. This solved my problem.
Latest Version of Pandas has .sort_values()
import pandas as pd
pd.sort_values()
Can Be Used
I had a similar error on a problem 'sort_values' has been Deprecated since version 0.20.0. Use DataFrame.sort_index()
The pandas package cancelled the sort method in the 0.23.4 version. The old version of the Series and DataFrame objects also contain this function. The new version recommends the use of the sort_index and sort_values functions
use
sort_values()
OR
sort_index()
df.sort_values(by='col1', ascending=False)
or
df.sort_values(by='col1', ascending=True)
co1 is column with the values you want to sort.