pyLDAvis library prepare method has crashed while using pandas library inside.
Here is the code:
def load_R_model(filename):
with open(filename, 'r') as j:
data_input = json.load(j)
data = {'topic_term_dists': data_input['phi'],
'doc_topic_dists': data_input['theta'],
'doc_lengths': data_input['doc.length'],
'vocab': data_input['vocab'],
'term_frequency': data_input['term.frequency']}
return data
movies_model_data = load_R_model('movie_reviews_input.json')
print('Topic-Term shape:%s' %str(np.array(movies_model_data['topic_term_dists']).shape))
print('Doc-Topic shape: %s' %str(np.array(movies_model_data['doc_topic_dists']).shape))
movies_vis_data = pyLDAvis.prepare(np.array(movies_model_data['topic_term_dists']),
np.array(movies_model_data['doc_topic_dists']),
np.array(movies_model_data['doc_lengths']),
np.array(movies_model_data['vocab']),
np.array(movies_model_data['term_frequency']))
Error:
... line 283, in prepare
topic_proportion=>(topic_freq/topic_freq.sum()).sort_values(ascending=False)
...
AttributeError: 'Series' object has no attribute 'sort_values'
Why pandas has no attribute as sort_values although I updated most recent version?
As per the documentation. http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html
'sort_values' is new in version 0.17.0.
So, please update pandas version.
How to check pandas version:
import pandas as pd
pd.__version__
How to update pandas.
using conda: conda update pandas
using pip: pip install pandas -U
I have got the same error recently. It's because of pandas.DataFrame.sortlevel() is deprecated since pandas version 0.20.0. Use DataFrame.sort_index() instead. This solved my problem.
Latest Version of Pandas has .sort_values()
import pandas as pd
pd.sort_values()
Can Be Used
I had a similar error on a problem 'sort_values' has been Deprecated since version 0.20.0. Use DataFrame.sort_index()
The pandas package cancelled the sort method in the 0.23.4 version. The old version of the Series and DataFrame objects also contain this function. The new version recommends the use of the sort_index and sort_values functions
use
sort_values()
OR
sort_index()
df.sort_values(by='col1', ascending=False)
or
df.sort_values(by='col1', ascending=True)
co1 is column with the values you want to sort.
Related
I have below line in my data pipeline code which takes json array and normalizes it using pandas.json_normalize
df = pd.json_normalize(reviews, sep='_')
Now when reviews is getting null or None, it has suddenly started failing. What should be done here?
I tried writing all the data that review receives in a for loop, and from that I understood, this failure occurs only when review receives null
Which version of pandas are you using?
If you get the error AttributeError: module 'pandas' has oo attribute 'json_normalize' after inserting from pandas import json_normalize it may be due to the version you are using.
You have to downgrade the pandas to the version before 1.0.3. Since you need to import the json_normalize module from the pandas package directly into newer versions from pandas, import json_normalize instead.
After that you can try out something like:
from pandas.io.json import json_normalize
data = { "xy": ["1","2","3"] }
json = json_normalize(data)
I have noticed that when we set some options for pandas DataFrames such as pandas.DataFrame('max_rows',10) it works perfectly for DataFrame objects.
However, it has no effect on Style objects.
Check the following code :
import pandas as pd
import numpy as np
data= np.zeros((10,20))
pd.set_option('max_rows',4)
pd.set_option('max_columns',10)
df=pd.DataFrame(data)
display(df)
display(df.style)
Which will result in :
I do not know how to set the properties for Style object.
Thanks.
Styler is developing its own options. The current version 1.3.0 of pandas has not got many. Perhaps only the styler.render.max_elements.
Some recent pull requests to the github repo are adding these features but they will be Stylers own version.
As #attack69 mentioned, styler has its own options under development.
However, I could mimic set_option(max_row) and set_option(max_columns) for styler objects.
Check the following code:
import pandas as pd
import numpy as np
data= np.zeros((10,20))
mx_rw=4
mx_cl=10
pd.set_option('max_rows',mx_rw)
pd.set_option('max_columns',mx_cl)
df=pd.DataFrame(data)
display(df)
print(type(df))
df.loc[mx_rw/2]='...'
df.loc[:][mx_cl/2]='...'
temp=list(range(0,int(mx_rw/2),1))
temp.append('...')
temp.extend(range(int(mx_rw/2)+1,data.shape[0],1))
df.index=temp
del temp
temp=list(range(0,int(mx_cl/2),1))
temp.append('...')
temp.extend(range(int(mx_cl/2)+1,data.shape[1],1))
df.columns=temp
del temp
df=df.drop(list(range(int(mx_rw/2)+1,data.shape[0]-int(mx_rw/2),1)),0)
df=df.drop(list(range(int(mx_cl/2)+1,data.shape[1]-int(mx_cl/2),1)),1)
df=df.style.format(precision=1)
display(df)
print(type(df))
which both DataFrame and Styler object display the same thing.
I'm running Spyder on Python 3.7 and am new to modin. I want to retrieve the first characters in a string and save to a new column. When I run the usual with pandas it works:
import pandas as pd
data = pd.read_csv('Path/data.csv', dtype=str, encoding='utf-8')
data['FL_x']=data['x'].str[0:3]
But when I run the same with modin I get the error: 'TypeError: 'StringMethods' object is not subscriptable'
import modin.pandas as pd
#etc.
I can solve the problem by using str.get():
data['FL_x']=data['x'].str.get(0) + data['x'].str.get(1) + data['x'].str.get(2)
But it is very time-consuming for large amounts of data and checking many first characters.
Is there a simple way to immediately retrieve the first z characters in a string with modin as with pandas?
You could try:
data['FL_x']=data['x'].str.slice(stop=3)
When I try to use from_csv method in python 3.7, I receive attribution error:
import pandas as pd
pd.DataFrame.from_csv(adr)
AttributeError: type object 'DataFrame' has no attribute 'from_csv'
How can I solve this problem?
from_csv is deprecated now. There are no further developments on this.
Its suggested to use pd.read_csv now.
import pandas as pd
df = pd.read_csv("your-file-path-here")
And python warning now says the same -
main:1: FutureWarning: from_csv is deprecated. Please use read_csv(...) instead. Note that some of the default arguments are different, so please refer to the documentation for from_csv when changing your function calls
import pandas as pd
df = pd.read_csv('<CSV_FILE>')
To read CSV file in a pandas dataframe you need to use the function read_csv. You may try the following code
import pandas as pd
pd.read_csv('adr.csv')
The following link will give you an idea about how to use pandas to read and write a CSV file.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
Could someone please explain what the following error is about. Following is my code:
import pandas as pd
from pandas import DataFrame
data =pd.read_csv('FILENAME')
b=data.info()
print b
Following is the error:
Traceback (most recent call last): File
"FILENAME", line 5, in <module>
b=data.info() File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1443, in
info
counts = self.count() File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 3862, in
count
result = notnull(frame).sum(axis=axis) File "/usr/lib/python2.7/dist-packages/pandas/core/common.py", line 276, in
notnull
return -res File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 604, in
__neg__
arr = operator.neg(_values_from_object(self))
TypeError: The numpy boolean negative, the `-` operator, is not supported, use the `~`
operator or the logical_not function instead.
All I am trying to do is display a summary of my dataset using the Dataframe.info() function, and I am having trouble trying to make sense of the error. Although I do feel it has something to do with the numpy package altogether. What needs to be done here?
The problem is with the old version of pandas as new version of numpy.
You must update pandas to get your code working.
If you are on conda you can do a conda update pandas to update pandas.
If you are using pip you can do pip install --upgrade pandas
Also, keep in mind that in pandas documentation it is mentioned the following for the info function
This method prints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage
data.info() will print the info to the console. So no need to assign it to a variable and then later printing it.
import pandas as pd
from pandas import DataFrame
data =pd.read_csv('FILENAME')
print data.info()
This code will work fine for you.