import sklearn
Maybe I'm not understanding something fundamental here, and I just don't know what that may be. How should I go about debugging this?
messages_tfidf = tfidf_transformer.transform(messages_bow)
print messages_tfidf
That part works fine, as intended. But I run into trouble when I test my understanding of .head()
print messages_tfidf.head()
Outputs the error
AttributeError Traceback (most recent call last)
1 messages_tfidf = tfidf_transformer.transform(messages_bow)
2 print messages_tfidf
----> 3 print messages_tfidf.head()
AttributeError: head not found
Can someone help me understand my logical gap here?
Head is a function of pandas DataFrame.
You can do something like that:
import pandas as pd
dframe = pd.DataFrame(messages_tfidf)
dframe.head()
sklearn always works internally with numpy and returns numpy arrays. There is no head function for a numpy array.
Related
I am trying to build a plot using ggplot(aes(x="order_date", y="total_price"), data=data_)
and this is the code:
# Core Python Data Analysis
from numpy.core.defchararray import index
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Plotting
from plotnine import (
ggplot, aes,
geom_col, geom_line, geom_smooth,
facet_wrap,
scale_y_continuous, scale_x_datetime,
labs,
theme, theme_minimal, theme_matplotlib,
expand_limits,
element_text
)
sales_by_months = df[[ 'order_date', 'total_price' ]] \
.set_index('order_date') \
.resample(rule='MS') \
.aggregate(np.sum) \
.reset_index()
data_ = sales_by_months
ggplot(aes(x="order_date", y="total_price"), data=data_)
I install all the libraries prior to this code. I get this error:
TypeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_28504\2472174790.py in <module>
----> 1 ggplot(aes(x="order_date", y="total_price"), data=data_)
TypeError: __init__() got multiple values for argument 'data'
Can someone please hep me to solve it or show me why am I having this bug?
WINDOW11, JUPYTER NOTEBOOK, PYTHON3
The first argument you're passing into ggplot is data by default (because it's in first place in the source code, and you didn't provide any keyword), but then you pass another data argument later (with keyword).
This is why such an error happens.
Try to specify the keyword for the first argument, which is mapping.
Also, another tip which is not necessary but can help to improve readability and have a code more straight: put the data argument first, then mapping in second.
So you would have: ggplot(data=data_, mapping=aes(x="order_date", y="total_price"))
import glob
from os.path import join
import yt
from yt.config import ytcfg
path = ytcfg.get("yt", "test_data_dir")
from mpl_toolkits.mplot3d import Axes3D
my_fns = glob.glob(join(path, "Orbit", "puredef_hdf5_chk_000000"))
my_fns.sort()
fields = ["particle_velocity_x", "particle_velocity_y", "particle_velocity_z"]
ds = yt.load(my_fns[:])
dd = ds.all_data()
indices = dd["particle_index"].astype("int")
print (indices)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-27-1bae40a7b7ba> in <module>
1 ds = yt.load(my_fns[:])
----> 2 dd = ds.all_data()
3 indices = dd["particle_index"].astype("int")
4 print (indices)
AttributeError: 'DatasetSeries' object has no attribute 'all_data'
I have looked at other posts on here, but many of them deal with different aspects of this error that deals with lens or other statements.
I had exactly the same error recently, with a very similar code. First of all, a mistake I did was giving the code the symbolic links to the real data files, while it should work directly with the data.
Another issue was a problem with the installation of the yt library, version 3.6.1. I had installed it using the pip command, but it wasn't working well, so I uninstalled it and I used the "all-in-one" script they provide on their homepage.
Fixing these two things together solved completely this problem.
Firstly, apologies if this is a silly question, this is my first time using plotly. I am trying to make a sunburst diagram using my 'actor' dataframe, but I get an attribute error when I attempt to do so:
Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-32-2e5f13ef3c16> in <module>
----> 1 px.data.actor
AttributeError: module 'plotly.express.data' has no attribute 'actor'
Screenshot:
I have the following packages imported at the top:
import plotly.graph_objects as go
import plotly.io as pio
import plotly.express as px
Can anyone see where I'm going wrong? Thanks in advance!
It seems you're assuming that px.data.actor somehow would make your dataframe actor available to plotly. And I can understand why since px.data will make some built-in datasets available to you, like px.data.carshare():
centroid_lat centroid_lon car_hours peak_hour
0 45.471549 -73.588684 1772.750000 2
1 45.543865 -73.562456 986.333333 23
2 45.487640 -73.642767 354.750000 20
3 45.522870 -73.595677 560.166667 23
4 45.453971 -73.738946 2836.666667 19
[...]
244 45.547171 -73.556258 951.416667 3
245 45.546482 -73.574939 795.416667 2
246 45.495523 -73.627725 425.750000 8
247 45.521199 -73.581789 1044.833333 17
248 45.532564 -73.567535 694.916667 5
To inspect all datatasets avaiable to you in the same manner, just run dir(px.data) to get:
['absolute_import',
'carshare',
'election',
'election_geojson',
'gapminder',
'iris',
'tips',
'wind']
But since actor already is available to you (because you've presumably made it yourself), the line px.data.actor() is not necessary at all.
P.S
Running px.express.carshare() returns a pandas dataframe. To keep working with this dataframe it's best to assign it to a variable like this: df_cs = px.data.carshare()
The error looks self-explanatory. I'm not really sure why calling the .actor() method is necessary in the code. px.data will load some datasets that are a part of the library. Some of these include iris, tips, wind, ... Since you already have the dataframe, this call is unnecessary.
Here is an exhaustive list from the code.
Simply remove the line and it should work.
I'm not looking for a solution here as I found a workaround; mostly I'd just like to understand why my original approach didn't work given that the work around did.
I have a dataframe of 2803 rows with the default numeric key. I want to replace that with the values in column 0, namely TKR.
So I use f.set_index('TKR') and get
f.set_index('TKR')
Traceback (most recent call last):
File "<ipython-input-4-39232ca70c3d>", line 1, in <module>
f.set_index('TKR')
TypeError: 'str' object is not callable
So I think maybe there's some noise in my TKR column and rather than scrolling through 2803 rows I try f.head().set_index('TKR')
When that works I try f.head(100).set_index('TKR') which also works. I continue with parameters of 500, 1000, and 1500 all of which work. So do 2800, 2801, 2802 and 2803. Finally I settle on
f.head(len(f)).set_index('TKR')
which works and will handle a different size dataframe next month. I would just like to understand why this works and the original, simpler, and (I thought) by the book method doesn't.
I'm using Python 3.6 (64 bit) and Pandas 0.18.1 on a Windows 10 machine
You might have accidentally assigned the pd.DataFrame.set_index() to a value.
example of this mistake: f.set_index = 'intended_col_name'
As a result for the rest of your code .set_index was changed into a str, which is not callable, resulting in this error.
Try restarting your notebook, remove the wrong code and replace it with f.set_index('TKR')
I know it's been a long while, but I think some people may need the answer in the future.
What you do with f.set_index('TKR') is totally right as long as 'TKR' is a column of DataFrame f.
That is to say, this is a bug you are not supposed to have. It is always because that you redefine some build-in function methods or functions of python in your former steps(Possibly 'set_index'). So, the way to fix is to review your code to find out which part is wrong.
If you are using Jupiter notebook, restart it and run this block only can fix this problem.
I believe I have a solution for you.
I ran into the same problem and I was constructing my dataframes from a dictionary, like this:
df_beta = df['Beta']
df_returns = df['Returns']
then, trying to do df_beta.set_index(Date) would fail. My workaround was
df_beta = df['Beta'].copy()
df_returns = df['Returns'].copy()
So apparently, if you build your dataframes as a "view" of another existing dataframe, you can't set index and it will raise 'Series not callable' error. If instead you create an explicit new object copying the original dataframes, then you can call reset_index, which is what you kind of do when you compute the head.
Hope this helps, 2 years later :)
I have the same problem here.
import tushare as ts
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
ts.set_token('*************************************')
tspro = ts.pro_api()
gjyx = tspro.daily(ts_code='000516.SZ', start_date='20190101')
# this doesn't work
# out:'str' object is not callable
gjyx = gjyx.set_index('trade_date')
# this works
gjyx = gjyx.head(len(gjyx)).set_index('trade_date')
jupyter notebook 6.1.6, python 3.9.1, miniconda3, win10
But when I upload this ipynb to ubuntu on AWS, it works.
I once had this same issue.
This simple line of code keep throwing TypeError: 'series' object is not callable error again and again.
df = df.set_index('Date')
I had to shutdown my kernel and restart the jupyter notebook to fix it.
I'm trying to generate a random.gauss numbers but I have message error. Here is my code:
import sys,os
import numpy as np
from random import gauss
previous_value1=1018.163072765074389
previous_value2=0.004264112033664
alea_var_n=random.gauss(1,2)
alea_var_tau=random.gauss(1,2)
new_var_n= previous_value1*(1.0+alea_var_n)
new_var_tau=previous_value2*(1.0+alea_var_tau)
print 'new_var_n',new_var_n
print 'new_var_tau',new_var_tau
I got this error:
Traceback (most recent call last):
File "lolo.py", line 15, in <module>
alea_var_n=random.gauss(1,2)
AttributeError: 'builtin_function_or_method' object has no attribute 'gauss'
Someone know what's wrong, I'm a newbye with python. Or is it a numpy version problem.
For a faster option, see Benjamin Bannier's solution (which I gave a +1 to). Your present code that you posted will not work for the following reason: your import statement
from random import gauss
adds gauss to your namespace but not random. You need to do this instead:
alea_var_n = gauss(1, 2)
The error in your post, however, is not the error you should get when you run the code that you have posted above. Instead, you will get the following error:
NameError: name 'random' is not defined
Are you sure you have posted the code that generated that error? Or have you somehow included the wrong error in your post?
Justin Barber shows you an immediate solution for your problem.
Since you are using NumPy you could however use their generators as well since they appear to be significantly faster (about a factor 5-7 on my machine), e.g.
alea_var_n = np.random.normal(1, 2)