I would like to apply a function to each row of a dask dataframe.
Executing the operation with ddf.compute() gives me an error:
AttributeError: 'Series' object has no attribute 'encode'
This is my code:
def polar(data):
data=scale(sid.polarity_scores(data.tweet)['compound'])
return data
t_data['sentiment'] = t_data.map_partitions(polar, meta=('sentiment', int))
And using t_data.head() also result in same error.
I have found out the answer. You have to apply for partition.
t_data['sentiment']=t_data.map_partitions(lambda df : df.apply(polar,axis=1))
You can use the following:
t_data.apply(polar, axis=1)
Related
def split_address(in_address):
address = in_address
address_arr = address.split()
if len(address_arr):
for i, w in enumerate(address_arr):
if w.isnumeric():
break
locationName = ' '.join(address_arr[:i])
locationAddress = ' '.join(address_arr[i:-3])
locationCity = address_arr[-3]
locationState = address_arr[-2]
locationZip = address_arr[-1]
else:
locationName.append('')
locationAddress.append('')
locationCity.append('')
locationState.append('')
locationZip.append('')
return (locationName, locationAddress, locationCity, locationState, locationZip)
new_df = out_df.apply(split_address)
The above code throws an error AttributeError: 'Series' object has no attribute 'split'
I wish to get an output so that the function is applied on all the rows of the column in the dataframe and the output has to be in 5 different columns as mentioned.
It'd be great if you can please help me with this.
Thank you.
I'm trying to use the funcion to.csv to export a dataset but the error "'str' object has no attribute 'columns'" was reported. That's my script:
import pandas as pd
data=pd.read_csv('Documents/Pos/ETLSIM/Dados/ETLSIM.DORES_MG_2019_t.csv', low_memory="false")
data2 = pd.read_csv('Documents/Pos/ETLSIM/ETLSIM.DORES_MG_2018_t.csv', low_memory="false")
df_concat = pd.concat([data,data2], sort = False)
df_concat.to_csv('concatenado.csv')
Data Overview
Hello everyone
I need to get the two platforms with the most visits per day for one year in total. So:
Group the data by day
Extract the two platforms with most visits for each day
I tried this code:
df.groupby(pd.Grouper(key="Datum", freq="1D")).nlargest(2, 'Visits')
and got that error:
AttributeError: Cannot access callable attribute 'nlargest' of 'DataFrameGroupBy' objects, try using the 'apply' method
Thanks a lot for your help! :)
Why not just use apply, as the error message states:
import pandas as pd
# dataframe example
d = {'Platform': ['location', 'office', 'station'], 'Date': ['01.08.2019', '01.08.2019', '01.08.2019'], 'Visits': [4372, 48176, 2012]}
df = pd.DataFrame(data=d)
df.groupby(pd.Grouper(key="Date")).apply(lambda grp: grp.nlargest(2, 'Visits'))
I'm trying to add a clickable hyperlink to my dataframe, which I've done successfully, but then when I try to use JoogleChart to create drop downs and make the data more manageable for users, I get this error: AttributeError: 'NoneType' object has no attribute 'columns'
[IN]: def candidate_url(row): return """<a
href="hirecentral.corp.indeed.com/candidates? application_id=
{}&from=candidate_search" target="_blank">{reqtitle}</a>
""".format(row['application_id'],
reqtitle = row.application_id) final['candidate_link'] =
final.apply(candidate_url, axis=1)
[IN]: h = final.to_html(escape=True)
[IN]: chart = JoogleChart(h, chart_type = 'Table', allow_nulls = True)
chart.show()
Attribute Error: 'NoneType' object has no attribute 'columns'
I want to convert all the items in the 'Time' column of my pandas dataframe from UTC to Eastern time. However, following the answer in this stackoverflow post, some of the keywords are not known in pandas 0.20.3. Overall, how should I do this task?
tweets_df = pd.read_csv('valid_tweets.csv')
tweets_df['Time'] = tweets_df.to_datetime(tweets_df['Time'])
tweets_df.set_index('Time', drop=False, inplace=True)
error is:
tweets_df['Time'] = tweets_df.to_datetime(tweets_df['Time'])
File "/scratch/sjn/anaconda/lib/python3.6/site-packages/pandas/core/generic.py", line 3081, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'to_datetime'
items from the Time column look like this:
2016-10-20 03:43:11+00:00
Update:
using
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])
tweets_df.set_index('Time', drop=False, inplace=True)
tweets_df.index = tweets_df.index.tz_localize('UTC').tz_convert('US/Eastern')
did no time conversion. Any idea what could be fixed?
Update 2:
So the following code, does not do in-place conversion meaning when I print the row['Time'] using iterrows() it shows the original values. Do you know how to do the in-place conversion?
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])
for index, row in tweets_df.iterrows():
row['Time'].tz_localize('UTC').tz_convert('US/Eastern')
for index, row in tweets_df.iterrows():
print(row['Time'])
to_datetime is a function defined in pandas not a method on a DataFrame. Try:
tweets_df['Time'] = pd.to_datetime(tweets_df['Time'])