how to use explode function - python

I am working on dataset named "data" and what to use explode function but facing error in this code. Error:-'DataFrame' object has no attribute 'explode' and showing attribute error
artists_exploded = data[['artists_upd','id']].explode('artists_upd')

Not sure whether you are trying to explode data on 2 columns (artists & ID), but this is how you do it for just artists_upd. Perhaps you want to use a different value for str.split(), depending on your data.
artists_exploded = data.assign(artists_upd = data.artists_upd.str.split(",")).explode("artists_upd")

Related

Pandas merge not working due to a wrong type

I'm trying to merge two dataframes using
grouped_data = pd.merge(grouped_data, df['Pattern'].str[7:11]
,how='left',left_on='Calc_DRILLING_Holes',
right_on='Calc_DRILLING_Holes')
But I get an error saying can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>
What could be the issue here. The original dataframe that I'm trying to merge to was created from a much larger dataset with the following code:
import pandas as pd
raw_data = pd.read_csv(r"C:\Users\cherp2\Desktop\test.csv")
data_drill = raw_data.query('Activity =="DRILL"')
grouped_data = data_drill.groupby([data_drill[
'PeriodStartDate'].str[:10], 'Blast'])[
'Calc_DRILLING_Holes'].sum().reset_index(
).sort_values('PeriodStartDate')
What do I need to change here to make it a regular normal dataframe?
If I try to convert either of them to a dataframe using .to_frame() I get an error saying that 'DataFrame' object has no attribute 'to_frame'
I'm so confused at to what kind of data type it is.
Both objects in a call to pd.merge need to be DataFrame objects. Is grouped_data a Series? If so, try promoting it to a DataFrame by passing pd.DataFrame(grouped_data) instead of just grouped_data.

getting 'DataFrameGroupBy' object is not callable in jupyter

I have this csv file from https://www.data.gov.au/dataset/airport-traffic-data/resource/f0fbdc3d-1a82-4671-956f-7fee3bf9d7f2
I'm trying to aggregate with
airportdata = Airports.groupby(['Year_Ended_December'])('Dom_Pax_in','Dom_Pax_Out')
airportdata.sum()
However, I keep getting 'DataFrameGroupBy' object is not callable
and it wont print the data I want
How to fix the issue?
You need to execute the sum aggregation before extracting the columns:
airportdata_agg = Airports.groupby(['Year_Ended_December']).sum()[['Dom_Pax_in','Dom_Pax_Out']]
Alternatively, if you'd like to ensure you're not aggregating columns you are not going to use:
airportdata_agg = Airports[['Dom_Pax_in','Dom_Pax_Out', 'Year_Ended_December']].groupby(['Year_Ended_December']).sum()

What's the easiest way to replace categorical columns of data with codes in Pandas?

I have a table of data in .dta format which I have read into python using Pandas. The data is mostly in the categorical data type and I want to replace the columns with numerical data that can be used with machine learning, such as boolean (1/0) or codes. The trouble is that I can't directly replace the data because it won't let me change the categories, unless I add them.
I have tried using pd.get_dummies(), but it keeps returning an error:
TypeError: 'columns' is an invalid keyword argument for this function
print(pd.get_dummies(feature).head(), columns=['smkevr', 'cignow', 'dnnow',
'dnever', 'complst'])
Is there a simple way to replace this data with numerical codes based on the value (for example 'Not applicable' = 0)?
I do it the following way:
df_dumm = pd.get_dummies(feature).head()
df_dumm.columns = ['smkevr', 'cignow', 'dnnow',
'dnever', 'complst']
print (df_dumm.head())

Get result of value_count() to excel from Pandas

I have a data frame "df" with a column called "column1". By running the below code:
df.column1.value_counts()
I get the output which contains values in column1 and its frequency. I want this result in the excel. When I try to this by running the below code:
df.column1.value_counts().to_excel("result.xlsx",index=None)
I get the below error:
AttributeError: 'Series' object has no attribute 'to_excel'
How can I accomplish the above task?
You are using index = None, You need the index, its the name of the values.
pd.DataFrame(df.column1.value_counts()).to_excel("result.xlsx")
If go through the documentation Series had no method to_excelit applies only to Dataframe.
So either you can save it another frame and create an excel as:
a=df.column1.value_counts()
a.to_excel("result.xlsx")
Look at Merlin comment I think it is the best way:
pd.DataFrame(df.column1.value_counts()).to_excel("result.xlsx")

Deleting first two rows of a dataframe after doing groupby

I am trying to delete the first two rows from my dataframe df and am using the answer suggested in this post. However, I get the error AttributeError: Cannot access callable attribute 'ix' of 'DataFrameGroupBy' objects, try using the 'apply' method and don't know how to do this with the apply method. I've shown the relevant lines of code below:
df = df.groupby('months_to_maturity')
df = df.ix[2:]
Edit: Sorry, when I mean I want to delete the first two rows, I should have said I want to delete the first two rows associated with each months_to_maturity value.
Thank You
That is what tail(-2) will do. However, groupby.tail does not take a negative value, so it needs a tweak:
df.groupby('months_to_maturity').apply(lambda x: x.tail(-2))
This will give you desired dataframe but its index is a multi-index now.
If you just want to drop the rows in df, just use drop like this:
df.drop(df.groupby('months_to_maturity').head(2).index)

Categories

Resources