Get result of value_count() to excel from Pandas - python

I have a data frame "df" with a column called "column1". By running the below code:
df.column1.value_counts()
I get the output which contains values in column1 and its frequency. I want this result in the excel. When I try to this by running the below code:
df.column1.value_counts().to_excel("result.xlsx",index=None)
I get the below error:
AttributeError: 'Series' object has no attribute 'to_excel'
How can I accomplish the above task?

You are using index = None, You need the index, its the name of the values.
pd.DataFrame(df.column1.value_counts()).to_excel("result.xlsx")

If go through the documentation Series had no method to_excelit applies only to Dataframe.
So either you can save it another frame and create an excel as:
a=df.column1.value_counts()
a.to_excel("result.xlsx")
Look at Merlin comment I think it is the best way:
pd.DataFrame(df.column1.value_counts()).to_excel("result.xlsx")

Related

How to merge dataset in Python based on column value

I have a dataframe structured as follows:
"Location","filePath","startLine","endLine","startColumn","endColumn","codeElementType","description", "codeElement","repository","sha1","url","type","description.1"
An example of the dataframe is the following:
I need to merge the entry that has the same sha1.
An example of output shoud be the following:
Supposing that this is the input:
The expected output should be the following:
This bacause the in this case, the first two lines has the same sha1.
I try the following snippet:
agg_functions=["Location","filePath","startLine","endLine", "startColumn","endColumn","codeElementType","description",
"codeElement","repository","sha1","url","type","description.1"]
df_new = df.groupby(df['sha1']).aggregate(agg_functions)
print(df_new)
However, is always thrown the following expection:
raise AttributeError(
AttributeError: 'SeriesGroupBy' object has no attribute 'Location'
How can I fix it?
agg_functions should be references to functions, not column names.
for example:
agg_functions = [np.sum, "mean"]
see DataFrameGroupBy.aggregate
I can't help with an exact fix. I must confess the text in the images you posted is too small for me to understand what your final result needs to be.

Converting a Pandas Series to Dataframe

I tried to do this:
get_sent_score_neut_df =
pandas.Series(get_sent_score_neut).to_frame(name='sentiment-
neutral').reset_index().apply(lambda x: float(x))
And when I want to merge/join it with another DataFrame I created the same way the error I get is:
AttributeError: 'Series' object has no attribute '_join_compat'
Is there a way to fix that?
That´s the line of code I used to merge/join them:
sentMerge = pandas.DataFrame.join(get_sent_score_pos_df, get_sent_score_neut_df)
Besides: i have tried to rename the index with ```.reset_index`(name='xyz')``
(assigning column names to a pandas series) which causes my IDE to responed with "unexpected argument".

Pandas merge not working due to a wrong type

I'm trying to merge two dataframes using
grouped_data = pd.merge(grouped_data, df['Pattern'].str[7:11]
,how='left',left_on='Calc_DRILLING_Holes',
right_on='Calc_DRILLING_Holes')
But I get an error saying can not merge DataFrame with instance of type <class 'pandas.core.series.Series'>
What could be the issue here. The original dataframe that I'm trying to merge to was created from a much larger dataset with the following code:
import pandas as pd
raw_data = pd.read_csv(r"C:\Users\cherp2\Desktop\test.csv")
data_drill = raw_data.query('Activity =="DRILL"')
grouped_data = data_drill.groupby([data_drill[
'PeriodStartDate'].str[:10], 'Blast'])[
'Calc_DRILLING_Holes'].sum().reset_index(
).sort_values('PeriodStartDate')
What do I need to change here to make it a regular normal dataframe?
If I try to convert either of them to a dataframe using .to_frame() I get an error saying that 'DataFrame' object has no attribute 'to_frame'
I'm so confused at to what kind of data type it is.
Both objects in a call to pd.merge need to be DataFrame objects. Is grouped_data a Series? If so, try promoting it to a DataFrame by passing pd.DataFrame(grouped_data) instead of just grouped_data.

getting 'DataFrameGroupBy' object is not callable in jupyter

I have this csv file from https://www.data.gov.au/dataset/airport-traffic-data/resource/f0fbdc3d-1a82-4671-956f-7fee3bf9d7f2
I'm trying to aggregate with
airportdata = Airports.groupby(['Year_Ended_December'])('Dom_Pax_in','Dom_Pax_Out')
airportdata.sum()
However, I keep getting 'DataFrameGroupBy' object is not callable
and it wont print the data I want
How to fix the issue?
You need to execute the sum aggregation before extracting the columns:
airportdata_agg = Airports.groupby(['Year_Ended_December']).sum()[['Dom_Pax_in','Dom_Pax_Out']]
Alternatively, if you'd like to ensure you're not aggregating columns you are not going to use:
airportdata_agg = Airports[['Dom_Pax_in','Dom_Pax_Out', 'Year_Ended_December']].groupby(['Year_Ended_December']).sum()

Deleting first two rows of a dataframe after doing groupby

I am trying to delete the first two rows from my dataframe df and am using the answer suggested in this post. However, I get the error AttributeError: Cannot access callable attribute 'ix' of 'DataFrameGroupBy' objects, try using the 'apply' method and don't know how to do this with the apply method. I've shown the relevant lines of code below:
df = df.groupby('months_to_maturity')
df = df.ix[2:]
Edit: Sorry, when I mean I want to delete the first two rows, I should have said I want to delete the first two rows associated with each months_to_maturity value.
Thank You
That is what tail(-2) will do. However, groupby.tail does not take a negative value, so it needs a tweak:
df.groupby('months_to_maturity').apply(lambda x: x.tail(-2))
This will give you desired dataframe but its index is a multi-index now.
If you just want to drop the rows in df, just use drop like this:
df.drop(df.groupby('months_to_maturity').head(2).index)

Categories

Resources