Pandas plot sum of occurrence of string - python

Hi hoping someone can help. I have a data frame where one of the columns contains a list of names. These names are repeated in some circumstances but not all. I am trying to plot a graph where the x-axis contains the name and then the y-axis contains the number of times that name appears in the column.
I have used the following to count the number of time each name appears.
df.groupby('name').name.count()
Then tried to use the following to plot the graph. However, I get a key error messasge.
df.plot.bar(x='name', y=df.groupby('name').name.count())
Anyone able to tell me what I am doing wrong?
Thanks

I believe you need plot Series returned from count function by Series.plot.bar:
df.groupby('name').name.count().plot.bar()
Or use value_counts:
df['name'].value_counts().plot.bar()

Related

Converting Datetimeindex of a dataframe to week numbers

I am very new to Python and cannot seem to solve the problem on my own. Currently I have a dataset which I already converted to a DataFrame using pandas which has a datetimeindex according to yyyy-mm-dd-HH-MM-SS with time stamps of minutes. The attached figure shows the already interpolated dataframe.
enter image description here
Now I want to convert the date/datetimeindex to week numbers to plot the corresponding HVAC Actual, Chiller power etc. to their week number. The index already was set to time but I got an error telling that 'Time' was not recognized in the columns. I tried to recall the index like in the code below and from there create a new column using dt.week
building_interpolated = building_interpolated.set_index('Time')
building_interpolated['Week number'] =
building_interpolated['Time'].dt.week
If I am correct this should create a new column called Week number with the week number in it. However, I still get an error telling that ['Time'] is not in the columns (see figure below)
enter image description here
Anyone who can help me?
Regards, nooby Boaz ;)
df.index = df.index.to_series().dt.isocalendar().week

Plotting non-numerical data in python

I'm a beginner in coding and I wrote some codes in python pandas that I didn't understand fully and need some clarification.
Lets say this is the data, DeathYear, Age, Gender and Country are all columns in an excel file.
How to plot a table with non-numeric values in python?
I saw this question and I used this command
df.groupby('Gender')['Gender'].count().plot.pie(autopct='%.2f',figsize=(5,5))
it works and gives me a pie chart of the percentage of each gender,
but the normal pie chart command that I know for numerical data looks like this
df["Gender"].plot.pie(autopct="%.2f",figsize=(5,5))
My question is why did we add the .count()?
is it to transform non numerical data to numerical?
and why did why use the group by and type the column twice ('Gender')['Gender']?
I'll address the second part of your question first since it makes more sense to explain it that way
The reason that you use ('Gender')['Gender'] is that it does two different things. The first ('Gender') is the argument to the groupby function. It tells you that you want the DataFrame to be grouped by the 'Gender' column. Note that the groupby function needs to have a column or level to group by or else it will not work.
The second ['Gender'] tells you to only look at the 'Gender' column in the resulting DataFrame. The easiest way to see what the second ['Gender'] does is to compare the output of df.groupby('Gender').count() and df.groupby('Gender')['Gender'].count() and see what happens.
One detail that I omitted in first part for clarity it that the output of df.groupby('Gender') is not a DataFrame, but actually a DataFrameGroupBy object. The details of what exactly this object is are not important to your question, but the key is that to get a DataFrame back you need to have a function that tells you what to put in the rows of the DataFrame that you wish to create. The .count() function is one of those options (along with many others such as .mean(), etc.). In your case, since you want the total counts to make a pie chart, the .count() function does exactly that; it will count the number of times 'Female' and 'Male' appears in the 'Gender' column and that sum will be the entries in the corresponding row. The DataFrame is then able to be used to create a pie chart. So you are correct in that the .count() function transforms the non-numeric 'Female' and 'Male' entries into a numeric value which corresponds to how often those entries appeared in the initial DataFrame.

Pandas Groupby Count Partial Strings

I am wanting to try to get a count of how many rows within a column contain a partial string based on an imported dataframe. In the sample data below, I want to groupby Trans_type and then get a count of how many rows contain a value.
So I would expect to see:
First, is this possible generically without passing a link to get each types expected brand? If not, how could I pass say Car a list of .str.contains['Audi','BMW'].
Thanks for any help!
Try this one:
df.groupby(df["Trans_type"], df["Brand"].str.extract("([a-zA-Z])+", expand=False)).count()

Switching rows and columns in pyplot

I'm new to Python and after a lot of tinkering, have managed to clean up some .csv data.
I now have a bunch of countries as rows and a bunch of dates as columns, and am trying to create a chart showing a line for each country's value over time.
The problem is that when I enter df.plot() it results in a chart with each date as a line.
I have melted the data such that the first column is country, second is date, and third is value, but all I get is a single blue block growing over time (not multiple lines). How can I fix this?
You can use the transpose function in [pandas][1]:
Or instead of df.plot, you can use plot(coloumn, row).
As it was mentioned in comments, it is always better to provide an example (look at #importanceofbeingeenest comment).

How can I concatenate symbol ("%") to integer value in python?

I am facing an issue here. I have a Dataframe column whose values I need to put as value+% i.e. say 10%, 15% etc.
However, I am able to put the values as string type in the excel sheet after writing but while I plot the graph, the value is being considered as a string and hence the chart is not getting generated.
I need to paste the value with the % symbol in the concerned column as well as I need to plot the graph while writing to the excel sheet.
Any solution for this??
Thanks in advance.
For writing the value in excel you can use
str(value) + '%'
While plotting graph access the values by slicing the last character(%) and convert it to number by using eval function.
eval(value[:-1])

Categories

Resources