how to plot in matplotlib with repitative values in a column - python

I have a csv file with 8 columns in it. I want to plot a graph between 2 columns using matplotlib. One of the columns has repetitive values. I want to take the mean of the values from the other column which has same corresponding value in the first column.
How can I do it?

This isn't really specific to matplotlib. Pandas has nice support for this kind of data mangling. Read your csv file into a Pandas dataframe:
import pandas as pd
df = pd.read_csv('data.csv')
Then, assuming the column you want to group by is named 'key' and the column whose values you want to take means of is named 'value', you can do:
grouped = df.groupby('key').mean()
grouped.plot('value')

Related

Plotting various amount of columns in Excel

I have an Excel file with several columns.
From this columns I want to plot columns which have a name like this:
IVOF_1_H, IVOF_1_L, IVOF_2_H, IVOF_2_L,...those columns will be on y axis. For the x axis the column will always be the same
I do not know how many of those columns I have in the file. I only know that the number is increasing. Is there any possibility to check how many of those IVOF columns I have and plot them.
In general, there is a limitation of those IVOF columns and I don't mind to set up my script in a way that all of those columns got plotted (if they are existing), but then I don't know how to avoid the code to crash if one of those columns is missing.
You can filter your data frame by its column name:
import pandas as pd
df = pd.read_excel('sample.xlsx')
df = df.filter(regex=("IVOF.*"))
#plot the first row
df.iloc[0].plot(kind="bar")
#plot all rows
df.plot(kind="bar")
simple example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[2,4,4],[4,3,3],[5,9,1]]),columns=['A','B1','B2'])
df = df.filter(regex=("B.*"))
df.plot(kind="bar")
The result:

Get Box Plot From Redundant Rows

I have a dataframe like below:
import pandas as pd
import numpy as np
df = pd.DataFrame({'id': [12124,12124,5687,5687,7892],
'A': [np.nan,np.nan,3.05,3.05,np.nan],'B':[1.05,1.05,np.nan,np.nan,np.nan],'C':[np.nan,np.nan,np.nan,np.nan,np.nan],'D':[np.nan,np.nan,np.nan,np.nan,7.09]})
I want to get box plot of columns A, B, C, and D, where the redundant row values in each column needs to be counted once only. How do I accomplish that?
Because panda can only deal with the dataFrame that every column has same length as well as every row has same length. In other words, only frame-shape data could be process. If null values need to be counted only once, it may conflict the principles of "panda" package. Here is my suggestion: you could transform the dataframe into list .
The detailed code of transforming the dataFrame into list
Then you could try to plot the box plot from the list data and index column.

Collapsing values of a Pandas column based on Non-NA value of other column

I have a data like this in a csv file which I am importing to pandas df
I want to collapse the values of Type column by concatenating its strings to one sentence and keeping it at the first row next to date value while keeping rest all rows and values same.
As shown below.
Edit:
You can try ffill + transform
df1=df.copy()
df1[['Number', 'Date']]=df1[['Number', 'Date']].ffill()
df1.Type=df1.Type.fillna('')
s=df1.groupby(['Number', 'Date']).Type.transform(' '.join)
df.loc[df.Date.notnull(),'Type']=s
df.loc[df.Date.isnull(),'Type']=''

Python: Create dataframe with 'uneven' column entries

I am trying to create a dataframe where the column lengths are not equal. How can I do this?
I was trying to use groupby. But I think this will not be the right way.
import pandas as pd
data = {'filename':['file1','file1'], 'variables':['a','b']}
df = pd.DataFrame(data)
grouped = df.groupby('filename')
print(grouped.get_group('file1'))
Above is my sample code. The output of which is:
What can I do to just have one entry of 'file1' under 'filename'?
Eventually I need to write this to a csv file.
Thank you
If you only have one entry in a column the other will be NaN. So you could just filter the NaNs by doing something like df = df.at[df["filename"].notnull()]

Python Pandas Replacing column names

I am using pandas and python to process multiple files with different column names for columns with the same data.
dataset = pd.read_csv('Test.csv', index_col=0)
cols= dataset.columns
I have the different possible column titles in a list.
AddressCol=['sAddress','address','Adrs', 'cAddress']
Is there a way to normalize all the possible column names to "Address" in pandas so I use the script on different files?
Without pandas I would use something like a double for loop to go through the list of column names and possible column names and a if statement to extract out the whole array.
You can use the rename DataFrame method:
dataset.rename(columns={typo: 'Address' for typo in AddressCol}, inplace=True)

Categories

Resources