I got this DataFrame
And I want to use "Tiempo" as x-axis for the columns "Promedio","1","2","3",etc..
all in the same graph using seaborn or anything that can accomplish a scatterplot with many columns.
I can't figure it out yet , please help.
You have to change the tiempo column as an index with df.set_index("tiempo") and if you use the internal function of the dataframe df.plot you will see the time in the x axis.
Related
I have started using python for lots of data problems at work and the datasets are always slightly different. I'm trying to explore more efficient ways of plotting data using the inbuilt pandas function rather than individually writing out the code for each column and editing the formatting to get a nice result.
Background: I'm using Jupyter notebook and looking at histograms where the values are all unique integers.
Problem: I want the xtick labels to align with the centers of the histogram bars when plotting multiple columns of data with the one function e.g. df.hist() to get histograms of all columns at once.
Does anyone know if this is possible?
Or is it recommended to do each graph on its own vs. using the inbuilt function applied to all columns?
I can modify them individually following this post: Matplotlib xticks not lining up with histogram
which gives me what I would like but only for one graph and with some manual processing of the values.
Desired outcome example for one graph:
Basic example of data I have:
# Import libraries
import pandas as pd
import numpy as np
# create list of datapoints
data = [[170,30,210],
[170,50,200],
[180,50,210],
[165,35,180],
[170,30,190],
[170,70,190],
[170,50,190]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
# print dataframe.
df
Code that displays the graphs in the problem statement
df.hist(figsize=(5,5))
plt.show()
Code that displays the graph for weight how I would like it to be for all
df.hist(column='weight',bins=[175,185,195,205,215])
plt.xticks([180,190,200,210])
plt.yticks([0,1,2,3,4,5])
plt.xlim([170, 220])
plt.show()
Any tips or help would be much appreciated!
Thanks
I hope this helps.You take the column and count the frequency of each label (value counts) then you specify sort_index in order to get the order by the label not by the frecuency, then you plot the bar plot.
data = [[170,30,210],
[170,50,200],
[180,50,210],
[165,35,180],
[170,30,190],
[170,70,190],
[170,50,190]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['height', 'width','weight'])
df.weight.value_counts().sort_index().plot(kind = 'bar')
plt.show()
I have a dataframe and with a column called "my_row". It has many values. I only want to see some of the data on FacetGrid that belong to specific values of "my_row" on the row. I tried to make a subset of my dataframe and visualize that, but still somehow seaborn "knows" that my original dataframe had more values in "my_row" column and shows empty plots for the rows that I dont want.
So using the following code still gives me a figure with 2 rows of data that I want and many empty plots after that.
X = df[(df['my_row']=='1') | (df['my_row']=='2')].copy()
g = sns.FacetGrid(X, row='my_row', col='column')
How can I tell python to just plot that 2 rows?
I get plots like this with many empty plots:
I cannot reproduce this. The code from the question seems to work fine. Here we have a dataframe with four different values in the my_row column. Then filtering out two of them creates a FacetGrid with only two rows.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame({"my_row" : np.random.choice(list("1234"), size=40),
"column" : np.random.choice(list("AB"), size=40),
"x" : np.random.rand(40),
"y" : np.random.rand(40)})
X = df[(df['my_row']=='1') | (df['my_row']=='2')].copy()
g = sns.FacetGrid(X, row='my_row', col='column')
g.map(plt.scatter, "x", "y")
plt.show()
For anyone encountering this problem-- the issue is that my_row is a categorical type. To solve, change this to a str.
i.e.
X = df[(df['my_row']=='1') | (df['my_row']=='2')].copy()
X['my_row']=X['my_row'].astype(str)
g = sns.FacetGrid(X, row='my_row', col='column')
This should now work! :)
I got inspired by this link:
Plot lower triangle in a seaborn Pairgrid
and changed my code to this:
g = sns.FacetGrid(df, row='my_row', col='column')
for i in list(range(2,48)):
for j in list(range(0,12)):
g.axes[i,j].set_visible(False)
So I had to iterate over each plot individually at make it invisible. But I think there should be an easier way to do this. And in the end I still don't understand how FacetGrid knows anything about the size of my original dataframe df when I use X and its input.
This is an answer that works, but I think there must be better solutions. One problem with my answer is that when I save the figure, I get a big white space in the saved plot (corresponding to the axes that I set their visibility to False) that I do not see in jupyter notebooks when I am running the code. If FacetGrid just plots the dataframe that I am giving it as the input (in this case X), there would have been no problem anymore. There should be a way to do that.
How would I graph this data in seaborn. I would like to have the various categories on the x axis, and the data on the y axis as percentages.
I tried to create a barplot with seaborn but I can't get it to look right.
Any help would be appreciated!
Thanks
Edit: code:
sns.barplot(x = new_df.columns,data=new_df)
I suggest you organize your DataFrame more like this, it will make it much easier to plot and organize this type of data.
Instead of doing your DataFrame as you have it, instead transpose it to two simple columns like so:
name value
debt_consolidation 0.152388
credit_card 0.115689
all_other 0.170111
etc. By doing this you can simply plot your data in Seaborn by doing the below:
sns.barplot(x="name",y="value", data = df)
Which will look like this (click)
I have the following dataframe:
Price,Volume
6550,18
6551,5
6552,2
6553,13
......
......
......
7001,3
7002,21
I want price along one axis and volume along the other. Since this is a pandas dataframe I am under the impression I can just plot it as follows:
df.plot(kind='bar')
plt.show()
However it is plotting both columns along the same axis. I want each column on a separate axis. I have tried the following which does not work:
df.plot(kind='bar', xticks=df['Price'], yticks=df['Volume'])
Any suggestions on what i'm doing wrong would be appreciated.
When you plot using df.plot(kind='bar') it will use the Index for the x-axis and then plot all columns in the DataFrame as y-values.
To get around this, you can choose x and y values to be used, such as:
df.plot(x='Price', y='Volume', kind='bar')
See here for more examples of plotting using pandas.
Specify the x and y you desire by label:
df.plot(x='Price', y='Volume', kind='bar')
Here is the complete documentation:
http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.plot.html
I have a pandas dataframe with dates in column 0 and times in column 1. I wish to plot data in columns 2,3,4...n as a function of the date and time. How do I go about formatting the tick labels in the code below so that I can display both the Date and time in the plot. Thanks in advance. I'm new to stackoverflow (and python for that matter) so sorry but I don't have enough a reputation that allows me to attach the image that I get from my code below.
df3=pd.read_table('filename.txt',
sep=',',
skiprows=4,
na_values='N\A',
index_col=[0,1]) # date and time are my indices
datedf=df3.ix[['01:07:2013'],['AOT_1640','AOT_870']]
fig, axes = plt.subplots(nrows=2, ncols=1)
for i, c in enumerate(datedf.columns):
print i,c
datedf[c].plot(ax=axes[i], figsize=(12, 10), title=c)
plt.savefig('testing123.png', bbox_inches='tight')
You could combine columns 0 and 1 into a single date & time column, set that to your index and then the pandas .plot attribute will automatically use the index as the x-tick labels. Hard to say how it will work with your data set as I can't see it but the main point is that Pandas uses the index for the x-tick labels unless you tell it not to. Be warned that this doesn't work well with hierarchical indexing (at least in my very limited experience).