I am currently facing the following issue. I have a couple of Python scripts that plot some useful information using the Python module Pandas which uses Matplotlib .
As far as I understand matplotlib let set its backend as described on the accepted answer to this question.
I would like to set the matplotlib backend from Pandas:
Is it possible?
How can I do it?
EDIT 1:
By the way my code looks like:
import pandas as pd
from pandas import DataFrame, Series
class MyPlotter():
def plot_from_file(self, stats_file_name, f_name_out, names,
title='TITLE', x_label='x label', y_label='y label'):
df = pd.read_table(stats_file_name, index_col=0, parse_dates=True,
names= names)
plot = df.plot(lw=2,colormap='jet',marker='.',markersize=10,title=title,figsize=(20, 15))
plot.set_xlabel(x_label)
plot.set_ylabel(y_label)
fig = plot.get_figure()
fig.savefig(f_name_out)
plot.cla()
I've just applied the solution posted on the this question and it worked out.
In others words, my code imports looked as:
import pandas as pd
from pandas import DataFrame, Series
After applying the solution the imports look as:
import pandas as pd
from pandas import DataFrame, Series
import matplotlib
matplotlib.use('pdf')
import matplotlib.pyplot as plt
I know I am answering my own question, but I am doing so in case someone can find it useful.
Related
i have the following quastion-
What can you tell about the relationship between time and speed? Is there a best time of day to connect? Has it changed throughout the years?
this is my dataframedataframe
my columns
data
does any one have any suggestion on how i would aprouch this question ?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('/Users/dimagoroh/Desktop/data_vis/big_file.csv', low_memory=False)
sns.lmplot(x="hours",y="speed",data=df)
im trying to do a plot but get this error i think i need to manipulate the hour column to a diffrent data type right now it is set as object
Please post the error you get. From the data I think you need to pass x="hour" and not x="hours". Also try
df.hour = pd.to_datetime(df.hour)
I'm trying to plot a figure on Python but I get a KeyError. I can't read the column "Cost per Genome" for some reason.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("Sequencing_Cost_Data_Table_Aug2021 - Data Table.csv") #The data can be found here: https://docs.google.com/spreadsheets/d/1auLPEnAp0aI__zIyK9fKBAkLpwQpOFBx9qOWgJoh0xY/edit#gid=729639239
fig = plt.figure()
plt.plot(data["Date"],data["Cost per Genome"])
It looks like either you have interpreted the data wrong into the Dataframe, of made an error with the plot. Read this. It might help you further: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html
P.S. I couldn't acces your spreadsheet. It was request only
When I write and run the following code, everything is done fine, but I have a doubt if someone could confirm it for me:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import pandas as pd
import seaborn as sns
from pydataset import data
sns.set_palette("deep", desat=.6)
sns.set_context(rc={"figure.figsize": (8, 4)})
faithful = data('faithful')
faithful.head(10)
All works fine. But, in the penultimate row above, the dataset 'faithful' I have not loaded, no copied, no have I linked to a URL to access said data. However, it runs and reads all the data. I must assume that this DataSet is included by default, in some library? Which one ?. Where is it located? How can I corroborate or verify this information? Any command? Thanks!.
You are importing the built-in datasets from pydataset module when you are running your 7th line:
from pydataset import data
If you run data() command, you will see all the 750+ datasets contained in this module. 'faithful' data is also present in this.
I am new to python,pandas,etc and i was asked to import, and plot an excel file. This file contains 180 rows and 15 columns and i have to plot each column with respect to the first one which is time, in total 14 different graphs. I would like some help with writing the script. Thanks in advance.
The function you are looking for is pandas.read_excel (Link).
It will return a DataFrame-Object from where you can access your data in python. Make sure you Excel-File is well formatted.
import pandas as pd
# Load data
df = pd.read_excel('myfile.xlsx')
Check out these packages/ functions, you'll find some code on these websites and you can tailor it to your needs.
Some useful codes:
Read_excel
import pandas as pd
df = pd.read_excel('your_file.xlsx')
Code above reads an excel file to python and keeps it as a DataFrame, named df.
Matplotlib
import matplotlib.pyplot as plt
plt.plot(df['column - x axis'], df['column - y axis'])
plt.savefig('you_plot_image.png')
plt.show()
This is a basic example of making a plot using matplotlib and saving it as your_plot_image.png, you have to replace column - x axis and column - y axis with desired columns from your file.
For cleaning data and some basics regarding DataFrames have a look at this package: Pandas
I have a csv file (excel spreadsheet) of a column of roughly a million numbers in column A. I want to make a histogram of this data with the frequency of the numbers on the y-axis and the number quantities on the x-axis. I'm using pandas to do so. My code:
import pandas as pd
pd.read_csv('D1.csv', quoting=2)['A'].hist(bins=50)
Python isn't interpreting 'A' as the column name. I've tried various names to reference the column, but all result in a keyword error. Am I missing a step where I have to assign that column a name via python which I don't know how to?
I need more rep to comment, so I put this as answer.
You need to have a header row with the names you want to use on pandas. Also if you want to see the histogram when you are working from python shell or ipython you need to import pyplot
import matplotlib.pyplot as plt
import pandas as pd
pd.read_csv('D1.csv', quoting=2)['A'].hist(bins=50)
plt.show()
Okay I finally got something to work with headings, titles, etc.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('D1.csv', quoting=2)
data.hist(bins=50)
plt.xlim([0,115000])
plt.title("Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
My first problem was that matplotlib is necessary to actually show the graph as stated by #Sauruxum. Also, I needed to set the action
pd.read_csv('D1.csv', quoting=2)
to data so I could plot the histogram of that action with
data.hist
Basically, the problem wasn't finding the name to the header row. The action itself needed to be .hist .Thank you all for the help.