Can't choose a column of a data frame on Python

Can't choose a column of a data frame on Python - python

I'm trying to plot a figure on Python but I get a KeyError. I can't read the column "Cost per Genome" for some reason.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("Sequencing_Cost_Data_Table_Aug2021 - Data Table.csv") #The data can be found here: https://docs.google.com/spreadsheets/d/1auLPEnAp0aI__zIyK9fKBAkLpwQpOFBx9qOWgJoh0xY/edit#gid=729639239
fig = plt.figure()
plt.plot(data["Date"],data["Cost per Genome"])

It looks like either you have interpreted the data wrong into the Dataframe, of made an error with the plot. Read this. It might help you further: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html
P.S. I couldn't acces your spreadsheet. It was request only

Related

visualising data with python of time series and float colmn

i have the following quastion-
What can you tell about the relationship between time and speed? Is there a best time of day to connect? Has it changed throughout the years?
this is my dataframedataframe
my columns
data
does any one have any suggestion on how i would aprouch this question ?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv('/Users/dimagoroh/Desktop/data_vis/big_file.csv', low_memory=False)
sns.lmplot(x="hours",y="speed",data=df)
im trying to do a plot but get this error i think i need to manipulate the hour column to a diffrent data type right now it is set as object

Please post the error you get. From the data I think you need to pass x="hour" and not x="hours". Also try
df.hour = pd.to_datetime(df.hour)

How do I assign a column in a csv file by python?

I have a CSV that I want to graph.
However, to get this graph, I need to first assign a column to a list (or array) and then go on from there. I need to assign the first column to said list. In the said column, there are many repeats of the numbers 1 through 45 (so in code that would be range(1,46)).
Currently, I have written this so far:
for weekly sales against Date
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
NOTE: walmart (the value associated to the dataset) has already been read here by pd.read_csv. It works and an output has been made.
I do not know what to do from here. I want to graph this as well based on the store.
The data set can be found: https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction

There are many ways to do this but the easiest that comes to mind is using pandas dataframe
First you need to install it in your environment. I see you tagged anaconda so this would be something like:
$ conda install pandas
Then import them in your python file (presumingly Jupyter notebook)
import pandas as pd
Then you would import the csv into a dataframe using the build in read_csv function (you can do many cool things with it so checkout the docs)
In your case assume you want to import just columns say number 3 and 5 and then plot them. If the first row in your csv contains the header (say 'col3'and 'col5') this should be read automatically and stored as the column name(If you want to skip the header reading add the option skiprows=1, if you want the columns to be named something else use the option names=['newname3', 'newname5']
data = pd.read_csv('path/to/my.csv', usecols=[3,5], names=['col1', 'col2'])
Then you can access the columns by name and plot them using data['colname']:
import matplotlib.pyplot as plt
plt.scatter(data['col1'], data['col2'])
plt.show()
Or you can use the built in function of pandas dataframes:
data.plot.scatter(x='col1', y='col2)

I have found out what I need to do to get this to work. The following code describes my situation.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
w = b[b.Store == printval]
ws = w["Weekly_Sales"]
tp = w["Date"]
plt.scatter(tp, ws)
plt.xlabel('Date')
plt.ylabel('Weekly Sales')
plt.title('Store_' + str(printval))
plt.savefig('Store_'+ str(printval) + '.png') #To save the file if needed
plt.show()
Again, I have already imported the CSV file, and associated it to walmart. There was no error when doing that.
Again, the dataset can be found in https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction.

Creating a loop to process all csv files and produce one figure with subplots per file

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import glob
path="/Users/My_Comp/Desktop/python_for_marine_data-master/week_2/USNDBC_62030/*.csv"
df = pd.DataFrame()
for fname in glob.iglob(path):
if file=='*.csv':
df = df.append(pd.read_csv(file,sep=';',index_col=[0],skiprows=[1],parse_dates={'DateTime': [0, 1, 2, 3, 4]}))
df.drop(['WDIR','WSPD','GST','WVHT','DPD','APD','MWD','WTMP','VIS','PTDY','TIDE'],axis=1,inplace=True)
fig1,ax1=plt.subplots(3,1,sharex=True)
df.plot(kind='line',y='PRES',ax=ax1[0])
df.plot(kind='line',y='ATMP',ax=ax1[1])
df.plot(kind='line',y='DEWP',ax=ax1[2])
plt.xticks(rotation=17)
plt.show()
Sorry, I'm new to python and can't quite figure how to get this to work. So currently, it runs fine, however, it only produces one plot with no data displayed. I'm trying to, albeit poorly, create a loop to read all .csv files in the directory, then create a figure with 3 subplots for each file. hope that helps. Any help or information would be greatly appreciated.

Set matplotlib backend from Pandas

I am currently facing the following issue. I have a couple of Python scripts that plot some useful information using the Python module Pandas which uses Matplotlib .
As far as I understand matplotlib let set its backend as described on the accepted answer to this question.
I would like to set the matplotlib backend from Pandas:
Is it possible?
How can I do it?
EDIT 1:
By the way my code looks like:
import pandas as pd
from pandas import DataFrame, Series
class MyPlotter():
def plot_from_file(self, stats_file_name, f_name_out, names,
title='TITLE', x_label='x label', y_label='y label'):
df = pd.read_table(stats_file_name, index_col=0, parse_dates=True,
names= names)
plot = df.plot(lw=2,colormap='jet',marker='.',markersize=10,title=title,figsize=(20, 15))
plot.set_xlabel(x_label)
plot.set_ylabel(y_label)
fig = plot.get_figure()
fig.savefig(f_name_out)
plot.cla()

I've just applied the solution posted on the this question and it worked out.
In others words, my code imports looked as:
import pandas as pd
from pandas import DataFrame, Series
After applying the solution the imports look as:
import pandas as pd
from pandas import DataFrame, Series
import matplotlib
matplotlib.use('pdf')
import matplotlib.pyplot as plt
I know I am answering my own question, but I am doing so in case someone can find it useful.

How to refer/assign an excel column in python?

I have a csv file (excel spreadsheet) of a column of roughly a million numbers in column A. I want to make a histogram of this data with the frequency of the numbers on the y-axis and the number quantities on the x-axis. I'm using pandas to do so. My code:
import pandas as pd
pd.read_csv('D1.csv', quoting=2)['A'].hist(bins=50)
Python isn't interpreting 'A' as the column name. I've tried various names to reference the column, but all result in a keyword error. Am I missing a step where I have to assign that column a name via python which I don't know how to?

I need more rep to comment, so I put this as answer.
You need to have a header row with the names you want to use on pandas. Also if you want to see the histogram when you are working from python shell or ipython you need to import pyplot
import matplotlib.pyplot as plt
import pandas as pd
pd.read_csv('D1.csv', quoting=2)['A'].hist(bins=50)
plt.show()

Okay I finally got something to work with headings, titles, etc.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('D1.csv', quoting=2)
data.hist(bins=50)
plt.xlim([0,115000])
plt.title("Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
My first problem was that matplotlib is necessary to actually show the graph as stated by #Sauruxum. Also, I needed to set the action
pd.read_csv('D1.csv', quoting=2)
to data so I could plot the histogram of that action with
data.hist
Basically, the problem wasn't finding the name to the header row. The action itself needed to be .hist .Thank you all for the help.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Can't choose a column of a data frame on Python - python

It looks like either you have interpreted the data wrong into the Dataframe, of made an error with the plot. Read this. It might help you further: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html P.S. I couldn't acces your spreadsheet. It was request only

Related

visualising data with python of time series and float colmn

How do I assign a column in a csv file by python?

Creating a loop to process all csv files and produce one figure with subplots per file

Set matplotlib backend from Pandas

How to refer/assign an excel column in python?

Categories

Resources