How to refer/assign an excel column in python? - python

I have a csv file (excel spreadsheet) of a column of roughly a million numbers in column A. I want to make a histogram of this data with the frequency of the numbers on the y-axis and the number quantities on the x-axis. I'm using pandas to do so. My code:
import pandas as pd
pd.read_csv('D1.csv', quoting=2)['A'].hist(bins=50)
Python isn't interpreting 'A' as the column name. I've tried various names to reference the column, but all result in a keyword error. Am I missing a step where I have to assign that column a name via python which I don't know how to?

I need more rep to comment, so I put this as answer.
You need to have a header row with the names you want to use on pandas. Also if you want to see the histogram when you are working from python shell or ipython you need to import pyplot
import matplotlib.pyplot as plt
import pandas as pd
pd.read_csv('D1.csv', quoting=2)['A'].hist(bins=50)
plt.show()

Okay I finally got something to work with headings, titles, etc.
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('D1.csv', quoting=2)
data.hist(bins=50)
plt.xlim([0,115000])
plt.title("Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
My first problem was that matplotlib is necessary to actually show the graph as stated by #Sauruxum. Also, I needed to set the action
pd.read_csv('D1.csv', quoting=2)
to data so I could plot the histogram of that action with
data.hist
Basically, the problem wasn't finding the name to the header row. The action itself needed to be .hist .Thank you all for the help.

Related

Can't choose a column of a data frame on Python

I'm trying to plot a figure on Python but I get a KeyError. I can't read the column "Cost per Genome" for some reason.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv("Sequencing_Cost_Data_Table_Aug2021 - Data Table.csv") #The data can be found here: https://docs.google.com/spreadsheets/d/1auLPEnAp0aI__zIyK9fKBAkLpwQpOFBx9qOWgJoh0xY/edit#gid=729639239
fig = plt.figure()
plt.plot(data["Date"],data["Cost per Genome"])
It looks like either you have interpreted the data wrong into the Dataframe, of made an error with the plot. Read this. It might help you further: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html
P.S. I couldn't acces your spreadsheet. It was request only

How do I assign a column in a csv file by python?

I have a CSV that I want to graph.
However, to get this graph, I need to first assign a column to a list (or array) and then go on from there. I need to assign the first column to said list. In the said column, there are many repeats of the numbers 1 through 45 (so in code that would be range(1,46)).
Currently, I have written this so far:
for weekly sales against Date
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
NOTE: walmart (the value associated to the dataset) has already been read here by pd.read_csv. It works and an output has been made.
I do not know what to do from here. I want to graph this as well based on the store.
The data set can be found: https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction
There are many ways to do this but the easiest that comes to mind is using pandas dataframe
First you need to install it in your environment. I see you tagged anaconda so this would be something like:
$ conda install pandas
Then import them in your python file (presumingly Jupyter notebook)
import pandas as pd
Then you would import the csv into a dataframe using the build in read_csv function (you can do many cool things with it so checkout the docs)
In your case assume you want to import just columns say number 3 and 5 and then plot them. If the first row in your csv contains the header (say 'col3'and 'col5') this should be read automatically and stored as the column name(If you want to skip the header reading add the option skiprows=1, if you want the columns to be named something else use the option names=['newname3', 'newname5']
data = pd.read_csv('path/to/my.csv', usecols=[3,5], names=['col1', 'col2'])
Then you can access the columns by name and plot them using data['colname']:
import matplotlib.pyplot as plt
plt.scatter(data['col1'], data['col2'])
plt.show()
Or you can use the built in function of pandas dataframes:
data.plot.scatter(x='col1', y='col2)
I have found out what I need to do to get this to work. The following code describes my situation.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
%matplotlib inline
a = []
for stn in range(1,46):
a.append(walmart[walmart.Store == stn])
for printval in range(1,46):
b = a[printval-1]
w = b[b.Store == printval]
ws = w["Weekly_Sales"]
tp = w["Date"]
plt.scatter(tp, ws)
plt.xlabel('Date')
plt.ylabel('Weekly Sales')
plt.title('Store_' + str(printval))
plt.savefig('Store_'+ str(printval) + '.png') #To save the file if needed
plt.show()
Again, I have already imported the CSV file, and associated it to walmart. There was no error when doing that.
Again, the dataset can be found in https://www.kaggle.com/divyajeetthakur/walmart-sales-prediction.

Select columns from a csv file based on user input

I am new to Python and I want to make a small program that takes from the user column name or multiple columns name that needed to be plot versus the time.
consider the column names : "time", "c2", "c3","c4", "c5","c6"
the column name needs to be selected from a csv file as a user input to plot a time series curve, However, it did not work for me. Do you have any Idea or similar codes to share?
The code I am using to plot the curves shown below, note that all the columns in the csv file are plotted versus the time column which has been written in epoch and I converted to human readable time later.
import pandas as pd
import pandas
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (10, 5)
plt.style.use('fivethirtyeight')
# import the csv file and epoch time conversion
df = pd.read_csv(.csv',parse_dates=['time'], date_parser=lambda epoch: pandas.to_datetime(epoch, unit='s'))
print(df)
# make sure the time column is actually time format
df['time']=pd.to_datetime(df['time'])
# set time as the index
df.set_index('time',inplace=True)
df.plot(linewidth=2, fontsize=12)
Probably going to need a bit more information than this to try and help.
Are you using a Web framework to be able to draw your plots like Flask or Django?
CSV files are pretty easy to read with column headings as field identifiers using the csv module.
https://docs.python.org/3/library/csv.html
Hopefully the answers are there for you.
The easiest way to do this would be to use the CSV module and matplotlib.
Matplotlib has a time series example. e.g. 1. You can also look at the other kinds of plots the library can do over here.
It is hard to recommend a method without knowing what kind of data you are working with and what needs to be done.

Plotting from excel to python with pandas

I am new to python,pandas,etc and i was asked to import, and plot an excel file. This file contains 180 rows and 15 columns and i have to plot each column with respect to the first one which is time, in total 14 different graphs. I would like some help with writing the script. Thanks in advance.
The function you are looking for is pandas.read_excel (Link).
It will return a DataFrame-Object from where you can access your data in python. Make sure you Excel-File is well formatted.
import pandas as pd
# Load data
df = pd.read_excel('myfile.xlsx')
Check out these packages/ functions, you'll find some code on these websites and you can tailor it to your needs.
Some useful codes:
Read_excel
import pandas as pd
df = pd.read_excel('your_file.xlsx')
Code above reads an excel file to python and keeps it as a DataFrame, named df.
Matplotlib
import matplotlib.pyplot as plt
plt.plot(df['column - x axis'], df['column - y axis'])
plt.savefig('you_plot_image.png')
plt.show()
This is a basic example of making a plot using matplotlib and saving it as your_plot_image.png, you have to replace column - x axis and column - y axis with desired columns from your file.
For cleaning data and some basics regarding DataFrames have a look at this package: Pandas

How to plot from .dat file with multiple columns and rows separated with tab spaces

I have 12 columns in my .dat file. How can I plot the first column with 12th column and there are around 50 rows. Each value is separated by a tab space. I have tried this error as the wrong number of columns at line42 is coming.
import numpy as np
from matplotlib import pyplot as plt
data=np.loadtxt('filep.dat')
pl.plot(data[:,1],data[:,2],'bo')
X=data[:,1]
Y=data[:,2]
plt.plot(X,Y,':ro')
plt.show()
The code in the question is correct! If it doesn't work, it's because your data is not organized the way you think it is or because you have missing values somewhere in your data.
You may try to use numpy.genfromtxt(...) which has more options for bad data filtering than np.loadtxt.

Categories

Resources