I am new to plotting charts in python. I've been told to use Pandas for that, using the following command. Right now it is assumed the csv file has headers (time,speed, etc). But how can I change it to when the csv file doesn't have headers? (data starts from row 0)
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.read_csv("P1541350772737.csv")
#df.head(5)
df.plot(figsize=(15,5), kind='line',x='timestamp', y='speed') # scatter plot
You can specify x and y by the index of the columns, you don't need names of the columns for that:
Very simple: df.plot(figsize=(15,5), kind='line',x=0, y=1)
It works if x column is first and y column is second and so on, columns are numerated from 0
For example:
The same result with the names of the columns instead of positions:
I may havve missinterpreted your question but II'll do my best.
Th problem seems to be that you have to read a csv that have no header but you want to add them. I would use this code:
cols=['time', 'speed', 'something', 'else']
df = pd.read_csv('useful_data.csv', names=cols, header=None)
For your plot, the code you used should be fine with my correction. I would also suggest to look at matplotlib in order to do your graph.
You can try
df = pd.read_csv("P1541350772737.csv", header=None)
with the names-kwarg you can set arbitrary column headers, this implies silently headers=None, i.e. reading data from row 0.
You might also want to check the doc https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
Pandas is more focused on data structures and data analysis tools, it actually supports plotting by using Matplotlib as backend. If you're interested in building different types of plots in Python you might want to check it out.
Back to Pandas, Pandas assumes that the first row of your csv is a header. However, if your file doesn't have a header you can pass header=None as a parameter pd.read_csv("P1541350772737.csv", header=None) and then plot it as you are doing it right now.
The full list of commands that you can pass to Pandas for reading a csv can be found at Pandas read_csv documentation, you'll find a lot of useful commands there (such as skipping rows, defining the index column, etc.)
Happy coding!
For most commands you will find help in the respective documentation. Looking at pandas.read_csv you'll find an argument names
names : array-like, default None
List of column names to use. If file contains no header row, then you should explicitly
pass header=None.
So you will want to give your columns names by which they appear in the dataframe.
As an example: Suppose you have this data file
1, 2
3, 4
5, 6
Then you can do
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("data.txt", names=["A", "B"], header=None)
print(df)
df.plot(x="A", y="B")
plt.show()
which outputs
A B
0 1 2
1 3 4
2 5 6
Related
I have an Excel file with several columns.
From this columns I want to plot columns which have a name like this:
IVOF_1_H, IVOF_1_L, IVOF_2_H, IVOF_2_L,...those columns will be on y axis. For the x axis the column will always be the same
I do not know how many of those columns I have in the file. I only know that the number is increasing. Is there any possibility to check how many of those IVOF columns I have and plot them.
In general, there is a limitation of those IVOF columns and I don't mind to set up my script in a way that all of those columns got plotted (if they are existing), but then I don't know how to avoid the code to crash if one of those columns is missing.
You can filter your data frame by its column name:
import pandas as pd
df = pd.read_excel('sample.xlsx')
df = df.filter(regex=("IVOF.*"))
#plot the first row
df.iloc[0].plot(kind="bar")
#plot all rows
df.plot(kind="bar")
simple example:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[2,4,4],[4,3,3],[5,9,1]]),columns=['A','B1','B2'])
df = df.filter(regex=("B.*"))
df.plot(kind="bar")
The result:
I am completely new to python.. I would like to ask how can I fix my code?
I can't make it to work because for some reason, it only calculates columns.
import numpy as np
import pandas as pd
rainfall = pd.read_csv('rainfall.csv', low_memory=False, parse_dates=True, header=None)
mean_rainfall = rainfall[0].mean()
print(mean_rainfall)
the picture of my csv
In pandas dataframe mean function you can provide parameter to let him him know either take mean of a row or column.
Check Here: pandas.DataFrame.mean.
It seams though it takes default axis value of 1 so it is calculation the mean of column.
Try this:
mean_rainfall = rainfall.iloc[0].mean(axis = 1)
I'm trying read many txt files into my data frame and this code works below. However, it duplicates some of my columns, not all of them. I couldn't find a solution. What can I do to prevent this?
import pandas as pd
import glob
dfs = pd.DataFrame(pd.concat(map(functools.partial(pd.read_csv, sep='\t', low_memory=False),
glob.glob(r'/folder/*.txt')), sort=False))
Let's say my data should look like this:
enter image description here
But it looks like this:
enter image description here
I don't want my columns to be duplicated.
Could you give us a bit more information? Especially the output of dfs.columns would be useful. I suspect there could be some extra spaces in your column names which would cause pandas to differ between those.
Also you could try dask for that:
import dask.dataframe as dd
dfs = dd.read_csv(r'/folder/*.text, sep='\t').compute()
is a bit simpler and should give the same result
It is important to think about the concat process as having two possible outcomes. By choosing the axis, you can add new columns like the example (I) below or as new rows illustrated in example (II). pd.concat lets you do this by setting the axis to either 0 (rows) or 1 (columns).
Read more in the excellent documentation: concat
Example I:
import pandas as pd
import glob
pd.concat([pd.read_csv(f) for f in glob.glob(r'/folder/*.txt')], axis=1)
Example II:
pd.concat([pd.read_csv(f) for f in glob.glob(r'/folder/*.txt')], axis=0)
If I use DataFrame.set_index, I get this result:
import pandas as pd
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
df.set_index('name')
Note the unnecessary row... I know it does this because it reserves the upper left cell for the column title, but I don't care about it, and it makes my table look somewhat unprofessional if I use it in a presentation.
If I don't use DataFrame.set_index, the extra row is gone, but I get numeric row indices, which I don't want:
If I use to_html(index=False) then I solve those problems, but the first column isn't bold:
import pandas as pd
from IPython.display import HTML
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
HTML(df.to_html(index=False))
If I want to control styling to make the names boldface, I guess I could use the new Styler API via HTML(df.style.do_something_here().render()) but I can't figure out how to achieve the index=False functionality.
What's a hacker to do? (besides construct the HTML myself)
I poked around in the source for Styler and figured it out; if you set df.index.names = [None] then this suppresses the "extra" row (along with the column header that I don't really care about):
import pandas as pd
df = pd.DataFrame([['foo',1,3.0],['bar',2,2.9],
['baz',4,2.85],['quux',3,2.82]],
columns=['name','order','gpa'])
df = df.set_index('name')
df.index.names = [None]
df
These days pandas actually has a keyword for this:
df.to_html(index_names=False)
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_html.html
I have a csv file with 8 columns in it. I want to plot a graph between 2 columns using matplotlib. One of the columns has repetitive values. I want to take the mean of the values from the other column which has same corresponding value in the first column.
How can I do it?
This isn't really specific to matplotlib. Pandas has nice support for this kind of data mangling. Read your csv file into a Pandas dataframe:
import pandas as pd
df = pd.read_csv('data.csv')
Then, assuming the column you want to group by is named 'key' and the column whose values you want to take means of is named 'value', you can do:
grouped = df.groupby('key').mean()
grouped.plot('value')