I have a data frame like that (it's just the head) :
Timestamp Function_code Node_id Delta
0 2000-01-01 10:39:51.790683 Tx_PDO_2 54 551.0
1 2000-01-01 10:39:51.791650 Tx_PDO_2 54 601.0
2 2000-01-01 10:39:51.792564 Tx_PDO_3 54 545.0
3 2000-01-01 10:39:51.793511 Tx_PDO_3 54 564.0
There are only two types of Function_code : Tx_PDO_2 and Tx_PDO_3
I plot in two windows, a graph with Timestamp on the x-axis and Delta on the y-axis. One for Tx_PDO_2 and the other for Tx_PDO_3 :
delta_rx_tx_df.groupby("Function_code").plot(x="Timestamp", y="Delta", )
Now, I want to know which window corresponds to which Function_code
I tried to use title=delta_rx_tx_df.groupby("Function_code").groups but it did not work.
There may be a better way, but for starters, you can assign the titles to the plots after they are created:
plots = delta_rx_tx_df.groupby("Function_code").plot(x="Timestamp", y="Delta")
plots.reset_index()\
.apply(lambda x: x[0].set_title(x['Function_code']), axis=1)
I read a dataset into Pandas and filtered the data using df_new=df.query("parent=='pr1'") to create a new DataFrame which looks like this:
child parent date pres
101 ch05 pr1 2004-06-01 2760.35
102 ch05 pr1 2004-07-08 2758.83
103 ch09 pr1 2004-08-04 2759.13
.. ... ... ...
317 ch12 pr1 2021-03-15 1737.09
318 ch12 pr1 2021-03-17 1730.98
183 ch05 pr1 2021-04-30 1777.09
I am trying to calculate the daily average so tried this: pobs = df.groupby('date')['pres'].mean(). This seems to work because print(pobs) gives something like this:
date
2004-06-01 2760.35
2004-07-08 2758.83
2004-08-04 2759.13
However I want to plot date against pres using matplotlib to make sure but have not been able to extract the two arrays separately. I tried tweaking the solution here Plotting pandas groupby but have got myself tied up in knots. I suspect the answer is one or two lines of code but I just can't find them - all suggestions appreciated. Thanks!
You just need to reset the index
pobs = df.groupby('date')['pres'].mean().reset_index()
output:
date pres
0 2004-06-01 2760.35
1 2004-07-08 2758.83
2 2004-08-04 2759.13
In this way, prob is now a dataframe and can be plotted as such, for example
import matplotlib.pyplot as plt
plt.plot(pobs.date,pobs.pres)
TLDR: How to save .txt data without delimiter in dataframe where each value array has a different length and is date depended.
I've got a fairly big data set saved in a .txt file with no delimiter in the following format:
id DateTime 4 84 464 8 64 874 5 854 652 1854 51 84 521 [. . .] 98 id DateTime 45 5 5 456 46 4 86 45 6 48 6 42 84 5 42 84 32 8 6 486 4 253 8 [. . .]
id and DateTime are numbers as well but ive written them in strings for readability here.
The length between the first id DateTime combination and the next is variable and not all values start/end on the same date.
Right now what I do is use .read_csv whith delimiter=" " which results in a three column DataFrame with id, DateTime and Values all stacked upon each other:
id DateTime Value
10 01.01 78
10 02.01 781
10 03.01 45
[:]
220 05.03 47
220 06.03 8
220 07.03 12
[:]
Then I create a dictionary for each id with the respective DateTime and their Values with dict[id]= df["Value"][df["id"]==id] resulting in a dictionary with keys as id.
Sadly using .from_dict() doesn't work here because each value list is of different length. To solve this I create a np.zeros() that is bigger than the biggest of the value arrays from the dictionary and save the values for each id inside a new np.array based on their DateTime. Those new arrays are then combined in a new data frame resulting in a lot of rows populated with zeros.
Desired output is:
A DataFrame with each column representing a id and their values.
First column as the overall Timeframe of the data set. Bascilly min(DateTime) to max(DateTime)
Rows in a column where no values exist should be NaN
This seems to be a lot of hassle for something that is in structure so simple (see original format). Besides that, it's quite slow. There must be a way to save the data inside a DataFrame based upon the DateTime leaving unpopulated areas with NaN.
What is a (if possible) more optimal solution for my issue?
from what i understand this should work:
for id in df.id.unique():
df[str(id)] = df.id.where(df.id == id)
I understand this must be a very basic question, but oddly enough, the resources I've read online don't seem very clear on how to do the following:
How can I index specific columns in pandas?
For example, after importing data from a csv, I have a pandas Series object with individual dates, along with a corresponding dollar amount for each date.
Now, I'd like to group the dates by month (and add their respective dollar amounts for that given month). I plan to create an array where the indexing column is the month, and the next column is the sum of dollar amounts for that month. I would then take this array and create another pandas Series object out of it.
My problem is that I can't seem to call the specific columns from the current pandas series object I have.
Any help?
Edited to add:
from pandas import Series
from matplotlib import pyplot
import numpy as np
series = Series.from_csv('FCdata.csv', header=0, parse_dates = [0], index_col =0)
print(series)
pyplot.plot(series)
pyplot.show() # this successfully plots the x-axis (date) with the y-axis (dollar amount)
dates = series[0] # this is where I try to call the column, but with no luck
This is what my data looks like in a csv:
Dates Amount
1/1/2015 112
1/2/2015 65
1/3/2015 63
1/4/2015 125
1/5/2015 135
1/6/2015 56
1/7/2015 55
1/12/2015 84
1/27/2015 69
1/28/2015 133
1/29/2015 52
1/30/2015 91
2/2/2015 144
2/3/2015 114
2/4/2015 59
2/5/2015 95
2/6/2015 72
2/9/2015 73
2/10/2015 119
2/11/2015 133
2/12/2015 128
2/13/2015 141
2/17/2015 105
2/18/2015 107
2/19/2015 81
2/20/2015 52
2/23/2015 135
2/24/2015 65
2/25/2015 58
2/26/2015 144
2/27/2015 102
3/2/2015 95
3/3/2015 98
You are reading the CSV file into a Series. A Series is a one-dimensional object - there are no columns associated with it. You see the index of that Series (dates) and probably think that's another column but it's not.
You have two alternatives: you can convert it to a DataFrame (either by calling reset_index() or to_frame or use it as a Series.
series.resample('M').sum()
Out:
Dates
2015-01-31 1040
2015-02-28 1927
2015-03-31 193
Freq: M, Name: Amount, dtype: int64
Since you already have an index formatted as date, grouping by month with resample is very straightforward so I'd suggest keeping it as a Series.
However, you can always convert it to a DataFrame with:
df = series.to_frame('Value')
Now, you can use df['Value'] to select that single column. resampling can be done both on the DataFrame and the Series:
df.resample('M').sum()
Out:
Value
Dates
2015-01-31 1040
2015-02-28 1927
2015-03-31 193
And you can access the index if you want to use that in plotting:
series.index # df.index would return the same
Out:
DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04',
'2015-01-05', '2015-01-06', '2015-01-07', '2015-01-12',
'2015-01-27', '2015-01-28', '2015-01-29', '2015-01-30',
'2015-02-02', '2015-02-03', '2015-02-04', '2015-02-05',
'2015-02-06', '2015-02-09', '2015-02-10', '2015-02-11',
'2015-02-12', '2015-02-13', '2015-02-17', '2015-02-18',
'2015-02-19', '2015-02-20', '2015-02-23', '2015-02-24',
'2015-02-25', '2015-02-26', '2015-02-27', '2015-03-02',
'2015-03-03'],
dtype='datetime64[ns]', name='Dates', freq=None)
Note: For basic time-series charts, you can use pandas' plotting tools.
df.plot() produces:
And df.resample('M').sum().plot() produces:
I have written a python program to get data from csv using pandas and plot the data using matplotlib. My code is below with result:
import pandas as pd
import datetime
import csv
import matplotlib.pyplot as plt
headers = ['Sensor Value','Date','Time']
df = pd.read_csv('C:/Users\Lala Rushan\Downloads\DataLog.CSV',parse_dates= {"Datetime" : [1,2]},names=headers)
#pd.to_datetime(df['Date'] + ' ' + df['Time'])
#df.apply(lambda r : pd.datetime.combine(r['Date'],r['Time']),)
print (df)
#f = plt.figure(figsize=(10, 10))
df.plot(x='Datetime',y='Sensor Value',) # figure.gca means "get current axis"
plt.title('Title here!', color='black')
plt.tight_layout()
plt._show()
Now as you can see the x-axis looks horrible. How can I plot the x-axis for a single date and time interval so that it does not looks like overlapping each other? I have stored both date and time as one column in my dataframe.
My Dataframe looks like this:
Datetime Sensor Value
0 2017/02/17 19:06:17.188 2
1 2017/02/17 19:06:22.360 72
2 2017/02/17 19:06:27.348 72
3 2017/02/17 19:06:32.482 72
4 2017/02/17 19:06:37.515 74
5 2017/02/17 19:06:42.580 70
Hacky way
Try this:
import pylab as pl
pl.xticks(rotation = 90)
It will rotate the labels by 90 degrees, thus eliminating overlap.
Cleaner way
Check out this link which describes how to use fig.autofmt_xdate() and let matplotlib pick the best way to format your dates.
Pandas way
Use to_datetime() and set_index with DataFrame.plot():
df.Datetime=pd.to_datetime(df.Datetime)
df.set_index('Datetime')
df['Sensor Value'].plot()
pandas will then take care to plot it nicely for you:
my Dataframe looks like this:
Datetime Sensor Value
0 2017/02/17 19:06:17.188 2
1 2017/02/17 19:06:22.360 72
2 2017/02/17 19:06:27.348 72
3 2017/02/17 19:06:32.482 72
4 2017/02/17 19:06:37.515 74
5 2017/02/17 19:06:42.580 70