Time Series Analysis from reading a csv file with pandas - python

I want to create a program to plot graphs for every 3rd column, using 1 and 2 as the date time which is x axis and 3 as the value which is y axis. Can someone help with the code?
using python

Related

pandas/matplotlib graph on frequency of appearance

I am a pandas newbie and I want to make a graph from a CSV I have. On this csv, there's some date written to it, and I want to make a graph of how frequent those date appears.
This is how it looks :
2022-01-12
2022-01-12
2022-01-12
2022-01-13
2022-01-13
2022-01-14
Here, we can see that I have three records on the 12th of january, 2 records the 13th and only one records the 14th. So we should see a decrease on the graph.
So, I tried converting my csv like this :
date,records
2022-01-12,3
2022-01-13,2
2022-01-14,1
And then make a graph with the date as the x axis and the records amount as the y axis.
But is there a way panda (or matplotlib I never understand which one to use) can make a graph based on the frequency of appearance, so that I don't have to convert the csv before ?
There is a function of PANDAS which allows you to count the number of values.
First off, you'd need to read your csv file into a dataframe. Do this by using:
import pandas as pd
df = pd.read_csv("~csv file name~")
Using the unique() function in the pandas library, you can display all of the unique values. The syntax should look like:
uniqueVals = df("~column name~").unique()
That should return a list of all the unique values. Then what you'll do is use the function value_counts() with whatever value you are trying to count in square brackets after the normal brackets. The syntax should look something like this:
totalOfVals = []
for date in uniqueVals:
numDate = df[date].valuecounts("~Whatever date you're looking for~")
totalOfVals.append(numDate)
Then you can use the two arrays you have for the unique dates and the amount of dates there are to then use matplotlib to create a graph.
You'll want to use the syntax:
import matplotlib.pyplot as mpl
mpl.plot(uniqueVals, totalOfVals, color = "~whatever colour you want the line to be~", marker = "~whatever you want the marker to look like~")
mpl.xlabel('Date')
mpl.ylabel('Number of occurrences')
mpl.title('Number of occurrences of dates')
mpl.grid(True)
mpl.show()
And that should display a graph with all the dates and number of occurrences with a grid behind it. Of course if you don't want the grid just either set mpl.grid to False or just get rid of it.

How do I convert unusual time string into date time

I measured the seeing index and I need to plot it as a function of time, but the time I received from the measurement is a string with 02-09-2022_time_11-53-51,045 format. How can I convert it into something Python could read and I could use in my plot?
Using pandas I extracted time and seeing_index columns from the txt file received by the measurement. Python correctly plotted seeing index values on Y axes, but besides plotting time values on the X axis, it just added a number to each row and plotted index against row number. What can I do so it was index against time?
You may try this:
df.time = pd.to_datetime(df.time, format='%d-%m-%Y_time_%H-%M-%S,%f')

xlwings - Get data range of existing chart

let's say I have an excel file, where there is data from A1 to C5. Meaning it looks like this:
A
B
C
1
1997
1
2
2
1997
2
4
3
1997
3
5
I now have one graph that plots the first time series B so the range of the graph is "A1:B3". The second graph is plotting time series C so the range in xlwings language is ("A1:A3, C1:C3").
What I want to do is open the graph in python with xlwings and extract the range of the graph. I already tried:
wb = xw.Book("myfile.xlsx")
ws = wb.sheets["mysheet"]
for chart in ws.charts:
print(chart.parent.used_range)
But this only gives back the range of all data of that sheet. So in this case "A1:C3" and not the range of the data the chart uses.
Is there any way to extract the exact range of data the chart uses?
Best,
Stefan
Even directly in VBA, the overall chart source data range is not available. In many cases, this range is undefined: if series have different X values, for example, or if series have a different number of points, or if series are plotted out of order, etc.
But you can get the range for the individual series in the chart through the series formulas, and along with some validation and adjustment, merge these ranges to get the source data range.

Unable to Plot using Seaborn

Hi there My dataset is as follows
username switch_state time
abcd sw-off 07:53:15 +05:00
abcd sw-on 07:53:15 +05:00
Now using this i need to find that on a given day how many times in a day the switch state is manipulated i.e switch on or switch off. My test code is given below
switch_off=df.loc[df['switch_state']=='sw-off']#only off switches
groupy_result=switch_off.groupby(['time','username']).count()['switch_state'].unstack#grouping the data on the base of time and username and finding the count on a given day. fair enough
the result of this groupby clause is given as
print(groupy_result)
username abcd
time
05:08:35 3
07:53:15 3
07:58:40 1
Now as you can see that the count is concatenated in the time column. I need to separate them so that i can plot it using Seaborn scatter plot. I need to have the x and y values which in my case will be x=time,y=count
Kindly help me out that how can i plot this column.
`
You can try the following to get the data as a DataFrame itself
df = df.loc[df['switch_state']=='sw-off']
df['count'] = df.groupby(['username','time'])['username'].transform('count')
The two lines of code will give you an updated data frame df, which will add a column called count.
df = df.drop_duplicates(subset=['username', 'time'], keep='first')
The above line will remove the duplicate rows. Then you can plot df['time'] and df['count'].
plt.scatter(df['time'], df['count'])

How to change axis limits for time in Matplotlib?

I have a data set stored in a Pandas dataframe object, and the first column of the dataframe is a datetime type, which looks like this:
0 2013-09-09 10:35:42.640000
1 2013-09-09 10:35:42.660000
2 2013-09-09 10:35:42.680000
3 2013-09-09 10:35:42.700000
In another column, I have another column called eventno, and that one looks like:
0 0
1 0
2 0
3 0
I am trying to create a scatter plot with Matplotlib, and once I have the scatter plot ready, I would like to change the range in the date axis (x-axis) to focus on certain times in the data. My problem is, I could not find a way to change the range the data will be plotted over in the x axis. I tried this below, but I get a Not implemented for this type error.
plt.figure(figsize=(13,7), dpi=200)
ax.set_xlim(['2013-09-09 10:35:00','2013-09-09 10:36:00'])
scatter(df2['datetime'][df.eventno<11],df2['eventno'][df.eventno<11])
If I comment out the ax.set.xlim line, I get the scatter plot, however with some default x axis range, not even matching my dates.
Do I have to tell matplotlib that my data is of datetime type, as well? If so, then how can I do it? Assuming this part is somehow accomplished, then how can I change the range of my data to be plotted?
Thanks!
PS: I tried uploading the picture, but I got a "Framing not allowed" error. Oh well... It just plots it from Jan 22 1970 to Jan 27 1970. No clue how it comes up with that :)
Try putting ax.set_xlim after the scatter command.

Categories

Resources