How to get months from np.datetime64 object, NOT using pandas

How to get months from np.datetime64 object, NOT using pandas - python

I have an array like this one:
dt64 = array(['1970-01-01', '1970-01-02', '1970-02-03', '1970-02-04',
'1970-03-05', '1970-03-06', '1970-04-07', '1970-04-08',
'1970-05-09', '1970-05-10', '1970-06-11', '1970-06-12',
'1970-07-13', '1970-07-14'], dtype='datetime64[D]')
Now I want to plot some data associated with the single element of the array. In the figure I want to plot using matplotlib I need to draw a line that changes color for some months.
I want to draw the months mar to aug in orange and the others in blue.
I think I have to do two plt.plot lines, one for the orange line and one for the blue line.
My problem now is, I struggle to slice these datetime64 object in a way that returns the month to to compare them with the required months.
So far I have:
import numpy as np
from matplotlib import pyplot as plt
def md_plot(dt64=np.array, md=np.array):
"""Erzeugt Plot der Marsdistanz (y-Achse) zur Zeit (x-Achse)."""
plt.style.use('seaborn-whitegrid')
y, m, d = dt64.astype(int) // np.c_[[10000, 100, 1]] % np.c_[[10000, 100, 100]]
dt64 = y.astype('U4').astype('M8') + (m-1).astype('m8[M]') + (d-1).astype('m8[D]')
plt.plot(dt64, md, color='orange', label='Halbjahr der steigenden Temperaturen')
plt.plot(dt64, md, color='blue', label='Halbjahr der fallenden Temperaturen')
plt.xlabel("Zeit in Jahren\n")
plt.xticks(rotation = 45)
plt.ylabel("Marsdistanz in AE\n(1 AE = 149.597.870,7 km)")
plt.figure('global betrachtet...') # diesen Block ggf. auskommentieren
#plt.style.use('seaborn-whitegrid')
md_plot(master_array[:,0], master_array[:,1]) # Graph
plt.show()
plt.close()
This idea seemed to work, but won't work for a whole array:
In [172]: dt64[0].astype(datetime.datetime).month
Out[172]: 1
I really try to avoid Pandas because I don't want to bloat my script when there is a way to get the task done by using the modules I am already using. I also read it would decrease the speed here.

If i understand you correctly this would do it:
[np.datetime64(i,'M') for i in dt64]

Converting to python datetime in an intermediate step:
from datetime import datetime
import numpy as np
datestrings = np.array(["18930201", "19840404"])
months = np.array([datetime.strptime(d, "%Y%m%d").month for d in datestrings])
print(months)
# out: [2 4]

My version of numpy May be dated, but when I ran np.datetime64(dt64[0]) I got numpy.datetime64('1970-01')
To get just the month (if that’s what you are looking for) try:
np.datetime_as_string(dt64[0]).split('-')[1]

This solution fits the best for me:
dt64[(dt64.astype('M8[M]') - dt64.astype('M8[Y]')).view(int) == 2]
Thanks to Paul Panzer.

Related

How do I create a 3-D plot from an external file for multiple traces/curves? (like waterfall plots in Origin)

I am a student and I want to plot my measured values in a 3-D graph ( I work in jupyter-notebook). I have recorded different measurement signals over time. My problem is that my instrument measures at different wavelengths, so I would like to plot my measurements over time at the different wavelengths in a 3-D graph.
Unfortunately I have no experience in programming. Online I found how to create a 3-D plot. Here were given matrix values. I don't know how to insert my measured values from an external file - my columns are the results of the individual measured values and my rows are the different times.
Enclosed I send you what I have so far. I have also tried to add a column with the time to my data, I just don't know what to do next.
I hope you can help me. I have already searched a lot , but having no background in using jupyter-notebook , I do not know what to look for.
Thank you very much!
Ginny
Addition:
I've come to the conclusion that I am trying to create a waterfall plot /graph like the ones Origin offers. I hope someone can help me to execute this. I've been researching a lot and still have no clue how to do this.
P.S. I'm new here, so if I do something wrong, please inform me. I am also not a native speaker, and it could be possible that I used the wrong terminology.
from pandas import DataFrame
import matplotlib.pyplot as plt
import seaborn as sns
#import glob
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
# from mpl_toolkits.mplot3d import Axes3D: import is required for Matplotlib versions before 3.2.0.
# For versions 3.2.0 and higher, you can plot 3D plots without importing mpl_toolkits.mplot3d.Axes3D.
#_____________________________________
#Erzeugen der eingestellten Wellenlänge, damit diese als Spaltenname in die Tabelle kann
# linspace() function
xlambda = np.linspace(start = 190, stop = 600, num = 206)
# round off the result
yLambda = np.round(xlambda)
# print ("linspace of Xlambda :\n", yLambda)
#_________________________________________________
#nun kann die Datei eingelesen werden
Messung1= pd.read_csv("ViB00009_Messung05_Mess 650mM Ammoniumacetat_19.07.2022 03_47_36_010-3D.asc", skiprows=14,
delimiter = '\t', names= yLambda)
###fehlt nur noch zuordnung der Zeit```
#______
```#Erzeugen der Zeit
# linspace() function
xZeit = np.linspace(start = 0.00, stop = 347.6, num = 11252)
# round off the result
yZeit = np.round(xZeit)
#print ("linspace of XZeit :\n", yZeit)```
#_____
```Messung1['Zeit'] = yZeit.tolist()```
#_____
```Messung1 = Messung1.set_index('Zeit', drop=False).rename_axis(None)```
#_____
```#3DPlot
#start out plotting (uses a subplot as that can be 3d)
fig = plt.figure(figsize=(8,8))
ax = fig.add_subplot(111, projection='3d')
# pull out the 3 columns that we want
xs = []
ys = []
zs = []
for index, row in Messung1.iterrows():
xs.append(row['Zeit'])
ys.append(row['0.1'])
#read_table uses first column as index
zs.append(row['0'])
# based on our data, set the extents of the axes
plt.xlim(min(xs), max(xs))
plt.ylim(min(ys), max(ys))
ax.set_zlim(min(zs), max(zs))
# standard scatter diagram (except it is 3d)
ax.plot(xs, ys, zs) #ax.scatter auch mgl. einzelne Messpunkte
ax.set_xlabel('Zeit')
ax.set_ylabel('MPG')
ax.set_zlabel('Signal190')
ax.set_title("3D Ausgabe")
plt.show()```
[enter image description here][1]
[1]: https://i.stack.imgur.com/RvG6A.png

Problem with matplotlib.pyplot with matplotlib.pyplot.scatter in the argument s

My name is Luis Francisco Gomez and I am in the course Intermediate Python > 1 Matplotlib > Sizes that belongs to the Data Scientist with Python in DataCamp. I am reproducing the exercises of the course where in this part you have to make a scatter plot in which the size of the points are equivalent to the population of the countries. I try to reproduce the results of DataCamp with this code:
# load subpackage
import matplotlib.pyplot as plt
## load other libraries
import pandas as pd
import numpy as np
## import data
gapminder = pd.read_csv("https://assets.datacamp.com/production/repositories/287/datasets/5b1e4356f9fa5b5ce32e9bd2b75c777284819cca/gapminder.csv")
gdp_cap = gapminder["gdp_cap"].tolist()
life_exp = gapminder["life_exp"].tolist()
# create an np array that contains the population
pop = gapminder["population"].tolist()
pop_np = np.array(pop)
plt.scatter(gdp_cap, life_exp, s = pop_np*2)
# Previous customizations
plt.xscale('log')
plt.xlabel('GDP per Capita [in USD]')
plt.ylabel('Life Expectancy [in years]')
plt.title('World Development in 2007')
plt.xticks([1000, 10000, 100000],['1k', '10k', '100k'])
# Display the plot
plt.show()
However a get this:
But in theory you need to get this:
I don't understand what is the problem with the argument s in plt.scatter .

You need to scale your s,
plt.scatter(gdp_cap, life_exp, s = pop_np*2/1000000)
The marker size in points**2.
Per docs

This is because your sizes are too large, scale it down. Also, there's no need to create all the intermediate arrays:
plt.scatter(gapminder.gdp_cap,
gapminder.life_exp,
s=gapminder.population/1e6)
Output:

I think you should use
plt.scatter(gdp_cap, life_exp, s = gdp_cap*2)
or maybe reduce or scale pop_np

Using python and matplotlib, fill between two lines not giving expected output

I am trying to plot a linear line with associated error.
I calculated values for slope (a) and intercepts (b). In addition, I calculated the error associated with these values. So I drew the line given by the typical formula below.
y=ax+b
However, in addition to the line, I also want to draw the associated error. I came up with the idea to draw the lines associated with these formulas and color the space between the lines gray.
y=(a+a_sd)x+(b+b_sd)
y=(a-a_sd)x+(b-b_sd)
Uisng the following piece of code, I am able to color part of the surface between the lines, but not the whole span (see included output).
I think this may be due to the fact that "distance" is not sorted, and fill_between is using distance[0] and distance[-1] as begin and end for the span, respectively.
As always, any help would be highly appreciated!
import matplotlib.pyplot as plt
distance=[0.35645334340084989, 0.55406894241607718, 0.10201413273193734, 0.13401365724625941, 0.71918808865838735, 0.14151335417722818]
time=[2.4004984846346171, 2.4909766335028447, 1.9852064018125195, 1.9083156734132103, 2.6380396934372863, 1.9114505780323543]
time_SD=[0.062393810960652669, 0.056945715242838917, 0.073960838867327183, 0.084111239062664475, 0.026912957190265499, 0.08595664694840538]
distance_SD=[0.035160608598240162, 0.032976715460514235, 0.02782911002465227, 0.035465701695038584, 0.043009444687382707, 0.038387585107200854]
a=1.17887019041
b=1.83339229489
a_sd=0.159771527859
b_sd=0.0762509747218
plt.errorbar(distance,time,yerr=time_SD, xerr=distance_SD, linestyle="None")
abline_values = [(a)*i + (b) for i in distance]
abline_values_plus = [(a+a_sd)*i + (b+b_sd) for i in distance]
abline_values_minus = [(a-a_sd)*i + (b-b_sd) for i in distance]
plt.plot(distance, abline_values,"r")
plt.fill_between(distance,abline_values_minus,abline_values_plus,facecolor='lightgrey', interpolate=True, edgecolors="None")
leg = plt.legend(loc="lower right", frameon=False, handlelength=0, handletextpad=0)
for item in leg.legendHandles:
item.set_visible(False)
plt.show()

In order to use pyplot.fill_between() the list to plot the horizontal coordinate should be sorted. Using an unsorted list of x values is possible, but can lead to undesired results.
Sorting a list can be done using sorted(list).
import matplotlib.pyplot as plt
distance=[0.35645334340084989, 0.55406894241607718, 0.10201413273193734, 0.13401365724625941, 0.71918808865838735, 0.14151335417722818]
time=[2.4004984846346171, 2.4909766335028447, 1.9852064018125195, 1.9083156734132103, 2.6380396934372863, 1.9114505780323543]
time_SD=[0.062393810960652669, 0.056945715242838917, 0.073960838867327183, 0.084111239062664475, 0.026912957190265499, 0.08595664694840538]
distance_SD=[0.035160608598240162, 0.032976715460514235, 0.02782911002465227, 0.035465701695038584, 0.043009444687382707, 0.038387585107200854]
a=1.17887019041
b=1.83339229489
a_sd=0.159771527859
b_sd=0.0762509747218
distance_sorted = sorted(distance)
plt.errorbar(distance,time,yerr=time_SD, xerr=distance_SD, linestyle="None")
abline_values = [(a)*i + (b) for i in distance_sorted]
abline_values_plus = [(a+a_sd)*i + (b+b_sd) for i in distance_sorted]
abline_values_minus = [(a-a_sd)*i + (b-b_sd) for i in distance_sorted]
plt.plot(distance_sorted, abline_values,"r")
plt.fill_between(distance_sorted,abline_values_minus,abline_values_plus, facecolor='lightgrey', edgecolors="None")
plt.show()
The documentation does not mention the requirement of x values being sorted. The reason is probably that fill_between actually works even with unsorted lists, just not the way one might expect. Maybe the following animation gives a more intuitive understanding on the issue:

You are right fill_between seems to expect the values to be sorted. The documentation is not clear about this behaviour though. The following example however shows the same effect:
import matplotlib.pyplot as plt
from numpy import random, array
#x = random.randn(20) #does not work
x = array(sorted(random.randn(20))) #works
a = 2
d = .5
y_h = x*(a+d)
y_l = x*(a-d)
plt.fill_between(x,y_h, y_l)
plt.show()
As a workaround just sort your values before calculating your errorlines using sorted.

Reading and manipulating multiple netcdf files in python

I need help with reading multiple netCDF files, despite few examples in here, none of them works properly.
I am using Python(x,y) vers 2.7.5, and other packages : netcdf4 1.0.7-4, matplotlib 1.3.1-4, numpy 1.8, pandas 0.12,
basemap 1.0.2...
I have few things I'm used to do with GrADS that I need to start doing them in Python.
I have a few 2 meter temperature data (4-hourly data, each year, from ECMWF), each file contains 2 meter temp data, with Xsize=480, Ysize=241,
Zsize(level)=1, Tsize(time) = 1460 or 1464 for leap years.
These are my files name look alike: t2m.1981.nc, t2m.1982.nc, t2m.1983.nc ...etc.
Based on this page:
( Loop through netcdf files and run calculations - Python or R )
Here is where I am now:
from pylab import *
import netCDF4 as nc
from netCDF4 import *
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import numpy as np
f = nc.MFDataset('d:/data/ecmwf/t2m.????.nc') # as '????' being the years
t2mtr = f.variables['t2m']
ntimes, ny, nx = shape(t2mtr)
temp2m = zeros((ny,nx),dtype=float64)
print ntimes
for i in xrange(ntimes):
temp2m += t2mtr[i,:,:] #I'm not sure how to slice this, just wanted to get the 00Z values.
# is it possible to assign to a new array,...
#... (for eg.) the average values of 00z for January only from 1981-2000?
#creating a NetCDF file
nco = nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','w',clobber=True)
nco.createDimension('x',nx)
nco.createDimension('y',ny)
temp2m_v = nco.createVariable('t2m', 'i4', ( 'y', 'x'))
temp2m_v.units='Kelvin'
temp2m_v.long_name='2 meter Temperature'
temp2m_v.grid_mapping = 'Lambert_Conformal' # can it be something else or ..
#... eliminated?).This is straight from the solution on that webpage.
lono = nco.createVariable('longitude','f8')
lato = nco.createVariable('latitude','f8')
xo = nco.createVariable('x','f4',('x')) #not sure if this is important
yo = nco.createVariable('y','f4',('y')) #not sure if this is important
lco = nco.createVariable('Lambert_Conformal','i4') #not sure
#copy all the variable attributes from original file
for var in ['longitude','latitude']:
for att in f.variables[var].ncattrs():
setattr(nco.variables[var],att,getattr(f.variables[var],att))
# copy variable data for lon,lat,x and y
lono=f.variables['longitude'][:]
lato=f.variables['latitude'][:]
#xo[:]=f.variables['x']
#yo[:]=f.variables['y']
# write the temp at 2 m data
temp2m_v[:,:]=temp2m
# copy Global attributes from original file
for att in f.ncattrs():
setattr(nco,att,getattr(f,att))
nco.Conventions='CF-1.6' #not sure what is this.
nco.close()
#attempt to plot the 00zJan mean
file=nc.Dataset('d:/data/ecmwf/t2m.00zJan.nc','r')
t2mtr=file.variables['t2m'][:]
lon=file.variables['longitude'][:]
lat=file.variables['latitude'][:]
clevs=np.arange(0,500.,10.)
map = Basemap(projection='cyl',llcrnrlat=0.,urcrnrlat=10.,llcrnrlon=97.,urcrnrlon=110.,resolution='i')
x,y=map(*np.meshgrid(lon,lat))
cs = map.contourf(x,y,t2mtr,clevs,extend='both')
map.drawcoastlines()
map.drawcountries()
plt.plot(cs)
plt.show()
First question is at the temp2m += t2mtr[1,:,:] . I am not sure how to slice the data to get only 00z (let say for January only) of all files.
Second, While running the test, an error came at cs = map.contourf(x,y,t2mtr,clevs,extend='both') saying "shape does not match that of z: found (1,1) instead of (241,480)". I know some error probably on the output data, due to error on recording the values, but I can't figure out what/where .
Thanks for your time. I hope this is not confusing.

So t2mtr is a 3d array
ntimes, ny, nx = shape(t2mtr)
This sums all values across the 1st axis:
for i in xrange(ntimes):
temp2m += t2mtr[i,:,:]
A better way to do this is:
temp2m = np.sum(tm2tr, axis=0)
temp2m = tm2tr.sum(axis=0) # alt
If you want the average, use np.mean instead of np.sum.
To average across a subset of the times, jan_times, use an expression like:
jan_avg = np.mean(tm2tr[jan_times,:,:], axis=0)
This is simplest if you want just a simple range, e.g the first 30 times. For simplicity I'm assuming the data is daily and years are constant length. You can adjust things for the 4hr frequency and leap years.
tm2tr[0:31,:,:]
A simplistic way on getting Jan data for several years is to construct an index like:
yr_starts = np.arange(0,3)*365 # can adjust for leap years
jan_times = (yr_starts[:,None]+ np.arange(31)).flatten()
# array([ 0, 1, 2, ... 29, 30, 365, ..., 756, 757, 758, 759, 760])
Another option would be to reshape tm2tr (doesn't work well for leap years).
tm2tr.reshape(nyrs, 365, nx, ny)[:,0:31,:,:].mean(axis=1)
You could test the time sampling with something like:
np.arange(5*365).reshape(5,365)[:,0:31].mean(axis=1)
Doesn't the data set have a time variable? You might be able to extract the desired time indices from that. I worked with ECMWF data a number of years ago, but don't remember a lot of the details.
As for your contourf error, I would check the shape of the 3 main arguments: x,y,t2mtr. They should match. I haven't worked with Basemap.

Drawing a 2D function in matplotlib

Dear fellow coders and science guys :)
I am using python with numpy and matplotlib to simulate a perceptron, proud to say it works pretty well.
I used python even tough I've never seen it before, cause I heard matplotlib offered amazing graph visualisation capabilities.
Using functions below I get a 2d array that looks like this:
[[aplha_1, 900], [alpha_2], 600, .., [alpha_99, 900]
So I get this 2D array and would love to write a function that would enable me to analyze the convergence.
I am looking for something that will easily and intuitively (don't have time to study a whole new library for 5 hours now) draw a function like this sketch:
def get_convergence_for_alpha(self, _alpha):
epochs = []
for i in range(0, 5):
epochs.append(self.perceptron_algorithm())
self.weights = self.generate_weights()
avg = sum(epochs, 0) / len(epochs)
res = [_alpha, avg]
return res
And this is the whole generation function.
def alpha_convergence_function(self):
res = []
for i in range(1, 100):
res.append(self.get_convergence_for_alpha(i / 100))
return res
Is this easily doable?

You can convert your nested list to a 2d numpy array and then use slicing to get the alphas and epoch counts (just like in matlab).
import numpy as np
import matplotlib.pyplot as plt
# code to simulate the perceptron goes here...
res = your_object.alpha_convergence_function()
res = np.asarray(res)
print('array size:', res.shape)
plt.xkcd() # so you get the sketchy look :)
# first column -> x-axis, second column -> y-axis
plt.plot(res[:,0], res[:,1])
plt.show()
Remove the plt.xkcd() line if you don't actually want the plot to look like a sketch...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get months from np.datetime64 object, NOT using pandas - python

If i understand you correctly this would do it: [np.datetime64(i,'M') for i in dt64]

Converting to python datetime in an intermediate step: from datetime import datetime import numpy as np datestrings = np.array(["18930201", "19840404"]) months = np.array([datetime.strptime(d, "%Y%m%d").month for d in datestrings]) print(months) # out: [2 4]

My version of numpy May be dated, but when I ran np.datetime64(dt64[0]) I got numpy.datetime64('1970-01') To get just the month (if that’s what you are looking for) try: np.datetime_as_string(dt64[0]).split('-')[1]

This solution fits the best for me: dt64[(dt64.astype('M8[M]') - dt64.astype('M8[Y]')).view(int) == 2] Thanks to Paul Panzer.

Related

How do I create a 3-D plot from an external file for multiple traces/curves? (like waterfall plots in Origin)

Problem with matplotlib.pyplot with matplotlib.pyplot.scatter in the argument s

Using python and matplotlib, fill between two lines not giving expected output

Reading and manipulating multiple netcdf files in python

Drawing a 2D function in matplotlib

Categories

Resources