I have two a two component data file (called RealData) that i am able to load and plot into python using matplotlib using the following code;
x = RealData[:,0]
y = RealData[:,1]
plt.plot(x,y
the first few lines of the data is
1431.11555,-0.02399
1430.15118,-0.02387
1429.18682,-0.02294
1428.22245,-0.02167
1427.25809,-0.02066
1426.29373,-0.02020
1425.32936,-0.02022
1424.36500,-0.02041
1423.40064,-0.02047
1422.43627,-0.02029
1421.47191,-0.01993
1420.50755,-0.01950
1419.54318,-0.01913
1418.57882,-0.01888
.........
I would like to plot the magnitude to the data so that the y component become positive, something like
|y| = squareRoot((-0.02399)^2 + (-0.02387)^2 + ... ))
I think this would involve some sort of for loop or while loop, however I am not sure how to construct it. any help?
Related
I am reading functions from an existing file using h5py library.
readFile = h5py.File('File',r)
using readFile.keys() I obtained the list of the functions stored in 'File'. One of these functions is the function phi. To print the function phi, I did
phi = numpy.array(readFile['phi'])[:,0,:,:]
in [:,0,:,:] we find the way how the data is stored [blocks, z, y, x]. z= 0 because it is a 2D case. x is divided in 2 blocks, and y is divided to 2 blocks. each x block is divided to nxb (x1, x2, ....,x20), and each y block is divided to nyb. (nxb and nyb can also be obtained directly from the file using h5py as they are also stored in the file. The domain of the data is also stored in the file and it is called ['bounding box'])
Then , coding the grid will be:
nxb = numpy.array(readFile['integer scalars'])[0][1]
nyb = numpy.array(readFile['integer scalars'])[1][1]
X = numpy.zeros([block, nxb, nyb])
Y = numpy.zeros([block, nxb, nyb])
for block in range(block):
x_min, x_max = numpy.array(readFile['bounding box'])[block,0,:]
y_min, y_max = numpy.array(readFile['bounding box'])[block,1,:]
X[block,:,:], Y[block,:,:] = numpy.meshgrid(numpy.linspace(x_min,x_max,nxb),
numpy.linspace(y_min,y_max,nyb))
My question, is that I am trying to restructure the data (see the figure). I want to bring the data of the block 2 up to the data of the block 1 and not next to him. Which means that I need to create new coordinates I' and J' related to the old coordinates I , and J. I tried this but it is not working:
for i in range(X):
for j in range(Y):
i' = i -len(X[0:1,:,:]
j' = j + len(Y[0:1,:,:]
phi(i',j') = phi
When working with HDF5 data, it's important to understand your data schema before you start writing code. Here are my initial observations and suggestions.
Your question is a little hard to follow. (For example, you are using the term "functions" to describe HDF5 datasets.) HDF5 organizes data in datasets and groups. Your data of interest is in 2 datasets: 'phi' and 'integer scalars'.
You can simplify code to access the datasets as a Numpy arrays using the following:
with h5py.File('File','r') as readFile:
# to get the axis dimensions for 'phi':
print(f"Shape of Dataset phi: {readFile['phi'].shape}")
phi_ds = readFile['phi'] # to get a dataset object
phi_arr = readFile['phi'][()] # to read dataset as a numpy array
# to get the axis dimensions for 'integer scalars'
nxb, nyb = readFile['integer scalars'].shape
I don't understand what you mean by "blocks". Are you referering to the axis simensions? Also, why you are using meshgrid? If you simply want to change dimensions, use Numpy's .reshape() method to change the axis dimensions of the Numpy array.
Here is a simple example that creates a 2x2 dataset, then reads it into a new array and reshapes it to 1x4. I think this is what you want to do. Change the values of a0 and a1 if you want to increase the size. The reshape operation will read the shape from the first array and reshape the new array to (N,1), where N is your nxb*nyb value.
with h5py.File('SO_72340647.h5','w') as h5f:
a0, a1 = 2,2
arr = np.arange(a0*a1).reshape(a0,a1)
h5f.create_dataset('ds_2x2',data=arr)
with h5py.File('SO_72340647.h5','r') as h5f:
print(f"Shape of Dataset ds_2x2: {h5f['ds_2x2'].shape}")
ds_arr = h5f['ds_2x2'][()]
print(ds_arr)
ds0, ds1 = ds_arr.shape
new_arr = ds_arr.reshape(ds0*ds1,1)
print(f"Shape of new (reshaped) array: {new_arr.shape}")
print(new_arr)
Note: h5py dataset objects "behave like" Numpy arrays. So, you frequently don't have to read into an array to use the data.
I'm trying to plot a grid of air pollution data from a netCDF files in python using xarray. However, I'm facing a couple roadblocks.
To start off, here is the data that can be used to reproduce my code:
Data
When you try to import this data using xarray.open_dataset, you end up with a file that has zero coordinates or variables, and lots of attributes:
FILE_NAME = "test2.nc". ##I changed the name to make it shorter
xr.open_dataset(FILE_NAME)
So I created variables of the data and tried to import those into xarray:
prd='PRODUCT'
metdata = "METADATA"
lat= ds.groups[prd].variables['latitude']
lon= ds.groups[prd].variables['longitude']
no2 = ds.groups[prd].variables['nitrogendioxide_tropospheric_column']
scanline = ds.groups[prd].variables['scanline']
time = ds.groups[prd].variables['time']
ground_pixel = ds.groups[prd].variables['ground_pixel']
ds = xr.DataArray(no2,
dims=["time","x","y"],
coords={
"lon":(["time","x", "y"], lon)
}
# coords=[("time", time), ("x", scanline),("y", ground_pixel)]
)
As you can see above, I tried multiple ways of creating the coordinates, but I'm still getting an error. The data in this netCDF file is on an irregular grid, and I just want to be able to plot that accurately and quickly using xarray.
Does someone know how I can do this?
So I am working with a shapefile and a csv file of data points. I have used a raster function to convert the shapefile to a grid based on latitudes/longitudes. Now I need to put data points onto the same grid so that I can see where they fall in comparison to the "shape" produced by the other file. I have opened the csv file using the below code, and removed where the latitude/longitudes are null and made two new numpy arrays of lat/lons. But now I am confused about the next step, so if anyone has done anything similar, how should I proceed?
x = list(csv.reader(open(places,'rb'),delimiter=','))
List1 = zip(*x)
dataDict1 = {}
for column in List1:
dataDict1[col[0]] = col[1:]
lats = np.array(dataDict1['Latitude'])
lons = np.array(dataDict1['Longitude'])
I'm retrieving a large number of data from a database, which I later plot using a scatterplot. However, I run out of memory, and the program aborts when I am using my full data. Just for the record it takes >30 minutes to run this program, and the length of the data list is about 20-30 million.
map = Basemap(projection='merc',
resolution = 'c', area_thresh = 10,
llcrnrlon=-180, llcrnrlat=-75,
urcrnrlon=180, urcrnrlat=82)
map.drawcoastlines(color='black')
# map.fillcontinents(color='#27ae60')
with lite.connect('database.db') as con:
start = 1406851200
end = 1409529600
cur = con.cursor()
cur.execute('SELECT latitude, longitude FROM plot WHERE unixtime >= {start} AND unixtime < {end}'.format(start = start, end = end))
data = cur.fetchall()
y,x = zip(*data)
x,y = map(x,y)
plt.scatter(x,y, s=0.05, alpha=0.7, color="#e74c3c", edgecolors='none')
plt.savefig('Plot.pdf')
plt.savefig('Plot.png')
I think my problem may be in the zip(*) function, but I really have no clue. I'm both interested in how I can preserve more memory by rewriting my existing code, and to split up the plotting process. My idea is to split the time period in half, then just do the same thing twice for the two time periods before saving the figure, however I am unsure on this will help me at all. If the problem is to actually plot it, I got no idea.
If you think the problem lies in the zip function, why not use a matplotlib array to massage your data into the right format? Something like this:
data = numpy.array(cur.fetchall())
lat = data[:,0]
lon = data[:,1]
x,y = map(lon, lat)
Also, your generated PDF will be very large and slow to render by the various PDF readers because it is a vectorized format by default. All your millions of data points will be stored as floats and rendered when the user opens the document. I recommend that you add the rasterized=True argument to your plt.scatter() call. This will save the result as a bitmap inside your PDF (see the docs here)
If this all doesn't help, I would investigate further by commenting out lines starting at the back. That is, first comment out plt.savefig('Plot.png') and see if the memory use goes down. If not, comment out the line before that, etc.
Hi I have a 3D list (I realise this may not be the best representation of my data so any advice here is appreciated) as such:
y_data = [
[[a,0],[b,1],[c,None],[d,6],[e,7]],
[[a,5],[b,2],[c,1],[d,None],[e,1]],
[[a,3],[b,None],[c,4],[d,9],[e,None]],
]
The y-axis data is such that each sublist is a list of values for one hour. The hours are the x-axis data. Each sublist of this has the following format:
[label,value]
So essentially:
line a is [0,5,3] on the y-axis
line b is [1,2,None] on the y-axis etc.
My x-data is:
x_data = [0,1,2,3,4]
Now when I plot this list directly i.e.
for i in range(0,5):
ax.plot(x_data, [row[i][1] for row in y_data], label=y_data[0][i][0])
I get a line graph however where the value is None the point is not drawn and the line not connected.
What I would like to do is to have a graph which will plot my data in it's current format, but ignore missing points and draw a line between the point before the missing data and the point after (i.e. interpolating the missing point).
I tried doing it like this https://stackoverflow.com/a/14399830/1800665 but I couldn't work out how to do this for a 3D list.
Thanks for any help!
The general approach that you linked to will work fine here ; it looks like the question you're asking is how to apply that approach to your data. I'd like to suggest that by factoring out the data you're plotting, you'll see more clearly how to do it.
import numpy as np
y_data = [
[[a,0],[b,1],[c,None],[d,6],[e,7]],
[[a,5],[b,2],[c,1],[d,None],[e,1]],
[[a,3],[b,None],[c,4],[d,9],[e,None]],
]
x_data = [0, 1, 2, 3, 4]
for i in range(5):
xv = []
yv = []
for j, v in enumerate(row[i][1] for row in y_data):
if v is not None:
xv.append(j)
yv.append(v)
ax.plot(xv, yv, label=y_data[0][i][0])
Here instead of using a mask like in the linked question/answer, I've explicitly built up the lists of valid data points that are to be plotted.