I am trying to create a Heatmap movie for the confirmed cases of Covid 19.
My dataset is a pd.dataFrame with columns Date, Latitude, Longitude, Confirmed.
My issue is that I do not know how to input the Confirmed value as an input in the Folium.plugin.HeatmapWithTime.
I tried using:
new_map = folium.Map(location=[0, 0], tiles= "cartodbpositron",min_zoom=2, zoom_start=2, max_zoom=3)
df['Lat'] = df['Lat'].astype(float)
df['Long'] = df['Long'].astype(float)
Confirmed_df = df[['Lat', 'Long','Confirmed']]
hm = plugins.HeatMapWithTime(Confirmed_df,auto_play=True,max_opacity=0.8)
hm.add_to(new_map)
new_map
df looks like:
Date LAT LONG Confirmed
2020/04/26 48.847306 2.433284 6500
2020/04/26 48.861935 2.441292 4800
2020/04/26 48.839644 2.655109 9000
2020/04/25 48.924351 2.386369 12000
2020/04/25 48.829872 2.376677 0
You should pre-process data before input to HeatMapWithTime() function. The Folium document and example here are helpful.
In your case, the input should be a list of [lat, lng, weight], you should use Confirmed column as a weight. The first thing, you need normalize 'Confirmed' values to (0, 1].
df['Confirmed'] = df['Confirmed'] / df['Confirmed'].sum()
Then, you can preprocess like this:
df['Date'] = df['Date'].sort_values(ascending=True)
data = []
for _, d in df.groupby('Date'):
data.append([[row['lat'], row['lng'], row['Confirmed']] for _, row in d.iterrows()])
Finally, use data to input to function HeatMapWithTime() as you did:
hm = plugins.HeatMapWithTime(data, auto_play=True,max_opacity=0.8)
hm.add_to(new_map)
new_map
Related
I am aiming to calculate daily climatology from a dataset, i.e. obtain the sea surface temperature (SST) for each day of the year by averaging all the years (for example, for January 1st, the average SST of all January 1st from 1982 to 2018). To do so, I made the following steps:
DATA PREPARATION STEPS
Here is a Drive link to both datasets to make the code reproducible:
link to datasets
First, I load two datasets:
ds1 = xr.open_dataset('./anomaly_dss/archive_to2018.nc') #from 1982 to 2018
ds2 = xr.open_dataset('./anomaly_dss/realtime_from2018.nc') #from 2018 to present
Then I convert to pandas dataframe and merge both in one:
ds1 = ds1.where(ds1.time > np.datetime64('1982-01-01'), drop=True) # Grab all data since 1/1/1982
ds2 = ds2.where(ds2.time > ds1.time.max(), drop=True) # Grab all data since the end of the archive
# Convert to Pandas Dataframe
df1 = ds1.to_dataframe().reset_index().set_index('time')
df2 = ds2.to_dataframe().reset_index().set_index('time')
# Merge these datasets
df = df1.combine_first(df2)
So far, this is how my dataframe looks like:
NOTE THAT THE LAT,LON GOES FROM LAT(35,37.7), LON(-10,-5), THIS MUST REMAIN LIKE THAT
ANOMALY CALCULATION STEPS
# Anomaly claculation
def standardize(x):
return (x - x.mean())/x.std()
# Calculate a daily average
df_daily = df.resample('1D').mean()
# Calculate the anomaly for each yearday
df_daily['anomaly'] = df_daily['analysed_sst'].groupby(df_daily.index.dayofyear).transform(standardize)
I obtain the following dataframe:
As you can see, I obtain the mean values of all three variables.
QUESTION
As I want to plot the climatology data on a map, I DO NOT want lat/lon variables to be averaged to one point. I need the anomaly from all the points lat/lon points, and I don't really know how to achieve that.
Any help would be very appreciated!!
I think you can do all that in a simpler and more straightforward way without converting your dataarray to a dataframe:
import os
#Will open and combine automatically the 2 datasets
DS = xr.open_mfdataset(os.path.join('./anomaly_dss', '*.nc'))
da = DS.analysed_sst
#Resampling
da = da.resample(time = '1D').mean()
# Anomaly calculation
def standardize(x):
return (x - x.mean())/x.std()
da_anomaly = da.groupby(da.time.dt.dayofyear).apply(standardize)
Then you can plot the anomaly for any day with:
da_anomaly[da_anomaly.dayofyear == 1].plot()
Making a program for my Final Year Project.
Program takes the longitude and latitude coords from a .csv dataset and plots them on the map.
Issue I am having is there is multiple ID's and this totals 445,000+ points.
How would I refine it so the program can differentiate between the IDs?
def create_image(self, color, width=2):
# Creates an image that contains the Map and the GPS record
# color = color the GPS line is
# width = width of the GPS line
data = pd.read_csv(self.data_path, header=0)
# sep will separate the latitude from the longitude
data.info()
self.result_image = Image.open(self.map_path, 'r')
img_points = []
gps_data = tuple(zip(data['latitude'].values, data['longitude'].values))
for d in gps_data:
x1, y1 = self.scale_to_img(d, (self.result_image.size[0], self.result_image.size[1]))
img_points.append((x1, y1))
draw = ImageDraw.Draw(self.result_image)
draw.line(img_points, fill=color, width=width)
I have also attached the github project here the program works but I am just trying to minimize how many users it plots at once.
Thanks in advance.
To check for a specific ID you could create a filter. For this dataframe
long lat ID
0 10 5 test1
1 15 20 test2
you could do the following:
id_filt = df_data['ID'] == 'test1'
This can be used to filter out every entry from the dataframe that has the ID 'test1'
df_data[id_filt]
long lat ID
10 5 test1
I am pretty new with Python and I need some help.
I need to find the grid cells in the precipitation file (.nc) that matches the locations of water flow stations (excel file) and then extract time series for these grid cells.
I have a Exel file with 117 stations in Norway that contains columns with station name and their areal, latitude and longitude.
I also have a nc file with precipitation series for this stations.
I manage to run a python script (Jupyter notebook) for on station at a time, but want to run it for all stations.
How do i do this? I know I need to make a for loop some how.
This is my code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import os
import xarray as xr
import cartopy.crs as ccrs
import cartopy as cy
metapath = "Minestasjoner.xlsx"
rrdatapath = "cropped_monsum_rr_ens_mean_0.25deg_reg_v20.0e.nc"
meta = pd.read_excel(metapath)
rrdata = xr.open_dataset(rrdatapath)
i=0
station = meta.iloc[i]["Regime"]*100000 + meta.iloc[i]["Main_nr"]
lon = meta.iloc[i]["Longitude"] #get longitude
lat = meta.iloc[i]["Latitude"] #get latitude
rr_at_obsloc = rrdata["rr"].sel(latitude=lat, longitude=lon, method='nearest')
df = rr_at_obsloc.to_dataframe()
print("Station %s with lon=%.2f and lat=%.2f have closest rr gridcell at lon=%.2f and lat=%.2f"%(station,lon,lat,df.longitude[0],df.latitude[0]))
df
I think the easiest way for you to do this is to make a python dictionary containing the station name and precipitation time-series for that station, and then to convert that dictionary to a pandas.DataFrame.
Here's how you do that in a simple loop:
"""
Everything you had previously...
"""
# Initialize empty dictionary to hold station names and time-series
station_name_and_data = {}
# Loop over all stations
for i in range(117):
# Get name of station 'i'
station = meta.iloc[i]["Regime"]*100000 + meta.iloc[i]["Main_nr"]
# Get lat/lon of station 'i'
lon = meta.iloc[i]["Longitude"]
lat = meta.iloc[i]["Latitude"]
# Extract precip time-series for this lat-lon
rr_at_obsloc = rrdata["rr"].sel(latitude=lat, longitude=lon, method='nearest')
# Put this station name and it's relevant time-series into a dictionary
station_name_and_data[station]=rr_at_obsloc
# Finally, convert this dictionary to a pandas dataframe
df = pd.DataFrame(data=station_name_and_data)
print(df)
I am very new to coding python and I am working with a .CSV file that gives me a 32x32 matrix in a 1024 column row with a time stamp. I reshaped the data to give me 32x32 arrays and looped through each row appending the matrices to a numpy array.
`i = 0
while i < len(df_array):
if i == 0:
spec = np.reshape(df_array[i][np.arange(1,1025)], (32,32))
spectrum_matrix = spec
else:
spec = np.reshape(df_array[i][np.arange(1,1025)], (32,32))
spectrum_matrix = np.concatenate((spectrum_matrix, spec), axis = 0)
i = i + 1
print("job done")`
What I would like to do is to add the time stamp from the original data file and add them to each of the matrices thus allowing me to re sample the data over a 5 minute average. I also would like to plot the bins a to get a plot similar to this Drop size distribution
As a reference I am reading in the data .CSV with pandas and here is an example of a portion of the raw data: 01.06.2017;18:22:20;0.122;0.00;51;7.401;10375;18745;57;27;0.00;23.6;0.110;0;
<SPECTRUM>;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
The ;'s after the SPECTRUM is the 32x32 matrix.
Thanks in advance for any help!
Python and associated packages can do many things without loops
From my understanding of your data you have a (8640 x 32 x 32) Data Structure (time x size x velocity).
Pandas works very well with 2D data structures, however for higher dimensional data I would recommend you get familiar with xarray. With this package along with pandas you can create and manipulate your data without having to resort to loops.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import xarray as xr
import seaborn as sns
%matplotlib inline
#create random data
data = (np.random.binomial(n =5, p =0.2, size =(8640,32,32))*1000).astype(int)
#create labels for data
sizes= np.linspace(1,5,32)
velocities = np.linspace(1,1000, num = 32)
#make time range of 24 hours with 10sec intervals
ind = pd.date_range(start='2014-01-01', periods=8640, freq='10s')
#convert data to xarray 3D data structure
df = xr.DataArray(data, coords = [ind, sizes, velocities],
dims = ['time', 'size', 'speed'])
#make a 5 min average of the data
min_average= df.resample('300s', dim = 'time', how = 'mean')
#plot sample of data and 5 min average
my1d = min_average.isel(size = 5, speed= 10)
my1d.plot(label = '5 min avg')
plt.gca()
df.isel(size = 5, speed =10).plot(alpha = 0.3, c = 'r', label = 'raw_data')
plt.legend()
As for making a distribution plot like you linked things become a bit trickier but is possible:
#transform your data to only have mean speed for each time and size
#and convert to pandas dataframe
mean_speed =min_average.mean(dim = ['speed'])
#for some reason xarray make you name the new column when you convert
#to a pandas dataframe. I then get rid of the extra empty variable with
#a list comprehension
df= mean_speed.to_dataframe('').unstack().T
df.index = np.array([np.array(i)[1].astype(float) for i in df.index])
#make a contourplot of your new data
plt.contourf(df.columns, df.index, df.values, cmap ='PuBu_r')
plt.title('mean speed')
plt.ylabel('size')
plt.xlabel('time')
plt.colorbar()
I have a following dataframe, the lat and lon are the latitudes and longitudes in Geographic coordinates system. I am trying to convert these coordinate system into native (x, y) projection.
I have tried pyproj for single points, but how do I proceed for the whole dataframe with thousands of rows.
time lat lon
0 2011-01-31 02:41:00 18.504273 -66.009332
1 2011-01-31 02:42:00 18.504673 -66.006225
I am trying to get something like this:
time lat lon x_Projn y_Projn
0 2011-01-31 02:41:00 18.504273 -66.009332 resp_x_val resp_y_val
1 2011-01-31 02:42:00 18.504673 -66.006225 resp_x_val resp_y_val
and so on...
Following is the code I tried for lat/lon to x,y system:
from pyproj import Proj, transform
inProj = Proj(init='epsg:4326')
outProj = Proj(init='epsg:3857')
x1,y1 = -105.150271116, 39.7278572773
x2,y2 = transform(inProj,outProj,x1,y1)
print (x2,y2)
Output:
-11705274.637407782 4826473.692203013
Thanks for any kind of help.
Unfortunately, pyproj only converts point by point. I guess something like this should work:
import pandas as pd
from pyproj import Proj, transform
inProj = Proj(init='epsg:4326')
outProj = Proj(init='epsg:3857')
def towgs84(row):
return pd.Series(transform(inProj, outProj, row["lat"], row["lon"]))
wsg84_df = df.apply(towgs84, axis=1) # new coord dataframe with two columns
You can iterate through the rows in a pandas data frame, transform the Longitude and Latitude values for each row, make two lists with the 1st and second coordinate values, and then turn the lists into new columns in original your data frame. Maybe not the prettiest, but this got the job done for me.
from pyproj import Proj, transform
M1s = [] #initiate empty list for 1st coordinate value
M2s = [] #initiate empty list for 2nd coordinate value
for index, row in df.iterrows(): #iterate over rows in the dataframe
long = row["Longitude (decimal degrees)"] #get the longitude for one row
lat = row["Latitude (decimal degrees)"] #get the latitude for one row
M1 = (transform(Proj(init='epsg:4326'), Proj(init='epsg:3857'), long, lat))[0] #get the 1st coordinate
M2 = (transform(Proj(init='epsg:4326'), Proj(init='epsg:3857'), long, lat))[1] #get the second coordinate
M1s.append(M1) #append 1st coordinate to list
M2s.append(M2) #append second coordinate to list
df['M1'] = M1s #new dataframe column with 1st coordinate
df['M2'] = M2s #new dataframe columne with second coordinate