MariaDB BLOB data format in Power BI vs. Python conversion - python

I have a MariaDB Table containing a MEDIUMBLOB Column. There are several entries in this table corresponding to one photo each.
When querying the data to PowerBI using the MariaDB connector, I get the Data in the format "Binary"
However, when querying the same data in Python (IDE or PowerBI) the format is different:
The bigger picture is to use this code to split the image in bits as PBI has a character-limit on their data elements:
Source = MariaDB.Contents("XXX.XXX.XXX.XXX:YYYY", "ZZZZZ"),
Query1= Source{[Name="Query1",Kind="Table"]}[Data],
#"Removed Top Rows" = Table.Skip(qc_westernblot_Table,1),
//Remove unnecessary columns
RemoveOtherColumns = Table.SelectColumns(#"Removed Top Rows",{"picture", "batchname"}),
//Creates Splitter function
SplitTextFunction = Splitter.SplitTextByRepeatedLengths(30000),
//Converts table of files to list
ListInput = Table.ToRows(RemoveOtherColumns),
//Function to convert binary of photo to multiple
//text values
ConvertOneFile = (InputRow as list) =>
let
BinaryIn = InputRow{0},
FileName = InputRow{1},
BinaryText = Binary.ToText(BinaryIn, BinaryEncoding.Base64),
SplitUpText = SplitTextFunction(BinaryText),
AddFileName = List.Transform(SplitUpText, each {FileName,_})
in
AddFileName,
//Loops over all photos and calls the above function
ConvertAllFiles = List.Transform(ListInput, each ConvertOneFile(_)),
//Combines lists together
CombineLists = List.Combine(ConvertAllFiles),
//Converts results to table
ToTable = #table(type table[Name=text,Pic=text],CombineLists),
//Adds index column to output table
AddIndexColumn = Table.AddIndexColumn(ToTable, "Index", 0, 1)
in
AddIndexColumn
As I am a beginner on this topic, I am confident there is a straight-forward conversion missing here but I couldn't figure it out so far myself.
I greatly appreciate any help. Thank you!

What are you planning to do with this data? PBI doesn't support binary data although you can see it in Power Query. It must be converted to something else before it can be loaded to the PBI data model.
I suspect the Python version is just the binary already converted to text. If you click the two arrows in the top right of the picture column for the PBI version, do you not get the same output?

Related

Query point data on a Google Earth Engine Image by specifying lat/long

I am trying to extract point values from a Google Earth Engine Image Collection by specifying lat/long information.
This seems to work perfectly fine when I am working with multiple images and use ee.Image.cat() to join them before I query the image. In the code example below composite = ee.Image.cat().
However, when I change composite (line 3 from the bottom) to one of the image collections (eg. chirps), it does not seem to work.
Please could someone assist me with this.
def getPropertyValue(settings):
collection = settings['collection'];
fieldName = settings['fieldName'];
dateRange = settings['dateRange'];
geoLocation = settings['geoLocation'];
scale = settings['scale'];
image = ee.ImageCollection(collection).select(fieldName).filterDate(dateRange[0], dateRange[1]).mean();
point = ee.Geometry.Point(geoLocation);
mean = image.reduceRegions(point, 'mean', scale);
valueRef = mean.select([fieldName], ['precipitation'], retainGeometry=True).getInfo();
value = valueRef[fieldName][0]['properties'][fieldName];
return value;
fieldName = 'LST_AVE';
chirps = ee.ImageCollection("JAXA/GCOM-C/L3/LAND/LST/V2").select(fieldName).filterDate('2020-01-01', '2020-02-01').mean()
point = ee.Geometry.Point([26.8206, 30.8025])
dist_stats = composite.reduceRegions(point, 'mean', 5000)
dist_stats = dist_stats.select([fieldName], [fieldName], retainGeometry=True).getInfo();
print(dist_stats['features'][0]['properties'][fieldName])
Result when using composite
14248.55
Error when replacing composite with a Google Earth Engine Image
EEException: Error in map(ID=0):
Feature.select: Selected a different number of properties (0) than names (1).
reduceRegions names the output column after the reducer, not the field that is being reduced. (Though it's more complicated when you have multiple bands and reducers).
So this:
dist_stats = dist_stats.select([fieldName], [fieldName], retainGeometry=True).getInfo();
should be changed to this
dist_stats = dist_stats.select(['mean'], [fieldName], retainGeometry=True).getInfo();

Trying to translate my Matlab routine for loading and preparing data to Python. Stuck in pandas

this is my first post and I have been struggling with this problem for a few days now. The following code is a Matlab code that I usually use to load my data (.csv files) and prepare them for further calculations.
%I use this later on to predefine my array because they need to have to same length for calculations
maxind = 400;
% i is given a vector (like test_numbers = [1 2 3]) I get from the user so I can iterate over the numbers of test specimen
for i = test_numbers
% Setup the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 4);
% Specify range and delimiter
opts.DataLines = [2, Inf];
opts.Delimiter = ";";
% Specify column names and types
opts.VariableNames = ["time", "force", "displ_1", "displ_2"];
opts.VariableTypes = ["double", "double", "double", "double"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Import the data
%Here I build the name and read the files/ the csv files are in the same folder as the main program.
data_col = readtable(['specimen_name',num2str(i),'.csv'], opts);
Data.force(:,i)=nan(maxind,1);
Data.force(1:length(data_col.time),i)=data_col.force;
Data.displ(:,i)=nan(maxind,1);
Data.displ(1:length(data_col.time),i)=nanmean([data_col.displ_1,data_col.displ_2]')';
Data.time(:,i)=nan(maxind,1);
Data.time(1:length(data_col.time),i)=data_col.time;
Data.name(i)={['specimen_name',num2str(i)]};
% Clear temporary variables
clear opts
end
Now I have to use Python instead of Matlab and I started with pandas to read my csv as DataFrame.
Now my question. Is there a way to access my data like in this part of my Matlab code or should I not use dataframes in the first place to do something like that? (I know I can access my data with the name of the column, but I got stuck by trying to refer my data like Data.force(1:length(data_col.time),i) in a new dataframe)
Data.force(:,i)=nan(maxind,1);
Data.force(1:length(data_col.time),i)=data_col.force;
Data.displ(:,i)=nan(maxind,1);
Data.displ(1:length(data_col.time),i)=nanmean([data_col.displ_1,data_col.displ_2]')';
Data.time(:,i)=nan(maxind,1);
Data.time(1:length(data_col.time),i)=data_col.time;
Data.name(i)={['specimen_name',num2str(i)]};
Many thanks in advance for your help.

Time series on folium map

I have a dataframe that denotes events that occured in particular locations.
I am aware that folium does not allow dynamic display of the appearance of the events so I was thinking about basically iterate through the dates and save a png of each folium map created.
Unfortunately I am mentally stuck in a 2 part problem:
1) how to loop through a ranges of dates (for example one map for each month)
2) an appropriate way to save the generated images for each loop.
This is a dataframe sample for this example:
since = ['2019-07-05', '2019-07-17', '2014-06-12', '2016-03-11']
lats = [38.72572, 38.71362, 38.79263, 38.71931]
longs = [-9.13412, -9.14407, -9.40824, -9.13143]
since_map = {'since' : pd.Series(since), 'lats' : pd.Series(lats), 'longs' : pd.Series(longs)}
since_df = pd.DataFrame(since_map)
I was able to create the base map:
lat_l = 38.736946
long_l = -9.142685
base_l = folium.Map(location=[lat_l,long_l], zoom_start=12)
neigh = folium.map.FeatureGroup()
And add some of the markers to the folium map:
for lati, longi in zip(since_df.longs, since_df.lats):
neigh.add_child(folium.CircleMarker([longi, lati], radius = 2, color = 'blue', fill = True))
base_l.add_child(neigh)
I am struggling to visualize how to loop through ranges of the dates and save each file. From what I saw here:
https://github.com/python-visualization/folium/issues/35 I actually have to open the saved html and then save it as png for each image.
If you could point me to an example or documentation that could demonstrate how this can be accomplished I would be very appreciative.
If you think I am overcomplicating it or you have a better alternative to what I am thinking I have an open ear to suggestions.
Thank you for your help.

Get results in an Earth Engine python script

I'm trying to get NDVI mean in every polygon in a feature collection with earth engine python API.
I think that I succeeded getting the result (a feature collection in a feature collection), but then I don't know how to get data from it.
The data I want is IDs from features and ndvi mean in each feature.
import datetime
import ee
ee.Initialize()
#Feature collection
fc = ee.FeatureCollection("ft:1s57dkY_Sg_E_COTe3sy1tIR_U-5Gw-BQNwHh4Xel");
fc_filtered = fc.filter(ee.Filter.equals('NUM_DECS', 1))
#Image collection
Sentinel_collection1 = (ee.ImageCollection('COPERNICUS/S2')).filterBounds(fc_filtered)
Sentinel_collection2 = Sentinel_collection1.filterDate(datetime.datetime(2017, 1, 1),datetime.datetime(2017, 8, 1))
# NDVI function to use with ee map
def NDVIcalc (image):
red = image.select('B4')
nir = image.select('B8')
ndvi = nir.subtract(red).divide(nir.add(red)).rename('NDVI')
#NDVI mean calculation with reduceRegions
MeansFeatures = ndvi.reduceRegions(reducer= ee.Reducer.mean(),collection= fc_filtered,scale= 10)
return (MeansFeatures)
#Result that I don't know to get the information: Features ID and NDVI mean
result = Sentinel_collection2.map(NDVIcalc)
If the result is small, you pull them into python using result.getInfo(). That will give you a python dictionary containing a list of FeatureCollection (which are more dictionaries). However, if the results are large or the polygons cover large regions, you'll have to Export the collection instead.
That said, there are probably some other things you'll want to do first:
1) You might want to flatten() the collection, so it's not nested collections. It'll be easier to handle that way.
2) You might want to add a date to each result so you know what time the result came from. You can do that with a map on the result, inside your NDVIcalc function
return MeansFeatures.map(lambda f : f.set('date', image.date().format())
3) If what you really want is a time-series of NDVI over time for each polygon (most common), then restructuring your code to map over polygons first will be easier:
Sentinel_collection = (ee.ImageCollection('COPERNICUS/S2')
.filterBounds(fc_filtered)
.filterDate(ee.Date('2017-01-01'),ee.Date('2017-08-01')))
def GetSeries(feature):
def NDVIcalc(img):
red = img.select('B4')
nir = img.select('B8')
ndvi = nir.subtract(red).divide(nir.add(red)).rename(['NDVI'])
return (feature
.set(ndvi.reduceRegion(ee.Reducer.mean(), feature.geometry(), 10))
.set('date', img.date().format("YYYYMMdd")))
series = Sentinel_collection.map(NDVIcalc)
// Get the time-series of values as two lists.
list = series.reduceColumns(ee.Reducer.toList(2), ['date', 'NDVI']).get('list')
return feature.set(ee.Dictionary(ee.List(list).flatten()))
result = fc_filtered.map(GetSeries)
print(result.getInfo())
4) And finally, if you're going to try to Export the result, you're likely to run into an issue where the columns of the exported table are selected from whatever columns the first feature has, so it's good to provide a "header" feature that has all columns (times), that you can merge() with the result as the first feature:
# Get all possible dates.
dates = ee.List(Sentinel_collection.map(function(img) {
return ee.Feature(null, {'date': img.date().format("YYYYMMdd") })
}).aggregate_array('date'))
# Make a default value for every date.
header = ee.Feature(null, ee.Dictionary(dates, ee.List.repeat(-1, dates.size())))
output = header.merge(result)
ee.batch.Export.table.toDrive(...)

Trouble with passing bson objectid to numpy recarray in python 3

I am working on machine translating some text that is stored in a mongodb database. I am trying pull the data from a database and then store it in numpy recarray. However I keep getting errors when I try to save the ObjectId field to the recarray--despite the different type conversions and such I have read about. Here is my code. Any suggestions would help.
#Pull the records from the DB into a resultset
db_results_records_to_translate = \
db_connector.db_fetch_untranslated_records_from_db(
article_collection,rec_number)
#Create an empty numpy recarray to store the data
data_table_for_translation=np.zeros([db_results_records_to_translate.count(),6],
dtype=[('_id', np.str),
('article_raw_text', np.str),
('article_raw_date', np.str),
('translated',np.bool),
('translated_text',np.str),
('translated_date',np.str)])
#Write record data to the recarray
for index, r in enumerate(db_results_records_to_translate):
data_table_for_translation[index, 0] = str(r['_id']) # Line with errors!!!
data_table_for_translation[index,1] = r['article_raw_text']
data_table_for_translation[index,2] = r['article_raw_date']
data_table_for_translation[index, 3] = r['translated']
So after running this code, I get an error TypeError: expected an object with a buffer interface.
Now I have tried to convert the objectid from bson to string using the str(ObjectId) function as referenced in the documentation, but no luck.
Any suggestions?
NOTE: I noticed that this error happens even for the non-id columns too, so even straight text has an issue.
There are errors in the definition of the array, including the dtype, and errors in indexing fields during the iteration.
This is clip illustrates the changes I think you need to make to get this assignment to work:
# fake data - a list of tuples
db_results_records_to_translate = [('12','raw text','raw date')]
#Create an empty numpy recarray to store the data
data_table_for_translation=np.zeros([1,],
dtype=[('_id', 'U10'),
('article_raw_text', 'U10'),
('article_raw_date', 'U10')])
# string dtype has to include length
# I'm using unicode here (Python3), 'S10' would do just as well (in py2)
#Write record data to the structured array
for index, r in enumerate(db_results_records_to_translate):
data_table_for_translation[index]['_id'] = str(r[0])
data_table_for_translation[index]['article_raw_text'] = r[1]
data_table_for_translation[index]['article_raw_date'] = r[2]
print(db_results_records_to_translate)
Note that I index the 'fields' by name, not number. data_table... is a 1d array with n fields, not a 2d array with n columns. I'm indexing r by number because my mock data is a tuple, not the db named fields.

Categories

Resources