How to set attributes in pandasdmx - python

The "pandasdmx" documentation has an example of how to set dimensions, but no how to set attributes.
I tried passing "key" to the variable, but an error occurs.
Perhaps setting attributes requires a different syntax?
At the end of my sample code, I printed out a list of dimensions and attributes.
import pandasdmx as sdmx
import pandas as pd
ecb = sdmx.Request('ECB')
key = dict(FREQ=['M'], SEC_ITEM=['F51100'], REF_AREA = ['I8'], SEC_ISSUING_SECTOR = ['1000'])
params = dict(startPeriod='2010-02', endPeriod='2010-02')
data_msg = ecb.data('SEC', key=key, params=params)
data = data_msg.data[0]
daily = [s for sk, s in data.series.items() if sk.FREQ == 'M']
cur_df = pd.concat(sdmx.to_pandas(daily)).unstack()
print(cur_df.to_string())
flow_msg = ecb.dataflow()
dataflows = sdmx.to_pandas(flow_msg.dataflow)
exr_msg = ecb.dataflow('SEC')
exr_flow = exr_msg.dataflow.SEC
dsd = exr_flow.structure
dimen = dsd.dimensions.components
attrib = dsd.attributes.components
print(dimen)
print(attrib)

To clarify, what you're asking here is not “how to set attributes” but “how to include attributes when converting (or ‘writing’) an SDMX DataSet to a pandas DataFrame”.
Using sdmx1,¹ see the documentation for the attribute parameter to the write_dataset() function. This is the function that is ultimately called by to_pandas() when you pass a DataSet object as the first argument.
In your sample code, the key line is:
sdmx.to_pandas(daily)
You might try instead:
sdmx.to_pandas(data_msg.data[0], attributes="dsgo")
Per the documentation, the string "dsgo" means “include in additional columns all attributes attached at the dataset, series key, group, and observation levels.”
¹ sdmx1 is a fork of pandaSDMX with many enhancements and more regular releases.

Related

How to define variables dynamically in python?

I need to create variables dynamically based on the data that is coming from my UI.
Sample JSON:
Some of the sample JSON I'll get from UI to hit the python code.
data_json = {'key1':'value1','key2':'value2','key3':['abc','def']}
data_json = {'key1':'value2','key2':'value8','key3':['abc','def','ghi','jklmn']}
data_json = {'key1':'value3','key2':'value9','key3':['abc']}
data_json = {'key1':'value4','key2':'value2','key3':['abc','def','xyz']}
data_json = {'key1':'value6','key2':'value2','key3':['abc','def']}
I have data in JSON format in which the length of the "key3" value will keep changing each time.
I have to capture those values in separate variables and have to use them later in other functions.
If I pass the first data_json first block of if condition will work and assign it to variables. And if I pass the second data_json second block will define the variables.
Python:
secret = data_json['key1']
if secret in ['value1','value6']: ​
​first_value = data_json['key3'][0]
​ second_value = data_json['key3'][1]
if secret in ['value2']:
​ first_value = data_json['key3'][0]
​​ second_value = data_json['key3'][1]
third_value = data_json['key3'][2]
fourth_value = data_json]'key3'][3]
if secret in ['value3']:
first_value = data_json['key3'][0]
if secret in ['value4']:
​ first_value = data_json['key3'][0]
​​ second_value = data_json['key3'][1]
third_value = data_json['key3'][2]
print("Value in first:%s",first_value)
print("Value in second :%s",second_value)
print("Value in third:%s",third_value)
I'm using conditions to capture those variables.The above code is working fine. But I have to avoid using if conditions. Is there any way to define the variables dynamically on the fly and so that i can use it later in same functions?
I don't think you are approaching it the right way. For such cases - where we have unknown number of variables - we use lists! Lists in python are of dynamic size. So, you don't need to know the exact size before creating a list.
Therefore, you can store your numbers in a list and then access them using the indices like this:
all_values = data_json['key3']
print("Value in first:%s", all_-values[0])
print("Value in second :%s", all_values[1])
print("Value in third:%s", all_values[2])
Note that here you don't need conditional statements to make sure you are reading the exact number of values (not more or less) from the JSON.
What you are calling dynamic variables are not needed! Wherever you need first_value, you can use all_values[0]. For, second_value, you can use all_values[1] and so on...
The best way to solve your problem is to save the values in an array and access then via indices, rather than creating separate variables for each element in the array.
data_json = {'key1':'value2','key2':'value8','key3':['abc','def','ghi','jklmn']}
key3_vars = data_json['key3']
for var in key3_vars:
print(var)
But if you have to create separate variables, then you can use the built-in function exec.
data_json = {'key1':'value2','key2':'value8','key3':['abc','def','ghi','jklmn']}
key3_vars = data_json['key3']
for i, var in enumerate(key3_vars):
exec(f"key3_var{i} = '{var}'")
print(key3_var0)
print(key3_var1)
print(key3_var2)
print(key3_var3)

cannot unpack non-iterable int object when using python dicitonary

I have the following command below:
import pandas as pd
import numpy as np
from scipy import stats
np.random.seed(12345)
standarderrors1992 = stats.sem(np.random.normal(32000,200000,3650))
standarderrors1993 = stats.sem(np.random.normal(43000,100000,3650))
standarderrors1994 = stats.sem(np.random.normal(43500,140000,3650))
standarderrors1995 = stats.sem(np.random.normal(48000,70000,3650))
mean1992 = np.random.normal(32000,200000,3650).mean()
mean1993 = np.random.normal(43000,100000,3650).mean()
mean1994 = np.random.normal(43500,140000,3650).mean()
mean1995 = np.random.normal(48000,70000,3650).mean()
Here, I have found both the mean and standard error for a set of randomly chosen values.
limit = 3000
dict = {mean1992:standarderrors1992,mean1993:standarderrors1993,mean1994:standarderrors1994,mean1995:standarderrors1995}
for key,value in dict:
if limit > (key+(1.96*value)):
colour = 1
elif limit < (key+(1.96*value)):
colour = 0
elif (limit !> (key+(1.96*value))) && (limit !< (key-(1.96*value))):
colour = ((key+(1.96*value))-limit)/((key+(1.96*value))-(key-(1.96*value)))
Here, I am trying to put the values corresponding to the means and standard errors into a dictionary so that I can loop through both of them.
Ideally, I want to assign a particular value to the variable 'colour' depending on the values for the mean and standard error of a particular year. i.e. mean and SE for 1992
However, I keep getting the error:
TypeError: cannot unpack non-iterable int object
Coudld anyone let me know where I'm going wrong?
You need to iterate over dict.items() for this to work.
for key,value in dict.items():
# do stuff here
I would advice against naming your variables dict which shadows the build in dict function though :)

Concise way to convert multiple Date Time to seconds?

I am looking for a way to write the code below in a more concise manner. I thought about trying df[timemonths] = pd.to_timedelta(df[timemonths])...
but it did not work (arg must be a string, timedelta, list, tuple, 1-d array, or Series).
Appreciate any help. Thanks
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3',
'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath',
'TimetoRecurrence']
monthsec = 2.628e6 # to convert to months
df.TimetoLocalRecurrence = pd.to_timedelta(df.TimetoLocalRecurrence).dt.total_seconds()/monthsec
df.TimeFromPriorRTtoSRS = pd.to_timedelta(df.TimeFromPriorRTtoSRS).dt.total_seconds()/monthsec
df.TimetoAcuteG3 = pd.to_timedelta(df.TimetoAcuteG3).dt.total_seconds()/monthsec
df.TimetoLateG3 = pd.to_timedelta(df.TimetoLateG3).dt.total_seconds()/monthsec
df.TimeSRStoLastFUDeath = pd.to_timedelta(df.TimeSRStoLastFUDeath).dt.total_seconds()/monthsec
df.TimeDiagnosistoLastFUDeath = pd.to_timedelta(df.TimeDiagnosistoLastFUDeath).dt.total_seconds()/monthsec
df.TimetoRecurrence = pd.to_timedelta(df.TimetoRecurrence).dt.total_seconds()/monthsec
You could write your operation as a lambda function and then apply it to the relevant columns:
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3',
'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath',
'TimetoRecurrence']
monthsec = 2.628e6
convert_to_months = lambda x: pd.to_timedelta(x).dt.total_seconds()/monthsec
df[timemonths] = df[timemonths].apply(convert_to_months)
Granted I am kind of guessing here since you haven't provided any example data to work with.
Iterate over vars() of df
Disclaimer: this solution will most likely only work if the df class doesn't have any other variables.
The way this works is by simply moving the repetitive code after the = to a function.
def convert(times):
monthsec = 2.628e6
return {
key: pd.to_timedelta(value).dt.total_seconds()/monthsec
for key, value in times.items()
}
Now we have to apply this function to each variable.
The problem here is that it can be tedious to apply it to each variable individually, so we could use your list timemonths to apply it based on the keys, however, this requires us to create an array of keys manually like so:
timemonths = ['TimeFromPriorRTtoSRS', 'TimetoAcuteG3','TimetoLateG3', 'TimeSRStoLastFUDeath','TimeDiagnosistoLastFUDeath', 'TimetoRecurrence']
And this can be annoying, especially if you add more, or take away some because you have to keep updating this array.
So instead, let's dynamically iterate over every variable in df
for key, value in convert(vars(df)).items():
setattr(df, key, value)
Full Code:
def convert(times):
monthsec = 2.628e6
return {
key: pd.to_timedelta(value).dt.total_seconds()/monthsec
for key, value in times.items()
}
for key, value in convert(vars(df)).items():
setattr(df, key, value)
Sidenote
The reason I am using setattr is because when examining your code, I came to the conclusion that df was most likely a class instance, and as such, properties (by this I mean variables like self.variable = ...) of a class instance must by modified via setattr and not df['variable'] = ....

Django: Add a list to a QuerySet

I am new to django so apologies if this is not possible or easy.
I have a view that takes a subset of a model
data = Terms.objects.filter(language = language_id)
The subset is one language. The set has a number of concepts for a language. Some languages might use the same word for multiple concepts, and I want to colour these the same in an SVG image. So I do this next:
for d in data:
if d.term is None:
d.colour = "#D3D3D3"
else:
d.colour = termColours[d.term]
Where termColours is a dictionary with keys as the unique terms and values as the hexadecimal colour I want.
I thought this would add a new colour attribute to my queryset. However, when I convert the queryset to json (in order to pass it to JS) the colour object is not there.
terms_json = serializers.serialize('json', data)
How can I add a new colour element to my queryset?
Convert your Queryset to Dict and then modify values.
Ex:
data = Terms.objects.filter(language = language_id).values()
for d in data:
if d.term is None:
d.colour = "#D3D3D3"
else:
d.colour = termColours[d.term]
If I understand correctly - you need Django ORM annotation. And it might look like that:
from django.db.models import Case, When, Value
data = Terms.objects.filter(language = language_id)
.annotate(color = Case(
When(term__isnull = True, then = "#D3D3D3"),
When(term__isnull = False, then = termColours[Value(term)]),))
Only problem here - I don't exactly know this moment - termColours[Value(term)], you need to test different combinations of that expressions to get the value of field term.

How to vectorize a json dictionary using R wrapped in python?

High level description of what I want: I want to be able to receive a json response detailing certain values of fields/features, say {a: 1, b:2, c:3} as a flask (json) request. Then I want to convert the resulting python_dict into an r dataframe with rpy2(a single row of one), and feed it into a model in R which is expecting to receive a set of input where each column is a factor in r. I usually use python for this sort of thing, and serialize a vectorizer object from sklearn -- but this particular analysis needs to be done an R.
So here is what I'm doing so far.
import rpy2.robjects as robjects
from rpy2.robjects.packages import STAP
model = os.path.join('model', 'rsource_file.R')
with open(model, 'r') as f:
string = f.read()
model = STAP(string, "model")
data_r = robjects.DataFrame(data)
data_factored = model.prepdata(data_r)
result = model.predict(data_factored)
the relevant r functions from rsource_code are:
prepdata = function(row){
for(v in vars) if(typeof(row[,v])=="character") row[,v] = as.factor(row[,v], levs[0,v])
modm2=model.matrix(frm, data=tdz2, contrasts.arg = c1,xlev = levs)
}
where contrasts and levels have been pre-extracted from an existing dataset likeso:
#vars = vector of columns of interest
load(data.Rd)
for(v in vars) if(typeof(data[,v])=="character") data[,v] = as.factor(data[,v])
frm = ~ weightedsum_of_things #function mapped, causes no issue
modm= (model.matrix(frm,data=data))
levs = lapply(data, levels)
c1 = attributes(modm)$contrasts
calling prepdata does not give me what I want, which is for the newly dataframe(from the json request data_r) to be properly turned into a vector of "factors" with the same encoding by which the elements of the data.Rd database where transformed.
Thank you for your assistance, will upvote.
More detail: So what my code is attempting to do is map the labels() method over a the dataset to extract a list of lists of possible "levels" for a factor -- and then for matching values in the new input, call factor() with the new data row as well as the corresponding set of levels, levs[0,v].
This throws an error that you can't use factor if there isn't more than one level. I think this might have something to do with the labels/level difference? I'm calling levs[,v] to get the element of the return value of lapply(data, levels) corresponding to the "title" v (a string). I extracted the levelsfrom the data set -- but referencing them in the body of prep_data this way doesn't seem to work. Do I need to extract labels instead? if so, how can I do that?

Categories

Resources