I'm trying to use statsmodels package of MICE to impute values for my columns. I'm unable to figure out how exactly to use it. Whatever I run, it throws the error: ValueError: variable to be imputed has no observed values
Code:
df=pd.read_csv('contacts.csv', engine='c',low_memory=False)
from statsmodels.imputation.mice import MICEData as md
md(df)
Why am I doing wrong?
at least one of the columns in the generated data frame (hence csv) is empty.
Check the dataframe, maybe you have to clean it up/normalize.
Also, don't afraid to look into the code base.
What you are looking for is the _split_indices method of MICEData.
Related
I am trying to fine-tune Tapas following the instructions here: https://huggingface.co/transformers/v4.3.0/model_doc/tapas.html#usage-fine-tuning , Weak supervision for aggregation (WTQ) using the https://www.microsoft.com/en-us/download/details.aspx?id=54253 , which follow the required format of dataset in the SQA format, tsv files with most of the named columns. But, there is no float_answer column. And as mentioned,
float_answer: the float answer to the question, if there is one (np.nan if there isn’t). Only required in case of weak supervision for aggregation (such as WTQ and WikiSQL)
Since I am using WTQ, I need the float_answer column. I tried populating float_answer based on answer_text as suggested here, using https://github.com/google-research/tapas/blob/master/tapas/utils/interaction_utils_parser.py 's parse_question(table, question, mode) function. However, I am getting errors.
I copied everything from here and put these args:
.
But, I get this error: TypeError: Parameter to CopyFrom() must be instance of same class: expected language.tapas.Question got str.
1) Can you, please help understand what args should I Use or how else can I populate float_answer?
I am using table_csv and the question, answer to which is in the table given:
2) Also we have tried to simply add float_answer column and make all the values np.nan. Crashed, too.
Is there tutorial for WTQ fine-tuning? Thanx!
I had big table which I sliced to many smaller tables based on their dates:
dfs={}
for fecha in fechas:
dfs[fecha]=df[df['date']==fecha].set_index('Hour')
#now I can acess the tables like this:
dfs['2019-06-23'].head()
I have done some modifictions to the dfs['2019-06-23'] specific table and now I would like to save it on my computer. I have tried to do this in two ways:
#first try:
dfs['2019-06-23'].to_csv('specific/path/file.csv')
#second try:
test=dfs['2019-06-23']
test.to_csv('test.csv')
both of them raised this error:
TypeError: get_handle() got an unexpected keyword argument 'errors'
I don't know why I get this error and haven't find any reason for that. I have saved many files this way but never had that before.
My goal: to be able to save this dataframe after my modification as csv
If you are getting this error, there are two things to check:
Whether the DataFrame is not actually a Series - see (Pandas : to_csv() got an unexpected keyword argument)
Your numpy version. For me, updating to numpy==1.20.1 with pandas==1.2.2 fixed the problem. If you are using Jupyter notebooks, remember to restart the kernel afterwards.
In the end what worked was to use pd.DataFrame and then to export it as following:
to_export=pd.DataFrame(dfs['2019-06-23'])
to_export.to_csv('my_table.csv')
that suprised me because when I checked the type of the table when I got the error it was dataframe . However, this way it works.
If I analyze these two datasets individually, I don't get any error and the I also get the viz of all the integer columns.
But when I try to compare these dataframe, I get the below error.
Cannot convert series 'Web Visit' in COMPARED from its TYPE_CATEGORICAL
to the desired type TYPE_BOOL.
I also tried the FeatureConfig to skip it, but no avail.
pid_compare = sweetviz.compare([pdf,"234_7551009"],[pdf_2,"215_220941058"])
Maintainer of the lib here; this question was asked in the git also, but it will be useful to detail the answer here.
After looking at your data provided in the link above, it looks like the first dataframe (pdf) only contains 0 & 1, so it is classified as boolean so it cannot be compared against the second one which is categorical (that one has 0,1,2,3 as you probably know!).
The system will be able to handle it if you use FeatureConfig to force the first dataframe to be considered CATEGORICAL.
I just tried the following and it seems to work, let me know if it helps!
feature_config = sweetviz.FeatureConfig(force_cat = ["Web Desktop Interaction"])
report = sweetviz.compare(pdf, pdf_2, None, feature_config)
I´m new to programming. I´m trying to use scipy minimize, had several issues and gotten through most of them.
Right now this is the code, but I'm not understanding why I´m getting this error.
par_opt = so.minimize(fun=fun_obj, x0=par_ini, method='Nelder-Mead', args=[series_pt_cal, dt, series_caudal_cal])
Not enough info is given by the OP, but basically somewhere in the code it's specified to operate by data frame column (axis=1) on an object that is a Pandas Series. If the code typically works but occasional gives errors, check for degenerative cases where a data frame may have only 1 row. Pandas has a nasty habit of guessing what you want -- it may decide to reduce a 1-row data frame to a Series (e.g., the apply() function; you can disable that by using reduce=False in there).
Add a line of code to check the object is isinstance(df, pd.DataFrame) or else convert the offending pandas Series to a data frame, something like s.to_frame().T for the problems I had to deal with.
Use pd.DataFrame(df) before your so.minimize function.
Pandas wants to run on DataFrame for that function.
I am completely new to programming languages and I have picked up Python for backtesting a trading strategy (because I heard it is relatively easy). I have made some progress in learning the basics, however I am currently stuck at performing an ADFuller tests on a timeseries dataframe.
This is how my Dataframe looks
Now I need to run ADF test on the columns - "A-Btd", "A- Ctd" and so on (I have 66 columns like these).I would like to get the test statistic/output for each of them.
I tired using lines such as cadfs = [ts.adfuller(df1)]. Since, I lack the expertise I am not able to adjust the code as per my dataframe.
I apologize in advance if I have missed out some important information I have to give out. Please leave a comment and I will provide it asap.
Thanks a lot in advance!
If you have to do it for so many, I would try putting the results in a dict, like this:
import statsmodels.tsa.stattools as tsa
df = ... #load your dataframe
adf_results = {}
for col in df.columns.values: #or edit this for a subset of columns first
adf_results[col] = tsa.adfuller(df[col])
Obviously specifying other settings as desired, e.g. tsa.adfuller(df[col], autolag='BIC'). Or if you don't want all the output and would rather just parse each column to find out if it's stationary or not, the test statistic is the first entry in the tuple returned by adfuller(), so you could just use tsa.adfuller(df[col])[0] and test it against your threshold to get a boolean result, then make that the value in your dict.