BulkIndexError: ('2 document(s) failed to index.') - Elasticsearch + Python

BulkIndexError: ('2 document(s) failed to index.') - Elasticsearch + Python - python

At first I found some null values in my preprocessed data, so removed those.
(Here's my Data Cleaning Code - and the respective outputs enclosed in '''Comments''')
Cleaning and Preprocessing
df_merged[df_merged.abstract_x != df_merged.abstract_y].shape
#this means out of the 25000 samples, abstract is not matching between metadata and pdf data
'''(22728, 22)'''
# check metadata abstract column to see if null values exist
df_merged.abstract_x.isnull().sum()
'''3363'''
# Check pdf_json abstract to see if null values exist
df_merged.abstract_y.isnull().sum()
'''0'''
#Since the abstract_x from metadata is more reliable , we will use it but only fill by abstract_y text when abstract_x value is null
# Convert all columns to string and then replace abstract_y values
#df = df.astype(str)
df_merged['abstract_y'] = df_merged['abstract_y'].astype(str)
df_merged['abstract_y'] = np.where(df_merged['abstract_y'].map(len) > 50, df_merged['abstract_y'], 'na')
df_merged.loc[df_merged.abstract_x.isnull() & (df_merged.abstract_y != 'na'), 'abstract_x'] = df_merged[df_merged.abstract_x.isnull() & (df_merged.abstract_y != 'na')].abstract_y #we want to overwrite the abstract_x column and abstract_y has to be not na
df_merged.abstract_x.isnull().sum()
'''
2745
'''
df_merged.rename(columns={'abstract_x': 'abstract'}, inplace=True)
df_merged.columns
'''
Index(['cord_uid', 'sha', 'source_x', 'title', 'doi', 'pmcid', 'pubmed_id',
'license', 'abstract', 'publish_time', 'authors', 'journal', 'mag_id',
'who_covidence_id', 'arxiv_id', 'pdf_json_files', 'pmc_json_files',
'url', 's2_id', 'abstract_y', 'body_text_x', 'body_text_y'],
dtype='object')
'''
df_merged = df_merged.drop(['abstract_y'], axis=1)
df_merged.columns
'''
Index(['cord_uid', 'sha', 'source_x', 'title', 'doi', 'pmcid', 'pubmed_id',
'license', 'abstract', 'publish_time', 'authors', 'journal', 'mag_id',
'who_covidence_id', 'arxiv_id', 'pdf_json_files', 'pmc_json_files',
'url', 's2_id', 'body_text_x', 'body_text_y'],
dtype='object')
'''
(df_merged.body_text_x != df_merged.body_text_y).sum()
'''25000'''
df_merged.body_text_x.isnull().sum()
'''1526'''
df_merged.body_text_y.isnull().sum()
'''5238'''
df_merged[df_merged.body_text_x.isnull() & df_merged.body_text_y.notnull()].shape
'''(1447, 21)'''
#when the body_text_y is not null, we'll be putting, bodytext y into x
df_merged.loc[df_merged.body_text_y.notnull(), 'body_text_x'] = df_merged.loc[df_merged.body_text_y.notnull(), 'body_text_y']
df_merged.body_text_x.isnull().sum()
'''79'''
df_merged.columns
'''
Index(['cord_uid', 'sha', 'source_x', 'title', 'doi', 'pmcid', 'pubmed_id',
'license', 'abstract', 'publish_time', 'authors', 'journal', 'mag_id',
'who_covidence_id', 'arxiv_id', 'pdf_json_files', 'pmc_json_files',
'url', 's2_id', 'body_text_x', 'body_text_y'],
dtype='object')
'''
df_merged.rename(columns={'body_text_x': 'body_text'}, inplace=True)
df_merged = df_merged.drop(['body_text_y'], axis=1)
df_merged.columns
'''
Index(['cord_uid', 'sha', 'source_x', 'title', 'doi', 'pmcid', 'pubmed_id',
'license', 'abstract', 'publish_time', 'authors', 'journal', 'mag_id',
'who_covidence_id', 'arxiv_id', 'pdf_json_files', 'pmc_json_files',
'url', 's2_id', 'body_text'],
dtype='object')
'''
df_final = df_merged[['sha', 'title', 'abstract', 'publish_time', 'authors', 'url', 'body_text']]
df_final.head()
sha title abstract publish_time authors url body_text
0 1cbf95a2c3a39e5cc80a5c4c6dbcec7cc718fd59 Genomic Evolution of Severe Acute Respiratory ... Abstract Recent emergence of severe acute resp... 2020-08-31 Jacob, Jobin John; Vasudevan, Karthick; Veerar... https://api.elsevier.com/content/article/pii/S... The outbreak of severe acute respiratory syndr...
1 7dc6943ca46a1093ece2594002d61efdf9f51f28 Impact of COVID-19 on COPD and Asthma admissio... Asthma and Chronic Obstructive Pulmonary Disea... 2020-12-10 Sykes, Dominic L; Faruqi, Shoaib; Holdsworth, ... https://www.ncbi.nlm.nih.gov/pubmed/33575313/;... The COVID-19 pandemic has led to an overall re...
2 5b127336f68f3dca83981d0142eda472634378f0 Programmable System of Cas13-Mediated RNA Modi... Clustered regularly interspaced short palindro... 2021-07-27 Tang, Tian; Han, Yingli; Wang, Yuran; Huang, H... https://www.ncbi.nlm.nih.gov/pubmed/34386490/;... Prokaryotic clustered regularly interspaced sh...
3 aafbe282248436380dd737bae844725882df2249 Are You Tired of Working amid the Pandemic? Th... With the outbreak of novel coronavirus in 2019... 2020-12-09 Chen, Huaruo; Liu, Fan; Pang, Liman; Liu, Fei;... https://doi.org/10.3390/ijerph17249188; https:... In the outbreak of novel coronavirus pneumonia...
4 4013a7e351c40d2bb7fdfe7f185d2ef9b1a872e6 Viral Sepsis in Children Sepsis in children is typically presumed to be... 2018-09-18 Gupta, Neha; Richter, Robert; Robert, Stephen;... https://www.ncbi.nlm.nih.gov/pubmed/30280095/;... The true incidence of viral sepsis, particular...
df_final = df_final.dropna(axis=0,subset=['abstract', 'body_text'])
df_final.isnull().sum()
'''
sha 0
title 0
abstract 0
publish_time 0
authors 104
url 0
body_text 0
dtype: int64
'''
df_final.shape
'''(22186, 7)'''
df_final.to_csv('FINAL_CORD_DATA.csv', index=False)
''')
Whenever I try to use the Sample Dataset that I created, in my es_populate notebook, using the sparse retriever, I keep getting
BulkIndexError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_19912/2533749049.py in <module>
----> 1 document_store.write_documents(final_dicts)
~\anaconda3\lib\site-packages\haystack\document_store\elasticsearch.py in write_documents(self, documents, index, batch_size, duplicate_documents)
426 # Pass batch_size number of documents to bulk
427 if len(documents_to_index) % batch_size == 0:
--> 428 bulk(self.client, documents_to_index, request_timeout=300, refresh=self.refresh_type)
429 documents_to_index = []
430
~\anaconda3\lib\site-packages\elasticsearch\helpers\actions.py in bulk(client, actions, stats_only, *args, **kwargs)
388 # make streaming_bulk yield successful results so we can count them
389 kwargs["yield_ok"] = True
--> 390 for ok, item in streaming_bulk(client, actions, *args, **kwargs):
391 # go through request-response pairs and detect failures
392 if not ok:
~\anaconda3\lib\site-packages\elasticsearch\helpers\actions.py in streaming_bulk(client, actions, chunk_size, max_chunk_bytes, raise_on_error, expand_action_callback, raise_on_exception, max_retries, initial_backoff, max_backoff, yield_ok, *args, **kwargs)
309
310 try:
--> 311 for data, (ok, info) in zip(
312 bulk_data,
313 _process_bulk_chunk(
~\anaconda3\lib\site-packages\elasticsearch\helpers\actions.py in _process_bulk_chunk(client, bulk_actions, bulk_data, raise_on_exception, raise_on_error, *args, **kwargs)
245 resp=resp, bulk_data=bulk_data, raise_on_error=raise_on_error
246 )
--> 247 for item in gen:
248 yield item
249
~\anaconda3\lib\site-packages\elasticsearch\helpers\actions.py in _process_bulk_chunk_success(resp, bulk_data, raise_on_error)
186
187 if errors:
--> 188 raise BulkIndexError("%i document(s) failed to index." % len(errors), errors)
189
190
BulkIndexError: ('2 document(s) failed to index.', [{'index': {'_index': 'document', '_type': '_doc', '_id': '9d04e1c37a299818d82416898ffe22d6', 'status': 400, 'error': {'type': 'mapper_parsing_exception', 'reason': 'failed to parse', 'caused_by': {'type': 'json_parse_exception', 'reason': "Non-standard token 'NaN': enable JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS to allow\n at [Source: (ByteArrayInputStream); line: 1, column: 217076]"}}, 'data': {'text': 'Increase
My method of using the document store was.
# Connect to Elasticsearch
from haystack.document_store import ElasticsearchDocumentStore
document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
C:\Users\manan\anaconda3\lib\site-packages\elasticsearch\connection\base.py:190: ElasticsearchDeprecationWarning: Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
warnings.warn(message, category=ElasticsearchDeprecationWarning)
02/20/2022 00:58:28 - INFO - elasticsearch - HEAD http://localhost:9200/ [status:200 request:0.227s]
02/20/2022 00:58:28 - INFO - elasticsearch - HEAD http://localhost:9200/document [status:200 request:0.015s]
02/20/2022 00:58:28 - INFO - elasticsearch - GET http://localhost:9200/document [status:200 request:0.011s]
02/20/2022 00:58:28 - INFO - elasticsearch - PUT http://localhost:9200/document/_mapping [status:200 request:0.087s]
02/20/2022 00:58:28 - INFO - elasticsearch - HEAD http://localhost:9200/label [status:200 request:0.006s]
document_store.write_documents(final_dicts)
02/20/2022 00:58:34 - INFO - elasticsearch - POST http://localhost:9200/_bulk?refresh=wait_for [status:200 request:3.887s]
02/20/2022 00:58:38 - INFO - elasticsearch - POST http://localhost:9200/_bulk?refresh=wait_for [status:200 request:3.464s]
followed by the above error.
I'm very new to this, and would appreciate any help that could come my way.

Related

How to pass multiple value in gitlab cicd variables

I have a multiple project in my gitlab repository wherein I do perform multiple commits when it requires.
I have develop a code in python through which I can get report of all the commits done by me in a csv format for all the projects available in gitlab repository as I have hard coded the the project ids in my python code as a LIST.
The Header of the csv file is : Date, submitted, gitlab_url, project, username, subject.
Now I want to run the pipeline manually by setting up an environment variable as 'Project_Ids'
and want to pass some of the project ids as value (More than one project id as a value) so that csv report should get generated for only these projects which has been passed as a value in environment variable.
so My question is , How can I pass multiple project ids as a value in 'Project_Ids' key while running the pipeline manually.
enter image description here
import gitlab
import os
import datetime
import csv
import re
Project_id_list = ['9427','8401','17937','26813','24899','23729','34779','27638','28600']
headerList = ['Date', 'Submitted', 'Gitlab_url', 'Project', 'Branch', 'Status', 'Username', 'Ticket', 'Subject']
filename = 'mydemo_{}'.format(datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S'))
# private token authentication
gl = gitlab.Gitlab('https://main.gitlab.in.com/', private_token="MLyWwLyEhU2zZjjjhZXog")
gl.auth()
# list all projects
for m in Project_id_list:
i=0
if (i<len(Project_id_list)):
i=+1
print(m)
projects = gl.projects.get(m)
commits = projects.commits.list(all=True, query_parameters={'ref_name': 'master'})
with open(f"{filename}_{m}.csv", 'w', newline="") as file:
dw = csv.DictWriter(file, delimiter=',',
fieldnames=headerList)
dw.writeheader()
for commit in commits:
print(commit)
msg = commit.message
if 'master' in msg or 'LCS-' in msg:
projectName = projects.path_with_namespace
branch = 'master'
status = 'merged'
date = commit.committed_date.split('T')[0]
submitted1 = commit.created_at.split('T')[1]
submitted = submitted1.split('.000')[0]
Gitlab_url = commit.web_url.split('-')[0]
username = commit.author_name
subject = commit.title
subject1 = commit.message.splitlines()
print(subject1)
subject2 = subject1[0:3]
print(subject2)
subject3 = ' '.join(subject2)
print(subject3)
match = re.search('S-\d+', subject3)
if match:
ticket = match.group(0)
ticket_url = 'https://.in.com/browse/' + str(ticket)
ticket1 = ticket_url
dw.writerow({'Date': date, 'Submitted': submitted, 'Gitlab_url': Gitlab_url, 'Project': projectName,
'Branch': branch, 'Status': status, 'Username': username, 'Ticket': ticket1,
'Subject': subject3})
else:
ticket1 = 'Not Found'
dw.writerow({'Date': date, 'Submitted': submitted, 'Gitlab_url': Gitlab_url, 'Project': projectName,
'Branch': branch, 'Status': status, 'Username': username, 'Ticket': ticket1,
'Subject': subject3})

Just use a space or some other delimiter in the variable value. For example, a string like 123 456 789
Then in Python, simply parse the variable. For example, using the string .split method to split on whitespace.
import os
...
project_ids_variable = os.environ.get('PROJECT_IDS', '') # '123 456 789'
project_ids = project_ids_variable.split() # ['123', '456', '789']
for project_id in project_ids:
project = gl.projects.get(project_id)
print(project)

Huggingface SciBERT predict masked word not working

I am trying to use the pretrained SciBERT model (https://huggingface.co/allenai/scibert_scivocab_uncased) from Huggingface to predict masked words in scientific/biomedical text. This produces errors, and not sure how to move forward from this point.
Here is the code so far -
!pip install transformers
from transformers import pipeline, AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")
model = AutoModel.from_pretrained("allenai/scibert_scivocab_uncased")
unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
unmasker("the patient is a 55 year old [MASK] admitted with pneumonia")
This works with BERT alone, but is not the specialized pre-trained model -
!pip install transformers
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("the patient is a 55 year old [MASK] admitted with pneumonia")
The errors with SciBERT are -
/usr/local/lib/python3.7/dist-packages/transformers/pipelines/__init__.py in pipeline(task, model, config, tokenizer, feature_extractor, framework, revision, use_fast, use_auth_token, model_kwargs, **kwargs)
494 kwargs["feature_extractor"] = feature_extractor
495
--> 496 return task_class(model=model, framework=framework, task=task, **kwargs)
/usr/local/lib/python3.7/dist-packages/transformers/pipelines/fill_mask.py in __init__(self, model, tokenizer, modelcard, framework, args_parser, device, top_k, task)
73 )
74
---> 75 self.check_model_type(TF_MODEL_WITH_LM_HEAD_MAPPING if self.framework == "tf" else MODEL_FOR_MASKED_LM_MAPPING)
76 self.top_k = top_k
77
/usr/local/lib/python3.7/dist-packages/transformers/pipelines/base.py in check_model_type(self, supported_models)
652 self.task,
653 self.model.base_model_prefix,
--> 654 f"The model '{self.model.__class__.__name__}' is not supported for {self.task}. Supported models are {supported_models}",
655 )
656
PipelineException: The model 'BertModel' is not supported for fill-mask. Supported models are ['BigBirdForMaskedLM', 'Wav2Vec2ForMaskedLM', 'ConvBertForMaskedLM', 'LayoutLMForMaskedLM', 'DistilBertForMaskedLM', 'AlbertForMaskedLM', 'BartForConditionalGeneration', 'MBartForConditionalGeneration', 'CamembertForMaskedLM', 'XLMRobertaForMaskedLM', 'LongformerForMaskedLM', 'RobertaForMaskedLM', 'SqueezeBertForMaskedLM', 'BertForMaskedLM', 'MegatronBertForMaskedLM', 'MobileBertForMaskedLM', 'FlaubertWithLMHeadModel', 'XLMWithLMHeadModel', 'ElectraForMaskedLM', 'ReformerForMaskedLM', 'FunnelForMaskedLM', 'MPNetForMaskedLM', 'TapasForMaskedLM', 'DebertaForMaskedLM', 'DebertaV2ForMaskedLM', 'IBertForMaskedLM']

As the error message tells you, you need to use AutoModelForMaskedLM:
from transformers import pipeline, AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("allenai/scibert_scivocab_uncased")
model = AutoModelForMaskedLM.from_pretrained("allenai/scibert_scivocab_uncased")
unmasker = pipeline('fill-mask', model=model, tokenizer=tokenizer)
unmasker("the patient is a 55 year old [MASK] admitted with pneumonia")
Output:
[{'sequence': 'the patient is a 55 year old woman admitted with pneumonia',
'score': 0.4025486707687378,
'token': 10221,
'token_str': 'woman'},
{'sequence': 'the patient is a 55 year old man admitted with pneumonia',
'score': 0.23970800638198853,
'token': 508,
'token_str': 'man'},
{'sequence': 'the patient is a 55 year old female admitted with pneumonia',
'score': 0.15444642305374146,
'token': 3672,
'token_str': 'female'},
{'sequence': 'the patient is a 55 year old male admitted with pneumonia',
'score': 0.1111455038189888,
'token': 3398,
'token_str': 'male'},
{'sequence': 'the patient is a 55 year old boy admitted with pneumonia',
'score': 0.015877680853009224,
'token': 12481,
'token_str': 'boy'}]

Odoo - Confirm Sales Order

Hi I am using Odoo10 and trying to create a sales order in POS, below code creates the Sales orders quotation. I want to Confirm the Sale and create Sales order not quotation.
#api.model
def create_sales_order(self, orderline, customer_id, sign):
sale_pool = self.env['sale.order']
prod_pool = self.env['product.product']
sale_line_pool = self.env['sale.order.line']
sale_no = ''
sale = {}
if customer_id:
customer_id = int(customer_id)
sale = {'partner_id': customer_id,
'partner_invoice_id': customer_id,
'partner_shipping_id': customer_id,
'signature': sign}
sale_id = sale_pool.create(sale)
if sale_id:
sale_brw = sale_id
sale_brw.onchange_partner_id()
#create sale order line
for line in orderline:
sale_line = {}
if line.get('product_id'):
prod_rec = prod_pool.browse(line['product_id'])
sale_line.update({'name': prod_rec.name or False,
'product_id': prod_rec.id,
'product_uom_qty': line['qty'],
'discount': line.get('discount'),
'order_id': sale_id.id})
sale_line_id = sale_line_pool.create(sale_line)
for line in sale_line_id:
line.product_id_change()
return {"name": sale_brw.name, "id": sale_brw.id }
How do I create sales order not quotation?

Short answer: Set state to "sale":
sale = {'partner_id': customer_id,
'partner_invoice_id': customer_id,
'partner_shipping_id': customer_id,
'signature': sign,
'state': 'sale'}
Sale orders and Quotations are saved on the same model (namely, sale.order) you can tell whether it is a SO or a Quote by looking at its state:
state | Meaning
-------|--------
draft | Quotation
sent | Quotation Sent
sale | Sales Order
done | Locked
cancel | Cancelled
Also, you can look at the function action_confirm which is triggered by clicking on the Confirm Sale button on addons/sale/models/sale.py file:
445 def action_done(self):
446 return self.write({'state': 'done'})
...
451 #api.multi
452 def action_confirm(self):
453 for order in self:
454 order.state = 'sale'
455 order.confirmation_date = fields.Datetime.now()
456 if self.env.context.get('send_email'):
457 self.force_quotation_send()
458 order.order_line._action_procurement_create()
459 if self.env['ir.values'].get_default('sale.config.settings', 'auto_done_setting'):
460 self.action_done()
461 return True

I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException

While trying to use to_gbq for updating Google BigQuery table, I get a response of:
GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
My code:
gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)
and my dataframe of mini_df looks like:
date request_number name feature_name value_name value
2018-01-10 1 1 "a" "b" 0.309457
2018-01-10 1 1 "c" "d" 0.273748
While I'm running the to_gbq, and there's no table on the BigQuery, I can see that the table is created with the next schema:
date STRING NULLABLE
request_number STRING NULLABLE
name STRING NULLABLE
feature_name STRING NULLABLE
value_name STRING NULLABLE
value FLOAT NULLABLE
What am I doing wrong? How can I solve this?
P.S, rest of the exception:
BadRequest Traceback (most recent call last)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
589 destination_table,
--> 590 job_config=job_config).result()
591 except self.http_error as ex:
~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
527 # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528 return super(_AsyncJob, self).result(timeout=timeout)
529
~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
110 # Pylint doesn't recognize that this is valid in this case.
--> 111 raise self._exception
112
BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.
During handling of the above exception, another exception occurred:
GenericGBQException Traceback (most recent call last)
<ipython-input-28-195df93249b6> in <module>()
----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
106 chunksize=chunksize,
107 verbose=verbose, reauth=reauth,
--> 108 if_exists=if_exists, private_key=private_key)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver)
987 table.create(table_id, table_schema)
988
--> 989 connector.load_data(dataframe, dataset_id, table_id, chunksize)
990
991
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
590 job_config=job_config).result()
591 except self.http_error as ex:
--> 592 self.process_http_error(ex)
593
594 rows = []
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
454 # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
455
--> 456 raise GenericGBQException("Reason: {0}".format(ex))
457
458 def run_query(self, query, **kwargs):
GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

I've had the very same problem.
In my case it depended on the data type object of the Data Frame.
I've had three columns externalId, mappingId, info. For none of those fields I set a data type and let pandas do it's magic.
It decided to set all three column data types to object. Problem is, internally the to_gbq component uses the to_json component. For some reason or another this output omits the quotes around the data field if the type of the field is object but holds only numerical values.
So Google Big Query needed this
{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}
but got this:
{"externalId": 12345, "mappingId":"abc123", "info":"blerb"}
And because the mapping of the field was STRING in Google Big Query, the import process failed.
Two solutions came up.
Solution 1 - Change the data type of the column
A simple type conversion helped with this issue. I also had to change the data type in Big Query to INTEGER.
df['externalId'] = df['externalId'].astype('int')
If this is the case, Big Query can consume fields without quotes as the JSON standard says.
Solution 2 - Make sure the string field is a string
Again, this is setting the data type. But since we set it explicitly to String, the export with to_json prints out a quoted field and everything worked fine.
df['externalId'] = df['externalId'].astype('str')

Disable default behavior - ModelForm Select widget choices filled with referenced objects

I am using a form in Django that is based on a model. So it looks like this:
240 class EventDetailForm(NgFormValidationMixin, NgModelForm):
241 def __init__(self, *args, **kwargs):tha
242 super(EventDetailForm, self).__init__(*args, **kwargs)
243 self.fields['gallery'].queryset = Gallery.objects.none()
244
245 class Meta:
246 model = Event
247 fields = ('title', 'description', 'end_date', 'start_date', 'gallery', 'cover_photo')
248 widgets = {
249 'title': forms.TextInput(attrs={
250 'editable-detail': '',
251 }),
252 'description': forms.TextInput(attrs={
253 'class': 'panel-body',
254 'id': 'event-description-editable',
255 'editable-detail': '',
256 }),
257 'cover_photo': SelectWithDefaultOptions(attrs={
258 'class': 'chosen-select-no-single',
259 'id': 'select-cover-photo',
260 'data-placeholder': 'Select Cover Photo',
261 'style': 'width: 200px;',
262 'tabindex': '-1',
263 }),
264 'start_date': DateTimeWidget(attrs = {
265 'class': 'datetimepicker col-xs-6',
266 'id': 'event-start-date-editable',
267 'editable-detail': '',
268 }),
269 'end_date': DateTimeWidget(attrs = {
270 'class': 'datetimepicker col-xs-6',
271 'id': 'event-end-date-editable',
272 'editable-detail': '',
273 }),
274 'gallery': SelectWithDefaultOptions(attrs={
275 'class': 'chosen-select-no-single',
276 'id': 'select-galley',
277 'data-placeholder': 'Select Gallery',
278 'style': 'width: 200px;',
279 'gallery-select': '',
280 'tabindex': '-1',
281 'organisator-profile-specific': '',
282 }
283 }
So what happens is that my gallery and cover_photo select widgets get filled with all the existing objects of the two types ( because they are actually foreign keys to other models ).
I want to prevent that, and as you see on line 243 I have tried to delete the current queryset ( tried with cleaning choices too resulting in the same ) which works pretty well. The problem is that as you see I use my custom select widget in which I set some default options. It looks like this:
62 class SelectWithDefaultOptions(forms.Select):
63 def __init__(self, attrs=None, choices=()):
64 super(SelectWithDefaultOptions, self).__init__(attrs, choices)
65
66 choices = ('', 'empty') + choices
67 choices = ('None', 'no selection') + choices
The problem is that with the approach I mentioned above I delete those values.
So I said to myself "Well, I will get the needed values, erase all and put the preferred back". Tried it but it turned out that actually the objects that Django puts in are deleting the ones that have been set. ( adds the default ones after the init method has passed)
So I thought "Well, if I set choices=() in the initialisation of the widget (line 274), Django should not set any other values on top of that, because this will violate my choices" so I tried it, but it turned out that Django actually does not care about what choices I would like there to be and act the same.
Also tried to set the field's 'initial' property, still no results.
So, how do I prevent DJango default behviour of putting the referenced objects into the choices list of my select?
Thanks.

This is how I fixed it, with a little help from #AamirAdnan:
65 class SelectWithDefaultOptions(forms.Select):
66 def render(self, name, value, attrs=None, choices=()):
67 choices = ()
68 choices += (('empty', ''),)
69 choices += (('no selection', 'None'),)
70 self.choices = choices
71
72 if value is None:
73 value = ''
74
75 final_attrs = self.build_attrs(attrs, name=name)
76 output = [format_html('<select{0}>', flatatt(final_attrs))]
77 options = self.render_options((), [value])
78
79 if options:
80 output.append(options)
81 output.append('</select>')
82
83 return mark_safe('\n'.join(output))
Just improved the widget render function.
It is not exactly what was asked for, but it works. What happens is:
1. Django sets the choices value to the referenced objects
2. Before the widget is rendered, they are changed to the correct choices

Somehow the widget defined in the Meta is causing this issue. If you just moved it to __init__ it will work. First update your widget, The keyword choices in line def __init__(self, attrs=None, choices=()) means that the choices by default is empty but any instance may override it passing in some values. So you need to set it empty tuple explicitly:
class SelectWithDefaultOptions(forms.Select):
def __init__(self, attrs=None, choices=()):
super(SelectWithDefaultOptions, self).__init__(attrs, choices)
choices = () # explicitly setting choices to empty here
choices += (('', 'empty'),)
choices += (('None', 'no selection'),)
self.choices = choices
Now update your form to assign widget to gallery field in __init__ rather than in Meta class:
class EventDetailForm(NgFormValidationMixin, NgModelForm):
def __init__(self, *args, **kwargs):
super(EventDetailForm, self).__init__(*args, **kwargs)
self.fields['gallery'].widget = SelectWithDefaultOptions(attrs={
'class': 'chosen-select-no-single',
'id': 'select-galley',
'data-placeholder': 'Select Gallery',
'style': 'width: 200px;',
'gallery-select': '',
'tabindex': '-1',
'organisator-profile-specific': '',
}
class Meta:
model = Event
fields = ('title', 'description', 'end_date', 'start_date', 'gallery', 'cover_photo')
OR You don't need any custom widget at all. Just set the choices in forms __init__ method and replace the SelectWithDefaultOptions widget name with forms.Select in Meta (This is cleaner and simpler):
class EventDetailForm(NgFormValidationMixin, NgModelForm):
def __init__(self, *args, **kwargs):
super(EventDetailForm, self).__init__(*args, **kwargs)
self.fields['gallery'].widget.choices = (('', 'Empty',),)
class Meta:
model = Event
fields = ('title', 'description', 'end_date', 'start_date', 'gallery', 'cover_photo')
widgets = {
'gallery': forms.Select(attrs={
'class': 'chosen-select-no-single',
'id': 'select-galley',
'data-placeholder': 'Select Gallery',
'style': 'width: 200px;',
'gallery-select': '',
'tabindex': '-1',
'organisator-profile-specific': '',
}),
}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

BulkIndexError: ('2 document(s) failed to index.') - Elasticsearch + Python - python

Related

How to pass multiple value in gitlab cicd variables

Huggingface SciBERT predict masked word not working

Odoo - Confirm Sales Order

I use to_gbq on pandas for updating Google BigQuery and get GenericGBQException

Disable default behavior - ModelForm Select widget choices filled with referenced objects

Categories

Resources