Python: error loading JSON object - python

I'm trying to load the following JSON string in python:
{
"Motivo_da_Venda_Perdida":"",
"Data_Visita":"2015-03-17 08:09:55",
"Cliente":{
"Distribuidor1_Modelo":"",
"RG":"",
"Distribuidor1_Marca":"Selecione",
"PlataformaMilho1_Quantidade":"",
"Telefone_Fazenda":"",
"Pulverizador1_Quantidade":"",
"Endereco_Fazenda":"",
"Nome_Fazenda":"",
"Area_Total_Fazenda":"",
"PlataformaMilho1_Marca":"Selecione",
"Trator1_Modelo":"",
"Tipo_Cultura3":"Selecione",
"Tipo_Cultura4":"Selecione",
"Cultura2_Hectares":"",
"Colheitadeira1_Quantidade":"",
"Tipo_Cultura1":"Soja",
"Tipo_Cultura2":"Selecione",
"Plantadeira1_Marca":"Stara",
"Autopropelido1_Modelo":"",
"Email_Fazenda":"",
"Autopropelido1_Marca":"Stara",
"Distribuidor1_Quantidade":"",
"PlataformaMilho1_Modelo":"",
"Trator1_Marca":"Jonh deere",
"Email":"",
"CPF":"46621644000",
"Endereco_Rua":"PAQUINHAS, S/N",
"Caixa_Postal_Fazenda":"",
"Cidade_Fazenda":"",
"Plantadeira1_Quantidade":"",
"Colheitadeira1_Marca":"New holland",
"Data_Nascimento":"2015-02-20",
"Cultura4_Hectares":"",
"Nome_Cliente":"MILTON CASTIONE",
"Cep_Fazenda":"",
"Telefone":"5491290687",
"Cultura3_Hectares":"",
"Trator1_Quantidade":"",
"Cultura1_Hectares":"",
"Autopropelido1_Quantidade":"",
"Pulverizador1_Modelo":"",
"Caixa_Postal":"",
"Estado":"RS",
"Endereco_Numero":"",
"Cidade":"COLORADO",
"Colheitadeira1_Modelo":"",
"Pulverizador1_Marca":"Selecione",
"CEP":"99460000",
"Inscricao_Estadual":"0",
"Plantadeira1_Modelo":"",
"Estado_Fazenda":"RS",
"Bairro":""
},
"Quilometragem":"00",
"Modelo_Pretendido":"Selecione",
"Quantidade_Prevista_Aquisicao":"",
"Id_Revenda":"1",
"Contato":"05491290687",
"Pendencia_Para_Proxima_Visita":"",
"Data_Proxima_Visita":"2015-04-17 08:09:55",
"Valor_de_Venda":"",
"Maquina_Usada":"0",
"Id_Vendedor":"2",
"Propensao_Compra":"Propensao_Compra_Frio",
"Comentarios":"despertar compra",
"Sistema_Compra":"Sistema_Compra_Finame",
"Outro_Produto":"",
"Data_Prevista_Aquisicao":"2015-04-17 08:09:55",
"Objetivo_Visita":"Despertar_Interesse",
"Tipo_Contato":"Telefonico"}
however I get the following error when I try to load it
File "python_file.py", line 107, in busca_proxima_mensagem
Visita = json.loads(corpo)
File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 2 - line 6 column 84 (char 1 - 1020)
but this JSON seems to be valid according to this site: http://jsonformatter.curiousconcept.com/ What am I doing wrong? Why can't I load this string as a JSON object?
I'm trying to load the string from AWS SQS like this:
import json
...
result = fila.get_messages(1, 30, 'SentTimestamp')
for message in result:
corpo = message.get_body()
Visita = json.loads(corpo)
OK, so I figured out what is causing me problems: There is a slash as a value of a key
"Endereco_Rua":"PAQUINHAS, S/N",
However I'm telling python to filter that out (code below), but it's not working. How can I remove that? Can do it on the origin that created the data, as I don't have access to the interface the user uses to fill in.
result = fila.get_messages(1, 30, 'SentTimestamp')
for message in result:
corpo = message.get_body()
corpo = corpo.replace("/", "") #Filtering slashes
Visita = json.loads(corpo)

Found a solution! Beside the slash caracter, sometimes this error also happened with no visible cause. Ended up solving this by adding the following lines in my python code:
1) At the start of my code, along with other python imports
from boto.sqs.message import RawMessage
2) Changing my SQS queue to use/fetch raw data:
fila = sqs_conn.get_queue(constantes.fila_SQS)
fila.set_message_class(RawMessage)
Hope this helps anyone who is having the same issue.

Related

How to fill a column of type 'multi-select' in notion-py(Notion API)?

I am trying to create a telegram-bot that will create notes in notion, for this I use:
notion-py
pyTelegramBotAPI
then I connected my notion by adding token_v2, and then receiving data about the note that I want to save in notion, at the end I save a note on notion like this:
def make_notion_row():
collection_view = client.get_collection_view(list_url[temporary_category]) #take collection
print(temporary_category)
print(temporary_name)
print(temporary_link)
print(temporary_subcategory)
print(temporary_tag)
row = collection_view.collection.add_row() #make row
row.ssylka = temporary_link #this is link
row.nazvanie_zametki = temporary_name #this is name
if temporary_category == 0: #this is category, where do I want to save the note
row.stil = temporary_subcategory #this is subcategory
tags = temporary_tag.split(',') #temporary_tags is text that has many tags separated by commas. I want to get these tags as an array
for tag_one in tags:
**add_new_multi_select_value("Теги", tag_one): #"Теги" is "Tag column" in russian. in this situation, tag_one takes on the following values: ['my_hero_academia','midoria']**
else:
row.kategoria = temporary_subcategory
this script works, but the problem is filling in the Tags column which is of type multi-select.
Since in the readme 'notion-py', nothing was said about filling in the 'multi-select', therefore
I used the bkiac function:https://github.com/jamalex/notion-py/issues/51
here is the slightly modified by me ​function:
art_tags = ['ryuko_matoi', 'kill_la_kill']
def add_new_multi_select_value(prop, value, style=None):
​global temporary_prop_schema
​if style is None:
​style = choice(art_tags)
​collection_schema = collection_view.collection.get(["schema"])
​prop_schema = next(
​(v for k, v in collection_schema.items() if v["name"] == prop), None
​)
​if not prop_schema:
​raise ValueError(
​f'"{prop}" property does not exist on the collection!'
​)
​if prop_schema["type"] != "multi_select":
​raise ValueError(f'"{prop}" is not a multi select property!')
​dupe = next(
​(o for o in prop_schema["options"] if o["value"] == value), None
​)
​if dupe:
​raise ValueError(f'"{value}" already exists in the schema!')
​temporary_prop_schema = prop_schema
​prop_schema["options"].append(
​{"id": str(uuid1()), "value": value, "style": style}
​)
​collection.set("schema", collection_schema)`
But it turned out that this function does not work, and gives the following error:
add_new_multi_select_value("Теги","my_hero_academia)
Traceback (most recent call last):
​File "<pyshell#4>", line 1, in <module>
​add_new_multi_select_value("Теги","my_hero_academia)
​File "C:\Users\laere\OneDrive\Documents\Programming\Other\notion-bot\program\notionbot\test.py", line 53, in add_new_multi_select_value
​collection.set("schema", collection_schema)
​File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\records.py", line 115, in set
​self._client.submit_transaction(
​File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\client.py", line 290, in submit_transaction
​self.post("submitTransaction", data)
​File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\client.py", line 260, in post
​raise HTTPError(
requests.exceptions.HTTPError: Unsaved transactions: Not allowed to edit column: schema
this is my table image: link
this is my telegram chatting to bot: link
Honestly, I don’t know how to solve this problem, the question is how to fill a column of type 'multi-select'?
I solved this problem using this command
row.set_property("Категория", temporary_subcategory)
and do not be afraid if there is an error "options ..." this can be solved by adding settings for the 'multi-select' field.

json.loads(json_string) results in JSONDecodeError: Extra data

I am new to JSON in general and having no idea why this fails. My thoughts where that is has something to do with the double quotes.
This is the JSON String i want to load (printed with print(resultJson)):
"{"Paging":{"PageSize":20,"PageIndex":1,"TotalRecords":1},"Result":{"VerifyResult":1,"Data":[{"Id":"a2b6a53eb992b6a9e682f81450b39ce3","RegGov":"龙湖海关","TradeType":"进出口收发货人","RegDate":"2002-08-30"}]},"Status":"200","Message":"查询成功","OrderNumber":"..."}"
This is my code to do it:
# Get the Results:
print(response.status_code)
resultJson = json.dumps(str(response.content, encoding = encode))
# convert unicode to chinese
resultJson = resultJson.encode(encode).decode("unicode-escape")
print(resultJson)
print(json.loads(resultJson)["Result"])
This is the Exception:
File "C:\[...]\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
JSONDecodeError: Extra data
What do i need to change? I am i converting/decoding something wrong?

json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) - Error only occurs when code is nested

I am currently working on a project that retrieves data about car auctions, I have it set up to request a custom Ebay URL that uses their api, I request the page and convert it to a JSON for handling. The code runs with no errors at all if the code is by itself but if I put it within a function or within a conditional statement or anything else that means it is nested it will give me the JSON error
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
My code is here, however I dont know if it is an issue with my code as it works fine when it is not within a function
ebayurl = "http://svcs.ebay.com/services/search/FindingService/v1?\
SECURITY-APPNAME=KyleOsbo-CarSearc-PRD-adf6708f9-c75353fe\
&OPERATION-NAME=findItemsAdvanced\
&SERVICE-VERSION=1.13.0\
&GLOBAL-ID=EBAY-GB\
&RESPONSE-DATA-FORMAT=JSON\
&REST-PAYLOAD\
&categoryId(0)=9801\
&outputSelector(0)=SellerInfo\
&keywords="+"honda%20civic" #The custom url was created based on my needs, I only want to search ebay uk and only within the cars category
apiResult = requests.get(ebayurl) #Request the custom url
parsedresult = apiResult.json() #Convert url to json format in order to extract information easier
for item in (parsedresult["findItemsAdvancedResponse"][0]["searchResult"][0]["item"]): #JSON is set up as multi dimensional array, looks within it to extract values
title = item["title"][0]
price = item["sellingStatus"][0]["convertedCurrentPrice"][0]["__value__"]
itemURL = item["viewItemURL"][0]
location = item["location"][0]
itemid = item["itemId"][0]
with sqlite3.connect("results.db") as db: #Connecting to table ready to insert new records
cursor = db.cursor()
values = (itemid, title, price, location, itemURL) #Declaring values that will be inserted,preventing sql injection, these values will change upon every iteration
sql = """ INSERT INTO ebay_results(item_id, title, price, location, itemURL)
VALUES(?,?,?,?,?)
"""
cursor.execute(sql, values) #Inserts a new record for every item found
db.commit()
The error occurs at
parsedresult = apiResult.json()
Traceback (most recent call last):
File "C:\Users\ikoze\Documents\Computer Science\Coursework
files\carrySearch.py", line 94, in <module>
parsedresult = apiResult.json() #Convert url to json format in order to
extract information easier
File "C:\Users\ikoze\AppData\Local\Programs\Python\Python36-32\lib\site-
packages\requests\models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "C:\Users\ikoze\AppData\Local\Programs\Python\Python36-
32\lib\json__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "C:\Users\ikoze\AppData\Local\Programs\Python\Python36-
32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\ikoze\AppData\Local\Programs\Python\Python36-
32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

HBase-HappyBase : Socket Timeout Error For Larger Files - Works Good With Smaller one's

I use following piece of python code using happybase module to update hbase. This works perfectly for a file less than 30k records. But throws timeout error when exceeds 30k-35k. I tried options informed in other stack questions by editing hbase_site.xml and few other stuffs. But no help. Did anyone come across same issue ?
import happybase as hb
def loadIdPHSegmentPyBase() :
s = socket.socket()
s.settimeout(300)
connection = hb.Connection('XXXXX',9090,timeout=None,compat='0.92',transport='buffered')
table = connection.table('HBASE_D_L')
ReqFileToLoad = ("%segment.txt" %(dirName))
b = table.batch()
with open('%s' %(ReqFileToLoad)) as ffile1 :
for line in ffile1 :
line = line.strip()
line = line.split('|')
#print line[7] ,
if line[7] == 'PH' :
b.put(line[0],{'ADDR_IDPH:PHMIDDLE_NAME':line[1],'ADDR_IDPH:PHSUR_NAME' :line[2],'ADDR_IDPH:PHFIRST_NAME' :line[3],'ADDR_IDPH:PHFILLER1' :line[4],'ADDR_IDPH:PHFILLER2' :line[5],'ADDR_IDPH:PHFILLER3' :line[6],'ADDR_IDPH:TELEPHONE_SUBSEGMENT_ID' :line[7],'ADDR_IDPH:TELEPHONE_TYPE_CODE' :line[8],'ADDR_IDPH:PUBLISHED_INDICATOR' :line[9],'ADDR_IDPH:TELEPHONE_NUMBER' :line[10]})
else :
b.put(line[0],{'ADDR_IDPH:IDMIDDLE_NAME':line[1],'ADDR_IDPH:IDSUR_NAME' :line[2],'ADDR_IDPH:IDFIRST_NAME' :line[3],'ADDR_IDPH:IDFILLER1' :line[4],'ADDR_IDPH:IDFILLER2' :line[5],'ADDR_IDPH:IDFILLER3' :line[6],'ADDR_IDPH:IDSUBSEGMENT_IDENTIFIER' :line[7],'ADDR_IDPH:ID_TYPE' :line[8],'ADDR_IDPH:ID_VALIDITY_INDICATOR' :line[9],'ADDR_IDPH:ID_VALUE' :line[11]})
b.send()
s.close()
My error with larger files :
File "thriftpy/protocol/cybin/cybin.pyx", line 429, in cybin.TCyBinaryProtocol.read_message_begin (thriftpy/protocol/cybin/cybin.c:6325)
File "thriftpy/protocol/cybin/cybin.pyx", line 60, in cybin.read_i32 (thriftpy/protocol/cybin/cybin.c:1546)
File "thriftpy/transport/buffered/cybuffered.pyx", line 65, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.c_read (thriftpy/transport/buffered/cybuffered.c:1881)
File "thriftpy/transport/buffered/cybuffered.pyx", line 69, in thriftpy.transport.buffered.cybuffered.TCyBufferedTransport.read_trans (thriftpy/transport/buffered/cybuffered.c:1948)
File "thriftpy/transport/cybase.pyx", line 61, in thriftpy.transport.cybase.TCyBuffer.read_trans (thriftpy/transport/cybase.c:1472)
File "/usr/local/python27/lib/python2.7/site-packages/thriftpy/transport/socket.py", line 108, in read
buff = self.sock.recv(sz)
socket.timeout: timed out
This was how it got resolved :
with open('%s' %(ReqFileToLoad)) as ffile1 :
for line in ffile1 :
line = line.strip()
line = line.split('|')
#print line[7] ,
if line[7] == 'PH' :
b = table.batch()
b.put(line[0],{'ADDR_IDPH:PHMIDDLE_NAME':line[1],'ADDR_IDPH:PHSUR_NAME' :line[2],'ADDR_IDPH:PHFIRST_NAME' :line[3],'ADDR_IDPH:PHFILLER1' :line[4],'ADDR_IDPH:PHFILLER2' :line[5],'ADDR_IDPH:PHFILLER3' :line[6],'ADDR_IDPH:TELEPHONE_SUBSEGMENT_ID' :line[7],'ADDR_IDPH:TELEPHONE_TYPE_CODE' :line[8],'ADDR_IDPH:PUBLISHED_INDICATOR' :line[9],'ADDR_IDPH:TELEPHONE_NUMBER' :line[10]})
else :
b = table.batch()
b.put(line[0],{'ADDR_IDPH:IDMIDDLE_NAME':line[1],'ADDR_IDPH:IDSUR_NAME' :line[2],'ADDR_IDPH:IDFIRST_NAME' :line[3],'ADDR_IDPH:IDFILLER1' :line[4],'ADDR_IDPH:IDFILLER2' :line[5],'ADDR_IDPH:IDFILLER3' :line[6],'ADDR_IDPH:IDSUBSEGMENT_IDENTIFIER' :line[7],'ADDR_IDPH:ID_TYPE' :line[8],'ADDR_IDPH:ID_VALIDITY_INDICATOR' :line[9],'ADDR_IDPH:ID_VALUE' :line[11]})
b.send()
i suggest that you use smaller batch sizes, or that you do not use a batch at all. batching is a client-side buffer without any limits, so it can cause huge thrift requests when it is sent. happybase also provides a helper for this: you can specify batch_size and the batch will be periodically flushed.
https://happybase.readthedocs.io/en/latest/api.html#happybase.Table.batch

How do I handle an accented character on a batch database import in Python and Postgres

When running a batch import script in Python (openblock), I'm getting the following invalid byte sequence for encoding "UTF8": 0xca4e error for an accented character:
It shows up as:
GRAND-CH?NE, COUR DU
But is actually "GRAND-CHÊNE, COUR DU"
What is the best way to handle this? Ideally I'd like to keep the accented character. I suspect I need to encode it somehow?
Edit: the ? is actually supposed to be Ê. Also note that the variable is coming from an ESRI Shapefile. When I try davidcrow's solution, I get "Unicode not supported", because presumably the strings that don't have accented characters are already Unicode strings.
Here's the ESRIImporter code I'm using:
from django.contrib.gis.gdal import DataSource
class EsriImporter(object):
def __init__(self, shapefile, city=None, layer_id=0):
print >> sys.stderr, 'Opening %s' % shapefile
ds = DataSource(shapefile)
self.layer = ds[layer_id]
self.city = "OTTAWA" #city and city or Metro.objects.get_current().name
self.fcc_pat = re.compile('^(' + '|'.join(VALID_FCC_PREFIXES) + ')\d$')
def save(self, verbose=False):
alt_names_suff = ('',)
num_created = 0
for i, feature in enumerate(self.layer):
#if not self.fcc_pat.search(feature.get('FCC')):
# continue
parent_id = None
fields = {}
for esri_fieldname, block_fieldname in FIELD_MAP.items():
value = feature.get(esri_fieldname)
#print >> sys.stderr, 'Looking at %s' % esri_fieldname
if isinstance(value, basestring):
value = value.upper()
elif isinstance(value, int) and value == 0:
value = None
fields[block_fieldname] = value
if not ((fields['left_from_num'] and fields['left_to_num']) or
(fields['right_from_num'] and fields['right_to_num'])):
continue
# Sometimes the "from" number is greater than the "to"
# number in the source data, so we swap them into proper
# ordering
for side in ('left', 'right'):
from_key, to_key = '%s_from_num' % side, '%s_to_num' % side
if fields[from_key] > fields[to_key]:
fields[from_key], fields[to_key] = fields[to_key], fields[from_key]
if feature.geom.geom_name != 'LINESTRING':
continue
for suffix in alt_names_suff:
name_fields = {}
for esri_fieldname, block_fieldname in NAME_FIELD_MAP.items():
key = esri_fieldname + suffix
name_fields[block_fieldname] = feature.get(key).upper()
#if block_fieldname == 'postdir':
#print >> sys.stderr, 'Postdir block %s' % name_fields[block_fieldname]
if not name_fields['street']:
continue
# Skip blocks with bare number street names and no suffix / type
if not name_fields['suffix'] and re.search('^\d+$', name_fields['street']):
continue
fields.update(name_fields)
block = Block(**fields)
block.geom = feature.geom.geos
print repr(fields['street'])
print >> sys.stderr, 'Looking at block %s' % unicode(fields['street'], errors='replace' )
street_name, block_name = make_pretty_name(
fields['left_from_num'],
fields['left_to_num'],
fields['right_from_num'],
fields['right_to_num'],
'',
fields['street'],
fields['suffix'],
fields['postdir']
)
block.pretty_name = unicode(block_name)
#print >> sys.stderr, 'Looking at block pretty name %s' % fields['street']
block.street_pretty_name = street_name
block.street_slug = slugify(' '.join((unicode(fields['street'], errors='replace' ), fields['suffix'])))
block.save()
if parent_id is None:
parent_id = block.id
else:
block.parent_id = parent_id
block.save()
num_created += 1
if verbose:
print >> sys.stderr, 'Created block %s' % block
return num_created
Output:
'GRAND-CH\xcaNE, COUR DU'
Looking at block GRAND-CH�NE, COUR DU
Traceback (most recent call last):
File "../blocks_ottawa.py", line 144, in <module>
sys.exit(main())
File "../blocks_ottawa.py", line 139, in main
num_created = esri.save(options.verbose)
File "../blocks_ottawa.py", line 114, in save
block.save()
File "/home/chris/openblock/src/django/django/db/models/base.py", line 434, in save
self.save_base(using=using, force_insert=force_insert, force_update=force_update)
File "/home/chris/openblock/src/django/django/db/models/base.py", line 527, in save_base
result = manager._insert(values, return_id=update_pk, using=using)
File "/home/chris/openblock/src/django/django/db/models/manager.py", line 195, in _insert
return insert_query(self.model, values, **kwargs)
File "/home/chris/openblock/src/django/django/db/models/query.py", line 1479, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File "/home/chris/openblock/src/django/django/db/models/sql/compiler.py", line 783, in execute_sql
cursor = super(SQLInsertCompiler, self).execute_sql(None)
File "/home/chris/openblock/src/django/django/db/models/sql/compiler.py", line 727, in execute_sql
cursor.execute(sql, params)
File "/home/chris/openblock/src/django/django/db/backends/util.py", line 15, in execute
return self.cursor.execute(sql, params)
File "/home/chris/openblock/src/django/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
return self.cursor.execute(query, args)
django.db.utils.DatabaseError: invalid byte sequence for encoding "UTF8": 0xca4e
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
More information please. What platform - Windows / Linux / ???
What version of Python?
If you are running Windows, your encoding is much more likely to be cp1252 or similar than ISO-8859-1. It's definitely not UTF-8.
You will need to: (1) Find out what your input data is encoded with. Try cp1252; it's the usual suspect. (2) decode your data into unicode (3) encode it into UTF-8.
How are you getting the data out of your ESRI shapefile? Show your code. Show the full traceback and error message. To avoid visual problems (it's E-grave! no, it's E-acute!) print repr(the_suspect_data) and copy/paste the result into an edit of your question. Go easy on the bold type.
Looks like the data isn't being sent as UTF-8... so check the client_encoding parameter in your DB session matches your data, or translate it to UTF-8/Unicode within Python when reading the file.
You can change the DB session's client encoding using "SET client_encoding = 'ISO-8859-1'" or similar. 0xca isn't E-with-grave in Latin1, though, so I'm not sure which character encoding your file is in?
You can try something like:
uString = unicode(item.field, "utf-8")
See http://evanjones.ca/python-utf8.html for more details about Unicode and Python.

Categories

Resources