Reading data from JSON file with Python

Reading data from JSON file with Python - python

I have a datra structure like this:
"37_7009": [
{
"viewport_dimensions": {"width": 1583, "height": 798},
"mouse_position": {"y": 1147, "x": 841},
"node_data_attrs": {"groupid": "id_FW13-e052-7009-08", "rankgroupid": "rank_37"}
}
]
with:
with gzip.GzipFile(args.file,'rb') as gzf:
all_hovers = json.load(gzf)
How can I read out the node_data_attrs values?
for cords in all_hovers[userID]:
x = cords["mouse_position"]["x"]
y = cords["mouse_position"]["y"]
viewport_x = cords["viewport_dimensions"]["width"]
viewport_y = cords["viewport_dimensions"]["height"]
data_attrs = cords["node_data_attrs"]["groupid"]
I get the following traceback:
Traceback (most recent call last):
File "opdracht2-3.py", line 86, in <module>
main()
File "opdracht2-3.py", line 66, in main
print cords["node_data_attrs"]["groupid"]
KeyError: 'groupid'
That doesn't work for reading the data... any suggestions?

Your code works just fine, it just appears that at least some of your data doesn't have a groupid key.
Use .get() to work around this:
for cords in all_hovers[userID]:
x = cords["mouse_position"]["x"]
y = cords["mouse_position"]["y"]
viewport_x = cords["viewport_dimensions"]["width"]
viewport_y = cords["viewport_dimensions"]["height"]
data_attrs = cords["node_data_attrs"].get("groupid")
This sets data_attrs to None if the key is missing. You can set it to a different default by passing in a second argument to dict.get():
data_attrs = cords["node_data_attrs"].get("groupid", 'default value')

Given the following sequence of numbers inserted into an AVL tree, indicate if the insertion results in no-rotate, rotate right, rotate left, double rotate left, and double rotate right.
9,8,7,6,2,3,4,5,11,1,12,23,24
List the BST tree in level order if no rotations are  done:
Root: 
L1:
L2:
...
List the AVL tree in level order if rotations are  done:
For the toolbar, press ALT+F10 (PC) or ALT+FN+F10 (Mac).

Related

How to fill a column of type 'multi-select' in notion-py(Notion API)?

I am trying to create a telegram-bot that will create notes in notion, for this I use:
notion-py
pyTelegramBotAPI
then I connected my notion by adding token_v2, and then receiving data about the note that I want to save in notion, at the end I save a note on notion like this:
def make_notion_row():
collection_view = client.get_collection_view(list_url[temporary_category]) #take collection
print(temporary_category)
print(temporary_name)
print(temporary_link)
print(temporary_subcategory)
print(temporary_tag)
row = collection_view.collection.add_row() #make row
row.ssylka = temporary_link #this is link
row.nazvanie_zametki = temporary_name #this is name
if temporary_category == 0: #this is category, where do I want to save the note
row.stil = temporary_subcategory #this is subcategory
tags = temporary_tag.split(',') #temporary_tags is text that has many tags separated by commas. I want to get these tags as an array
for tag_one in tags:
**add_new_multi_select_value("Теги", tag_one): #"Теги" is "Tag column" in russian. in this situation, tag_one takes on the following values: ['my_hero_academia','midoria']**
else:
row.kategoria = temporary_subcategory
this script works, but the problem is filling in the Tags column which is of type multi-select.
Since in the readme 'notion-py', nothing was said about filling in the 'multi-select', therefore
I used the bkiac function:https://github.com/jamalex/notion-py/issues/51
here is the slightly modified by me function:
art_tags = ['ryuko_matoi', 'kill_la_kill']
def add_new_multi_select_value(prop, value, style=None):
global temporary_prop_schema
if style is None:
style = choice(art_tags)
collection_schema = collection_view.collection.get(["schema"])
prop_schema = next(
(v for k, v in collection_schema.items() if v["name"] == prop), None
)
if not prop_schema:
raise ValueError(
f'"{prop}" property does not exist on the collection!'
)
if prop_schema["type"] != "multi_select":
raise ValueError(f'"{prop}" is not a multi select property!')
dupe = next(
(o for o in prop_schema["options"] if o["value"] == value), None
)
if dupe:
raise ValueError(f'"{value}" already exists in the schema!')
temporary_prop_schema = prop_schema
prop_schema["options"].append(
{"id": str(uuid1()), "value": value, "style": style}
)
collection.set("schema", collection_schema)`
But it turned out that this function does not work, and gives the following error:
add_new_multi_select_value("Теги","my_hero_academia)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
add_new_multi_select_value("Теги","my_hero_academia)
File "C:\Users\laere\OneDrive\Documents\Programming\Other\notion-bot\program\notionbot\test.py", line 53, in add_new_multi_select_value
collection.set("schema", collection_schema)
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\records.py", line 115, in set
self._client.submit_transaction(
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\client.py", line 290, in submit_transaction
self.post("submitTransaction", data)
File "C:\Users\laere\AppData\Local\Programs\Python\Python39-32\lib\site-packages\notion\client.py", line 260, in post
raise HTTPError(
requests.exceptions.HTTPError: Unsaved transactions: Not allowed to edit column: schema
this is my table image: link
this is my telegram chatting to bot: link
Honestly, I don’t know how to solve this problem, the question is how to fill a column of type 'multi-select'?

I solved this problem using this command
row.set_property("Категория", temporary_subcategory)
and do not be afraid if there is an error "options ..." this can be solved by adding settings for the 'multi-select' field.

“TypeError: 'unicode' object does not support item assignment” in dicts when scraping via scrapy pipeline

I'm trying to build a dictionary of keywords and put it into a scrapy item.
'post_keywords':{1: 'midwest', 2: 'i-70',}
The point is that this will all go inside a json object later on down the road. I've tried initializing a new blank dictionary first, but that doesn't work.
Pipeline code:
tag_count = 0
for word, tag in blob.tags:
if tag == 'NN':
tag_count = tag_count+1
nouns.append(word.lemmatize())
keyword_dict = dict()
key = 0
for item in random.sample(nouns, tag_count):
word = Word(item)
key=key+1
keyword_dict[key] = word
item['post_keywords'] = keyword_dict
Item:
post_keywords = scrapy.Field()
Output:
Traceback (most recent call last):
File "B:\Mega Sync\Programming\job_scrape\lib\site-packages\twisted\internet\defer.py", line 588, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "B:\Mega Sync\Programming\job_scrape\cl_tech\cl_tech\pipelines.py", line215, in process_item
item['post_noun_phrases'] = noun_phrase_dict
TypeError: 'unicode' object does not support item assignment
It SEEMS like pipelines behave weirdly, like they don't want to run all the code in the pipeline UNLESS all the item assignments check out, which makes it so that my initialized dictionaries aren't created or something.

Thanks to MarkTolonen for the help.
My mistake was using the variable name 'item' for more than two things.
This works:
for thing in random.sample(nouns, tag_count):
word = Word(thing)
key = key+1
keyword_dict[key] = word
item['post_keywords'] = keyword_dict

Python Dict: TypeError: list indices must be integers, not str

This piece of code is from a script that obtains the PDB (Protein Data Bank) code which contain the UniProt code store in a dictionary: DDomainSeq.
This is a sample of DDomainSeq Dic:
{'3.30.67.10': ['G3GAK5', 'I3QCE1', 'G3EN69', 'K4LBV0', 'Q2XWS4', 'D6MQ73', 'F1D844', 'Q8JTJ9', 'H9U1G9', 'B1PNF1', 'B3F7E1', 'Q9J0E6', 'K4LBK6', 'Q2XRW4', 'D0EPQ6', 'D3U0G6', 'Q8QMF3', 'J9PQ44', 'B9W116', 'Q2XRW9', 'I3QCH7', 'K7R4A7', 'I7B1H2', 'B1PNH0', 'I3QCD9', 'Q82861', 'I3QC33', 'Q2XRJ4', 'E3UMQ4', 'B9V561', 'Q8BE43', 'Q80QJ9', 'E0YAP9'], '2.60.98.10': ['C9WJC0', 'B3TN06', 'Q9IZI7', 'Q9WDA0', 'A9LIM6', 'C5MSX3', 'Q6Q6Q1', 'Q3LFV0', 'E5RCU8', 'I6XG39', 'G5EJD7', 'D3X8F0', 'Q2XRV6', 'D0QXC4', 'I7EMG2', 'A4UIW9', 'Q89283', 'H9M657', 'F2YLD8', 'Q2YGV6', 'D6MQ23', 'G9F8Y6', 'G8G189', 'H8Y6K8', 'E3UMP9', 'Q91AG4', 'I3QCA4', 'A4K4T3', 'H6VBW8', 'D8FSI8', 'D0TYZ3', 'I3QCM1', 'H6VBX9', 'C0JZP9', 'C6ZE88', 'A1BY35', 'I7A3V7', 'Q2XRZ1', 'A5YBK7', 'Q66463', 'C3V004', 'Q6YG48', 'Q2ESB0', 'H1ZYK5', 'Q00P61', 'E2IZW1', 'D0VF46', 'K4IYH8', 'Q9IJX6', 'Q87046', 'Q9WB77', 'C7T0M1', 'I3QC70', 'E2IGI0', 'Q32ZL8', 'C8CKT7', 'D6MM36', 'Q3LFN6', 'F5AXV2', 'I6PGU1', 'B9W157', 'K7PP62', 'Q3Y6G7', 'Q6YFX6', 'C9WPK5', 'G9IBD9', 'G9DR11', 'C1KKF7', 'I6WJM3', 'K7PPW7', 'Q3S2G1', 'Q6WP68', 'H2D5H7', 'H2D5I3', 'K7QRY5', 'Q9WLZ8', 'F5AXW1', 'Q8JTJ2', 'E3UMM2', 'B9VHE4', 'B6E979', 'Q2YH31', 'A7TUC9', 'D3X8C3', 'H2D5I2', 'B6EBW6', 'F2WS10', 'Q2YH68', 'C1KKE8', 'B0LCR1''A3GPY8']}
Each one of the elements within each key are use to search for a PDB file found in the PDBSum Data base.
PDBSumWWW = urllib.urlopen("https://www.ebi.ac.uk/thornton-srv/databases/pdbsum/data/seqdata.dat")
PDBSum = PDBSumWWW.read().splitlines()
PDBSumWWW.close()
This is the code I use for this:
for domain in dDomainSeq.keys():
print domain
PDB = []
for uni in dDomainSeq[domain]:
for i in range(len(PDBSum)):
if "SWS_ID" in PDBSum[i]:
str = PDBSum[i]
splited = str.split()
if uni in splited[2]:
PDB.append(splited[0])
print PDB
print len(PDB)
PDBSum[domain]=PDB
However after loading all PDB-UniProt Code matches the first key "3.30.67.10" it reveals the following error:
Traceback (most recent call last):
File "/Users/ThePoet/Dropbox/MBID/MSc Summer Project/Program/SeqPDBSum.py", line 57, in <module>
main()
File "/Users/ThePoet/Dropbox/MBID/MSc Summer Project/Program/SeqPDBSum.py", line 45, in main
PDBSum[domain]=PDB
TypeError: list indices must be integers, not str

To get a dictionary try
varofdictionary = {}
PDBSum is a list, so you should create an empty dictionary and append to that.

Python multiprocessing "Bad file descriptor" error (not repeatable)

Apologies in advance, but I am unable to post a fully working example (too much overhead in this code to distill to a runnable snippet). I will post as much explanatory detail as I can, and please do let me know if anything critical seems missing.
Running Python 2.7.5 through IDLE
I am writing a program to compare two text files. Since the files can be large (~500MB) and each row comparison is independent, I would like to implement multiprocessing to speed up the comparison. This is working pretty well, but I am getting stuck on a pseudo-random Bad file descriptor error. I am new to multiprocessing, so I guess there is a technical problem with my implementation. Can anyone point me in the right direction?
Here is the code causing the trouble (specifically the pool.map):
# openfiles
csvReaderTest = csv.reader(open(testpath, 'r'))
csvReaderProd = csv.reader(open(prodpath, 'r'))
compwriter = csv.writer(open(outpath, 'wb'))
pool = Pool()
num_chunks = 3
chunksTest = itertools.groupby(csvReaderTest, keyfunc)
chunksProd = itertools.groupby(csvReaderProd, keyfunc)
while True:
# make a list of num_chunks chunks
groupsTest = [list(chunk) for key, chunk in itertools.islice(chunksTest, num_chunks)]
groupsProd = [list(chunk) for key, chunk in itertools.islice(chunksProd, num_chunks)]
# merge the two lists (pair off comparison rows)
groups_combined = zip(groupsTest,groupsProd)
if groups_combined:
# http://stackoverflow.com/questions/5442910/python-multiprocessing-pool-map-for-multiple-arguments
a_args = groups_combined # a list - set of combinations to be tested
second_arg = True
worker_result = pool.map(worker_mini_star, itertools.izip(itertools.repeat(second_arg),a_args))
Here is the full error output. (This error sometimes occurs, and other times the comparison runs to finish without problems):
Traceback (most recent call last):
File "H:/<PATH_SNIP>/python_csv_compare_multiprocessing_rev02_test2.py", line 407, in <module>
main(fileTest, fileProd, fileout, stringFields, checkFileLengths)
File "H:/<PATH_SNIP>/python_csv_compare_multiprocessing_rev02_test2.py", line 306, in main
worker_result = pool.map(worker_mini_star, itertools.izip(itertools.repeat(second_arg),a_args))
File "C:\Python27\lib\multiprocessing\pool.py", line 250, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Python27\lib\multiprocessing\pool.py", line 554, in get
raise self._value
IOError: [Errno 9] Bad file descriptor
If it helps, here are the functions called by pool.map:
def worker_mini(flag, chunk):
row_comp = []
for entry, entry2 in zip(chunk[0][0], chunk[1][0]):
if entry == entry2:
temp_comp = entry
else:
temp_comp = '%s|%s' % (entry, entry2)
row_comp.append(temp_comp)
return True, row_comp
#takes a single tuple argument and unpacks the tuple to multiple arguments
def worker_mini_star(flag_chunk):
"""Convert `f([1,2])` to `f(1,2)` call."""
return worker_mini(*flag_chunk)
def main():

how to use facet in pysolr

I have successful built a python search app with pysolr. So far I have used two fields: id and title. now i want to push two different versions of titles; the original and the title after removing the stopwords. any ideas? the following code works:
def BuildSolrIndex(solr, trandata):
tmp = []
for i, dat in enumerate(trandata):
if all(d is not None and len(d) > 0 for d in dat):
d = {}
d["id"] = dat[0]
d["title"] = dat[1]
tmp.append(d)
solr.add(tmp)
solr.optimize()
return solr
but this one does not:
def BuildSolrIndex(solr, trandata):
tmp = []
for i, dat in enumerate(trandata):
if all(d is not None and len(d) > 0 for d in dat):
d = {}
d["id"] = dat[0]
d["title_org"] = dat[1]
d["title_new"] = CleanUpTitle(dat[1])
tmp.append(d)
solr.add(tmp)
solr.optimize()
return solr
Any ideas?
EDIT:
bellow is the exception:
Traceback (most recent call last):
...
solr = BuildSolrIndex(solr, trandata)
File "...", line 56, in BuildSolrIndex
solr.add(tmp)
File "build/bdist.linux-x86_64/egg/pysolr.py", line 779, in add
File "build/bdist.linux-x86_64/egg/pysolr.py", line 387, in _update
File "build/bdist.linux-x86_64/egg/pysolr.py", line 321, in _send_request
pysolr.SolrError: [Reason: None]
<response><lst name="responseHeader"><int name="status">400</int><int name="QTime">8</int></lst><lst name="error"><str name="msg">ERROR: [doc=...] unknown field 'title_new'</str><int name="code">400</int></lst></response>

This looks like an issue with your Solr schema.xml, as the exception indicates that "title_new" is not recognized as a valid field. This answer may be of assistance to you: https://stackoverflow.com/a/14400137/1675729
Check to make sure your schema.xml contains a "title_new" field, and that you've restarted the Solr services if necessary. If this doesn't solve your problem, come on back!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading data from JSON file with Python - python

Related

How to fill a column of type 'multi-select' in notion-py(Notion API)?

“TypeError: 'unicode' object does not support item assignment” in dicts when scraping via scrapy pipeline

Python Dict: TypeError: list indices must be integers, not str

Python multiprocessing "Bad file descriptor" error (not repeatable)

how to use facet in pysolr

Categories

Resources