Extracting table caption/title from a word document

Extracting table caption/title from a word document - python

I have to extract content of specific tables from a huge msword doc. After some reading, I have been trying to use win32com to extract the table titles. I used the following code.
from win32com import client
word = client.Dispatch("Word.Application")
document = word.Documents.Open("path_to_doc")
titles = [table.Title for table in document.Tables]
f = open('myfile.doc', 'w')
f.write("%s" %titles)
f.close()
However the resulting file just has:
[u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'', u'']
Where am I going wrong?

Related

How to lemmatize Norwegian using spaCy?

I'm doing the following:
from spacy.lang.nb import Norwegian
nlp = Norwegian()
doc = nlp(u'Jeg heter Marianne Borgen og jeg er ordføreren i Oslo.')
for token in doc:
print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,token.shape_, token.is_alpha, token.is_stop)
Lemmatization seems to not work at all, as this is the output:
(u'Jeg', u'Jeg', u'', u'', u'', u'Xxx', True, False)
(u'heter', u'heter', u'', u'', u'', u'xxxx', True, False)
(u'Marianne', u'Marianne', u'', u'', u'', u'Xxxxx', True, False)
(u'Borgen', u'Borgen', u'', u'', u'', u'Xxxxx', True, False)
(u'og', u'og', u'', u'', u'', u'xx', True, True)
(u'jeg', u'jeg', u'', u'', u'', u'xxx', True, True)
(u'er', u'er', u'', u'', u'', u'xx', True, True)
(u'ordf\xf8reren', u'ordf\xf8reren', u'', u'', u'', u'xxxx', True, False)
(u'i', u'i', u'', u'', u'', u'x', True, True)
(u'Oslo', u'Oslo', u'', u'', u'', u'Xxxx', True, False)
(u'.', u'.', u'', u'', u'', u'.', False, False)
However, looking at https://github.com/explosion/spaCy/blob/master/spacy/lang/nb/lemmatizer/_verbs_wordforms.py, the verb heter should at least be transformed into hete.
So it looks like spaCy has support, but it's not working? What could be the problem?

The lemmatization does in fact work for Norwegian as it's specified in the docs: all forms in lookup.py are lemmatized. Try for instance doc = nlp(u'ei') and you'll see that the lemma of ei is en.
Now, the file you are referring to, verbs_wordforms.py, documents exceptions in case the part-of-speech (POS) tag is a verb. However, the blank model Norwegian() does not have a POS tagger and so that particular exception for heter is never triggered.
So the solution is either to use a model which has a POS tagger, or to add your specific exceptions to lookup.py. You'll see for instance that if you'd add there the line 'heter': 'hete', that your blank model would find hete as lemma for heter.
Finally, note that there's been a lot of work and discussion about publishing a pre-trained Norwegian model in spaCy - but it looks like that is still a bit of a work in progress.

Python - Read string from log file

I would like to extract serial_number value from the following Python log output:
Continue? (y/N): Initializing nimble object...
##### Call time: -0.000003
Initializing nimble object...
##### Call time: 0.000002
##### Volume Records Retrieved From Nimble:
{u'parent_vol_name': u'', u'owned_by_group_id': u'00058a0dd7f9ecafd9000000000000000000000001', u'num_fc_connections': 4, u'dedupe_enabled': True, u'snap_usage_compressed_bytes': 3516972911, u'num_iscsi_connections': 0, u'move_bytes_remaining': 0, u'thinly_provisioned': True, u'cache_needed_for_pin': 107374182400, u'last_replicated_snap': None, u'space_usage_level': u'normal', u'fc_sessions': [{u'initiator_wwpn': u'51:40:2e:c0:01:ca:5c:d6', u'initiator_symbolic_nodename': u'', u'initiator_switch_port': u'20', u'initiator_symbolic_portname': u'', u'target_port_array_name': u'a1epc8snsm4001', u'target_wwnn': u'56:c9:ce:90:d4:51:6d:00', u'target_wwpn': u'56:c9:ce:90:d4:51:6d:06', u'initiator_alias': u'a1epc8lhan402_hba1_p2', u'session_id': u'330000000051402ec001ca5cd656c9ce90d4516d06', u'alua': u'standby', u'pr_key': 0, u'target_port_ctrlr_id': 1, u'initiator_fcid': 136192, u'target_port_interface_name': u'fc2b.1', u'target_fcid': 131328, u'initiator_switch_name': u'a1epc8sfcs4002', u'id': u'330000000051402ec001ca5cd656c9ce90d4516d06'}, {u'initiator_wwpn': u'51:40:2e:c0:01:ca:5c:a4', u'initiator_symbolic_nodename': u'', u'initiator_switch_port': u'20', u'initiator_symbolic_portname': u'', u'target_port_array_name': u'a1epc8snsm4001', u'target_wwnn': u'56:c9:ce:90:d4:51:6d:00', u'target_wwpn': u'56:c9:ce:90:d4:51:6d:05', u'initiator_alias': u'a1epc8lhan402_hba2_p1', u'session_id': u'330000000051402ec001ca5ca456c9ce90d4516d05', u'alua': u'standby', u'pr_key': 0, u'target_port_ctrlr_id': 1, u'initiator_fcid': 70656, u'target_port_interface_name': u'fc2a.1', u'target_fcid': 65792, u'initiator_switch_name': u'a1epc8sfcs4001', u'id': u'330000000051402ec001ca5ca456c9ce90d4516d05'}, {u'initiator_wwpn': u'51:40:2e:c0:01:ca:5c:d6', u'initiator_symbolic_nodename': u'', u'initiator_switch_port': u'20', u'initiator_symbolic_portname': u'', u'target_port_array_name': u'a1epc8snsm4001', u'target_wwnn': u'56:c9:ce:90:d4:51:6d:00', u'target_wwpn': u'56:c9:ce:90:d4:51:6d:02', u'initiator_alias': u'a1epc8lhan402_hba1_p2', u'session_id': u'330000000051402ec001ca5cd656c9ce90d4516d02', u'alua': u'active_optimized', u'pr_key': 0, u'target_port_ctrlr_id': 0, u'initiator_fcid': 136192, u'target_port_interface_name': u'fc2b.1', u'target_fcid': 131072, u'initiator_switch_name': u'a1epc8sfcs4002', u'id': u'330000000051402ec001ca5cd656c9ce90d4516d02'}, {u'initiator_wwpn': u'51:40:2e:c0:01:ca:5c:a4', u'initiator_symbolic_nodename': u'', u'initiator_switch_port': u'20', u'initiator_symbolic_portname': u'', u'target_port_array_name': u'a1epc8snsm4001', u'target_wwnn': u'56:c9:ce:90:d4:51:6d:00', u'target_wwpn': u'56:c9:ce:90:d4:51:6d:01', u'initiator_alias': u'a1epc8lhan402_hba2_p1', u'session_id': u'330000000051402ec001ca5ca456c9ce90d4516d01', u'alua': u'active_optimized', u'pr_key': 0, u'target_port_ctrlr_id': 0, u'initiator_fcid': 70656, u'target_port_interface_name': u'fc2a.1', u'target_fcid': 65536, u'initiator_switch_name': u'a1epc8sfcs4001', u'id': u'330000000051402ec001ca5ca456c9ce90d4516d01'}], u'vol_usage_uncompressed_bytes': 4968947712, u'num_snaps': 4, u'base_snap_name': u'', u'cache_pinned': False, u'name': u'a1epc8lhan402-boot-2', u'num_connections': 4, u'last_content_snap_id': 0, u'cksum_last_verified': 0, u'avg_stats_last_5mins': {u'read_latency': 0, u'combined_throughput': 816, u'read_throughput': 0, u'write_latency': 4, u'write_throughput': 816, u'combined_iops': 0, u'read_iops': 0, u'write_iops': 0, u'combined_latency': 4}, u'usage_valid': True, u'creation_time': 1543442584, u'full_name': u'default:/a1epc8lhan402/a1epc8lhan402-boot-2', u'move_bytes_migrated': 0, u'snap_reserve': 0, u'move_est_compl_time': 0, u'volcoll_id': u'07058a0dd7f9ecafd9000000000000000000000004', u'vol_usage_compressed_bytes': 2477152400, u'perfpolicy_name': u'hana4k-data', u'agent_type': u'none', u'base_snap_id': u'', u'metadata': None, u'app_category': u'Other', u'cache_policy': u'normal', u'encryption_cipher': u'none', u'online_snaps': None, u'last_modified': 1543442600, u'snap_limit_percent': -1, u'folder_id': u'2f058a0dd7f9ecafd9000000000000000000000002', u'total_usage_bytes': 5994125311, u'iscsi_sessions': None, u'snap_limit': 9223372036854775807, u'pool_id': u'0a058a0dd7f9ecafd9000000000000000000000001', u'snap_usage_populated_bytes': 21854855168, u'needs_content_repl': False, u'move_start_time': 0, u'warn_level': 80, u'offline_reason': None, u'dest_pool_name': u'', u'block_size': 4096, u'size': 102400, u'perfpolicy_id': u'03058a0dd7f9ecafd900000000000000000000001f', u'move_aborting': False, u'pinned_cache_size': 0, u'serial_number': u'7e89fa94a3829e8b6c9ce9000ea266fc', u'limit_iops': -1, u'protection_type': u'local', u'folder_name': u'a1epc8lhan402', u'vpd_t10': u'Nimble 7e89fa94a3829e8b6c9ce9000ea266fc', u'limit': 100, u'app_uuid': u'', u'projected_num_snaps': 0, u'last_snap': {u'snap_id': u'04058a0dd7f9ecafd9000000000000005c00005343', u'snap_creation_time': 1543726800, u'snap_name': u'Boot-Policy-Boot-Daily-2018-12-02::00:00:00.000'}, u'target_name': u'56:c9:ce:90:d4:51:6d:00', u'dest_pool_id': u'', u'id': u'06058a0dd7f9ecafd9000000000000000000000057', u'read_only': False, u'volcoll_name': u'Boot-Policy', u'content_repl_errors_found': False, u'multi_initiator': True, u'last_content_snap_br_cg_uid': 0, u'owned_by_group': u'a1epc8snsm4001-grp', u'snap_usage_uncompressed_bytes': 7053840384, u'online': True, u'access_control_records': [{u'chap_user_name': u'*', u'vol_id': u'06058a0dd7f9ecafd9000000000000000000000057', u'pe_name': u'', u'snapluns': None, u'acl_id': u'0d058a0dd7f9ecafd900000000000000000000005a', u'initiator_group_id': u'02058a0dd7f9ecafd9000000000000000000000008', u'access_protocol': u'fc', u'chap_user_id': u'', u'initiator_group_name': u'a1epc8lhan402', u'vol_name': u'a1epc8lhan402-boot-2', u'apply_to': u'both', u'pe_id': u'', u'pe_lun': None, u'id': u'0d058a0dd7f9ecafd900000000000000000000005a', u'lun': 0}], u'caching_enabled': True, u'pool_name': u'default', u'description': u'', u'clone': False, u'search_name': u'a1epc8lhan402-boot-2', u'snap_warn_level': 0, u'last_content_snap_br_gid': 0, u'previously_deduped': True, u'parent_vol_id': u'', u'limit_mbps': -1, u'upstream_cache_pinned': False, u'vpd_ieee0': u'7e89fa94a3829e8b', u'vpd_ieee1': u'6c9ce9000ea266fc', u'vol_state': u'online', u'reserve': 0}
How is possible to accomplish this?
Thanks in advance

Can be extracted with a simple regular expression, as long as the text file is in the same format as the Python log output you posted (Returns all of them):
import re
file="".join([i for i in open("yourfileinthesamefolder.txt")])
serials=re.findall("u'serial_number': u'(.+)'",file)
print(serials)
I suggest reading up on how to use regular expressions in Python:
Regular Expression HOWTO — Python 3.7.2 documentation

Python List - splitting on an empty object in the list [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I'm trying to split my Python List using ("") Empty Object in the list.
['', u'WO0000008971346', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971321', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971307', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971247', u'', u'Low', u'Pending', u'Client Action Required',
u'17/04/2018 15:08:49','', u'WO0000008971245',u'', u'Low', u'Pending', u'Client Action Required',
u'17/04/2018 15:07:10','', u'WO0000008971235', u'',
u'Low', u'In Progress', u'', u'17/04/2018 15:03:50']
Any conventions to split this using python?

you probably mean "create sublists from list, separated by the empty strings".
In that case, use itertools.groupby, the condition is "string is empty":
import itertools
s = ['', u'WO0000008971346', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971321', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971307', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971247', u'', u'Low', u'Pending', u'Client Action Required', u'17/04/2018 15:08:49',
'', u'WO0000008971245',u'', u'Low', u'Pending', u'Client Action Required', u'17/04/2018 15:07:10',
'', u'WO0000008971235', u'', u'Low', u'In Progress', u'', u'17/04/2018 15:03:50']
result = [list(x) for k,x in itertools.groupby(s,key=bool) if k]
print(result)
bool is the key function which yields True if the string isn't empty. We then filter on a True condition to keep the non-empty groups.
result:
[['WO0000008971346'], ['Low', 'Assigned'], ['WO0000008971321'], ['Low', 'Assigned'],
['WO0000008971307'], ['Low', 'Assigned'], ['WO0000008971247'],
['Low', 'Pending', 'Client Action Required', '17/04/2018 15:08:49'],
['WO0000008971245'], ['Low', 'Pending', 'Client Action Required', '17/04/2018 15:07:10'],
['WO0000008971235'], ['Low', 'In Progress'], ['17/04/2018 15:03:50']]
If you wanted to remove the multiple occurrences of the empty strings instead (to keep a flat list, delimited by empty string, it's the same idea, but with a flatten, and a conditional:
result2 = list(itertools.chain.from_iterable(x if k else [''] for k,x in itertools.groupby(s,key=bool)))
yields:
['', 'WO0000008971346', '', 'Low', 'Assigned', '', 'WO0000008971321', '',
'Low', 'Assigned', '', 'WO0000008971307', '', 'Low', 'Assigned', '',
'WO0000008971247', '', 'Low', 'Pending', 'Client Action Required',
'17/04/2018 15:08:49', '', 'WO0000008971245', '', 'Low', 'Pending', 'Client Action Required',
'17/04/2018 15:07:10', '', 'WO0000008971235', '', 'Low', 'In Progress', '', '17/04/2018 15:03:50']

If I understood correctly you need sublist of 3 elements after deleting the empty elements.
Demo
data = ['', u'WO0000008971346', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971321', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971307', u'', u'Low', u'Assigned', u'', u'',
'', u'WO0000008971247', u'', u'Low', u'Pending', u'Client Action Required', u'17/04/2018 15:08:49',
'', u'WO0000008971245',u'', u'Low', u'Pending', u'Client Action Required', u'17/04/2018 15:07:10',
'', u'WO0000008971235', u'', u'Low', u'In Progress', u'', u'17/04/2018 15:03:50']
data = filter(None, data)
print([data[x:x+3] for x in xrange(0, len(data), 3)])
Output:
[[u'WO0000008971346', u'Low', u'Assigned'], [u'WO0000008971321', u'Low', u'Assigned'], [u'WO0000008971307', u'Low', u'Assigned'], [u'WO0000008971247', u'Low', u'Pending'], [u'Client Action Required', u'17/04/2018 15:08:49', u'WO0000008971245'], [u'Low', u'Pending', u'Client Action Required'], [u'17/04/2018 15:07:10', u'WO0000008971235', u'Low'], [u'In Progress', u'17/04/2018 15:03:50']]

SugarCRM call records REST API

I am trying to get the call records from my SugarCRM account using the REST API and I am using Python.
There I want to obtain all the attendees but all I get is the user to whom the call is assigned.
u'assigned_user_id': u'xxxxxxxx',
The response I've received is,
{u'created_by_link': {u'id': u'1', u'full_name': u'adminx', u'_acl': {u'fields': {u'last_login': {u'write': u'no', u'create': u'no'}, u'pwd_last_changed': {u'write': u'no', u'create': u'no'}}, u'_hash': u'xxxx', u'delete': u'no'}}, u'dri_workflow_task_template_link': {u'_acl': {u'fields': [], u'_hash': u'xxxx'}, u'name': u'', u'id': u''},
u'customer_journey_points': 10,
u'dri_subworkflow_id': u'',
u'recurrence_id': u'',
u'created_by_name': u'adminx',
u'date_end': u'2018-05-02T09:45:00+00:00',
u'dri_subworkflow_template_id': u'',
u'parent_type': u'Accounts',
u'contact_id': u'xxxx',
u'_acl': {u'fields': {}},
u'duration_minutes': 30,
u'tag': [],
u'assigned_user_name': u'xxxx',
u'repeat_ordinal': u'',
u'repeat_count': None,
u'contact_name': u'xxxx',
u'repeat_interval': 1, u'id': u'xxxx', > - u'parent_name': u'ABC',
u'customer_journey_parent_activity_id': u'',
u'date_entered': u'2017-07-17T12:49:23+00:00',
u'outlook_id': u'',
u'team_name': [{u'name_2': u'', u'selected': False, u'primary': True, u'id': u'1', u'name': u'Global'}, {u'name_2': u'', u'selected': False, u'primary': False, u'id': u'West', u'name': u'West'}],
u'contacts': {u'_acl': {u'fields': [], u'_hash': u'xxxx'}, u'name': u'xxx', u'id': u'xxx'},
u'dri_workflow_task_template_id': u'',
u'customer_journey_score': None,
u'date_start': u'2018-05-02T09:15:00+00:00',
u'reminder_checked': u'',
u'dri_workflow_sort_order': u'1',
u'created_by': u'1',
u'parent_id': u'xxxx',
u'dri_subworkflow_template_link': {u'_acl': {u'fields': [], u'_hash': u'xxxx'}, u'name': u'', u'id': u''},
u'dri_subworkflow_name': u'',
u'dri_subworkflow_link': {u'_acl': {u'fields': [], u'_hash': u'xxxx'}, u'name': u'', u'id': u''},
u'modified_by_name': u'adminx',
u'repeat_selector': u'',
u'email_reminder_sent': False,
u'dri_workflow_template_id': u'',
u'status': u'Not Held',
u'direction': u'Outbound',
u'accept_status_users': u'',
u'repeat_dow': u'',
u'description': u'',
u'parent': {u'type': u'Accounts', u'_acl': {u'fields': [], u'_hash': u'xxxx'}, u'name': u'XYZ Funding Inc', u'id': u'xxxx'},
u'repeat_unit': u'',
u'deleted': False,
u'is_customer_journey_parent_activity': False,
u'customer_journey_parent_activity_type': u'',
u'locked_fields': [],
u'email_reminder_time': -1,
u'following': False,
u'assigned_user_link': {u'id': u'xxxx', u'full_name': u'xxxx', u'_acl': {u'fields': [], u'_hash': u'xxxx'}},
u'repeat_type': u'',
u'assigned_user_id': u'seed_sally_id',
u'team_count_link': {u'team_count': u'', u'id': u'1', u'_acl': {u'fields': [], u'_hash': u'xxxx'}},
u'dri_workflow_task_template_name': u'',
u'modified_user_link': {u'id': u'1', u'full_name': u'adminx', u'_acl': {u'fields': {u'last_login': {u'write': u'no', u'create': u'no'}, u'pwd_last_changed': {u'write': u'no', u'create': u'no'}}, u'_hash': u'xxx', u'delete': u'no'}},
u'email_reminder_checked': u'',
u'_module': u'Calls',
u'modified_user_id': u'1',
u'repeat_until': u'',
u'name': u'test',
u'date_modified': u'2017-07-17T12:49:23+00:00',
u'accept_status': u'',
u'reminder_time': -1,
u'customer_journey_progress': 0,
u'dri_workflow_template_name': u'',
u'my_favorite': False,
u'dri_subworkflow_template_name': u'',
u'dri_workflow_template_link': {u'_acl': {u'fields': [], u'_hash': u'xxx'}, u'name': u'', u'id': u''}, > - u'set_accept_links': u'',
u'repeat_days': u'',
u'is_customer_journey_activity': False,
u'repeat_parent_id': u'',
u'team_count': u'',
u'duration_hours': 0,
u'recurring_source': u''},

Strangely, the object which contains the list of "Guests" is not present in the standard GET request i.e.
https://{INSTANCE}/rest/v10/Calls/{RECORD_ID}
After doing some troubleshooting, and looking at the model in the web application itself, I found that the "Guests" field in the GUI ties back to a model property called "invitees".
Running a web request specifically referencing this field provides you with an array of records linked to the Call ID. So try running a GET request to this endpoint:
https://{INSTANCE}/rest/v10/Calls/{RECORD_ID}?fields=invitees
This should provide you with JSON akin to the below:
{
"id": "ec041f60-72b1-11e7-89f0-00163ef1f82f",
"date_modified": "2017-08-08T12:26:47+00:00",
"invitees": {
"records": [
{
"id": "cf378211-2b38-4fe5-949b-a53040717f04",
"date_modified": "2017-08-01T16:12:48+00:00",
"_acl": {
"fields": {}
},
"_module": "Users",
"_link": "users"
},
{
"id": "fe1740e6-3fa4-11e7-8fef-00163ef1f82f",
"date_modified": "2017-05-23T10:45:52+00:00",
"_acl": {
"fields": {}
},
"_module": "Contacts",
"_link": "contacts"
},
{
"id": "dcc526fc-72b1-11e7-a6dd-00163ef1f82f",
"date_modified": "2017-07-27T09:57:21+00:00",
"_acl": {
"fields": {}
},
"_module": "Leads",
"_link": "leads"
},
{
"id": "89f8a6d1-7df0-0e0b-3568-58a5bb6ecf34",
"date_modified": "2017-04-06T10:36:16+00:00",
"_acl": {
"fields": {}
},
"_module": "Leads",
"_link": "leads"
}
],
"next_offset": {
"contacts": -1,
"leads": -1,
"users": -1
}
},
"_acl": {
"fields": {}
},
"contact_name": "test",
"_module": "Calls"
}

How to convert a JSON string into a Python data structure

'[{"append":null,"appendCanExplainable":false,"appendList":[],"auction":{"aucNumId":"35179051643","auctionPic":"http://img.taobaocdn.com/bao/uploaded/i3/TB12WchGXXXXXb5XpXXXXXXXXXX_!!0-item_pic.jpg_40x40.jpg","link":"http://item.taobao.com/item.htm?id=35179051643","sku":"\xd1\xab\xb7\xd6\xc0\xe0:\xc9\xab\xbb\xd2\xcf\xdf\xbd\xf4\xc9\xed\xb3\xa4\xbf\xe3 &nbsp\xb3\xdf\xc2\xeb:M-170M-55-62KG","thumbnail":"","title":"\xcb\xb9\xbd\xf4\xc9\xed\xbf\xe3 \xb5\xaf\xc1\xa6\xd7\xe3\xc7\xf2\xd4\xaf\xbd\xa1\xc9\xed\xbf\xe3 PRO \xc4\xd0 \xb4\xf2\xb5\xd7\xd1\xb5\xc1\xb7\xb3\xa4\xbf\xe3\xcb\xd9\xb8\xc9"},"award":"","bidPriceMoney":{"amount":35,"cent":3500,"centFactor":100,"currency":{"currencyCode":"CNY","defaultFractionDigits":2,"symbol":"\xa3\xa4"},"currencyCode":"CNY","displayUnit":"\xd4\xaa"},"buyAmount":1,"content":"\xba\xc3\xc6\xc0\xa3\xa1","creditFraudRule":0,"date":"2014\xc4\xea12\xd4\xc220\xc8\xd5 15:41","dayAfterConfirm":0,"enableSNS":false,"from":"","lastModifyFrom":0,"payTime":{"date":18,"day":4,"hours":13,"minutes":4,"month":11,"seconds":37,"time":1418879077000,"timezoneOffset":-480,"year":114},"photos":[],"promotionType":"\xbb\xee\xb6\xaf\xb4\xd9\xcf\xfa ","propertiesAvg":"0.0","rate":"1","rateId":231421178840,"raterType":0,"reply":null,"shareInfo":{"lastReplyTime":"","pic":0,"reply":0,"share":false,"userNumIdBase64":""},"showCuIcon":true,"showDepositIcon":false,"spuRatting":[],"status":0,"tag":"","useful":0,"user":{"anony":true,"avatar":"http://a.tbcdn.cn/app/sns/img/default/avatar-40.png","displayRatePic":"b_red_3.gif","nick":"y***6","nickUrl":"","rank":65,"rankUrl":"","userId":"","vip":"","vipLevel":0},"validscore":1,"vicious":""},{"append":null,"appendCanExplainable":false,"appendList":[],"auction":{"aucNumId":"35179051643","auctionPic":"http://img.taobaocdn.com/bao/uploaded/i3/TB12WchGXXXXXb5XpXXXXXXXXXX_!!0-item_pic.jpg_40x40.jpg","link":"http://item.taobao.com/item.htm?id=35179051643","sku":"\xd1\xd5\xc9\xab\xb7\xd6\xc0\xe0:\xba\xda\xc9\xab\xba\xda\xcf\xdf\xbd\xf4\xc9\xed\xb3\xa4\xbf\xe3 &nbsp\xb3\xdf\xc2\xeb:S-160m-45~55KG","thumbnail":"","title":"\xc7\xf2\xc9\xed\xbf\xe3\xb4\xf2\xb5\xd7\xd1\xb5\xc1\xb7\xb3\xa4\xbf\xe3\xcb\xd9\xb8\xc9"},"award":"","bidPriceMoney":{"amount":35,"cent":3500,"centFactor":100,"currency":{"currencyCode":"CNY","defaultFractionDigits":2,"symbol":"\xa3\xa4"},"currencyCode":"CNY","displayUnit":"\xd4\xaa"},"buyAmount":1,"content":"\xba\xc3\xc6\xc0\xa3\xa1","creditFraudRule":0,"date":"2014\xc4\xea12\xd4\xc220\xc8\xd5 15:37","dayAfterConfirm":0,"enableSNS":false,"from":"","lastModifyFrom":0,"payTime":{"date":17,"day":3,"hours":17,"minutes":43,"month":11,"seconds":47,"time":1418809427000,"timezoneOffset":-480,"year":114},"photos":[],"promotionType":"\xbb\xee\xb6\xaf\xb4\xd9\xcf\xfa ","propertiesAvg":"0.0","rate":"1","rateId":231441191365,"raterType":0,"reply":null,"shareInfo":{"lastReplyTime":"","pic":0,"reply":0,"share":false,"userNumIdBase64":""},"showCuIcon":true,"showDepositIcon":false,"spuRatting":[],"status":0,"tag":"","useful":0,"user":{"anony":true,"avatar":"http://a.tbcdn.cn/app/sns/img/default/avatar-40.png","displayRatePic":"b_blue_3.gif","nick":"\xc2\xb7***0","nickUrl":"","rank":1235,"rankUrl":"","userId":"","vip":"","vipLevel":0},"validscore":1,"vicious":""}]'
How can I convert this str to list of dicts ? I have tried some methods, but failed. The string represents a list containing 2 big dicts, and one dict contains nested small dicts. The expected result is:
[{dict1},{dict2}]

You have a JSON string, so use the json module to decode this:
import json
decoded = json.loads(encoded)
decoded is then a Python list; you can then address each dictionary in a list, or use unpacking to assign two dictionaries to two names:
dictionary1, dictionary2 = decoded
If you are using the requests library then you can use the response.json() method to load the content:
decoded = response.json()
In this specific case you appear to have GBK encoded data however (or perhaps GB2312, a predecessor).
This goes well outside the JSON standard (which actually requires one of the UTF codecs to be used), and you'll need to tell json.loads() about the codec used:
decoded = json.loads(encoded, 'gbk')
The requests library will use whatever codec the server sent along with the response, or will otherwise use a characterset detection technique to try and find the right codec to use.
The result, when decoded, then looks like:
>>> decoded = json.loads(encoded, 'gbk')
>>> pprint(decoded)
[{u'append': None,
u'appendCanExplainable': False,
u'appendList': [],
u'auction': {u'aucNumId': u'35179051643',
u'auctionPic': u'http://img.taobaocdn.com/bao/uploaded/i3/TB12WchGXXXXXb5XpXXXXXXXXXX_!!0-item_pic.jpg_40x40.jpg',
u'link': u'http://item.taobao.com/item.htm?id=35179051643',
u'sku': u'\u52cb\u5206\u7c7b:\u8272\u7070\u7ebf\u7d27\u8eab\u957f\u88e4 &nbsp\u5c3a\u7801:M-170M-55-62KG',
u'thumbnail': u'',
u'title': u'\u65af\u7d27\u8eab\u88e4 \u5f39\u529b\u8db3\u7403\u8f95\u5065\u8eab\u88e4 PRO \u7537 \u6253\u5e95\u8bad\u7ec3\u957f\u88e4\u901f\u5e72'},
u'award': u'',
u'bidPriceMoney': {u'amount': 35,
u'cent': 3500,
u'centFactor': 100,
u'currency': {u'currencyCode': u'CNY',
u'defaultFractionDigits': 2,
u'symbol': u'\uffe5'},
u'currencyCode': u'CNY',
u'displayUnit': u'\u5143'},
u'buyAmount': 1,
u'content': u'\u597d\u8bc4\uff01',
u'creditFraudRule': 0,
u'date': u'2014\u5e7412\u670820\u65e5 15:41',
u'dayAfterConfirm': 0,
u'enableSNS': False,
u'from': u'',
u'lastModifyFrom': 0,
u'payTime': {u'date': 18,
u'day': 4,
u'hours': 13,
u'minutes': 4,
u'month': 11,
u'seconds': 37,
u'time': 1418879077000,
u'timezoneOffset': -480,
u'year': 114},
u'photos': [],
u'promotionType': u'\u6d3b\u52a8\u4fc3\u9500 ',
u'propertiesAvg': u'0.0',
u'rate': u'1',
u'rateId': 231421178840,
u'raterType': 0,
u'reply': None,
u'shareInfo': {u'lastReplyTime': u'',
u'pic': 0,
u'reply': 0,
u'share': False,
u'userNumIdBase64': u''},
u'showCuIcon': True,
u'showDepositIcon': False,
u'spuRatting': [],
u'status': 0,
u'tag': u'',
u'useful': 0,
u'user': {u'anony': True,
u'avatar': u'http://a.tbcdn.cn/app/sns/img/default/avatar-40.png',
u'displayRatePic': u'b_red_3.gif',
u'nick': u'y***6',
u'nickUrl': u'',
u'rank': 65,
u'rankUrl': u'',
u'userId': u'',
u'vip': u'',
u'vipLevel': 0},
u'validscore': 1,
u'vicious': u''},
{u'append': None,
u'appendCanExplainable': False,
u'appendList': [],
u'auction': {u'aucNumId': u'35179051643',
u'auctionPic': u'http://img.taobaocdn.com/bao/uploaded/i3/TB12WchGXXXXXb5XpXXXXXXXXXX_!!0-item_pic.jpg_40x40.jpg',
u'link': u'http://item.taobao.com/item.htm?id=35179051643',
u'sku': u'\u989c\u8272\u5206\u7c7b:\u9ed1\u8272\u9ed1\u7ebf\u7d27\u8eab\u957f\u88e4 &nbsp\u5c3a\u7801:S-160m-45~55KG',
u'thumbnail': u'',
u'title': u'\u7403\u8eab\u88e4\u6253\u5e95\u8bad\u7ec3\u957f\u88e4\u901f\u5e72'},
u'award': u'',
u'bidPriceMoney': {u'amount': 35,
u'cent': 3500,
u'centFactor': 100,
u'currency': {u'currencyCode': u'CNY',
u'defaultFractionDigits': 2,
u'symbol': u'\uffe5'},
u'currencyCode': u'CNY',
u'displayUnit': u'\u5143'},
u'buyAmount': 1,
u'content': u'\u597d\u8bc4\uff01',
u'creditFraudRule': 0,
u'date': u'2014\u5e7412\u670820\u65e5 15:37',
u'dayAfterConfirm': 0,
u'enableSNS': False,
u'from': u'',
u'lastModifyFrom': 0,
u'payTime': {u'date': 17,
u'day': 3,
u'hours': 17,
u'minutes': 43,
u'month': 11,
u'seconds': 47,
u'time': 1418809427000,
u'timezoneOffset': -480,
u'year': 114},
u'photos': [],
u'promotionType': u'\u6d3b\u52a8\u4fc3\u9500 ',
u'propertiesAvg': u'0.0',
u'rate': u'1',
u'rateId': 231441191365,
u'raterType': 0,
u'reply': None,
u'shareInfo': {u'lastReplyTime': u'',
u'pic': 0,
u'reply': 0,
u'share': False,
u'userNumIdBase64': u''},
u'showCuIcon': True,
u'showDepositIcon': False,
u'spuRatting': [],
u'status': 0,
u'tag': u'',
u'useful': 0,
u'user': {u'anony': True,
u'avatar': u'http://a.tbcdn.cn/app/sns/img/default/avatar-40.png',
u'displayRatePic': u'b_blue_3.gif',
u'nick': u'\u8def***0',
u'nickUrl': u'',
u'rank': 1235,
u'rankUrl': u'',
u'userId': u'',
u'vip': u'',
u'vipLevel': 0},
u'validscore': 1,
u'vicious': u''}]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extracting table caption/title from a word document - python

Related

How to lemmatize Norwegian using spaCy?

Python - Read string from log file

Python List - splitting on an empty object in the list [closed]

SugarCRM call records REST API

How to convert a JSON string into a Python data structure

Categories

Resources