Python JSONDecoderError - python

I am not to sure what I am doing wrong. I am trying to parse the specific contents within JavaScript.
This is the output of "s" (for the code below it):
<script type="text/javascript">window._sharedData = {"activity_counts":{"comment_likes":0,"comments":0,"likes":0,"relationships":0,"usertags":0},"config":{"csrf_token":"OIXAF5a6FwMQJj3vCaUQXCGUGL3sFb0Z","viewer":{"allow_contacts_sync":false,"biography":"Follow for the best social media experience. Est. 2014","external_url":null,"full_name":"Social Media Bliztexnetwork","has_profile_pic":true,"id":"6440587166","profile_pic_url":"https://instagram.fbed1-1.fna.fbcdn.net/vp/dd5d8db8ca1645ac8b69fdaf8886184f/5BB11538/t51.2885-19/s150x150/32947488_229940584435561_2806247690365566976_n.jpg","profile_pic_url_hd":"https://instagram.fbed1-1.fna.fbcdn.net/vp/df4d5098687fe594c5b2d9750804941a/5BEC5FC8/t51.2885-19/s320x320/32947488_229940584435561_2806247690365566976_n.jpg","username":"bliztezxxmedia"}},"supports_es6":false,"country_code":"US","language_code":"en","locale":"en_US","entry_data":{"ProfilePage":[{"logging_page_id":"profilePage_7507466602","show_suggested_profiles":false,"graphql":{"user":{"biography":"What a wonderful day!!!","blocked_by_viewer":false,"country_block":false,"external_url":null,"external_url_linkshimmed":null,"edge_followed_by":{"count":17},"followed_by_viewer":true,"edge_follow":{"count":8},"follows_viewer":false,"full_name":"Verna Manning","has_channel":false,"has_blocked_viewer":false,"highlight_reel_count":0,"has_requested_viewer":false,"id":"7507466602","is_private":true,"is_verified":false,"mutual_followers":{"additional_count":-3,"usernames":[]},"profile_pic_url":"https://instagram.fbed1-1.fna.fbcdn.net/vp/96e65311d0a5e79729411bd582592816/5BCC9C5A/t51.2885-19/s150x150/33143922_237271910362316_6290555001760645120_n.jpg","profile_pic_url_hd":"https://instagram.fbed1-1.fna.fbcdn.net/vp/96e65311d0a5e79729411bd582592816/5BCC9C5A/t51.2885-19/s150x150/33143922_237271910362316_6290555001760645120_n.jpg","requested_by_viewer":false,"username":"vernamanning46464","connected_fb_page":null,"edge_felix_combined_post_uploads":{"count":0,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"edge_felix_combined_draft_uploads":{"count":0,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"edge_felix_video_timeline":{"count":0,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"edge_felix_drafts":{"count":0,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"edge_felix_pending_post_uploads":{"count":0,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"edge_felix_pending_draft_uploads":{"count":0,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"edge_owner_to_timeline_media":{"count":2,"page_info":{"has_next_page":false,"end_cursor":"AQAQt_06KHhticevO8Am12l3GJ1CdrZVdUztIDyZN7oXm_IVmr2Clwi844aWh9oe9TU"},"edges":[{"node":{"__typename":"GraphImage","id":"1810494542282448836","edge_media_to_caption":{"edges":[{"node":{"text":"What a sunny day!"}}]},"shortcode":"BkgKzGch1_EsxkqWK-4ZjG_XoWfrFxgXIOrZqs0","edge_media_to_comment":{"count":24},"comments_disabled":false,"taken_at_timestamp":1530047789,"dimensions":{"height":1080,"width":1080},"display_url":"https://instagram.fbed1-1.fna.fbcdn.net/vp/d82d797684ce57fef7a9fe87c74d2342/5BCE0CF2/t51.2885-15/s1080x1080/e15/fr/35274418_207295373248007_2552664476088270848_n.jpg","edge_liked_by":{"count":0},"edge_media_preview_like":{"count":0},"gating_info":null,"media_preview":"ACoqnuL0Qj5cMT0Gf1rBdi5LHqasXLK8hZBtB7fz/WoMV1xjZGLZHijFSbaNtXYVwhgaZtq/n2H1q9/Zn/TRfyNVAxUYHGetNyfU1LT6DuXnsnTn09f6VX21tERy/wAZCjt3P41BNaqMGLkHj1pRl0luJrqjM20m2rrwGM7TjPtzTobYynGQB3J7VpdWv0I8ihtpNtdGkdvGnlvtb1PqaZstPQfmf8az9ouzL5X3RmLMGPIxgdulSiYDgZAPNVQOB/nvSjmvP9pLa5fKibzlPTmjzlH0qMKB27H+ZqOn7SXcXKiwZFAzTPNHpVVSc1ISaPaz7i5Uf//Z","owner":{"id":"7507466602"},"thumbnail_src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/7cecb59edaba9f9f7565604eac28d8df/5BC63210/t51.2885-15/s640x640/sh0.08/e35/35274418_207295373248007_2552664476088270848_n.jpg","thumbnail_resources":[{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/b499ce5fafa113fe57f7325d86628900/5BE96296/t51.2885-15/s150x150/e15/35274418_207295373248007_2552664476088270848_n.jpg","config_width":150,"config_height":150},{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/f124ca8254e24569515be5f3f99ff911/5BE9A3A9/t51.2885-15/s240x240/e15/35274418_207295373248007_2552664476088270848_n.jpg","config_width":240,"config_height":240},{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/5c82e7c2ae3905863fe25150fca1f5e4/5BCB4ED1/t51.2885-15/s320x320/e15/35274418_207295373248007_2552664476088270848_n.jpg","config_width":320,"config_height":320},{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/e76685c6614c444d8ed5f04efc01435a/5BB6D257/t51.2885-15/s480x480/e15/35274418_207295373248007_2552664476088270848_n.jpg","config_width":480,"config_height":480},{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/7cecb59edaba9f9f7565604eac28d8df/5BC63210/t51.2885-15/s640x640/sh0.08/e35/35274418_207295373248007_2552664476088270848_n.jpg","config_width":640,"config_height":640}],"is_video":false}},{"node":{"__typename":"GraphImage","id":"1757529388200541080","edge_media_to_caption":{"edges":[{"node":{"text":"What a nice day."}}]},"shortcode":"Bhj_6qyALuYgmy2sPgmUtoBcmcxZWGeyLkM3O00","edge_media_to_comment":{"count":3},"comments_disabled":false,"taken_at_timestamp":1523733851,"dimensions":{"height":1080,"width":1080},"display_url":"https://instagram.fbed1-1.fna.fbcdn.net/vp/16610d58bb6cc90893ffd264f81755c6/5BAD1DBC/t51.2885-15/s1080x1080/e15/fr/30590929_101347367387069_7153309976138612736_n.jpg","edge_liked_by":{"count":1},"edge_media_preview_like":{"count":1},"gating_info":null,"media_preview":"ACoqwgadupijNLn0oAdvNPErDoahopiLkdw6nIPNTfaX9apxDNS7fencLDIAP1qFsZ4qYAx9CDmmMmD1B/GsxkRoqXbnjjP1FN2H2/MVVwHKcVJvqLYf8ml2tSAtbce/vjFIFYeh/Sn0Uhjh7gU8Njjt+dMFLQA4Y9B+HFO+X0/z+VMpaAP/2Q==","owner":{"id":"7507466602"},"thumbnail_src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/a4dfd1c28505301d4c440c95023fbbc7/5BC81D5E/t51.2885-15/s640x640/sh0.08/e35/30590929_101347367387069_7153309976138612736_n.jpg","thumbnail_resources":[{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/d69525a20a2e61b2ee8663daf287a8ee/5BB46AD8/t51.2885-15/s150x150/e15/30590929_101347367387069_7153309976138612736_n.jpg","config_width":150,"config_height":150},{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/4034b1752e3a4bace405aadd5a35477c/5BCB79E7/t51.2885-15/s240x240/e15/30590929_101347367387069_7153309976138612736_n.jpg","config_width":240,"config_height":240},{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/96b684f38cb826f3efd8a7610ed6e9bb/5BEB899F/t51.2885-15/s320x320/e15/30590929_101347367387069_7153309976138612736_n.jpg","config_width":320,"config_height":320},{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/51dac35520bc90d7c7253cc331acf561/5BB44919/t51.2885-15/s480x480/e15/30590929_101347367387069_7153309976138612736_n.jpg","config_width":480,"config_height":480},{"src":"https://instagram.fbed1-1.fna.fbcdn.net/vp/a4dfd1c28505301d4c440c95023fbbc7/5BC81D5E/t51.2885-15/s640x640/sh0.08/e35/30590929_101347367387069_7153309976138612736_n.jpg","config_width":640,"config_height":640}],"is_video":false}}]},"edge_saved_media":{"count":0,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]},"edge_media_collections":{"count":0,"page_info":{"has_next_page":false,"end_cursor":null},"edges":[]}}},"felix_onboarding_video_resources":{"mp4":"/static/videos/felix-onboarding/onboardingVideo.mp4/9d16838ca7f9.mp4","poster":"/static/images/felix-onboarding/onboardingVideoPoster.png/8fdba7cf2120.png"}}]},"gatekeepers":{"ld":true,"rt":true,"sw":true,"vl":true,"seo":true,"seoht":true,"2fac":true,"sf":true,"saa":true,"ai":true},"knobs":{"acct:ntb":0,"cb":0,"captcha":0},"qe":{"dash_for_vod":{"g":"","p":{}},"aysf":{"g":"","p":{}},"bc3l":{"g":"","p":{}},"comment_reporting":{"g":"","p":{}},"direct_conversation_reporting":{"g":"","p":{}},"direct_reporting":{"g":"","p":{}},"reporting":{"g":"","p":{}},"media_reporting":{"g":"","p":{}},"acc_recovery_link":{"g":"","p":{}},"notif":{"g":"","p":{}},"drct_nav":{"g":"","p":{}},"fb_unlink":{"g":"","p":{}},"mobile_stories_doodling":{"g":"","p":{}},"move_comment_input_to_top":{"g":"","p":{}},"mobile_cancel":{"g":"","p":{}},"mobile_search_redesign":{"g":"","p":{}},"show_copy_link":{"g":"control","p":{"show_copy_link_option":"false"}},"mobile_logout":{"g":"","p":{}},"pl_pivot_li":{"g":"control_0423","p":{"show_pivot":"false"}},"pl_pivot_lo":{"g":"","p":{}},"404_as_react":{"g":"","p":{}},"acc_recovery":{"g":"test_with_prefill","p":{"has_prefill":"true"}},"collections":{"g":"","p":{}},"comment_ta":{"g":"","p":{}},"connections":{"g":"control","p":{"has_suggestion_context_in_feed":"false"}},"disc_ppl":{"g":"control_02_27","p":{"has_follow_all_button":"false","has_pagination":"false"}},"embeds":{"g":"","p":{}},"ebdsim_li":{"g":"control_shadow_0322","p":{"is_shadow_enabled":"false","use_new_ui":"true"}},"ebdsim_lo":{"g":"","p":{}},"empty_feed":{"g":"","p":{}},"bundles":{"g":"","p":{}},"exit_story_creation":{"g":"","p":{}},"gdpr_logged_out":{"g":"","p":{}},"appsell":{"g":"","p":{}},"imgopt":{"g":"control","p":{}},"follow_button":{"g":"test","p":{"is_inline":"true"}},"loggedout":{"g":"","p":{}},"loggedout_upsell":{"g":"test_with_new_loggedout_upsell_content_03_15_18","p":{"has_new_loggedout_upsell_content":"true"}},"us_li":{"g":"Test","p":{"show_related_media":"true"}},"msisdn":{"g":"","p":{}},"bg_sync":{"g":"","p":{}},"onetaplogin":{"g":"default_opt_in","p":{"default_value":"true","during_reg":"true","storage_version":"one_tap_storage_version"}},"onetaplogin_userbased":{"g":"","p":{}},"login_poe":{"g":"","p":{}},"prvcy_tggl":{"g":"","p":{}},"private_lo":{"g":"","p":{}},"profile_photo_nux_fbc_v2":{"g":"launch","p":{"prefill_photo":"true","skip_nux":"false"}},"profile_tabs":{"g":"","p":{}},"push_notifications":{"g":"","p":{}},"reg":{"g":"control_01_10","p":{"has_new_landing_appsells":"false","has_new_landing_page":"false"}},"reg_vp":{"g":"","p":{}},"feed_vp":{"g":"launch","p":{"is_hidden":"true"}},"report_haf":{"g":"","p":{}},"report_media":{"g":"","p":{}},"report_profile":{"g":"test","p":{"is_enabled":"true"}},"save":{"g":"test","p":{"is_enabled":"true"}},"sidecar":{"g":"","p":{}},"sidecar_swipe":{"g":"","p":{}},"su_universe":{"g":"test_login_autocomplete","p":{"use_autocomplete_signup":"true"}},"stale":{"g":"","p":{}},"stories_lo":{"g":"test_03_15","p":{"stories_profile":"true"}},"stories":{"g":"","p":{}},"tp_pblshr":{"g":"","p":{}},"video":{"g":"","p":{}},"gdpr_settings":{"g":"","p":{}},"gdpr_blocking_logout":{"g":"","p":{}},"gdpr_eu_tos":{"g":"","p":{}},"gdpr_row_tos":{"g":"test_05_01","p":{"tos_version":"row"}},"fd_gr":{"g":"control","p":{"show_post_back_button":"false"}},"felix":{"g":"test","p":{"is_enabled":"true"}},"felix_clear_fb_cookie":{"g":"control","p":{"is_enabled":"true","blacklist":"fbsr_124024574287414"}},"felix_creation_duration_limits":{"g":"dogfooding","p":{"minimum_length_seconds":"15","maximum_length_seconds":"600"}},"felix_creation_enabled":{"g":"","p":{"is_enabled":"true"}},"felix_creation_fb_crossposting":{"g":"control","p":{"is_enabled":"false"}},"felix_creation_fb_crossposting_v2":{"g":"control","p":{"is_enabled":"true"}},"felix_creation_validation":{"g":"control","p":{"edit_video_controls":"true"}},"felix_creation_video_upload":{"g":"","p":{}},"felix_early_onboarding":{"g":"","p":{}},"pride":{"g":"test","p":{"enabled":"true","hashtag_whitelist":"lgbt,lesbian,gay,bisexual,transgender,trans,queer,lgbtq,girlslikeus,girlswholikegirls,instagay,pride,gaypride,loveislove,pansexual,lovewins,transequalitynow,lesbiansofinstagram,asexual,nonbinary,lgbtpride,lgbta,lgbti,queerfashion,queers,queerpride,queerlife,marriageequality,pride2018,genderqueer,bi,genderfluid,lgbtqqia,comingout,intersex,transman,transwoman,twospirit,transvisibility,queerart,dragqueen,dragking,dragartist,twomoms,twodads,lesbianmoms,gaydads,gendernonconforming"}},"unfollow_confirm":{"g":"","p":{}},"profile_enhance_li":{"g":"control","p":{"has_tagged":"false"}},"profile_enhance_lo":{"g":"control","p":{"has_tagged":"false"}},"create_tag":{"g":"","p":{}}},"hostname":"www.instagram.com","platform":"ios","rhx_gis":"87a25368813608d393baaa28a0d6afb7","nonce":"zsP4NjzdJRIWmer6K5At1A==","zero_data":{},"rollout_hash":"5f72737283f8","bundle_variant":"base","probably_has_app":false,"show_app_install":true};</script>
And this is the code I am trying to execute.
s = str(soup.find_all("script", type="text/javascript")[3])
m = re.search(r"(?<=window._sharedData = )(?P<json>.*)(?=</script>)", s)
if m:
data = json.loads(m.group('json'))
print(data)
for i in data['entry_data']["ProfilePage"]:
for j in i['graphql']['user']['edge_owner_to_timeline_media']['edges']:
print(j['node']["id"])
Upon running this, I am prompted with the following error:
json.decoder.JSONDecodeError: Extra data: line 1 column 12215 (char 12214)
I am completely lost and have no idea where I am going wrong. All help is appreciated and thanks to all of those who contribute in advance!

I think you could update your regex to match the json without the semicolon at the end by adding that to the positive lookahead (?=;</script>):
(?<=window\._sharedData = )(?P<json>.*)(?=;</script>)
Your code might look like this without the [3] in the first line for your given example:
s = str(soup.find_all("script", type="text/javascript"))
m = re.search(r"(?<=window\._sharedData = )(?P<json>.*)(?=;</script>)", s)
if m:
data = json.loads(m.group('json'))
for i in data['entry_data']["ProfilePage"]:
for j in i['graphql']['user']['edge_owner_to_timeline_media']['edges']:
print(j['node']["id"])

There's not quite enough here to debug, what you give for s doesn't include the </script> so the pattern never matches when I run it locally, however when I append it, it seems to work correctly
From the error it is clear that the contents of m.group('json') is not actually a valid JSON string so I suspect you need to work on your regular expression. Try printing out the value of m.group('json') (before attempting to parse it) and feeding that into a a json validator such as https://jsonlint.com/ which will direct you to where the error lies, perhaps that line terminates with a ; that you need to strip out or some other issue

Related

Extract text from a config file [duplicate]

This question already has answers here:
Parse key value pairs in a text file
(7 answers)
Closed 1 year ago.
I'm using a config file to inform my Python script of a few key-values, for use in authenticating the user against a website.
I have three variables: the URL, the user name, and the API token.
I've created a config file with each key on a different line, so:
url:<url string>
auth_user:<user name>
auth_token:<API token>
I want to be able to extract the text after the key words into variables, also stripping any "\n" that exist at the end of the line. Currently I'm doing this, and it works but seems clumsy:
with open(argv[1], mode='r') as config_file:
lines = config_file.readlines()
for line in lines:
url_match = match('jira_url:', line)
if url_match:
jira_url = line[9:].split("\n")[0]
user_match = match('auth_user:', line)
if user_match:
auth_user = line[10:].split("\n")[0]
token_match = match('auth_token', line)
if token_match:
auth_token = line[11:].split("\n")[0]
Can anybody suggest a more elegant solution? Specifically it's the ... = line[10:].split("\n")[0] lines that seem clunky to me.
I'm also slightly confused why I can't reuse my match object within the for loop, and have to create new match objects for each config item.
you could use a .yml file and read values with yaml.load() function:
import yaml
with open('settings.yml') as file:
settings = yaml.load(file, Loader=yaml.FullLoader)
now you can access elements like settings["url"] and so on
If the format is always <tag>:<value> you can easily parse it by splitting the line at the colon and filling up a custom dictionary:
config_file = open(filename,"r")
lines = config_file.readlines()
config_file.close()
settings = dict()
for l in lines:
elements = l[:-1].split(':')
settings[elements[0]] = ':'.join(elements[1:])
So, you get a dictionary that has the tags as keys and the values as values. You can then just refer to these dictionary entries in your pogram.
(e.g.: if you need the auth_token, just call settings["auth_token"]
if you can add 1 line for config file, configparser is good choice
https://docs.python.org/3/library/configparser.html
[1] config file : 1.cfg
[DEFAULT] # configparser's config file need section name
url:<url string>
auth_user:<user name>
auth_token:<API token>
[2] python scripts
import configparser
config = configparser.ConfigParser()
config.read('1.cfg')
print(config.get('DEFAULT','url'))
print(config.get('DEFAULT','auth_user'))
print(config.get('DEFAULT','auth_token'))
[3] output
<url string>
<user name>
<API token>
also configparser's methods is useful
whey you can't guarantee config file is always complete
You have a couple of great answers already, but I wanted to step back and provide some guidance on how you might approach these problems in the future. Getting quick answers sometimes prevents you from understanding how those people knew about the answers in the first place.
When you zoom out, the first thing that strikes me is that your task is to provide config, using a file, to your program. Software has the remarkable property of solve-once, use-anywhere. Config files have been a problem worth solving for at least 40 years, so you can bet your bottom dollar you don't need to solve this yourself. And already-solved means someone has already figured out all the little off-by-one and edge-case dramas like stripping line endings and dealing with expected input. The challenge of course, is knowing what solution already exists. If you haven't spent 40 years peeling back the covers of computers to see how they tick, it's difficult to "just know". So you might have a poke around on Google for "config file format" or something.
That would lead you to one of the most prevalent config file systems on the planet - the INI file. Just as useful now as it was 30 years ago, and as a bonus, looks not too dissimilar to your example config file. Then you might search for "read INI file in Python" or something, and come across configparser and you're basically done.
Or you might see that sometime in the last 30 years, YAML became the more trendy option, and wouldn't you know it, PyYAML will do most of the work for you.
But none of this gets you any better at using Python to extract from text files in general. So zooming in a bit, you want to know how to extract parts of lines in a text file. Again, this problem is an age-old problem, and if you were to learn about this problem (rather than just be handed the solution), you would learn that this is called parsing and often involves tokenisation. If you do some research on, say "parsing a text file in python" for example, you would learn about the general techniques that work regardless of the language, such as looping over lines and splitting each one in turn.
Zooming in one more step closer, you're looking to strip the new line off the end of the string so it doesn't get included in your value. Once again, this ain't a new problem, and with the right keywords you could dig up the well-trodden solutions. This is often called "chomping" or "stripping", and with some careful search terms, you'd find rstrip() and friends, and not have to do awkward things like splitting on the '\n' character.
Your final question is about re-using the match object. This is much harder to research. But again, the "solution" wont necessarily show you where you went wrong. What you need to keep in mind is that the statements in the for loop are sequential. To think them through you should literally execute them in your mind, one after one, and imagine what's happening. Each time you call match, it either returns None or a Match object. You never use the object, except to check for truthiness in the if statement. And next time you call match, you do so with different arguments so you get a new Match object (or None). Therefore, you don't need to keep the object around at all. You can simply do:
if match('jira_url:', line):
jira_url = line[9:].split("\n")[0]
if match('auth_user:', line):
auth_user = line[10:].split("\n")[0]
and so on. Not only that, if the first if triggered then you don't need to bother calling match again - it will certainly not trigger any of other matches for the same line. So you could do:
if match('jira_url:', line):
jira_url = line[9:].rstrip()
elif match('auth_user:', line):
auth_user = line[10:].rstrip()
and so on.
But then you can start to think - why bother doing all these matches on the colon, only to then manually split the string at the colon afterwards? You could just do:
tokens = line.rstrip().split(':')
if token[0] == 'jira_url':
jira_url = token[1]
elif token[0] == 'auth_user':
auth_user = token[1]
If you keep making these improvements (and there's lots more to make!), eventually you'll end up re-writing configparse, but at least you'll have learned why it's often a good idea to use an existing library where practical!

Is there a way to avoid double quotation in formatting strings which include quotations inside them in Python

It is not supposed to be a hard problem, but I've worked on it for almost a day!
I want to create a query which has to be in this format: 'lat':'###','long':'###' where ###s represent latitude and longitude.
I am using the following code to generate the queries:
coordinateslist=[]
for i in range(len(lat)):
coordinateslist.append("'lat':'{}','long':'-{}'".format(lat[i],lon[i]))
coordinateslist
However the result would be some thing similar to this which has "" at the beginning and end of it: "'lat':'40.66','long':'-73.93'"
Ridiculously enough it's impossible to remove the " with either .replace or .strip! and wrapping the terms around repr doesn't solve the issue.
Do you know how I can get rid of those double quotation marks?
P.S. I know that when I print the command the "will not be shown but when i use each element of the array in my query, a " will appear at the end of the query which stops it from working.
directly writing the line like this:
query_params = {'lat':'43.57','long':'-116.56'}
works perfectly fine.
but using either of the codes below will lead to an error.
aa=print(coordinateslist[0])
bb=coordinateslist[0]
query_params = {aa}
query_params = {bb}
query_params = aa
query_params = bb
Try using a dictionary instead, if you don't want to see the " from string representation:
coordinateslist.append({
"lat": lat[i],
"long": "-{}".format(lon[i])
})
It is likely that the error you are getting is something else entirely (i.e. unrelated to the quotes you are seeing in printed outputs). From the format of the query, I would guess that it is expecting a properly formatted URL parameters: reverse_geocode.json?... which 'lat':'41.83','long':'-87.68' is not. Did you try to manually call it with a fixed string, (e.g. using with postman) ?
Assuming you're calling the twitter geo/ API, you might want to ry it out with properly separated URL parameters.
geo/reverse_geocode.json?lat=41.83&long=-87.68

Python-How to execute code and store into variable?

So I have been struggling with this issue for what seems like forever now (I'm pretty new to Python). I am using Python 3.7 (need it to be 3.7 due to variations in the versions of packages I am using for the project) to develop an AI chatbot system that can converse with you based on your text input. The program reads the contents of a series of .yml files when it starts. In one of the .yml files I am developing a syntax for when the first 5 characters match a ^###^ pattern, it will instead execute the code and return the result of that execution rather than just output text back to the user. For example:
Normal Conversation:
- - What is AI?
- Artificial Intelligence is the branch of engineering and science devoted to constructing machines that think.
Service/Code-based conversation:
- - Say hello to me
- ^###^print("HELLO")
The idea is that when you ask it to say hello to you, the ^##^print("HELLO") string will be retrieved from the .yml file, the first 5 characters of the response will be removed, the response will be sent to a separate function in the python code where it will run the code and store the result into a variable which will be returned from the function into a variable that will give the nice, clean result of HELLO to the user. I realize that this may be a bit hard to follow, but I will straighten up my code and condense everything once I have this whole error resolved. As a side note: Oracle is just what I am calling the project. I'm not trying to weave Java into this whole mess.
THE PROBLEM is that it does not store the result of the code being run/executed/evaluated into the variable like it should.
My code:
def executecode(input):
print("The code to be executed is: ",input)
#note: the input may occasionally have single quotes and/or double quotes in the input string
result = eval("{}".format(input))
print ("The result of the code eval: ", result)
test = eval("2+2")
test
print(test)
return result
#app.route("/get")
def get_bot_response():
userText = request.args.get('msg')
print("Oracle INTERPRETED input: ", userText)
ChatbotResponse = str(english_bot.get_response(userText))
print("CHATBOT RESPONSE VARIABLE: ", ChatbotResponse)
#The interpreted string was a request due to the ^###^ pattern in front of the response in the custom .yml file
if ChatbotResponse[:5] == '^###^':
print("---SERVICE REQUEST---")
print(executecode(ChatbotResponse[5:]))
interpreter_response = executecode(ChatbotResponse[5:])
print("Oracle RESPONDED with: ", interpreter_response)
else:
print("Oracle RESPONDED with: ", ChatbotResponse)
return ChatbotResponse
When I run this code, this is the output:
Oracle INTERPRETED input: How much RAM do you have?
CHATBOT RESPONSE VARIABLE: ^###^print("HELLO")
---SERVICE REQUEST---
The code to be executed is: print("HELLO")
HELLO
The result of the code eval: None
4
None
The code to be executed is: print("HELLO")
HELLO
The result of the code eval: None
4
Oracle RESPONDED with: None
Output on the website interface
Essentially, need it to say HELLO for the "The result of the code eval:" output. This should get it to where the chatbot responds with HELLO in the web interface, which is the end goal here. It seems as if it IS executing the code due to the HELLO's after the "The code to be executed is:" output text. It's just not storing it into a variable like I need it to.
I have tried eval, exec, ast.literal_eval(), converting the input to string with str(), changing up the single and double quotes, putting \ before pairs of quotes, and a few other things. Whenever I get it to where the program interprets "print("HELLO")" when it executes the code, it complains about the syntax. Also, from several days of looking online I have figured out that exec and eval aren't generally favored due to a bunch of issues, however I genuinely do not care about that at the moment because I am trying to make something that works before I make something that is good and works. I have a feeling the problem is something small and stupid like it always is, but I have no idea what it could be. :(
I used these 2 resources as the foundation for the whole chatbot project:
Text Guide
Youtube Guide
Also, I am sorry for the rather lengthy and descriptive question. It's rare that I have to ask a question of my own on stackoverflow because if I have a question, it usually already has a good answer. It feels like I've tried everything at this point. If you have a better suggestion of how to do this whole system or you think I should try approaching this another way, I'm open to ideas.
Thank you for any/all help. It is very much appreciated! :)
The issue is that python's print() doesn't have a return value, meaning it will always return None. eval simply evaluates some expression, and returns back the return value from that expression. Since print() returns None, an eval of some print statement will also return None.
>>> from_print = print('Hello')
Hello
>>> from_eval = eval("print('Hello')")
Hello
>>> from_print is from_eval is None
True
What you need is a io stream manager! Here is a possible solution that captures any io output and returns that if the expression evaluates to None.
from contextlib import redirect_stout, redirect_stderr
from io import StringIO
# NOTE: I use the arg name `code` since `input` is a python builtin
def executecodehelper(code):
# Capture all potential output from the code
stdout_io = StringIO()
stderr_io = StringIO()
with redirect_stdout(stdout_io), redirect_stderr(stderr_io):
# If `code` is already a string, this should work just fine without the need for formatting.
result = eval(code)
return result, stdout_io.getvalue(), stderr_io.getvalue()
def executecode(code):
result, std_out, std_err = executecodehelper(code)
if result is None:
# This code didn't return anything. Maybe it printed something?
if std_out:
return std_out.rstrip() # Deal with trailing whitespace
elif std_err:
return std_err.rstrip()
else:
# Nothing was printed AND the return value is None!
return None
else:
return result
As a final note, this approach is heavily linked to eval since eval can only evaluate a single statement. If you want to extend your bot to multiple line statements, you will need to use exec, which changes the logic. Here's a great resource detailing the differences between eval and exec: What's the difference between eval, exec, and compile?
It is easy just convert try to create a new list and add the the updated values of that variable to it, for example:
if you've a variable name myVar store the values or even the questions no matter.
1- First declare a new list in your code as below:
myList = []
2- If you've need to answer or display the value through myVar then you can do like below:
myList.append(myVar)
and this if you have like a generator for the values instead if you need the opposite which means the values are already stored then you will just update the second step to be like the following:
myList[0]='The first answer of the first question'
myList[1]='The second answer of the second question'
ans here all the values will be stored in your list and you can also do this in other way, for example using loops is will be much better if you have multiple values or answers.

How to do parsing in python?

I'm kinda new to Python. And I'm trying to find out how to do parsing in Python?
I've got a task: to do parsing with some piece of unknown for me symbols and put it to DB. I guess I can create DB and tables with help of SQLAlchemy, but I have no idea how to do parsing and what all these symbols below mean?
http://joxi.ru/YmEVXg6Iq3Q426
http://joxi.ru/E2pvG3NFxYgKrY
$$HDRPUBID 112701130020011127162536
H11127011300UNIQUEPONUMBER120011127
D11127011300UNIQUEPONUMBER100001112345678900000001
D21127011300UNIQUEPONUMBER1000011123456789AR000000001
D11127011300UNIQUEPONUMBER200002123456987X000000001
D21127011300UNIQUEPONUMBER200002123456987XIR000000000This item is inactive. 9781605600000
$$EOFPUBID 1127011300200111271625360000005
Thanks in advance those who can give me some advices what to start from and how the parsing is going on?
The best approach is to first figure out where each token begins and ends, and write a regular expression to capture these. The site RegexPal might help you design the regex.
As other suggest take a look to some regex tutorials, and also re module help.
Probably you're looking to something like this:
import re
headerMapping = {'type': (1,5), 'pubid': (6,11), 'batchID': (12,21),
'batchDate': (22,29), 'batchTime': (30,35)}
poaBatchHeaders = re.findall('\$\$HDR\d{30}', text)
parsedBatchHeaders = []
batchHeaderDict = {}
for poaHeader in poaBatchHeaders:
for key in headerMapping:
start = headerMapping[key][0]-1
end = headerMapping[key][1]
batchHeaderDict.update({key: poaHeader[start:end]})
parsedBatchHeaders.append(batchHeaderDict)
Then you have list with dicts, each dict contains data for each attribute. I assume that you have your datafile in text which is string. Each dict is made for one found structure (POA Batch Header in example).
If you want to parse it further, you have to made a function to parse each date in each attribute.
def batchDate(batch):
return (batch[0:2]+'-'+batch[2:4]+'-20'+batch[4:])
for header in parsedBatchHeaders:
header.update({'batchDate': batchDate( header['batchDate'] )})
Remember, that's an example and I don't know documentation of your data! I guess it works like that, but rest is up to you.

Python: Matching & Stripping port number from socket data

I have data coming in to a python server via a socket. Within this data is the string '<port>80</port>' or which ever port is being used.
I wish to extract the port number into a variable. The data coming in is not XML, I just used the tag approach to identifying data for future XML use if needed. I do not wish to use an XML python library, but simply use something like regexp and strings.
What would you recommend is the best way to match and strip this data?
I am currently using this code with no luck:
p = re.compile('<port>\w</port>')
m = p.search(data)
print m
Thank you :)
Regex can't parse XML and shouldn't be used to parse fake XML. You should do one of
Use a serialization method that is nicer to work with to start with, such as JSON or an ini file with the ConfigParser module.
Really use XML and not something that just sort of looks like XML and really parse it with something like lxml.etree.
Just store the number in a file if this is the entirety of your configuration. This solution isn't really easier than just using JSON or something, but it's better than the current one.
Implementing a bad solution now for future needs that you have no way of defining or accurately predicting is always a bad approach. You will be kept busy enough trying to write and maintain software now that there is no good reason to try to satisfy unknown future needs. I have never seen a case where "I'll put this in for later" has led to less headache later on, especially when I put it in by doing something completely wrong. YAGNI!
As to what's wrong with your snippet other than using an entirely wrong approach, angled brackets have a meaning in regex.
Though Mike Graham is correct, using regex for xml is not 'recommended', the following will work:
(I have defined searchType as 'd' for numerals)
searchStr = 'port'
if searchType == 'd':
retPattern = '(<%s>)(\d+)(</%s>)'
else:
retPattern = '(<%s>)(.+?)(</%s>)'
searchPattern = re.compile(retPattern % (searchStr, searchStr))
found = searchPattern.search(searchStr)
retVal = found.group(2)
(note the complete lack of error checking, that is left as an exercise for the user)

Categories

Resources