SELECT url,
regexp_replace(title, '(http|ftp|file|https)://[-a-z0-9+&##/\%?=~_-|!:,.;/]*|\<.*?\>|(=+)\s*(.*?)\s*(=+)|&\w+;', '') AS text_body
FROM df_table_doc
0 https://demo.com New Arch {Onboarding}..Lets (Onboard) it..
1 https://example.com New Arch (Onboarding)
Adding the pattern \{.*?\} to replace anything within {} is failing with :
IndexError: tuple index out of range
IndexError Traceback (most recent call last)
<ipython-input-1-20460659c049> in <module>
----> 1 get_ipython().run_cell_magic('spark_sql', '--limit 200', "select url, regexp_replace(title, '(http|ftp|file|https)://[-a-z0-9+&##/\\%?=~_-|!:,.;/]*|\\<.*?\\>|\\{.*?\\}|(=+)\\s*(.*?)\\s*(=+)|&\\w+;', '') as text_body\n from df_table_doc\n")
Related
I analyzed the data in the precedent and tried to use topic modeling. Here is a
syntax I am using:
According to the error, I think it means that the string should go in when
joining, but the tuple was found. I don't know how to fix this part.
class FacebookAccessException(Exception): pass
def get_profile(request, token=None):
...
response = json.loads(urllib_response)
if 'error' in response:
raise FacebookAccessException(response['error']['message'])
access_token = response['access_token'][-1]
return access_token
#Join the review
word_list = ",".join([",".join(i) for i in sexualhomicide['tokens']])
word_list = word_list.split(",")
This is Error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\Users\Public\Documents\ESTsoft\CreatorTemp\ipykernel_13792\3474859476.py in <module>
1 #Join the review
----> 2 word_list = ",".join([",".join(i) for i in sexualhomicide['tokens']])
3 word_list = word_list.split(",")
C:\Users\Public\Documents\ESTsoft\CreatorTemp\ipykernel_13792\3474859476.py in <listcomp>(.0)
1 #Join the review
----> 2 word_list = ",".join([",".join(i) for i in sexualhomicide['tokens']])
3 word_list = word_list.split(",")
TypeError: sequence item 0: expected str instance, tuple found
This is print of 'sexual homicide'
print(sexualhomicide['cleaned_text'])
print("="*30)
print(twitter.pos(sexualhomicide['cleaned_text'][0],Counter('word')))
I can't upload the results of this syntax. Error occurs because it is classified as spam during the upload process.
import requests # get connection
import pandas as pd
import json
def get_info(data):
data=[]
source=[]
published_date=[]
adx_keywords=[]
byline=[]
title=[]
abstract=[]
des_facet=[]
per_facet=[]
media=[]
Api_Key=''
url='https://api.nytimes.com/svc/mostpopular/v2/viewed/7.json?api-key=' # key redacted
response=requests.get(url).json()
for i in response['results']:
source.append(i['source'])
published_date.append(i['published_date'])
adx_keywords.append(i['adx_keywords'])
byline.append(i['byline'])
title.append(i['title'])
abstract.append(i['abstract'])
des_facet.append(i['des_facet'])
per_facet.append(i['per_facet'])
media.append(i['media'])
data=data.append({'source':source,'published_date':published_date,'adx_keywords':adx_keywords,byline':byline, 'title':title,'abstract':abstract,'des_facet':des_facet,
'per_facet':per_facet,'media':media})
df=df.append(d)
return df
df NameError
Traceback (most recent call last)
<ipython-input-292-00cf07b74dcd> in <module>()
----> 1 df
NameError: name 'df' is not defined
your hyphens are in the the wrong place
before:
data=data.append({'source':source,'published_date':published_date,'adx_keywords':adx_keywords,byline':byline, 'title':title,'abstract':abstract,'des_facet':des_facet,
'per_facet':per_facet,'media':media})
after:
data=data.append({'source':source,'published_date':published_date,'adx_keywords':adx_keywords,'byline':byline, 'title':title, 'abstract':abstract,'des_facet':des_facet,
'per_facet':per_facet,'media':media})
I would like to loop a Bloomberg IntraDayBar request over a dynamic list of 22 tickers and then combined the result into one dataframe:
This code generates the following list of tickers:
bquery = blp.BlpQuery().start()
dates = pd.bdate_range(end='today', periods=31)
time = datetime.datetime.now()
bcom_info = bquery.bds("BCOM Index", "INDX_MEMBERS")
bcom_info['ticker'] = bcom_info['Member Ticker and Exchange Code'].astype(str) + ' Comdty'
I would like to create a dataframe that returns the volume for each ticker, contained in the 'TRADE' event_type. Effectively looping the below code over each of the tickers in bcom_info.
bquery.bdib(bcom_info['ticker'], event_type='TRADE', interval=60, start_datetime=dates[0], end_datetime=time)
I tried this but couldn't get it to work:
def bloom_func(x, func):
bloomberg = bquery
return bquery.bdib(x, func, event_type='TRADE', interval=60, start_datetime=dates[0], end_datetime=time)
for d in bcom_info['ticker']:
x[d] = bloom_func(d)
It generates the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-80-fcbf4acd6840> in <module>
2
3 for d in tickers:
----> 4 x[d] = bloom_func(d)
TypeError: bloom_func() missing 1 required positional argument: 'func'
My script stops scraping after 449th Yelp restaurant.
Entire Code: https://pastebin.com/5U3irKZp
for idx, item in enumerate(yelp_containers, 1):
print("--- Restaurant number #", idx)
restaurant_title = item.h3.get_text(strip=True)
restaurant_title = re.sub(r'^[\d.\s]+', '', restaurant_title)
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
The error I am getting is:
Traceback (most recent call last):
File "/Users/kenny/MEGA/Python/yelp scraper.py", line 41, in
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
IndexError: list index out of range
The problem is that some restaurants are missing the address, for example this one:
What you should do is check first, if the address has enough elements before indexing it. Change this line of code:
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')[1]
to these:
restaurant_address = item.select_one('[class*="secondaryAttributes"]').get_text(separator='|', strip=True).split('|')
restaurant_address = restaurant_address[1] if len(restaurant_address) > 1 else restaurant_address[0]
I ran your parser for all pages and it worked.
I am assigning the stack trace variable traceback.format_exc() to a list as below ,strange thing I notice is after appending ,all the single quotes(') get escaped (\') as can be seen from the output below.
I looked on google #https://github.com/behave/behave/issues/336 and tried to assign (traceback.format_exc(), sys.getfilesystemencoding() which didn't work either,am very curious why is this happening and how to fix this?
import traceback
clonedRadarsdetailslist = []
clonedRadardetails = {}
try:
#raise
(updateproblemoutput,updateproblempassfail) = r.UpdateProblem(problemID=newRadarID, componentName=componentName, componentVersion=componentVersion,assigneeID=assignee,state=state,substate=substate,milestone=milestone, category=category,priority=priority,resolution=re_solution )
except:
clonedRadardetails['updatedFailedReason'] = traceback.format_exc()
clonedRadarsdetailslist.append(clonedRadardetails)
print clonedRadarsdetailslist
OUTPUT:-
['{\'clonedRadar\': 40171867, \'clonedStatus\': \'PASS\', \'clonedRadarFinalStatus\': \'PASS\', \'updatedFailedReason\': \'Traceback (most recent call last):\\n File "./cloneradar.py", line 174, in clone\\n (updatetitleoutput,updatetitlepassfail) = r.UpdateProble(problemID=newRadarID,title=title )\\nAttributeError: \\\'RadarWS\\\' object has no attribute \\\'UpdateProble\\\'\\n\', \'clonedRadarFinalStatusReason\': \'N/A\', \'updateStatus\': \'FAIL\', \'clonedStatusfailReason\': \'N/A\'}', '{\'clonedRadar\': 40171867, \'clonedStatus\': \'PASS\', \'clonedRadarFinalStatus\': \'PASS\', \'updatedFailedReason\': \'Traceback (most recent call last):\\n File "./cloneradar.py", line 174, in clone\\n (updatetitleoutput,updatetitlepassfail) = r.UpdateProble(problemID=newRadarID,title=title )\\nAttributeError: \\\'RadarWS\\\' object has no attribute \\\'UpdateProble\\\'\\n\', \'clonedRadarFinalStatusReason\': \'N/A\', \'updateStatus\': \'FAIL\', \'clonedStatusfailReason\': \'N/A\'}']