'I am trying to fetch deals data from Hubspot, I am trying to fetch dealid and deal name in this example to simplify the question but later I will more properties. I have the following code that gives me an array of dealIds and one array of deal names. I could I make it that instead of multiple arrays I get the following instead:
{{12345,'deal1'}, {12346,'deal2'}, {12347,'deal3'}}
or something like:
{{'dealId': 12345, 'dealname' : 'deal1'}}
This is my code so far:
deals = []
names = []
def getdeals():
apikey = "demo"
url = 'https://api.hubapi.com/deals/v1/deal/paged?hapikey='+apikey+'&properties=dealname&limit=250'
response = requests.get(url)
jsonDeals = response.json()
for deal in jsonDeals['deals']:
properties = deal['properties']
deals.append(deal['dealId'])
names.append(properties['dealname']['value'])
You already have the data in json. Its just how you want to map and store it.
output={}
def getdeals():
apikey = "demo"
url = 'https://api.hubapi.com/deals/v1/deal/paged?hapikey='+apikey+'&properties=dealname&limit=250'
response = requests.get(url)
jsonDeals = response.json()
for deal in jsonDeals['deals']:
properties = deal['properties']
output.update({deal['dealId']: properties['dealname']['value']})
This can be solved using list comprehension:
[{'dealId':deal['dealId'],'dealname':deal['properties']['dealname']['value']} for deal in jsonDeals['deals']]
AS E.Serra suggested deal_obj = {'dealname': properties['dealname']['value'], 'dealid':deal['dealId']} solved the issue.
here is the updated code:
%%time
deals = []
def getdeals():
apikey = "demo"
url = 'https://api.hubapi.com/deals/v1/deal/paged?hapikey='+apikey+'&properties=dealname&limit=250'
response = requests.get(url)
jsonDeals = response.json()
for deal in jsonDeals['deals']:
properties = deal['properties']
deal_obj = {'dealname': properties['dealname']['value'], 'dealid':deal['dealId']}
deals.append(deal_obj)
Related
A have a code that gets your pastebin's data
def user_key():
user_key_data = {'api_dev_key': 'my-dev-key',
'api_user_name': 'my-username',
'api_user_password': 'my-password'}
req = urllib.request.urlopen('https://pastebin.com/api/api_login.php',
urllib.parse.urlencode(user_key_data).encode('utf-8'),
timeout=7)
return req.read().decode()
def user_pastes()
data = data = {'api_dev_key': 'my_dev_key',
'api_user_key': user_key(),
'api_option': 'list'}
req = urllib.request.urlopen('https://pastebin.com/api/api_post.php',
urllib.parse.urlencode(data).encode('utf-8'), timeout=7)
return req.read().decode()
Every Paste has a unique html tag e.g. url, title, paste key, etc.
The Above code will print these out per paste.
I made a code that only takes certain tags. the paste url, paste title and the paste key
my_pastes = []
src = user_pastes()
soup = BeautifulSoup(src, 'html.parser')
for paste in soup.findAll(['paste_url', 'paste_title', 'paste_key']):
my_pastes.append(paste.text)
print(my_pastes)
What I want is to join the url, title and key per paste together into one string.
I tried using the .join method but it only joins the chars. (might not make sense but you'll see when you try it)
Unrelated to the problem.
What I'll do once they're joined. split them again and put them in a PyQt5 table
So This is kind of the answer but I'm still looking for a more simpler code
title = []
key = []
url = []
src = user_pastes()
soup = BeautifulSoup(src, 'html.parser')
for paste_title in soup.findAll('paste_title'):
title.append(paste_title.text)
for paste_key in soup.findAll('paste_key'):
key.append(paste_key.text)
for paste_url in soup.findAll('paste_url'):
url.append(paste_url.text)
for i in range(len(title)):
print(title[i], key[i], url[i])
Maybe from this answer you'll get the idea of what I want to achieve since the post was kind of confusing since I can't really express what I want
I've got a list of IDs which I want to pass through the URLs to collect the data on the comments. But I'm kinda of newb and when I'm trying to iterate over the list, I'm getting only one url and consequently data for one comment. Can someone, please, explain me what's wrong with my code and how to get URLs for all IDs in a list and consequently collect the data for all comments?
comments_from_reddit = ['fkkmga7', 'fkkgxtj', 'fkklfx3', ...]
def getPushshiftData():
for ID in range(len(comments_from_reddit)):
url = 'https://api.pushshift.io/reddit/comment/search?ids={}'.format(comments_from_reddit[ID])
print(url)
req = requests.get(url)
data = json.loads(req.text)
return data['data']
data = getPushshiftData()
Output I'm getting: https://api.pushshift.io/reddit/comment/search?ids=fkkmga7
I will really appreciate any help on my issue. Thanks for your attention.
This should work:
comments_from_reddit = ['fkkmga7', 'fkkgxtj', 'fkklfx3', ...]
def getPushshiftData():
result = list()
for ID in range(len(comments_from_reddit)):
url = 'https://api.pushshift.io/reddit/comment/search?ids={}'.format(comments_from_reddit[ID])
print(url)
req = requests.get(url)
data = json.loads(req.text)
result.append( data['data'] )
return result
data = getPushshiftData()
Summary: I want to web scrape subreddit and then turn data into data-frames. I know how to do them individually. But I am stuck with using a function.
Here is how I do it one by one.
url = 'https://api.pushshift.io/reddit/search/submission'
params3 = {'subreddit':'Apple', 'size': 500,'before':1579411194}
res3 = requests.get(url, params3)
data = res3.json()
post3 = data['data']
apdf3 = pd.DataFrame(post3)
Here is the function I came up with so far:
url = 'https://api.pushshift.io/reddit/search/submission'
def webscrape (subreddit, size,):
for i in range(1, 11):
params = {"subreddit":subreddit, 'size':size, 'before': f'post{i}'[-1]['created_utc']}
res = requests.get(url, params)
f'data{i}' = res.json()
f'post{i}' = data[f'data{i}']
f'ap_df{i}' = pd.DataFrame(f'post{i})
My problem is that my first parameter doesn't need 'before'. But after the 'post' is created I need to use 'before' in order for me to get all the posts that are earlier than the last post from the previous action. How do I reconcile this conflict?
Many thanks!
What you are asking for is doable, but I don't think f-strings will work here. The code below attaches each dataframe to a dictionary of dataframes. Try it and see if it works:
d = {}
url = 'https://api.pushshift.io/reddit/search/submission'
def webscraper (subreddit, size,):
bef = 0
for i in range(1, 11):
if i==1:
params = {"subreddit":subreddit, 'size':size}
else:
params = {"subreddit":subreddit, 'size':size, 'before': bef}
res = requests.get(url, params)
data = res.json()
dat = data['data']
bef = dat[-1]['created_utc']
df_name = subreddit+str(i)
d[df_name] = pd.DataFrame(dat)
I'm trying to parse Oxford Dictionary in order to obtain the etymology of a given word.
class SkipException (Exception):
def __init__(self, value):
self.value = value
try:
doc = lxml.html.parse(urlopen('https://en.oxforddictionaries.com/definition/%s' % "good"))
except SkipException:
doc = ''
if doc:
table = []
trs = doc.xpath("//div[1]/div[2]/div/div/div/div[1]/section[5]/div/p")
I cannot seem to work out how to obtain the string of text I need. I know I lack some lines of code in the ones I have copied but I don't know how HTML nor LXML fully works. I would much appreciate if someone could provide me with the correct way to solve this.
You don't want to do web scraping, and especially when probably every dictionary has an API interface. In the case of Oxford create an account at https://developer.oxforddictionaries.com/. Get the API credentials from your account and do something like this:
import requests
import json
api_base = 'https://od-api.oxforddictionaries.com:443/api/v1/entries/{}/{}'
language = 'en'
word = 'parachute'
headers = {
'app_id': '',
'app_key': ''
}
url = api_base.format(language, word)
reply = requests.get(url, headers=headers)
if reply.ok:
reply_dict = json.loads(reply.text)
results = reply_dict.get('results')
if results:
headword = results[0]
entries = headword.get('lexicalEntries')[0].get('entries')
if entries:
entry = entries[0]
senses = entry.get('senses')
if senses:
sense = senses[0]
print(sense.get('short_definitions'))
Here's a sample to get you started scraping Oxford dictionary pages:
import lxml.html as lh
from urllib.request import urlopen
url = 'https://en.oxforddictionaries.com/definition/parachute'
html = urlopen(url)
root = lh.parse(html)
body = root.find("body")
elements = body.xpath("//span[#class='ind']")
for element in elements:
print(element.text)
To find the correct search string you need to format the html so you can see the structure. I used the html formatter at https://www.freeformatter.com/html-formatter.html. Looking at the formatted HTML, I could see the definitions were in the span elements with the 'ind' class attribute.
I am trying to get a certain value in a string of json but I can't figure out how exactly to do it. I don't want to convert it into a string and strip / replace the unwanted pieces because then I won't be able to get the other values. My current code is:
username = "Dextication"
url = f"https://minecraft-statistic.net/api/player/info/{username}/"
response = requests.get(url)
json_data = json.loads(response.text)
print(json_data)
Edit:
when I run this, json.data = "{"status":"ok","data":{"online":0,"total_time_play":46990,"last_play":1513960562,"license":1,"name":"Dextication","uuid":"74d57a754855410c90b3d51bc99b8beb"}}"
I would like to only print the value: 46990
Try below code
import json, requests
username = "Dextication"
url = f"https://minecraft-statistic.net/api/player/info/{username}/"
response = requests.get(url)
json_data = json.loads(response.text)
result = json_data['data']['total_time_play']
print (result)