Scraping Google News with pygooglenews - python

I am trying to do scraping from Google News with pygooglenews.
I am trying to scrape more than 100 articles at a time (as google sets limit at 100) by changing the target dates using for loop. The below is what I have so far but I keep getting error message
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-84-4ada7169ebe7> in <module>
----> 1 df = pd.DataFrame(get_news('Banana'))
2 writer = pd.ExcelWriter('My Result.xlsx', engine='xlsxwriter')
3 df.to_excel(writer, sheet_name='Results', index=False)
4 writer.save()
<ipython-input-79-c5266f97934d> in get_titles(search)
9
10 for date in date_list[:-1]:
---> 11 search = gn.search(search, from_=date, to_=date_list[date_list.index(date)])
12 newsitem = search['entries']
13
~\AppData\Roaming\Python\Python37\site-packages\pygooglenews\__init__.py in search(self, query, helper, when, from_, to_, proxies, scraping_bee)
140 if from_ and not when:
141 from_ = self.__from_to_helper(validate=from_)
--> 142 query += ' after:' + from_
143
144 if to_ and not when:
TypeError: unsupported operand type(s) for +=: 'dict' and 'str'
import pandas as pd
from pygooglenews import GoogleNews
import datetime
gn = GoogleNews()
def get_news(search):
stories = []
start_date = datetime.date(2021,3,1)
end_date = datetime.date(2021,3,5)
delta = datetime.timedelta(days=1)
date_list = pd.date_range(start_date, end_date).tolist()
for date in date_list[:-1]:
search = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
newsitem = search['entries']
for item in newsitem:
story = {
'title':item.title,
'link':item.link,
'published':item.published
}
stories.append(story)
return stories
df = pd.DataFrame(get_news('Banana'))
Thank you in advance.

It looks like you are correctly passing in a string into get_news() which is then passed on as the first argument (search) into gn.search().
However, you're reassigning search to the result of gn.search() in the line:
search = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
# ^^^^^^
# gets overwritten with the result of gn.search()
In the next iteration this reassigned search is passed into gn.search() which it doesn't like.
If you look at the code in pygooglenews, it looks like gn.search() is returning a dict which would explain the error.
To fix this, simply use a different variable, e.g.:
result = gn.search(search, from_=date.strftime('%Y-%m-%d'), to_=(date+delta).strftime('%Y-%m-%d'))
newsitem = result['entries']

I know that pygooglenews has a limit of 100 articles, so you must to make a loop in which it will scrape every day separately.

Related

Looping a Bloomberg function over a list of tickers

I would like to loop a Bloomberg IntraDayBar request over a dynamic list of 22 tickers and then combined the result into one dataframe:
This code generates the following list of tickers:
bquery = blp.BlpQuery().start()
dates = pd.bdate_range(end='today', periods=31)
time = datetime.datetime.now()
bcom_info = bquery.bds("BCOM Index", "INDX_MEMBERS")
bcom_info['ticker'] = bcom_info['Member Ticker and Exchange Code'].astype(str) + ' Comdty'
I would like to create a dataframe that returns the volume for each ticker, contained in the 'TRADE' event_type. Effectively looping the below code over each of the tickers in bcom_info.
bquery.bdib(bcom_info['ticker'], event_type='TRADE', interval=60, start_datetime=dates[0], end_datetime=time)
I tried this but couldn't get it to work:
def bloom_func(x, func):
bloomberg = bquery
return bquery.bdib(x, func, event_type='TRADE', interval=60, start_datetime=dates[0], end_datetime=time)
for d in bcom_info['ticker']:
x[d] = bloom_func(d)
It generates the following error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-80-fcbf4acd6840> in <module>
2
3 for d in tickers:
----> 4 x[d] = bloom_func(d)
TypeError: bloom_func() missing 1 required positional argument: 'func'

I am getting a "'NoneType' object is not subscriptable" when trying to bring in data from a URL

Here is my code:
#Import libraries
import os
import pandas as pd
import requests
import matplotlib.pyplot as plt
import numpy as np
from datetime import date
import matplotlib.ticker as ticker
# API Key from EIA
api_key = 'blah blah'
# api_key = os.getenv("EIA_API_KEY")
# PADD Names to Label Columns
# Change to whatever column labels you want to use.
PADD_NAMES = ['PADD 1','PADD 2','PADD 3','PADD 4','PADD 5']
# Enter all your Series IDs here separated by commas
PADD_KEY = ['PET.MCRRIP12.M',
'PET.MCRRIP22.M',
'PET.MCRRIP32.M',
'PET.MCRRIP42.M',
'PET.MCRRIP52.M']
# Initialize list - this is the final list that you will store all the data from the json pull. Then you will use this list to concat into a pandas dataframe.
final_data = []
# Choose start and end dates
startDate = '2009-01-01'
endDate = '2021-01-01'
# Pull in data via EIA API
for i in range(len(PADD_KEY)):
url = 'http://api.eia.gov/series/?api_key=' + api_key + PADD_KEY[i]
r = requests.get(url)
json_data = r.json()
if r.status_code == 200:
print('Success!')
else:
print('Error')
df = pd.DataFrame(json_data.get('series')[0].get('data'),
columns = ['Date', PADD_NAMES[i]])
df.set_index('Date', drop=True, inplace=True)
final_data.append(df)
Here is my error:
TypeError Traceback (most recent call last)
<ipython-input-38-4de082165a0d> in <module>
10 print('Error')
11
---> 12 df = pd.DataFrame(json_data.get('series')[0].get('data'),
13 columns = ['Date', PADD_NAMES[i]])
14 df.set_index('Date', drop=True, inplace=True)
TypeError: 'NoneType' object is not subscriptable
'NoneType' object is not subscriptable comes when you try to find value in a none object like df["key"] where df is None.
Do you have PADD_NAMES defined somewhere in your code. For me the error looks like the issue of your json data. have you tried printing your json data?
The API you are calling requires HTTPS protocol to access, try to change "http" to "https"
https://api.eia.gov/series/?api_key=
Consider adding some debug output to check for other errors, by changing if...else block like this
if r.status_code == 200:
print('Success!')
else:
print('Error')
print(json_data)

Getting KeyError: 'viewCount' for using Youtube API in Python

I'm trying to get the view count for a list of videos from a channel. I've written a function and when I try to run it with just 'video_id', 'title' & 'published date' I get the output. However, when I want the view count or anything from statistics part of API, then it is giving a Key Error.
Here's the code:
def get_video_details(youtube, video_ids):
all_video_stats = []
for i in range(0, len(video_ids), 50):
request = youtube.videos().list(
part='snippet,statistics',
id = ','.join(video_ids[i:i+50]))
response = request.execute()
for video in response['items']:
video_stats = dict(
Video_id = video['id'],
Title = video['snippet']['title'],
Published_date = video['snippet']['publishedAt'],
Views = video['statistics']['viewCount'])
all_video_stats.append(video_stats)
return all_video_stats
get_video_details(youtube, video_ids)
And this is the error message:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_18748/3337790216.py in <module>
----> 1 get_video_details(youtube, video_ids)
~\AppData\Local\Temp/ipykernel_18748/1715852978.py in get_video_details(youtube, video_ids)
14 Title = video['snippet']['title'],
15 Published_date = video['snippet']['publishedAt'],
---> 16 Views = video['statistics']['viewCount'])
17
18 all_video_stats.append(video_stats)
KeyError: 'viewCount'
I was referencing this Youtube video to write my code.
Thanks in advance.
I got it.
I had to use .get() to avoid the KeyErrors. It will return None for KeyErrors.
Replaced this code to get the solution.
Views = video['statistics'].get('viewCount')

IMDbPY handling None object

I'm trying to pull data about cinematographers from IMDbPY and i'm encountering a null object. I'm not sure how to deal with that None object in the code. Could someone help me out please?
here's where I have reached.
from imdb import IMDb, IMDbError
ia = IMDb()
itemdop = ''
doplist = []
items = ["0050083", "6273736", "2582802"]
def imdblistdop(myList=[], *args):
for x in myList:
movie = ia.get_movie(x)
cinematographer = movie.get('cinematographers')[0]
cinematographer2 = movie.get('cinematographers')
print(cinematographer)
print(doplist)
try:
itemdop = cinematographer['name']
doplist.append(itemdop)
except KeyError as ke:
print('Nope!')
imdblistdop(items)
The code is not working at all and all i get is this:
Boris Kaufman
[]
TypeError Traceback (most recent call last)
in ()
21
22
---> 23 imdblistdop(items)
24
25
in imdblistdop(myList, *args)
10 for x in myList:
11 movie = ia.get_movie(x)
---> 12 cinematographer = movie.get('cinematographers')[0]
13 cinematographer2 = movie.get('cinematographers')
14 print(cinematographer)
TypeError: 'NoneType' object is not subscriptable
cinematographer is a list. It means that you can point to an an entry in the list using its index. Example: cinematographer[2]. You can not use a string to point to an entry in the list.

python Msql connection with fetchone gives error

I've got a weird problem. I've made a vba code in excel, that calls a python code that get information from the excel sheet and put it into a database. Yesterday there was no problem. Today I start my computer and tried the vba code and it errors in the python file.
The error:
testchipnr = TC104
Traceback (most recent call last):
testchipID = 108
File "S:/3 - Technical/13 - Reports & Templates/13 - Description/DescriptionToDatabase/DescriptionToDatabase.py", line 40, in <module>
TestchipID = cursorOpenShark.fetchone()[0] # Fetch a single row using fetchone() method and store the result in a variable., the [0] fetches only 1 value
TypeError: 'NoneType' object has no attribute '__getitem__'
The weird thing is that there is a value in the database -> testchipID ...
My code:
#get the testchipID and the testchipname
testchipNr = sheet.cell(7, 0).value # Get the testchipnr
print "testchipnr = ", testchipNr
queryTestchipID = """SELECT testchipid FROM testchip WHERE nr = '%s'""" %(testchipNr)
cursorOpenShark.execute(queryTestchipID)
print "testchipID = ", cursorOpenShark.fetchone()[0]
TestchipID = cursorOpenShark.fetchone()[0] # Fetch a single row using fetchone() method and store the result in a variable., the [0] fetches only 1 value
As you are saying, it was working fine but now it is not, the only reason that I can think of is when your query is returning Zero results.
Looking at your code, if your query is returning only one result, then it will print the result and the next time you are making the call to cursorOpenShark.fetchone() to store in TestchipID it would return None.
Instead, can you try the following code
#get the testchipID and the testchipname
testchipNr = sheet.cell(7, 0).value # Get the testchipnr
print "testchipnr = ", testchipNr
queryTestchipID = """SELECT testchipid FROM testchip WHERE nr = '%s'""" %(testchipNr)
cursorOpenShark.execute(queryTestchipID)
TestchipID = cursorOpenShark.fetchone()[0] # Fetch a single row using fetchone() method and store the result in a variable., the [0] fetches only 1 value
if TestchipID:
print "testchipID = ", TestchipID[0]
else:
print "Returned nothing"
Let me know if this works.

Categories

Resources