Specify amount of Tweets Tweepy returns - python

I am building a word cloud from public Tweets. I have connected to the API via Tweepy and have successfully gotten it to return Tweets related to my search term, but for some reason can only get it to return 15 Tweets.
import pandas as pd
# subject of word cloud
search_term = 'ENTER SEARCH TERM HERE'
# creating dataframe containing the username and corresponding tweet content relating to our search term
df = pd.DataFrame(
[tweet.user.id, tweet.user.name, tweet.text] for tweet in api.search(q=search_term, lang="en")
)
# renaming columns of data frame
df.rename(columns={0 : 'user id'}, inplace=True)
df.rename(columns={1 : 'screen name'}, inplace=True)
df.rename(columns={2 : 'text'}, inplace=True)
df

By default, the standard search API that API.search uses returns up to 15 Tweets per page.
You need to specify the count parameter, up to a maximum of 100, if you want to retrieve more per request.
If you want more than 100 or a guaranteed amount, you'll need to look into paginating using tweepy.Cursor.

Related

How do I implement two iterable columns in a pandas DataFrame?

I'm brand new to programming, but got sucked into a school project that feels pretty far over my head. I'm scraping Twitter data related to BTC, and trying to find the average sentiment score per month. I got access to Twitter API, and all that, but I'm having troubles creating my dataframe. I can create one column that houses either "tweet.text" or "tweet.created_at", but can't seem to make two columns with each.
Here's what I have so far:
search_words = "bitcoin"
date_since = "2022-2-1"
tweets = tw.Cursor(api.search,
q = search_words,
since = date_since,
lang = "en",
count = 1000).items(1000)
a = {'Tweets':[tweet.text for tweet in tweets], 'Date':[tweet.created_at for tweet in tweets]}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()
df
The result:
How can I generate the desired output?

Pytrends - Interest over time - return column with None when there is no data

Pytrends for Google Trends data does not return a column if there is no data for a search parameter on a specific region.
The code below is from pytrends.request
def interest_over_time(self):
"""Request data from Google's Interest Over Time section and return a dataframe"""
over_time_payload = {
# convert to string as requests will mangle
'req': json.dumps(self.interest_over_time_widget['request']),
'token': self.interest_over_time_widget['token'],
'tz': self.tz
}
# make the request and parse the returned json
req_json = self._get_data(
url=TrendReq.INTEREST_OVER_TIME_URL,
method=TrendReq.GET_METHOD,
trim_chars=5,
params=over_time_payload,
)
df = pd.DataFrame(req_json['default']['timelineData'])
if (df.empty):
return df
df['date'] = pd.to_datetime(df['time'].astype(dtype='float64'),
unit='s')
df = df.set_index(['date']).sort_index()
From the code above, if there is no data, it just returns df, which will be empty.
My question is, how can I make it return a column with "No data" on every line and the search term as header, so that I can clearly see for which search terms there is no data?
Thank you.
I hit this problem, then I hit this web page. My solution was to ask Google trends for data on a search item it would have data for, then rename the column and 0 the data.
I used the ".drop" method to get rid of the "isPartial" column and the ".rename" method to change the column name. To zero the data in the column, I did the following, I created a function:
#Make value zero
def MakeZero(x):
return x *0
Then using the ".apply" method on the dataframe to 0 the column.
ThisYrRslt=BlankResult.apply(MakeZero)
: ) But the question is, what search term do you ask google trends about that will always return a value? I chose "Google". : )
I'm sure you can think of some better ones, but it's hard to leave those words in commercial code.

Google Trends Category Search

I'm trying to extract/download Google Trends Series Data by category and/or subcategory with Python based on this list in the following link: https://github.com/pat310/google-trends-api/wiki/Google-Trends-Categories
This list of categories contains codes that are used in the (unofficial) API of Google Trends, named pytrends.
However, I'm not able to search only by category because it is required to give a keyword/search term. In the case below, we have category 47 (Autos & Vehicles) and keywords ['BMW', 'Peugeot'].
import pytrends
from pytrends.request import TrendReq
pytrend = TrendReq()
pytrend = TrendReq(hl='en-US', tz=360)
keywords = ['BMW', 'Peugeot']
pytrend.build_payload(
kw_list=keywords,
cat=47,
timeframe='today 3-m',
geo='FR',
gprop='')
data = pytrend.interest_over_time()
data= data.drop(labels=['isPartial'],axis='columns')
image = data.plot(title = 'BMW V.S. Peugeot in last 3 months on Google Trends ')
fig = image.get_figure()
I found this as a possible solution, but I haven't tried because it's in R:
https://github.com/PMassicotte/gtrendsR/issues/89
I don't know if there is an API that would give this possibility to extract series by category and ignoring keyword/search term. Let me know if it exists. I believe an option would be to download directly from Google Trends website and filling up just the category field, like this example where we can see the series for category "Autos & Vehicles":
https://trends.google.com/trends/explore?cat=47&date=all&geo=SG
You may search via category with empty string in the kw_list array:
keywords = ['']
pytrend.build_payload(kw_list=[''], cat=47,
timeframe='today 3-m', geo='FR', gprop='')
data = pytrend.interest_over_time()

import all rows from dataset using SODA API Python

I'm trying to import the following dataset and store it in a pandas dataframe: https://data.nasa.gov/Space-Science/Meteorite-Landings/gh4g-9sfh/data
I use the following code:
r = requests.get('https://data.nasa.gov/resource/gh4g-9sfh.json')
meteor_data = r.json()
df = pd.DataFrame(meteor_data)
print(df.shape)
The resulting dataframe only has 1000 rows. I need it to have all 45,716 rows. How do I do this?
Check out the docs on the $limit parameter
The $limit parameter controls the total number of rows returned, and
it defaults to 1,000 records per request.
Note: The maximum value for $limit is 50,000 records, and if you
exceed that limit you'll get a 400 Bad Request response.
So you're just getting the default number of records back.
You will not be able to get more than 50,000 records in a single API call - this will take multiple calls using $limit together with $offset
Try:
https://data.nasa.gov/resource/gh4g-9sfh.json$limit=50000
See Why am I limited to 1,000 rows on SODA API when I have an App Key
DO LIKE This ans set limit
import pandas as pd
from sodapy import Socrata
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.nasa.gov", None)
# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.nasa.gov,
# MyAppToken,
# userame="user#example.com",
# password="AFakePassword")
# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("gh4g-9sfh", limit=2000)
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

Filter Multi-Channel Funnels (MFC) interactionType on campaignPath through Google Analytics API

I'm doing a MFC query through Google Analytics python API like:
query = service.data().mcf().get(
ids = "ga:79961457",
start_date = day,
end_date = day,
metrics = metric,
dimensions = "mcf:conversionDate,mcf:campaignPath",
sort = None,
filters = "mcf:campaignPath=#RTG,mcf:conversionType==Transaction",
max_results = 1000000000
).execute()
This yields a list with rows similar to this:
{u'primitiveValue': u'20160701'},
{u'conversionPathValue':
[{u'nodeValue': u'{{soporte}}', u'interactionType': u'CLICK'},
{u'nodeValue': u'RTG',
u'interactionType': u'IMPRESSION'},
{u'nodeValue': u'CAMP',
u'interactionType': u'CLICK'}],
{u'primitiveValue': u'1'},FBRTG
Now, the filter I applied is
mcf:campaignPath=#RTG
And it filters throught all "nodeValue".
The thing is that I want to get only the rows that contains
{u'nodeValue': u'RTG',
u'interactionType': u'CLICK'},
(for example)
So, the problem is: how can i apply a filter to the interactionType, not only the campaignPath?

Categories

Resources