Google Trends Category Search - python

I'm trying to extract/download Google Trends Series Data by category and/or subcategory with Python based on this list in the following link: https://github.com/pat310/google-trends-api/wiki/Google-Trends-Categories
This list of categories contains codes that are used in the (unofficial) API of Google Trends, named pytrends.
However, I'm not able to search only by category because it is required to give a keyword/search term. In the case below, we have category 47 (Autos & Vehicles) and keywords ['BMW', 'Peugeot'].
import pytrends
from pytrends.request import TrendReq
pytrend = TrendReq()
pytrend = TrendReq(hl='en-US', tz=360)
keywords = ['BMW', 'Peugeot']
pytrend.build_payload(
kw_list=keywords,
cat=47,
timeframe='today 3-m',
geo='FR',
gprop='')
data = pytrend.interest_over_time()
data= data.drop(labels=['isPartial'],axis='columns')
image = data.plot(title = 'BMW V.S. Peugeot in last 3 months on Google Trends ')
fig = image.get_figure()
I found this as a possible solution, but I haven't tried because it's in R:
https://github.com/PMassicotte/gtrendsR/issues/89
I don't know if there is an API that would give this possibility to extract series by category and ignoring keyword/search term. Let me know if it exists. I believe an option would be to download directly from Google Trends website and filling up just the category field, like this example where we can see the series for category "Autos & Vehicles":
https://trends.google.com/trends/explore?cat=47&date=all&geo=SG

You may search via category with empty string in the kw_list array:
keywords = ['']
pytrend.build_payload(kw_list=[''], cat=47,
timeframe='today 3-m', geo='FR', gprop='')
data = pytrend.interest_over_time()

Related

How do I implement two iterable columns in a pandas DataFrame?

I'm brand new to programming, but got sucked into a school project that feels pretty far over my head. I'm scraping Twitter data related to BTC, and trying to find the average sentiment score per month. I got access to Twitter API, and all that, but I'm having troubles creating my dataframe. I can create one column that houses either "tweet.text" or "tweet.created_at", but can't seem to make two columns with each.
Here's what I have so far:
search_words = "bitcoin"
date_since = "2022-2-1"
tweets = tw.Cursor(api.search,
q = search_words,
since = date_since,
lang = "en",
count = 1000).items(1000)
a = {'Tweets':[tweet.text for tweet in tweets], 'Date':[tweet.created_at for tweet in tweets]}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()
df
The result:
How can I generate the desired output?

how to do a conditional assignment to a column in python?

I have a Regex function in Google Data Studio Dashboard that creates a "Channel" column in the dataframe.
CASE
WHEN REGEXP_MATCH(business_partner, ".*Accounting.*|.*Ecosystem.*|.*Platform.*|.*Agency.*") THEN "Partner"
WHEN REGEXP_MATCH(utm_source, '.*Facebook.*') THEN "Facebook"
WHEN REGEXP_MATCH(utm_source, '.*Google*') AND NOT REGEXP_MATCH(utm_campaign,".*branding.*") THEN "Google"
WHEN REGEXP_MATCH(utm_campaign,".*branding.*") THEN "Branding"
ELSE "Others"
END
How can I replicate this code in python? something like df['channel'] = ...
channel
Facebook
Google
Partner
I did a lot of research on the internet, but didn't find anything very conclusive.
Here a sample of data:
utm_source
utm_campaign
business_partner
facebook
conversion
Google
Search
Google
Branding
Direct
Agency
facebook
traffic
Google
Display
This is an easy, straightforward solution using np.select (documentation):
import pandas as pd
import numpy as np
import io
data_string = io.StringIO("""utm_source utm_campaign business_partner
facebook conversion
Google Search
Google Branding
Direct Agency
facebook traffic
Google Display """)
df = pd.read_table(data_string, sep='\t')
conditions = [(df['utm_source'].str.lower() == 'facebook'),
(df['utm_source'].str.lower() == 'google'),
(df['utm_source'].str.lower() == 'partner')]
channels = ['Facebook', 'Google', 'Partner']
df['channel'] = np.select(conditions, channels, default=np.nan)
df['channel'] = np.where(df['utm_campaign'] == 'Branding', 'Branding', df['channel'])

How can I load and feed data from multiple excel/csv sheets into the mutliselect function? - Streamlit

Context
In my web-app, I want my users to select different data categories and upon selecting this data category, it will load the data from its respective csv into the mutliselect function. The below demonstrates this:
Example
So upon selection of social and Economy, load the respective data/columns from their separate excel sheets. Though I am lost on how to do this exactly.
I initially tried a for loop but it ended up creating many different multiselect columns. But this is not what I wanted. I just want the capability to load data from ‘economy’ if I click economy and if I select both ‘social’ and ‘economy’ that both datasets load. They are stored in different excel sheets.
My code as of now:
if choose_theme_category == 'Country specific':
st.write("Choose data unique to a country for this theme.")
Country_choice = st.beta_columns(3)
countries = ['','United Kingdom', 'Ireland', 'Germany']
Country = Country_choice[0].selectbox("Country", countries)
if Country == 'United Kingdom':
# Category of data
Category = st.beta_columns(4)
Data_category = Category[0].checkbox("Social")
Data_category1 = Category[1].checkbox("Economy")
Data_category2 = Category[2].checkbox("Environment")
Data_category3 = Category[3].checkbox("Health")
Data_category4 = Category[0].checkbox("Policy")
# categories = [Data_category,Data_category1,Data_category2,Data_category3,Data_category4]
data_mix_buttons = st.beta_columns([3,1,1])
# confirmation_data = data_mix_buttons[0].button("Show Data")
Hide_data = data_mix_buttons[2].checkbox("Hide Data")
# if hide data is not pressed, then show data
if not Hide_data:
# for every selection chosen, add data from their respective excel sheets
if Data_category:
Select_data = st.multiselect("Choose columns", options=COVID_19_cols,key=1)
Social_data = COVID_19_data[Select_data]
if not Select_data:
st.error("Please select at least one column.")
else:
st.write(Social_data)
Is this possible? Any suggestions?
Solution is detailed here in the streamlit discussion website.

How to filter out Column data From Multiple rows data?

Good Evening
Hi everyone, so i got the following JSON file from Walmart regarding their product items and price.
so i loaded up jupyter notebook, imported pandas and then loaded it into a Dataframe with custom columns as shown in the pics below.
now this is what i want to do:
make new columns named as min price and max price and load the data into it
how can i do that ?
Here is the code in jupyter notebook for reference.
i also want the offer price as some items dont have minprice and maxprice :)
EDIT: here is the PYTHON Code:
import json
import pandas as pd
with open("walmart.json") as f:
data = json.load(f)
walmart = data["items"]
wdf = pd.DataFrame(walmart,columns=["productId","primaryOffer"])
print(wdf.loc[0,"primaryOffer"])
pd.set_option('display.max_colwidth', None)
print(wdf)
Here is the JSON File:
https://pastebin.com/sLGCFCDC
The following code snippet on top of your code would achieve the required task:
min_prices = []
max_prices = []
offer_prices = []
for i,row in wdf.iterrows():
if('showMinMaxPrice' in row['primaryOffer']):
min_prices.append(row['primaryOffer']['minPrice'])
max_prices.append(row['primaryOffer']['maxPrice'])
offer_prices.append('N/A')
else:
min_prices.append('N/A')
max_prices.append('N/A')
offer_prices.append(row['primaryOffer']['offerPrice'])
wdf['minPrice'] = min_prices
wdf['maxPrice'] = max_prices
wdf['offerPrice'] = offer_prices
Here we are checking for the 'showMinMaxPrice' element from the json in the column named 'primaryOffer'. For cases where the minPrice and maxPrice is available, the offerPrice is shown as 'N/A' and vice-versa. These are first stored in lists and later added to the dataframe as columns.
The output for wdf.head() would then be:

Get closing stock prices for yesterday for multiple stocks

I'm trying to get yesterday's closing stock prices for all stocks listed on the BSE using https://www.quandl.com/data/BSE-Bombay-Stock-Exchange as the data source. I have a list of company codes that I can use to pull that data but I need to figure out how to iterate over it correctly.
According to the quandl documentation, I can use quandl.get('BSE/BOM500002', column_index = '4', start_date='2019-03-19', end_date='2019-03-20') to get yesterday's closing price for a stock where BOM500002 would be the company code. If my list of company codes is listed in companyCodes['code'], could you help me figure out how to generate the company code dynamically to get yesterday's closing stock prices for all stocks listed on this exchange?
Bonus Question: How would I list the name of the stock next to the closing price?
Here is way to get date name together with the results:
import quandl
df = pd.DataFrame([("BOM500002", "ABB India Limited"),("BOM500003", "AEGIS")], columns=["Code", "Name"])
results = []
for i, r in df.iterrows():
result = quandl.get('BSE/'+r["Code"], column_index = '4', start_date='2019-03-19', end_date='2019-03-20')
result["Name"] = r["Name"]
results.append(result)
final = pd.concat(results)
Give this a try.
import quandl
quandl.ApiConfig.api_key = 'your quandl code'
stocks = [
'BSE/BOM533171',
'BSE/BOM500002'
]
mydata = quandl.get(stocks, start_date = '2019-03-19', end_date='2019-03-21')
mydata.loc[:,(mydata.columns.str.contains('Close'))].T

Categories

Resources