Create dynamic lists based on IDs and JSON responses - python

I am a beginner and I am trying to teach myself Python by using topics that are interesting to me and where I can at the same time challenge myself. I am currently struggling with a generic logical problem.
I would like to consume the CoinGecko API by using the following endpoint:
https://api.coingecko.com/api/v3/coins/bitcoin/market_chart?vs_currency=usd&days=max&interval=daily
I would like to replace "bitcoin" with a dynamic variable that refers to a list that I already have. (bitcoin, ethereum, fantom, avalanche-2)
Therefore I use the following code in combination with for i in list with a counter:
counter = 0
CoinDatabase = []
for i in CoinIDList:
if counter >2:
break
else:
r = requests.get (f"https://api.coingecko.com/api/v3/coins/{i}/market_chart?vs_currency=usd&days=max&interval=daily")
data = r.json()
prices = data["prices"]
market_caps = data["market_caps"]
total_volumes = data["total_volumes"]
for i in range (len(CoinDatabase)):
prices = data["prices"]
CoinDatabase.append(prices)
counter = counter+1
What is the smartest way to match the ID that is used in the "i" with the JSON response that I receive in every loop? Otherwise I get tons of JSONs without having the reference to what coin it belongs to.
My long term objective is to setup a small database with different values and maximum historical data for prices, market_caps, total_volumes, timestamp - just as a Python learning exercise.
Thanks in advance!

Related

Python, Tweepy --- struggling with getting tweets filtered on certain criteria

I am struggling with the following, any help would be highly appreciated. The path I chose to solve the problem might be clunky, even outdated, but it is the best I could do. So, I am trying to get recent tweets BASED on a query and ONLY from the people I follow on Twitter. So I ran two different queries:
1)
followers = client.get_users_following(id = '', max_results = 100)
and 2)
tweets = client.search_recent_tweets(query=query, tweet_fields=['author_id', 'created_at'], max_results=100)
I managed to get the responses into json objects, then normalise and at the end I get two dataframes:
A) a dataframe df['id']--where the 'id' is the unique username of the Twitter user, result of the first query("get_users_following"); here I converted the 'id' type from "object"to "int"
B) a dataframe with the following columns ['author_id'], ['text'], ['created_at'], ['id'] ---where 'author_id' is the unique username of the Twitter user, the same as the 'id' from the previous dataframe
All good until the point where I am trying to iterate through the 'author_id' to see if it matches my list of the 'id's of the people I follow and whenever it does, I would like to add the text of that particular tweet to a list and start analysing the data.
The code I am struggling with is below and the thing is that somehow that the error I get is in fact an empty list.
all = []
x =len(df['id'])
for number in twe['author_id']:
for j in range(x):
if number == df['id'][j]:
all.append(twe['text'][number])
else:
j+=1
print(all)
print(len(all))
I checked and there were people that I follow that were tweeting on a particular topic or another.
Any thoughts would be highly appreciated.
LATER EDIT:
In the meanwhile I worked a bit more on the for loop, but still the same empty list as a result.
al = []
print(f1.shape)
x =len(f1['id'])
print(x)
y = len(twe['text'])
print(y)
i = 0
j = 0
for (i,j) in [(i,j) for i in range(x) for j in range(y)]:
if f1['id'][i] == twe['author_id'][j]:
al.append(['text'][j])
else:
if j<y:
j+=1
else:
i+=1
print(len(al))

For loop for reduce duplication?

I've created the following code, that pulls Cryptocurrency prices from the CoinGecko api and parses the bits I need in JSON
btc = requests.get("https://api.coingecko.com/api/v3/coins/bitcoin")
btc.raise_for_status()
jsonResponse = btc.json() # print(response.json()) for debug
btc_marketcap=(jsonResponse["market_data"]["market_cap"]["usd"])
This works fine, except I then need to duplicate the above 4 lines for every currency which is getting long/messy & repetitive.
After researching I felt an approach was to store the coins in an array, and loop through the array replacing bitcoin in the above example with each item from the array.
symbols = ["bitcoin", "ethereum", "sushi", "uniswap"]
for x in symbols:
print(x)
This works as expected, but I'm having issues substituting bitcoin/btc for x successfully.
Any pointers appreciated, and whether this is the best approach for what I am trying to achieve
Something like this could work. Basically, just put the repeated part inside a function and call it with the changing arguments (currency). The substitution of the currency can be done for example with f-strings:
def get_data(currency):
btc = requests.get(f"https://api.coingecko.com/api/v3/coins/{currency}")
btc.raise_for_status()
return btc.json()["market_data"]["market_cap"]["usd"]
for currency in ["bitcoin", "ethereum", "sushi", "uniswap"]:
print(get_data(currency))

How to increment a variable within a for loop in python

Im building a google sheet to keep track of stock prices for the stocks i own. I have an API running thats connected to Google Sheets and my own python application.
My google sheet looks like this
Stock | Previous close
AAPL | 316.73
NVDA | 348.71
SPOT | 191.00
i currently have the code running as follows.
import requests
import gspread
from oauth2client.service_account import ServiceAccountCredentials
sheet = client.open("Stock").sheet1
AAPL = sheet.cell(2,1).value
url = ('https://ca.finance.yahoo.com/quote/'+AAPL+'?p='+AAPL+'&.tsrc=fin-srch')
response = requests.get(url)
htmltext = response.text
splitlist = htmltext.split("Previous Close")
afterfirstsplit =splitlist[1].split("\">")[2]
aftersecondsplit = afterfirstsplit.split("</span>")
datavalue = aftersecondsplit[0]
sheet.update_cell(2,2,datavalue)
# this would update the value within my google sheet to the previous close price
For each individual stock, i would copy and paste, change the stock symbol, to find the value of the next quote.
I know theres a way to use FOR statements to automate this process. I tried that with the following but it wouldnt update as needed. I reached a wall at this point and would appreciate any help or insight on how i could automate this function.
tickers = {sheet.cell(2,1).value : [],
sheet.cell(3,1).value : [],
sheet.cell(4,1).value : [],
sheet.cell(5,1).value :[]}
for symbols in tickers:
url = ('https://ca.finance.yahoo.com/quote/'+symbols+'?p='+symbols+'&.tsrc=fin-srch')
response = requests.get(url)
htmltext = response.text
splitlist = htmltext.split("Previous Close")
afterfirstsplit =splitlist[1].split("\">")[2]
aftersecondsplit = afterfirstsplit.split("</span>")
datavalue = aftersecondsplit[0]
sheet.update.cell(2,1,datavalue)
print (datavalue)
Doing this gathers all the values of the current stock prices and it does import it into the excel file but only to one coordinate. I dont know how to increase the '1' within sheet.update.cell(2,1,datavalue), each time within the FOR statement. I believe that is the way to solve this, but if anyone has any other suggestions, im all ears.
In regards to answering this part of your question:
"I don't know how to increase the '1' within sheet.update.cell(2,1,datavalue), each time within the FOR statement."
This is how you increment a counter inside a for loop typically speaking:
counter = 1
for symbol in tickers:
#Your code
sheet.update.cell(2,counter,datavalue)
counter = counter+1
While counter variables are a very common pattern used in most programming language (see Akib Rhast's answer), the more pythonic way to do it is by using the enumerate builtin function:
for column, symbol in enumerate(tickers, start=1):
# do stuff
sheet.update.cell(2,column,datavalue)
what is enumerate?
As the documentation states, enumerate takes something that you can iterate on (like a list) and returns a tuple with the counter as the first element and the elements from the iterator as the second element:
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
list(enumerate(seasons, start=1))
# outputs [(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]
It also has the advantage of doing so in a memory-efficient manner and is directly tied to your loop.
why is there a comma in my for loop?
This is just syntactic sugar in python that allows you to unpack a tuple or list:
alist = [1, 2, 3]
first, second, third = alist
print(third) # outputs 3
print(second) # outputs 2
print(first) # outputs 1
As enumerate returns a tuple, you are basically assigning each element on that tuple to a different variable at the same time.

How to Iterate trough Indeed reviews and find the correspondent job offer, printing the employee review?

Having established already a dynamic search for the offers based on companies generating a link where you use it to search it´s available job reviews done by the previous employees, I´m now faced with the question about coding the part that would let me after having assign job offers and job reviews to a list as well as description to iterate through them and print the correspondent.
It all seems easy to do until you notice that job offers list have a different size than job reviews so I´m on a standstill regarding the following situation.
I´m trying the following code which obviously gives me an error since cargo_revisto_list is longer in length than nome_emprego_list because once you have more reviews than job offers this tends to happen, as well as the opposite.
Lists would be per example, the following:
cargo_revisto_list = ["Business Leader","Sales Manager"]
nome_emprego_list = ["Business Leader","Sales Manager","Front-end Developer"]
opiniao_list = ["Excellent Job","Wonderful managing"]
It would be a question of luck to get them to be exactly the same size.
url = "https://www.indeed.pt/cmp/Novabase/reviews?fcountry=PT&floc=Lisboa"
comprimento_cargo_revisto = len(cargo_revisto_list) #19
comprimento_nome_emprego = len(nome_emprego_list) #10
descricoes_para_cargos_existentes = []
if comprimento_cargo_revisto > comprimento_nome_emprego:
for i in range(len(cargo_revisto_list)):
s = cargo_revisto_list[i]
for z in range(len(nome_emprego_list)):
a = nome_emprego_list[z]
if(s == a): #A Stopping here needs new way of comparing strings
c=opiniao_list[i]
descricoes_para_cargos_existentes.append(c)
elif comprimento_nome_emprego > comprimento_cargo_revisto:
for i in range(len(comprimento_nome_emprego)):
s = nome_emprego_list[i]
for z in range(len(cargo_revisto_list)):
a = cargo_revisto_list[z]
if(s == a) and a!=None:
c = opiniao_list[z]
descricoes_para_cargos_existentes.append(c)
else:
for i in range(len(cargo_revisto_list)):
s = cargo_revisto_list[i]
for z in range(len(nome_emprego_list)):
a = nome_emprego_list[z]
if(s == a):
c = (opiniao_list[i])
descricoes_para_cargos_existentes.append(c)
After solving this issue I would need to get the exact review description about the job reviewed that corresponds to the job offer, so to solve this I would get the index of cargo_revisto_list and use that index to print opiniao_list (job description) that matches the job reviewed since it was added to the list at the same time and order by Beautiful Soup at the scraping moment.

How to choose a random but non-recent asset from a list?

I have the issue of trying to pull a "random" item out of a database in my Flask app. This function only needs to return a video that wasn't recently watched by the user. I am not worried about multiple users right now. My current way of doing this does not work. This is what I am using:
#app.route('/_new_video')
def new_video():
Here's the important part I'm asking about:
current_id = request.args.get('current_id')
video_id = random.choice(models.Video.query.all()) # return list of all video ids
while True:
if video_id != current_id:
new_video = models.Video.query.get(video_id)
and then I return it:
webm = new_video.get_webm() #returns filepath in a string
mp4 = new_video.get_mp4() #returns filepath in a string
return jsonify(webm=webm,mp4=mp4,video_id=video_id)
The random range starts at 1 because the first asset was deleted from the database, so the number 0 isn't associated with an id. Ideally, the user would not get a video they had recently watched.
I recommend using a collections.deque to store a recently watched list. It saves a list like collection of items, and as you add to it, if it gets to its max length, it automatically drops the oldest items, on a first in, first out basis.
import collections
And here's a generator that you can use to get random vids, that haven't been recently watched. The denom argument will allow you to change the length of the recently watched list because it's used to determine the max length of your recently_watched as a fraction of your list of vids.
def gen_random_vid(vids, denom=2):
'''return a random vid id that hasn't been recently watched'''
recently_watched = collections.deque(maxlen=len(vids)//denom)
while True:
selection = random.choice(vids)
if selection not in recently_watched:
yield selection
recently_watched.append(selection)
I'll create a quick list to demo it:
vids = ['vid_' + c for c in 'abcdefghijkl']
And here's the usage:
>>> recently_watched_generator = gen_random_vid(vids)
>>> next(recently_watched_generator)
'vid_e'
>>> next(recently_watched_generator)
'vid_f'
>>> for _ in range(10):
... print(next(recently_watched_generator))
...
vid_g
vid_d
vid_c
vid_f
vid_e
vid_g
vid_a
vid_f
vid_e
vid_c

Categories

Resources