I'm working on a site and I'm trying to find an api that returns the total value in $ of every skin in csgo.
What I want to achieve is something like this: https://pbs.twimg.com/media/E2-bYmJXEAQmO5u.jpg
How can I do that?
Thank you #NewbieCody for linking me to the answer.
Example:
data = requests.get("https://steamcommunity.com/market/search/render/?search_descriptions=0&sort_column=name&sort_dir=desc&appid=730&norender=1&count=100&start=0")
json_data = json.loads(data.text)
print(json_data)
Every page returns 100 items so I itinerated over [the number of pages]/100 adding 100 every time to the start parameter and extracted the prices to make the graph.
Related
I am new with Python and trying to scrape IMDB. I am scraping a list of 250 top IMDB movies and want to get information on each unique website for example the length of each movie.
I already have a list of unique URLs. So, I want to loop over this list and for every URL in this list I want to retrieve the 'length' of that movie. Is this possible to do in one code?
for URL in urlofmovie:
htmlsource = requests.get(URL)
tree_url = html.fromstring(htmlsource)
lengthofmovie = tree_url.xpath('//*[#class="subtext"]')
I expect that lengthofmovie will become a list of all the lengths of the movies. However, it already goes wrong at line 2: the htmlsource.
To make it a list you should first create a list and then append each length to that list.
length_list = []
for URL in urlofmovie:
htmlsource = requests.get(URL)
tree_url = html.fromstring(htmlsource)
length_list.append(tree_url.xpath('//*[#class="subtext"]'))
Small tip: Since you are new to Python I would suggest you to go over PEP8 conventions. Your variable naming can make your(and other developers) life easier. (urlofmovie -> urls_of_movies)
However, it already goes wrong for at line 2: the htmlsource.
Please provide the exception you are receiving.
My task is to get the number of open issues using github.api. Unfortunately, when I parsing any repositories, I get the same number - 30.
import requests
r = requests.get('https://api.github.com/repos/grpc/grpc/issues')
count = 0
for item in r.json():
if item['state'] == 'open':
count += 1
print(count)
Is there any way to get a real quantity of issues?
See the documentation about the Link response header, also you can pass the state or filters.
https://developer.github.com/v3/guides/traversing-with-pagination/
https://developer.github.com/v3/issues/
You'll have to page through.
http://.../issues?page=1&state=open
http://.../issues?page=2&state=open
The /issues/ endpoint is paginated: it means that you will have to iterate through several pages to get all the issues.
https://api.github.com/repos/grpc/grpc/issues?page=1
https://api.github.com/repos/grpc/grpc/issues?page=2
...
But there is a better way to get what you want. The GET /repos/:owner/:repo directly gives the number of open issues on a repository.
For instance, on https://api.github.com/repos/grpc/grpc, you can see:
"open_issues_count": 1052,
Click here to have a look at the documentation for this endpoint.
Like many others I have been looking for an alternative source of stock prices now that the Yahoo and Google APIs are defunct. I decided to take a try at web scraping the Yahoo site from which historical prices are still available. I managed to put together the following code which almost does what I need:
import urllib.request as web
import bs4 as bs
def yahooPrice(tkr):
tkr=tkr.upper()
url='https://finance.yahoo.com/quote/'+tkr+'/history?p='+tkr
sauce=web.urlopen(url)
soup=bs.BeautifulSoup(sauce,'lxml')
table=soup.find('table')
table_rows=table.find_all('tr')
allrows=[]
for tr in table_rows:
td=tr.find_all('td')
row=[i.text for i in td]
if len(row)==7:
allrows.append(row)
vixdf= pd.DataFrame(allrows).iloc[0:-1]
vixdf.columns=['Date','Open','High','Low','Close','Aclose','Volume']
vixdf.set_index('Date',inplace=True)
return vixdf
which produces a dataframe with the information I want. Unfortunately, even though the actual web page shows a full year's worth of prices, my routine only returns 100 records (including dividend records). Any idea how I can get more?
The Yahoo Finance API was depreciated in May '17, I believe. Now, there are to many options for downloading time series data for free, at least that I know of. Nevertheless, there is always some kind of alternative. Check out the URL below to find a tool to download historical price.
http://investexcel.net/multiple-stock-quote-downloader-for-excel/
See this too.
https://blog.quandl.com/api-for-stock-data
I don't have the exact solution to your question but I have a workaround (I had the same problem and hence used this approach)....basically, you can use Bday() method - 'import pandas.tseries.offset' and look for x number of businessDays for collecting the data. In my case, i ran the loop thrice to get 300 businessDays data - knowing that 100 was maximum I was getting by default.
Basically, you run the loop thrice and set the Bday() method such that the iteration on first time grabs 100 days data from now, then the next 100 days (200 days from now) and finally the last 100 days (300 days from now). The whole point of using this is because at any given point, one can only scrape 100 days data. So basically, even if you loop through 300 days in one go, you may not get 300 days data - your original problem (possibly yahoo limits amount of data extracted in one go). I have my code here : https://github.com/ee07kkr/stock_forex_analysis/tree/dataGathering
Note, the csv files for some reason are not working with /t delimiter in my case...but basically u can use the data frame. One more issue I currently have is 'Volume' is a string instead of float....the way to get around is :
apple = pd.DataFrame.from_csv('AAPL.csv',sep ='\t')
apple['Volume'] = apple['Volume'].str.replace(',','').astype(float)
First - Run the code below to get your 100 days.
Then - Use SQL to insert the data into a small db (Sqlite3 is pretty easy to use with python).
Finally - Amend code below to then get daily prices which you can add to grow your database.
from pandas import DataFrame
import bs4
import requests
def function():
url = 'https://uk.finance.yahoo.com/quote/VOD.L/history?p=VOD.L'
response = requests.get(url)
soup=bs4.BeautifulSoup(response.text, 'html.parser')
headers=soup.find_all('th')
rows=soup.find_all('tr')
ts=[[td.getText() for td in rows[i].find_all('td')] for i in range (len(rows))]
date=[]
days=(100)
while days > 0:
for i in ts:
data.append (i[:-6])
now=data[num]
now=DataFrame(now)
now=now[0]
now=str(now[0])
print now, item
num=num-1
I am trying to write my first API query with python. I am calling an extremely simple dataset. (http://api.open-notify.org/astros.json). This displays information about the number of people in space.
I can return the number, but I want to try and display the names. So far I have:
import requests
response = requests.get("http://api.open-notify.org/astros.json")
data = response.json()
print(data["number"])
Any help would be greatly appreciated.
If you want to get names or crafts just do this:
print("Fist name: ",data["people"][0]["name"])
print("Fist craft: ",data["people"][0]["craft"])
Eventually you can put it in a for loop like this:
for i in range(len(data["people"])):
print(data["people"][i]["name"])
You should iterate by data['people'] and then get the name:
for people in data['people']:
print(people['name'])
Recently I am reading some stock prices database in Quandl using API call to extract the data. But I am really confused by the example I have.
import requests
api_url = 'https://www.quandl.com/api/v1/datasets/WIKI/%s.json' % stock
session = requests.Session()
session.mount('http://', requests.adapters.HTTPAdapter(max_retries=3))
raw_data = session.get(api_url)
Can anyone explain that to me?
1) for api_url, if I copy that webepage, it says 404 not found. So if I want to use other database, how do I prepare this api_usl? What does '% stock' mean?
2) here request looks like to be used to extract the data, what is the format of the raw_data? How do I know the column names? How do I extract the columns?
To expand on my comment above:
% stock is a string formatting operation, replacing %s in the preceding string with the value referenced by stock. Further details can be found here
raw_data actually references a Response object (part of the requests module - details found here
To expand on your code.
import requests
#Set the stock we are interested in, AAPL is Apple stock code
stock = 'AAPL'
#Your code
api_url = 'https://www.quandl.com/api/v1/datasets/WIKI/%s.json' % stock
session = requests.Session()
session.mount('http://', requests.adapters.HTTPAdapter(max_retries=3))
raw_data = session.get(api_url)
# Probably want to check that requests.Response is 200 - OK here
# to make sure we got the content successfully.
# requests.Response has a function to return json file as python dict
aapl_stock = raw_data.json()
# We can then look at the keys to see what we have access to
aapl_stock.keys()
# column_names Seems to be describing the individual data points
aapl_stock['column_names']
# A big list of data, lets just look at the first ten points...
aapl_stock['data'][0:10]
Edit to answer question in comment
So the aapl_stock[column_names] shows Date and Open as the first and second values respectively. This means they correspond to positions 0 and 1 in each element of the data.
Therefore to access date use aapl_stock['data'][0:10][0] (date value for first ten items) and to access the value for open use aapl_stock['data'][0:78][1] (open value for first 78 items).
To get a list of every value in the dataset, where each element is a list with values for Date and Open you could add something like aapl_date_open = aapl_stock['data'][:][0:1].
If you are new to python I seriously recommend looking at the list slice notation, a quick intro can be found here