Looping through values in an API and saving to a txt file - python

I am using a Pokemon API : https://pokeapi.co/api/v2/pokemon/
and I am trying to make a list which can store 6 pokemon ID's then, using a for loop, call to the API and retrieve data for each pokemon. Finally, I want to save this info in a txt file. This is what I have so far:
import random
import requests
from pprint import pprint
pokemon_number = []
for i in range (0,6):
pokemon_number.append(random.randint(1,10))
url = 'https://pokeapi.co/api/v2/pokemon/{}/'.format(pokemon_number)
response = requests.get(url)
pokemon = response.json()
pprint(pokemon)
with open('pokemon.txt', 'w') as pok:
pok.write(pokemon_number)
I don't understand how to get the API to read the IDs from the list.
I hope this is clear, I am in a right pickle.
Thanks

You are passing pokemon_number to the url variable, which is a list. You need to iterate over the list instead.
Also, to actually save the pokemon, you can use either the name or it's ID as the filename. The JSON library allows for easy saving of objects to JSON files.
import random
import requests
import json
# renamed this one to indicate it's not a single number
pokemon_numbers = []
for i in range (0,6):
pokemon_numbers.append(random.randint(1,10))
# looping over the generated IDs
for id in pokemon_numbers:
url = f"https://pokeapi.co/api/v2/pokemon/{id}/"
# if you use response, you overshadow response from the requests library
resp = requests.get(url)
pokemon = resp.json()
print(pokemon['name'])
with open(f"{pokemon['name']}.json", "w") as outfile:
json.dump(pokemon, outfile, indent=4)

I now have this:
import requests
pokemon_number = []
for i in range (0,6):
pokemon_number.append(random.randint(1,50))
x = 0
while x <len(pokemon_number):
print(pokemon_number[x])
x = x +1
url = 'https://pokeapi.co/api/v2/pokemon/{}/'.format(pokemon_number[])
response = requests.get(url)
pokemon = response.json()
print(pokemon)
print(pokemon['name'])
print(pokemon['height'])
print(pokemon['weight'])
with open('pokemon.txt', 'w') as p:
p.write(pokemon['name'])
p.write(pokemon['ability'])

Related

Storing API data in Json fromat

I am having one API in that all location id’s and their respective info like(address, Lat, long) present. But if I want to fetch other extra attributes like location name, location area, location access then I need to give location id one by one as parameter in API to fetch their respective extra attributes.
I have written below code.but the problem with below code is the data is coming in console and i don't know how to take this information in json and then convert it into text file.
ids=location_id_df["id"] #stored location id in dataframe
authorization =”####################### "
print("started")
def test_api(url, authorization, rawfile,ids):
for i in range(0,1000,50):
for j in ids:
#print(j)
try:
request = urllib.request.Request('https:….. /locations/{}'.format(j)+"?
offset="+str(i),headers={'authorization':authorization})
response = urllib.request.urlopen(request).read()
print(response)
except HTTPError as e:
print(e)
sys.exit(0)
with open(rawfile + "_offset_" + str(i) + ".json", "wb") as json_download:
json_download.write(response)
test_api(url, authorization, rawfile,ids)
I need to fectch response in json like
5182021_offset_0.json #contains some location id's with extra attribute data
5182021_offset_50.json #contains some location id's with extra attribute data
5182021_offset_100.json #contains some location id's with extra attribute data
........................
.......................
Here is a simplified version of your example that queries an api that returns json and saves each result to a file.
import urllib.request
import json
for i in range(2):
responses = []
for j in range(3):
request = urllib.request.Request("https://www.boredapi.com/api/activity/")
response = urllib.request.urlopen(request)
if response.status == 200:
try:
response_bytes = response.read()
finally:
response.close()
response_string = response_bytes.decode("utf8")
response_data = json.loads(response_string)
responses.append(response_data)
file_name = "data-{}.json".format(i)
with open(file_name, "w") as f:
json.dump(responses, f)
I would suggest using the Requests library as it tends to have a simpler api than urllib, and is widely used by the python community. Here is the same example with the Requests library.
import requests
import json
for i in range(2):
responses = []
for j in range(3):
response = requests.get("https://www.boredapi.com/api/activity/")
if response.status_code == 200:
response_data = response.json()
responses.append(response_data)
file_name = "data-{}.json".format(i)
with open(file_name, "w") as f:
json.dump(responses, f)

Saving a "for loop" iteration

When I run the code below, the for loop saves the first text correctly into a separate file, but the second iteration saves the first AND the second into another separate file, and the third iteration saves the first, second and third into a separate file and so on.... I'd like to save each iteration into a separate file but not adding the previous iterations. I don't have a clue to what I'm missing here. Can anyone help, please?
import requests
from bs4 import BeautifulSoup
import pandas as pd
base_url = 'http://www.chakoteya.net/StarTrek/'
end_url = ['1.htm', '6.htm', '8.htm', '2.htm', '7.htm',
'5.htm', '4.htm', '10.htm', '12.htm', '11.htm', '3.htm', '16.htm']
episodes = []
count = 0
for end_url in end_url:
url = base_url + end_url
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
episodes.append(soup.text)
file_text = open(f"./{count}.txt", "w")
file_text.writelines()
file_text.close()
count = count + 1
print(f"saved file for url:{url}")
Please consider the following points!
there's no reason at all to use bs4! since response.text is actually holding the same.
You've to use Same Session explained on my previous answer
You can use iteration with fstring/format which will let your code more cleaner and easier to read.
with context manager is less headache as you don't need to remember to close your file after!
import requests
block = [9, 13, 14, 15]
def main(url):
with requests.Session() as req:
for page in range(1, 17):
if page not in block:
print(f'Extracing Page# {page}')
r = req.get(url.format(page))
with open(f'{page}.htm', 'w') as f:
f.write(r.text)
main('http://www.chakoteya.net/StarTrek/{}.htm')
You needed to empty your episodes for each iteration. Try the following:
import requests
from bs4 import BeautifulSoup
import pandas as pd
base_url = 'http://www.chakoteya.net/StarTrek/'
end_url = ['1.htm', '6.htm', '8.htm', '2.htm', '7.htm',
'5.htm', '4.htm', '10.htm', '12.htm', '11.htm', '3.htm', '16.htm']
count = 0
for end_url in end_url:
episodes = []
url = base_url + end_url
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
episodes.append(soup.text)
file_text = open(f"./{count}.txt", "w")
file_text.writelines(episodes)
file_text.close()
count = count + 1
print(f"saved file for url:{url}")
It doesn't appear that your code would save anything to the files at all as you are calling writelines with no arguments
if __name__ == '__main__':
import requests
from bs4 import BeautifulSoup
base_url = 'http://www.chakoteya.net/StarTrek/'
paths = ['1.htm', '6.htm', '8.htm', '2.htm', '7.htm',
'5.htm', '4.htm', '10.htm', '12.htm', '11.htm', '3.htm', '16.htm']
for path in paths:
url = f'{base_url}{path}'
filename = path.split('.')[0]
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
with open(f"./{filename}.txt", "w") as f:
f.write(soup.text)
print(f"saved file for url:{url}")
This is reworked a little. It wasn't clear why the data was appending to episodes so that was left off.
Maybe you were writing the list to the file which would account for dupes. You were adding the content to each file to a list and writing that growing list each iteration.

How to read a particular value ("text") from the following JSON file using Python 2.7?

I am trying to read the text values from the following JSON file.
https://www.ted.com//talks/marina_abramovic_an_art_made_of_trust_vulnerability_and_connection/transcript.json?language=en
I want to print whatever there is in "text" key.
I am trying this code but not getting the results :
import json
import urllib
url = "https://www.ted.com//talks/marina_abramovic_an_art_made_of_trust_vulnerability_and_connection/transcript.json?language=en"
response = urllib.urlopen(url)
data = json.loads(response.read())
def iterate(data):
for key, value in data.items():
if isinstance(value, dict):
print(value)
iterate(value)
continue
iterate(data)
Can you try this please ?
import urllib
import json
url = "https://www.ted.com//talks/marina_abramovic_an_art_made_of_trust_vulnerability_and_connection/transcript.json?language=en"
response = urllib.urlopen(url)
data = json.loads(response.read())
def iterate(data):
if "paragraphs" not in data: # check if paragraphs node exits
return
for cues in data['paragraphs']: #iterate through paragraphs node
for d in cues['cues']: #iterate through cues
print d['text']
iterate(data)

Loop in python script with xpath. Why do I only get results form last url?

Why do I only get the results form the last url?
The idea is that I get a list of results of both urls.
Also, with the printing in csv I get eacht time an empty row. How do I remove this row?
import csv
import requests
from lxml import html
import urllib
TV_category = ["_108-tot-127-cm-43-tot-50-,98952,501090","_128-tot-150-cm-51-tot-59-,98952,501091"]
url_pattern = 'http://www.mediamarkt.be/mcs/productlist/{}.html?langId=-17'
for item in TV_category:
url = url_pattern.format(item)
page = requests.get(url)
tree = html.fromstring(page.content)
outfile = open("./tv_test1.csv", "wb")
writer = csv.writer(outfile)
rows = tree.xpath('//*[#id="category"]/ul[2]/li')
for row in rows:
price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
writer.writerow([product_ref,price])
As I explained in the question's comments, you need to put the second for loop inside (at the end) the first one. Otherwise, only the last rows results will be saved/written to the CSV-format file.
You don't need to open the file in each loop (a with statement will close it automagically). It is, as well, important to highlight that if you open a file with write flags it will overwrite, and if it's inside a loop it will overwrite each time it's opened.
I'd refactor your code as follows:
import csv
import requests
from lxml import html
import urllib
TV_category = ["_108-tot-127-cm-43-tot-50-,98952,501090","_128-tot-150-cm-51-tot-59-,98952,501091"]
url_pattern = 'http://www.mediamarkt.be/mcs/productlist/{}.html?langId=-17'
with open("./tv_test1.csv", "wb") as outfile:
writer = csv.writer(outfile)
for item in TV_category:
url = url_pattern.format(item)
page = requests.get(url)
tree = html.fromstring(page.content)
rows = tree.xpath('//*[#id="category"]/ul[2]/li')
for row in rows:
price = row.xpath('normalize-space(div/aside[2]/div[1]/div[1]/div/text())')
product_ref = row.xpath('normalize-space(div/div/h2/a/text())')
writer.writerow([product_ref,price])

Parse HTML table data to JSON and save to text file in Python 2.7

I'm trying to extract the data on the crime rate across states from
this webpage, link to web page
http://www.disastercenter.com/crime/uscrime.htm
I am able to get this into text file. But I would like to get the
response in Json format. How can I do this in python.
Here is my code:
import urllib
import re
from bs4 import BeautifulSoup
link = "http://www.disastercenter.com/crime/uscrime.htm"
f = urllib.urlopen(link)
myfile = f.read()
soup = BeautifulSoup(myfile)
soup1=soup.find('table', width="100%")
soup3=str(soup1)
result = re.sub("<.*?>", "", soup3)
print(result)
output=open("output.txt","w")
output.write(result)
output.close()
The following code will get the data from the two tables and output all of it as a json formatted string.
Working Example (Python 2.7.9):
from lxml import html
import requests
import re as regular_expression
import json
page = requests.get("http://www.disastercenter.com/crime/uscrime.htm")
tree = html.fromstring(page.text)
tables = [tree.xpath('//table/tbody/tr[2]/td/center/center/font/table/tbody'),
tree.xpath('//table/tbody/tr[5]/td/center/center/font/table/tbody')]
tabs = []
for table in tables:
tab = []
for row in table:
for col in row:
var = col.text_content()
var = var.strip().replace(" ", "")
var = var.split('\n')
if regular_expression.match('^\d{4}$', var[0].strip()):
tab_row = {}
tab_row["Year"] = var[0].strip()
tab_row["Population"] = var[1].strip()
tab_row["Total"] = var[2].strip()
tab_row["Violent"] = var[3].strip()
tab_row["Property"] = var[4].strip()
tab_row["Murder"] = var[5].strip()
tab_row["Forcible_Rape"] = var[6].strip()
tab_row["Robbery"] = var[7].strip()
tab_row["Aggravated_Assault"] = var[8].strip()
tab_row["Burglary"] = var[9].strip()
tab_row["Larceny_Theft"] = var[10].strip()
tab_row["Vehicle_Theft"] = var[11].strip()
tab.append(tab_row)
tabs.append(tab)
json_data = json.dumps(tabs)
output = open("output.txt", "w")
output.write(json_data)
output.close()
This might be what you want, if you can use the requests and lxml modules. The data structure presented here is very simple, adjust this to your needs.
First, get a response from your requested URL and parse the result into an HTML tree:
import requests
from lxml import etree
import json
response = requests.get("http://www.disastercenter.com/crime/uscrime.htm")
tree = etree.HTML(response.text)
Assuming you want to extract both tables, create this XPath and unpack the results. totals is "Number of Crimes" and rates is "Rate of Crime per 100,000 People":
xpath = './/table[#width="100%"][#style="background-color: rgb(255, 255, 255);"]//tbody'
totals, rates = tree.findall(xpath)
Extract the raw data (td.find('./') means first child item, whatever tag it has) and clean the strings (r'' raw strings are needed for Python 2.x):
raw_data = []
for tbody in totals, rates:
rows = []
for tr in tbody.getchildren():
row = []
for td in tr.getchildren():
child = td.find('./')
if child is not None and child.tag != 'br':
row.append(child.text.strip(r'\xa0').strip(r'\n').strip())
else:
row.append('')
rows.append(row)
raw_data.append(rows)
Zip together the table headers in the first two rows, then delete the redundant rows, seen as the 11th & 12th steps in slice notation:
data = {}
data['tags'] = [tag0 + tag1 for tag0, tag1 in zip(raw_data[0][0], raw_data[0][1])]
for raw in raw_data:
del raw[::12]
del raw[::11]
Store the rest of the raw data and create a JSON file (optional: eliminate whitespace with separators=(',', ':')):
data['totals'], data['rates'] = raw_data[0], raw_data[1]
with open('data.json', 'w') as f:
json.dump(data, f, separators=(',', ':'))

Categories

Resources