switching a specific part of URL and save the result to CSV - python

I have an excel that has 'ID' key and API call URL
the Excel looks like this:
And the URL is this:
http://api.*******.com/2.0/location/{property_id}?key=123456ASDFG
The result of the API call is in JSON format
I would like to iterate the excel's property_id in the URL and save the result to csv in each row.
what I did so far is
import requests
import json
url = "http://api.*******.com/2.0/location/{property_id}?key=123456ASDFG"
response = requests.get(url)
data = response.test
print data
the result is basically same as what I just put the url in Chrome browser
I somehow have to read each row in the excel column A and switch the value and insert into {property_id} in the url
then, append the result to csv as row number increases..
I'm very new to API and I have no idea where to start.
I was trying to find similar questions on Stack-overflow and could not find any. (maybe wrong keywords?)
Any help is very helpful.
Thanks

Related

Python API json file to Pandas dataframe in Jupyter

I’m new to Python and Jupyter. I have an API which I get my data from. I have located the childnode with the list of data I want from a loop. And now I want to put that data into Pandas dataframe. Could someone please help me with this? You can see my code below
resp = requests.get('http://***
auth=('***', '***'),
headers={'Accept': 'application/json'})
data = json.loads(resp.text)
for Observasjoner in data ['Holdings']:
display(Observasjoner)
just extract the data from JSON and append it into the lists, later create a data frame and save it into the data frame.
import requests
data = requests.get("form_link")
print(data.text()) #will print all text or use print(data.json())
Now search for the data which you need or use beautiful soup for if it is in HTML website
if it is JSON they will be like dictionaries so use the same concepts here, now my data is the dictionary
print(data["key"]) #it will print key in same way iterate full dictionary (JSON FILE)
Now use the dictionary concept and append all the values of the keys into the lists
now keys are the columns and values are the rows create a data frame for it
Thanks

Export HTML table to excel without page refresh using python

I have a web page in which user can generate a table with no of rows and no of columns input.
Now I want to export this HTML table to an excel file using python. After some googling, I came to know about the to_excel snippet as shown below.
import pandas as pd
# The webpage URL whose table we want to extract
url = "https://www.geeksforgeeks.org/extended-operators-in-relational-algebra/"
# Assign the table data to a Pandas dataframe
table = pd.read_html(url)[0]
# Store the dataframe in Excel file
table.to_excel("data.xlsx")
As you can observe from the above code that the program navigates to the specified url, but in my web page, if the url is hit, all the data is gone (after page refresh) because I am generating number of rows and columns on the go without page refresh.
Can someone suggest alternate approach for excel export of HTML table using python?
Don't pass the url, pass the raw string containing html:
Parameters:
io: (str, path object or file-like object)
A URL, a file-like object, or a raw string containing HTML. Note that
lxml only accepts the http, ftp and file url protocols. If you have a
URL that starts with 'https' you might try removing the 's'.
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html

Passing values in a data frame as a function argument in python

I am trying to write a python code to read a set of URLs from a CSV file and download the content in that URL. To read data from the CSV file, I am using pandas. And data is stored in data frames.Now I want to pass these values in the data frame(URLs) as an argument one by one to a function that uses the GET method to go to that particular URL and downloads the file. I am stuck in how to pass the values stored in a data frame in a loop as an argument. Any helps or any alternate methods are appreciated. Thanks in advance
Note: The data frame holds around 500 URLs.
Edit: I am using url = pd.read_csv(file_name, usecols=[26]) to read data.
My question is how to pass values in url to a function in loop
Not sure I understand your question, but maybe this is an answer to it:
d = {'URL': ['URL1', 'URL2','URL3','URL4','URL5']}
df = pd.DataFrame(data=d)
for k in range(len(df)):
url = df.at[k,'URL']
out = do_something_with_url(url)

Python: Saving AJAX response data to .json and save this to pandas DataFrame

Hello and thank your for taking the time to have a read at this,
I am looking to extract company information from a particular stock exchange and then save this information to a pandas DataFrame.
Each firm has it's own webpage that are all determined by the "KodeEmiten" ending. These codes are saved in a column of the first Dataframe:
df = pd.DataFrame.from_dict(data['data'])
Now my goal is to use these codes to call each companies website individually and create a json file for each
for i in range (len(df)):
requests.get(f'https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten={df.loc[i, "KodeEmiten"]}').json()
While this works i can't save this to a new DataFrame due list index out of range and incorrect keyword errors. There is significantly more information in the xhr than i actually need and the different structures are what I believe to cause the error trying to save them to a new DataFrame. I'm really just interested in getting the data in these xhr headers:
AnakPerusahaan:, Direktur:, Komisaris, PemegangSaham:
So my question is kind of two-in-one:
a) How can I just extract the information from those specific xhr headers (all of them are tables)
b) how can i save those to a new dataframe (or even list I don't really mind)
import requests
import pandas as pd
import json
import time
# gets broad data of main page of the stock exchange
sxow = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')
data = sxow.json() # save the request as .json file
df = pd.DataFrame.from_dict(data['data']) #creates DataFrame based on the data (.json) file
# add: compare file contents and overwrite original if same
cdate = time.strftime ("%Y%m%d") # creating string-variable w/ current date year|month|day
df.to_excel(f"{cdate}StockExchange_Overview.xlsx") # converts DataFrame to Excel file, can't overwrite existing file
for i in range (len(df)) :
requests.get(f'https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten={df.loc[i, "KodeEmiten"]}').json()
#This is where I'm completely stuck
You don't need to convert the result to a dataframe. You can just loop through the json object and concatenate the url to get other companies website details.
Follow the code below:
import requests
import pandas as pd
import json
import time
# gets broad data of main page of the stock exchange
sxow = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')
data = sxow.json() # save the request as .json file
list_of_json = []
for nested_json in data['data']:
list_of_json.append(requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten='+nested_json['KodeEmiten']).json())
time.sleep(1)
The list_of_json will contain all the json results you requested for.
Here nested_json is the loop variable to loop through the array of json of different KodeEmiten.
This is a slight improvement on #bigbounty's approach:
Since the aim is to save the information to a list and then use said list further in the script list comprehension is actually a tad faster.
i.e.
list_of_json = [requests.get('url+nested_json["KodeEmiten"]).json() for nested_json in data["data"]]'

How to read API JSON data and store as Python dictionary

I am pulling in info from an API. The returned data is in JSON format. I have to iterate through and get the same data for multiple inputs. I want to save the JSON data for each input in a python dictionary for easy access. This is what I have so far:
import pandas
import requests
ddict = {}
read_input = pandas.read_csv('input.csv')
for d in read_input.values:
print(d)
url = "https://api.xyz.com/v11/api.json?KEY=123&LOOKUP={}".format(d)
response = requests.get(url)
data = response.json()
ddict[d] = data
df = pandas.DataFrame.from_dict(ddict, orient='index')
with pandas.ExcelWriter('output.xlsx') as w:
df.to_excel(w, 'output')
With the above code, I get the following output:
a.com
I also get an excel output with the data only from this first line. My input csv file has close to 400 rows so I should be seeing more than 1 line in the output and in the output excel file.
If you have a better way of doing this, that would be appreciated. In addition, the excel output I get is very hard to understand. I want to read the JSON data using dictionaries and subdictionaries but I don't completely understand the format of the underlying data - I think it looks closest to a JSON array.
I have looked at numerous other posts including Parsing values from a JSON file using Python? and How do I write JSON data to a file in Python? and Converting JSON String to Dictionary Not List and How do I save results of a "for" loop into a single variable? but none of the techniques have worked so far. I would prefer not to pickle, if possible.
I'm new to Python so any help is appreciated!
I'm not going to address your challenges with JSON here as I'll need more information on the issues you're facing. However, with respect to reading from CSV using Pandas, here's a great resource: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html.
Now, your output is being read the way it is because a.com is being considered the header (undesirable). Your read statement should be:
read_input = pandas.read_csv('input.csv', header=None)
Now, read_input is a DataFrame (documentation). So, what you're really looking for is the values in the first column. You can easily get an array of values by read_input.values. This gives you a separate array for each row. So your for loop would be:
for d in read_input.values:
print(d[0])
get_info(d[0])
For JSON, I'd need to see a sample structure and your desired way of storing it.
I think there is a awkwardness in you program.
Try with this:
ddict = {}
read_input = pandas.read_csv('input.csv')
for d in read_input.values:
url = "https://api.xyz.com/v11/api.json?KEY=123&LOOKUP={}".format(d)
response = requests.get(url)
data = response.json()
ddict[d] = data
Edit: iterate the read_input.values.

Categories

Resources