I have access public API data by given below link.
import json,urllib
import csv
data = urllib.urlopen("https://earthquake.usgs.gov/fdsnws/event/1/query?format=geojson&starttime=2016-10-01&endtime=2016-10-02").read()
output = json.loads(data)
print (output)
need help to put the obtained data into a csv file. With Following attributes should be the columns in the csv file:
• Latitude (Hint: Treat, the first entry in coordinates attribute as Lat)
• Longitude (Hint: Treat, the second entry in the coordinates attribute as Longitude)
• Title : This should include the Earthquake description
• Place: The location of the Earthquake
• Mag: Magnitude of the earthquake
And then to convert into Pandas dataframe
You can do this directly using pd.read_csv() and by requesting CSV data in the HTTP request:
import pandas as pd
url_csv = 'https://earthquake.usgs.gov/fdsnws/event/1/query?format=csv&starttime=2016-10-01&endtime=2016-10-02'
df = pd.read_csv(url_csv, usecols=['latitude', 'longitude', 'place', 'mag'])
Notice that I have changed the URL to request the data in CSV format by setting format=csv, and that pd.read_csv() accepts a URL for the data. usecols selects those columns to retain.
The CSV file does not contain the title column, however that column seems to be composed of the magnitude and location columns so, although you might want to avoid adding duplicate data, it can be constructed and appended to the dataframe like this:
df['title'] = 'M ' + df['mag'].map(str) + ' - ' + df['place']
There is also a pd.read_json() Pandas function, but I wasn't able to easily get it to work. If you can figure it out then you should be able to extract the required data without manually composing the title column.
Related
I am trying to process an Excel file with Pandas. The filter to be applied is by the values of the "Test Code" column which has the format "XX10.X/XX12.X" (i.e: EF10.1). The problem is that the dataframe and neglects everything after the dot when reading the column, leaving just "XX10". The information after the dot is the most important information.
The original document classifies those cells as a date, which probably is altering the normal processing of the values.
excelfile
The code I am using is:
import os
import pandas as pd
file = "H2020_TRI-HP_T6.2_PropaneIceFaultTests_v1"
folder = "J:\Downloads"
file_path = os.path.join(folder,file+".xlsx")
df = pd.read_excel(file_path,sheet_name="NF10")
df["Test Code"]
The output is:
output
From NOAA API, I can get Boston hourly weather forecast information via JSON file. Here is the link: https://api.weather.gov/gridpoints/BOX/70,76
(This JSON file is too long to present comprehensively here, please kindly click the link to see it)
I want to convert some of the weather variables into data frame to proceed further calculation.
The expected format is as below for temperature. I will use the same format to get precipitation, snowfall, humidity, etc.
expected dataframe format
Now I cannot figure out how to convert it to the dataframe I want. Please kindly help....
For now, here is the best I can do, but still cannot extract validTime and values from Temperature
import requests
import pandas as pd
response = requests.get("https://api.weather.gov/gridpoints/BOX/70,76")
# create new variable forecast
forecast=response.json()
df1 = pd.DataFrame.from_records(forecast['properties']).reset_index()
df2=df1.loc[ :1 , ['temperature','quantitativePrecipitation', 'snowfallAmount', 'relativeHumidity', 'windGust', 'windSpeed', 'visibility']]
df2
current output
Original file
Data source
Output
My code is as follows.
import pandas as pd
file_dest = r"C:\Users\user\Desktop\Book1.csv"
# read csv data
book=pd.read_csv(file_dest)
file_source = r"C:\Users\user\Desktop\Book2.csv"
materials=pd.read_csv(file_source)
Right_join = pd.merge(book,
materials,
on ='Name',
how ='left')
Right_join.to_csv(file_dest, index=False)
However, the output is as follows, which looks like it just copied the contents but didn't use Vlookup to insert the data. I had tried it with different kinds of data. The results are all the same (which looks like it just copied the contents). Please help me find out the bugs.
Since column names are different in each data source, you have to specify columns to join on in the left and right dataframes. Try this:
# assuming materials is your data source with Price column
joined = book.merge(materials,
left_on="Custmor",
right_on="Name",
how ='left')
please see attached photo
here's the image
I only need to import a specific column with conditions(such as specific data found in that column). And also, I only need to remove unnecessary columns. dropping them takes too much code. What specific code or syntax is applicable?
How to get a column from pandas dataframe is answered in Read specific columns from a csv file with csv module?
To quote:
Pandas is spectacular for dealing with csv files, and the following
code would be all you need to read a csv and save an entire column
into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
So in your case, you just save the the filtered data frame in a new variable.
This means you do newdf = data.loc[...... and then use the code snippet from above to extract the column you desire, for example newdf.continent
Hello and thank your for taking the time to have a read at this,
I am looking to extract company information from a particular stock exchange and then save this information to a pandas DataFrame.
Each firm has it's own webpage that are all determined by the "KodeEmiten" ending. These codes are saved in a column of the first Dataframe:
df = pd.DataFrame.from_dict(data['data'])
Now my goal is to use these codes to call each companies website individually and create a json file for each
for i in range (len(df)):
requests.get(f'https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten={df.loc[i, "KodeEmiten"]}').json()
While this works i can't save this to a new DataFrame due list index out of range and incorrect keyword errors. There is significantly more information in the xhr than i actually need and the different structures are what I believe to cause the error trying to save them to a new DataFrame. I'm really just interested in getting the data in these xhr headers:
AnakPerusahaan:, Direktur:, Komisaris, PemegangSaham:
So my question is kind of two-in-one:
a) How can I just extract the information from those specific xhr headers (all of them are tables)
b) how can i save those to a new dataframe (or even list I don't really mind)
import requests
import pandas as pd
import json
import time
# gets broad data of main page of the stock exchange
sxow = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')
data = sxow.json() # save the request as .json file
df = pd.DataFrame.from_dict(data['data']) #creates DataFrame based on the data (.json) file
# add: compare file contents and overwrite original if same
cdate = time.strftime ("%Y%m%d") # creating string-variable w/ current date year|month|day
df.to_excel(f"{cdate}StockExchange_Overview.xlsx") # converts DataFrame to Excel file, can't overwrite existing file
for i in range (len(df)) :
requests.get(f'https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten={df.loc[i, "KodeEmiten"]}').json()
#This is where I'm completely stuck
You don't need to convert the result to a dataframe. You can just loop through the json object and concatenate the url to get other companies website details.
Follow the code below:
import requests
import pandas as pd
import json
import time
# gets broad data of main page of the stock exchange
sxow = requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfiles?draw=1&columns%5B0%5D%5Bdata%5D=KodeEmiten&columns%5B0%5D%5Bname%5D&columns%5B0%5D%5Bsearchable%5D=true&columns%5B0%5D%5Borderable%5D=false&columns%5B0%5D%5Bsearch%5D%5Bvalue%5D&columns%5B0%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B1%5D%5Bdata%5D=KodeEmiten&columns%5B1%5D%5Bname%5D&columns%5B1%5D%5Bsearchable%5D=true&columns%5B1%5D%5Borderable%5D=false&columns%5B1%5D%5Bsearch%5D%5Bvalue%5D&columns%5B1%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B2%5D%5Bdata%5D=NamaEmiten&columns%5B2%5D%5Bname%5D&columns%5B2%5D%5Bsearchable%5D=true&columns%5B2%5D%5Borderable%5D=false&columns%5B2%5D%5Bsearch%5D%5Bvalue%5D&columns%5B2%5D%5Bsearch%5D%5Bregex%5D=false&columns%5B3%5D%5Bdata%5D=TanggalPencatatan&columns%5B3%5D%5Bname%5D&columns%5B3%5D%5Bsearchable%5D=true&columns%5B3%5D%5Borderable%5D=false&columns%5B3%5D%5Bsearch%5D%5Bvalue%5D&columns%5B3%5D%5Bsearch%5D%5Bregex%5D=false&start=0&length=700&search%5Bvalue%5D&search%5Bregex%5D=false&_=155082600847')
data = sxow.json() # save the request as .json file
list_of_json = []
for nested_json in data['data']:
list_of_json.append(requests.get('https://www.idx.co.id/umbraco/Surface/ListedCompany/GetCompanyProfilesDetail?emitenType=&kodeEmiten='+nested_json['KodeEmiten']).json())
time.sleep(1)
The list_of_json will contain all the json results you requested for.
Here nested_json is the loop variable to loop through the array of json of different KodeEmiten.
This is a slight improvement on #bigbounty's approach:
Since the aim is to save the information to a list and then use said list further in the script list comprehension is actually a tad faster.
i.e.
list_of_json = [requests.get('url+nested_json["KodeEmiten"]).json() for nested_json in data["data"]]'