Exporting data from Yelp API to csv file - python

I'm a beginner programmer trying to work with the Yelp API and I've been able to pull the information I need but I can't figure out how to only export a single part of the address into my csv file. This is the code I am working with:
**Convert the JSON string to a dictionary*
business_data = response.json()
c = csv.writer(open('testing.csv', 'a'), lineterminator ='\n')
for biz in business_data['businesses']:
c.writerow([biz['name'], biz['location'], biz['phone'], biz['url']])
In the last line of code, in the for loop, I want to be able to target a specific element of the 'location' like this:
#c.writerow([biz['name'], biz['address1'], biz['city'], biz['state'], biz['zip_code'], biz['phone'], biz['url']])
on the Yelp website it shows that I can target these specifics but I just can't seem to figure out how with c.writerow()
Yelp shows that I can target these like this:
businesses[x].location.address1
businesses[x].location.address2
businesses[x].location.city

From the response, biz['location'] is a python dictionary meaning it consists of key, value pairs.
You can validate this by printing type(biz['location']). To answer your question, all you need is to call the dict key & write the value into the file.
c.writerow([biz['name'], biz['location']['address1'], biz['location']['city'], biz['location']['state'], biz['location']['zip_code'] biz['phone'], biz['url']])

Related

python parsing soup protocol response header and value

Trying to write the question one more time because the first time the problem was not clear.
I'm trying to extract output from a SOAP protocol using Python.
The output is structured with a lot of subcategories and this is where I can't extract the information properly.
The output I have is very long, so I will bring here a short example from soapenv:Body
the original code (soap body)
<mc:ocb-rule id="boic"><mc:id>boic</mc:id><mc:ocb-conditions><mc:rule-deactivated>true</mc:rule-deactivated><mc:international>true</mc:international></mc:ocb-conditions><mc:cb-actions><mc:allow>false</mc:allow></mc:cb-actions></mc:ocb-rule>
As you can see I also used the command
xmltodict.parse(soap_response)
for turning the output into a dictionary
OrderedDict([('#id',
'boic'),
('mc:id',
'boic'),
('mc:ocb-conditions',
OrderedDict([('mc:rule-deactivated',
'true'),
('mc:international',
'true')])),
('mc:cb-actions',
OrderedDict([('mc:allow',
'false')]))])
If there is an easier way, I would appreciate guidance
As I mentioned, my goal is ultimately to get each category its own value, when if there is a subcategory, a separate category will be added.
In the end I will have to put all the values into the Data frame and display all the categories and their values
for example :
table example
I really hope that this time I was able to explain myself properly
Thanks in advance
i am trying to parsing soup response and insert all value and header to data frame

how to get nested data with pandas and request

I'm going crazy trying to get data through an API call using request and pandas. It looks like it's nested data, but I cant get the data i need.
https://xorosoft.docs.apiary.io/#reference/sales-orders/get-sales-orders
above is the api documentation. I'm just trying to keep it simple and get the itemnumber and qtyremainingtoship, but i cant even figure out how to access the nested data. I'm trying to use DataFrame to get it, but am just lost. any help would be appreciated. i keep getting stuck at the 'Data' level.
type(json['Data'])
df = pd.DataFrame(['Data'])
df.explode('SoEstimateHeader')
df.explode('SoEstimateHeader')
Cell In [64], line 1
df.explode([0:])
^
SyntaxError: invalid syntax
I used the link to grab a sample response from the API documentation page you provided. From the code you provided it looks like you are already able to get the data and I'm assuming the you have it as a dictionary type already.
From what I can tell I don't think you should be using pandas, unless its some downstream requirement in the task you are doing. But to get the ItemNumber & QtyRemainingToShip you can use the code below.
# get the interesting part of the data out of the api response
data_list = json['Data']
#the data_list is only one element long, so grab the first element which is of type dictionary
data = data_list[0]
# the dictionary has two keys at the top level
so_estimate_header = data['SoEstimateHeader']
# similar to the data list the value associated with "SoEstimateItemLineArr" is of type list and has 1 element in it, so we grab the first & only element.
so_estimate_item_line_arr = data['SoEstimateItemLineArr'][0]
# now we can grab the pieces of information we're interested in out of the dictionary
qtyremainingtoship = so_estimate_item_line_arr["QtyRemainingToShip"]
itemnumber = so_estimate_item_line_arr["ItemNumber"]
print("QtyRemainingToShip: ", qtyremainingtoship)
print("ItemNumber: ", itemnumber)
Output
QtyRemainingToShip: 1
ItemNumber: BC
Side Note
As a side note I wouldn't name any variables json because thats also the name of a popular library in python for parsing json, so that will be confusing to future readers and will clash with the name if you end up having to import the json library.

How can I extract the data from these strings?

I am making a program that consists of scraping data from a job page, and I get to this data
{"job":{"ciphertext":"~01142b81f148312a7c","rid":225177647,"uid":"1416152499115024384","type":2,"access":4,"title":"Need app developers to handle our app upgrades","status":1,"category":{"name":"Mobile Development","urlSlug":"mobile-development"
,"contractorTier":2,"description":"We have an app currently built, we are looking for someone to \n\n1) Manage the app for bugs etc \n2) Provide feature upgrades \n3) Overall Management and optimization \n\nPlease get in touch and i will share more details. ","questions":null,"qualifications":{"type":0,"location":null,"minOdeskHours":0,"groupRecno":0,"shouldHavePortfolio":false,"tests":null,"minHoursWeek":40,"group":null,"prefEnglishSkill":0,"minJobSuccessScore":0,"risingTalent":true,"locationCheckRequired":false,"countries":null,"regions":null,"states":null,"timezones":null,"localMarket":false,"onSiteType":null,"locations":null,"localDescription":null,"localFlexibilityDescription":null,"earnings":null,"languages":null
],"clientActivity":{"lastBuyerActivity":null,"totalApplicants":0,"totalHired":0,"totalInvitedToInterview":0,"unansweredInvites":0,"invitationsSent":0
,"buyer":{"isPaymentMethodVerified":false,"location":{"offsetFromUtcMillis":14400000,"countryTimezone":"United Arab Emirates (UTC+04:00)","city":"Dubai","country":"United Arab Emirates"
,"stats":{"totalAssignments":31,"activeAssignmentsCount":3,"feedbackCount":27,"score":4.9258937139,"totalJobsWithHires":30,"hoursCount":7.16666667,"totalCharges":{"currencyCode":"USD","amount":19695.83
,"jobs":{"postedCount":59,"openCount":2
,"avgHourlyJobsRate":{"amount":19.999534874418824
But the problem is that the only data I need is:
-Title
-Description
-Customer activity (lastBuyerActivity, totalApplicants, totalHired, totalInvitedToInterview, unansweredInvites, invitationsSent)
-Buyer (isPaymentMethodVerified, location (Country))
-stats (All items)
-jobs (all items)
-avgHourlyJobsRate
These sort of data are JSON type data. Python understands these sort of data through dictionary data type.
Suppose you have your data stored in a string. You can use di = exec(myData) to convert the string to dictionary. Then you can access the structured data like: di["job"] which return's the job section of the data.
di = exec(myData)
print(`di["job"]`)
However this is just a hack and it is not recommended because it's a
bit messy and unpythonic.
The appropriate way is to use JSON library to convert the data to dictionary. Take a look at the code snippet below to get an idea of what is the appropriate way:
import json
myData = "Put your data Here"
res = json.loads(myData)
print(res["jobs"])
convert the data to dictionary using json.loads
then you can easily use the dictionary keys that your want to lookup or filter the data.
This seems to be a dictionary so you can extract something from it by doing: dictionary["job"]["uid"] for example. If it is a Json file convert the data to a Python dictionary

Beautifulsoup returns empty for all table tags

I'm trying to access the table details to ultimately put into a dataframe and save as a csv with a limited number of rows(the dataset is massive) from the following site: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
I'm just starting out webscraping and was practicing on this dataset. I can effectively pull tags like div but when I try soup.findAll('tr') or td, it returns an empty set.
The table appears to be embedded in a different code(see link above) so that's maybe my issue, but still unsure how to access the detail rows and headers, etc..., Selenium maybe?
Thanks in advance!
By the looks of it, the website already allows you to export the data:
As it would seem, the original link is:
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
The .csv download link is:
https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD
The .json link is:
https://data.cityofchicago.org/resource/ijzp-q8t2.json
Therefore you could simply extract the ID of the data, in this case ijzp-q8t2, and replace it on the download links above. Here is the official documentation of their API.
import pandas as pd
from sodapy import Socrata
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofchicago.org", None)
# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cityofchicago.org,
# MyAppToken,
# userame="user#example.com",
# password="AFakePassword")
# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("ijzp-q8t2", limit=2000)
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)

python dictionary from url

I am trying to gather weather data from the national weather service and read it into a python script. They offer a JSON return, but they also offer another return which isn't formatted JSON but has more variables (which I want). This set of data looks like it is formatted as a python dictionary. It looks like this:
stations={
KAPC:
{
'id':'KAPC',
'stnid':'92',
'name':'Napa, Napa County Airport',
'elev':'33',
'latitude':'38.20750',
'longitude':'-122.27944',
'distance':'',
'provider':'NWS/FAA',
'link':'http://www.wrh.noaa.gov/mesowest/getobext.php?sid=KAPC',
'Date':'24 Feb 8:54 am',
'Temp':'39',
'TempC':'4',
'Dewp':'29',
'Relh':'67',
'Wind':'NE#6',
'Direction':'50&#176',
'Winds':'6',
'WindChill':'35',
'Windd':'50',
'SLP':'1027.1',
'Altimeter':'30.36',
'Weather':'',
'Visibility':'10.00',
'Wx':'',
'Clouds':'CLR',
[...]
So, to me, it looks like its got a defined variable stations equal to a dictionary of dictionaries containing the stations and their variables. My question is how do I access this data. Right now I am trying:
import urllib
response = urrllib.urlopen(url)
r = response.read()
If I try to use the JSON module, it clearly fails because this isn't json. And if I just try to read the file, it comes back with a long string of characters. Any suggestions on how to extract this data? If possible, I would just like to get the dictionary as it exists in the url return, ie stations={...} Thanks!
See, As far I infer from the question, I assume that you have data in the form of text which in not a valid JSON data, So given we have a text like: line = "stations={'KAPC':{'id':'KAPC', 'stnid':'92', 'name':'Napa, Napa County Airport'}}" (say), then we can extract the dictionary by splitting it at the = symbol and then use the eval() method which initializes the dictionary variable with the required data.
dictionary_text = line.split("=")[1]
python_dictionary = eval(dictionary_text)
print python_dictionary
>>> {'KAPC': {'id': 'KAPC', 'name': 'Napa, Napa County Airport', 'stnid': '92'}}
The python_dictionary now behaves like a Python Dictionary with key, value pairs , and you can access any attribute using python_dictionary["KAPC"]["id"]

Categories

Resources