Pulling Zillow Rent Data from Zillow API - python

I am playing around with the Zillow API, but I am having trouble retrieving the rent data. Currently I am using a Python Zillow wrapper, but I am not sure if it works for pulling the rent data.
This is the help page I am using for the Zillow API:
https://www.zillow.com/howto/api/GetSearchResults.htm
import pyzillow
from pyzillow.pyzillow import ZillowWrapper, GetDeepSearchResults
import pandas as pd
house = pd.read_excel('Housing_Output.xlsx')
### Login to Zillow API
address = ['123 Test Street City, State Abbreviation'] # Fill this in with an address
zip_code = ['zip code'] # fill this in with a zip code
zillow_data = ZillowWrapper(API KEY)
deep_search_response = zillow_data.get_deep_search_results(address, zip_code)
result = GetDeepSearchResults(deep_search_response)
# These API calls work, but I am not sure how to retrieve the rent data
print(result.zestimate_amount)
print(result.tax_value)
ADDING ADDITIONAL INFO:
Chapter 2 talks how to pull rent data by creating a XML function called zillowProperty. My skills going into XML aren't great, but I think I need to either:
a) import some xml package to help read it
b) save the code as an XML file and use the open function to read the file
https://www.amherst.edu/system/files/media/Comprehensive_Evaluation_-_Ningyue_Christina_Wang.pdf
I am trying to provide the code in here, but it won't let me break to the next line for some reason.

We can see that rent is not a field one can get using the pyzillow package, by looking into the attributes of your result by running dir(result), as well as the code here: Pyzillow source code.
However, thanks to the beauty of open source, you can edit the source code of this package and get the functionality you are looking for. Here is how:
First, locate where the code sits in your hard drive. Import pyzillow, and run:
pyzillow?
The File field shows this for me:
c:\programdata\anaconda3\lib\site-packages\pyzillow\__init__.py
Hence go to c:\programdata\anaconda3\lib\site-packages\pyzillow (or whatever it shows for you) and open the pyzillow.py file with a text editor.
Now we need to do two changes.
One: Inside the get_deep_search_results function, you'll see params. We need to edit that to turn the rentzestimate feature on. So change that function to:
def get_deep_search_results(self, address, zipcode):
"""
GetDeepSearchResults API
"""
url = 'http://www.zillow.com/webservice/GetDeepSearchResults.htm'
params = {
'address': address,
'citystatezip': zipcode,
'zws-id': self.api_key,
'rentzestimate': True # This is the only line we add
}
return self.get_data(url, params)
Two: Go to class GetDeepSearchResults(ZillowResults), and add the following into the attribute_mapping dictionary:
'rentzestimate_amount': 'result/rentzestimate/amount'
Voila! The customized&updated Python package now returns the Rent Zestimate! Let's try:
from pyzillow.pyzillow import ZillowWrapper, GetDeepSearchResults
address = ['11 Avenue B, Johnson City, NY']
zip_code = ['13790']
zillow_data = ZillowWrapper('X1-ZWz1835knufc3v_38l6u')
deep_search_response = zillow_data.get_deep_search_results(address, zip_code)
result = GetDeepSearchResults(deep_search_response)
print(result.rentzestimate_amount)
Which correctly returns the Rent Zestimate of $1200, which can be validated at the Zillow page of that address.

Related

Stix, Taxii, Python3, Cabby API - getting data into a format i can use

i'm using the cabby API: https://github.com/EclecticIQ/cabby
in hopes of pulling stix info through the taxii client.
I've got my python code pulling the data from www.hailataxii.com
the data is in a container.. and i can flip through it.. it looks like xml, but no xml parser will read or manipulate the data. I'd love to put each record into a dictionary, then put the data into some kind of database.. but until i find a way to access the data from the download, i'm at a loss. there is very little data or examples.
Any suggestions would be greatly appreciated.
Here is my basic code for testing:
import pprint
from cabby import create_client
HailATaxiiFeedList=[
'guest.Abuse_ch',
'guest.CyberCrime_Tracker',
'guest.EmergingThreats_rules',
'guest.Lehigh_edu',
'guest.MalwareDomainList_Hostlist',
'guest.blutmagie_de_torExits',
'guest.dataForLast_7daysOnly',
'guest.dshield_BlockList',
'guest.phishtank_com'
]
client = create_client(
'hailataxii.com',
use_https=False,
discovery_path='/taxii-discovery-service')
print (": Discover_Collections:")
services = client.discover_services()
for service in services:
print('Service type= {s.type} , address= {s.address}' .format(s=service))
print (": Get_Collections:")
collections = client.get_collections(
uri='http://hailataxii.com/taxii-data')
for collection_name in HailATaxiiFeedList:
print ("Polling :", collection_name, ".. could take a while, please be patient..")
file = open(("./iocs/"+ collection_name + ".xml"), "w")
content_blocks = client.poll(collection_name=collection_name)
count =1
for block in content_blocks:
taxii_message=block.content.decode('utf-8')
file.write(taxii_message)
count+=1
if count > 20: # just getting the 20 top objects because the lists are huge
break
file.close()
The output looks like xml, but no xml parser will touch it.
<stix:STIX_Package xmlns:cyboxCommon="http://cybox.mitre.org/common-2" xmlns:cybox="http://cybox.mitre.org/cybox-2" xmlns:cyboxVocabs="http://cybox.mitre.org/default_vocabularies-2" xmlns:marking="http://data-marking.mitre.org/Marking-1" xmlns:simpleMarking="http://data-marking.mitre.org/extensions/MarkingStructure#Simple-1" xmlns:tlpMarking="http://data-marking.mitre.org/extensions/MarkingStructure#TLP-1" xmlns:TOUMarking="http://data-marking.mitre.org/extensions/MarkingStructure#Terms_Of_Use-1" xmlns:opensource="http://hailataxii.com" xmlns:edge="http://soltra.com/" xmlns:indicator="http://stix.mitre.org/Indicator-2" xmlns:ttp="http://stix.mitre.org/TTP-1" xmlns:stixCommon="http://stix.mitre.org/common-1" xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1" xmlns:stix="http://stix.mitre.org/stix-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:taxii="http://taxii.mitre.org/messages/taxii_xml_binding-1" xmlns:taxii_11="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:tdq="http://taxii.mitre.org/query/taxii_default_query-1" id="edge:Package-a2c8f8f2-5a4d-4f0e-92be-d3fa482247d0" version="1.1.1" timestamp="2017-10-09T20:39:36.179672+00:00">
<stix:STIX_Header>
<stix:Handling>
<marking:Marking>
<marking:Controlled_Structure>../../../../descendant-or-self::node()</marking:Controlled_Structure>
<marking:Marking_Structure xsi:type="tlpMarking:TLPMarkingStructureType" color="WHITE"/>
<marking:Marking_Structure xsi:type="TOUMarking:TermsOfUseMarkingStructureType">
<TOUMarking:Terms_Of_Use>zeustracker.abuse.ch | Abuse source[https://sslbl.abuse.ch/blacklist/] - As for all abuse.ch projects, the use of the SSL Blacklist is free for both commercial and non-commercial usage without any limitation. However, if you are a commercial vendor of security software/services and you want to integrate data from the SSL Blacklist into your products / services, you will have to ask for permission first by contacting me using the contact form [http://www.abuse.ch/?page_id=4727].'
</TOUMarking:Terms_Of_Use>
</marking:Marking_Structure>
<marking:Marking_Structure xsi:type="simpleMarking:SimpleMarkingStructureType">
<simpleMarking:Statement>Unclassified (Public)</simpleMarking:Statement>
</marking:Marking_Structure>
</marking:Marking>
</stix:Handling>
</stix:STIX_Header>
<stix:Indicators>
<stix:Indicator id="opensource:indicator-00398022-0d9c-474b-b543-31b85a4f22ab" timestamp="2014-10-31T16:44:24.766014+00:00" xsi:type="indicator:IndicatorType" version="2.1.1">
<indicator:Title>ZeuS Tracker (offline)| s-k.kiev.ua/html/30/config.bin (2014-10-13) | This domain has been identified as malicious by zeustracker.abuse.ch</indicator:Title>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">Domain Watchlist</indicator:Type>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">URL Watchlist</indicator:Type>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">File Hash Watchlist</indicator:Type>
<indicator:Description>This domain s-k.kiev.ua has been identified as malicious by zeustracker.abuse.ch. For more detailed infomation about this indicator go to [CAUTION!!Read-URL-Before-Click] [https://zeustracker.abuse.ch/monitor.php?host=s-k.kiev.ua].</indicator:Description>
<indicator:Observable idref="opensource:Observable-94ead651-1df5-4cfe-b4bb-e34ce5e60224">
</indicator:Observable>
<indicator:Indicated_TTP>
<stixCommon:TTP idref="opensource:ttp-6055672f-ecfd-40ae-aa84-0b336a5accb6" xsi:type="ttp:TTPType"/>
</indicator:Indicated_TTP>
<indicator:Producer>
<stixCommon:Identity id="opensource:Identity-3066ae12-3db6-44dd-9636-6b083b6479dc">
<stixCommon:Name>zeustracker.abuse.ch</stixCommon:Name>
</stixCommon:Identity>
<stixCommon:Time>
<cyboxCommon:Produced_Time>2014-10-13T00:00:00+00:00</cyboxCommon:Produced_Time>
<cyboxCommon:Received_Time>2014-10-20T19:29:30+00:00</cyboxCommon:Received_Time>
</stixCommon:Time>
</indicator:Producer>
</stix:Indicator>
</stix:Indicators>
Any suggestions would be greatly appreciated.
The XML you're looking at is STIX. Check out: https://www.eclecticiq.com/stix-taxii. Then follow the link to the STIX website and find (right bottom) "tooling" section. You should find libraries and parsing tools to make it useful.
Alternatively, there are commercial platform available to do data processing magic. Google "Threat Intelligence Platform".
Cheers,
Joep
Founder
EclecticIQ

KeyError with Youtube API using python

This is my first time to ask something here. I've been trying to access the Youtube API to get something for an experiment I'm doing. Everything's working so far. I just wanted to ask about this very inconsistent error that I'm getting.
-----------
1
Title: All Movie Trailers of New York Comic-Con (2016) Power Rangers, John Wick 2...
Uploaded by: KinoCheck International
Uploaded on: 2016-10-12T14:43:42.000Z
Video ID: pWOH-OZQUj0
2
Title: Movieclips Trailers
Uploaded by: Movieclips Trailers
Uploaded on: 2011-04-01T18:43:14.000Z
Video ID: Traceback (most recent call last):
File "scrapeyoutube.py", line 24, in <module>
print "Video ID:\t", search_result['id']['videoId']
KeyError: 'videoId'
I tried getting the video ID ('videoID' as per documentation). But for some reason, the code works for the 1st query, and then totally flops for the 2nd one. It's weird because it's only happening for this particular element. Everything else ('description','publishedAt', etc.) is working. Here's my code:
from apiclient.discovery import build
import json
import pprint
import sys
APINAME = 'youtube'
APIVERSION = 'v3'
APIKEY = 'secret teehee'
service = build(APINAME, APIVERSION, developerKey = APIKEY)
#volumes source ('public'), search query ('androide')
searchrequest = service.search().list(q ='movie trailers', part ='id, snippet', maxResults = 25).execute()
searchcount = 0
print "-----------"
for search_result in searchrequest.get("items", []):
searchcount +=1
print searchcount
print "Title:\t", search_result['snippet']['title']
# print "Description:\t", search_result['snippet']['description']
print "Uploaded by:\t", search_result['snippet']['channelTitle']
print "Uploaded on:\t", search_result['snippet']['publishedAt']
print "Video ID:\t", search_result['id']['videoId']
Hope you guys can help me. Thanks!
Use 'get' method for result.
result['id'].get('videoId')
there are in some element no this key.
if you use square parenteces, python throw exeption keyError, but if you use 'get' method, python return None for element whitch have not key videoId
Using search() method returns channels, playlists as well together with videos in search. That might be why your problem.
I use their interactive playgrounds to learn the structure of returned JSON, functions, etc. For your question, I suggest to visit https://developers.google.com/youtube/v3/docs/search/list .
Make sure if a kind of an item is "youtube#video", then access videoId of that item.
Sample of code:
...
for index in response["items"]: # response is a JSON file I have got from API
tmp = {} # temporary dict to assert into my custom JSON
if index["id"]["kind"] == "youtube#video":
tmp["videoID"] = index["id"]["videoId"]
...
This is a part of code from my personal project I am currently working on.
Because some results to Key "ID", return:
{u'kind': u'youtube#playlist', u'playlistId': u'PLd0_QArxznVHnlvJp0ki5bpmBj4f64J7P'}
You can see, there is no key "videoId".

How to create random entries in database with python

I need to create random entries with a given sql-schema in sql with the help of python programming language.
Is there a simple way to do that or do I have to write own generators?
You could use Wikipedia as a data source. Select categories that are relevant to your schema and pick random articles from those categories.
This code accesses CatScan using requests for convenience. Maybe there is a library that can do the same (return pages in a Wikipedia category), but writing this short piece of code was easier than finding one.
choice selects a random element from a list.
from random import choice
from requests import post
def title(page):
return page['a']['title'].split('(')[0].replace('_', ' ').strip()
def category(name, depth=0):
url = 'https://tools.wmflabs.org/catscan2/catscan2.php'
payload = {
'categories': name,
'depth': depth,
'format': 'json',
'doit': 'Do it!',
}
category = post(url, data=payload).json()['*'][0]['a']['*']
return [title(page) for page in category]
first = category('Italian masculine given names')
last = category('Surnames of Italian origin')
work = category('Organized crime members by role')
for i in range(10):
print(*map(choice, (first, last, work)), sep=',')
The result:
$ python random_data.py | column -t -s,
Santino Comolli Boss
Constantino Furlan Made man
Ernesto Forlán Consigliere
Silvestro Gherardi Informant
Adelmo Mancuso Bagman
Giuliano Paganelli Made man
Renato Barberis Capobastone
Roberto Comollo Consigliere
Dario Speroni Consigliere
Gastone Pestalozzi Underboss
You could try a recently launched Python package called pydbgen.
Here is a article about it.
Introducing pydbgen: A random dataframe/database table generator
For example, this generate a .DB file which can be used with MySQL or SQLite. The resulting database table was opened in DB Browser for SQLite and looks like following,
myDB.gen_table(db_file='Testdb.DB',table_name='People',
fields=['name','city','street_address','email'])
As you can see, the db file name, the table name, and the fields can be specified.
You can also use faker.
just pip install faker
Just go through documentation and check it out

NetFlix API : "expand" not working with pyflix2

Netflix API says that this URL:
http://api-public.netflix.com/catalog/titles/movies/60036637?expand=synopsis%20cast%20directors%20awards%20similars
should work while getting title details. And this should expand all the links sent as the parameter "expand", i.e. synopsis, cast, director, awards and similars. However, in pyflix2 (a python implementation for accessing netflix API) when I do the same it doesn't work.
The code I am using is like below:
import pyflix2
netflix = pyflix2.NetflixAPIV2( 'App_ID', 'APP_KEY', 'SHARED_SECRET')
movieDetails = netflix.get_title(title["id"], "synopsis cast directors awards similars")
The get_title() function is written as follows in the pyflix2 lib (I had tweaked it to get the required functionality):
def get_title(self, id, category=None):
if id.startswith('http'):
url=id
url="%s?expand=%s" %(url,urllib.quote(category))
return self._request('get', url).json()
else:
raise NetflixError("The id should be like: http://api.netflix.com/catalog/movies/60000870")
This still returns me collapsed links which I would have got with this query:
http://api-public.netflix.com/catalog/titles/movies/60036637
I feel somewhere the parameter I am attaching is getting malformed and is being omitted. I have tried the same URL on this link :
http://kentbrewster.com/netflix-api-explorer/
And it gives me the correct response. Maybe, pyflix2 is going wrong somewhere. Has anyone worked on the same and faced this issue? Please respond ASAP.
You are providing space separated list of expand topics. It should be a comma separated list. Also for V2 api the expands look like #synopsis, #cast etc.
The uri in case of V2 should look like: http://api.netflix.com/catalog/titles/movies/493387?expand=#cast,#synopsis&v=2.0
def get_title(self, id, category=None):
if id.startswith('http'):
url=id
data["expand"] = ",".join(category)
return self._request('get', url, data).json()
else:
raise NetflixError("The id should be like: http://api.netflix.com/catalog/movies/60000870")`
(I have removed all error checks in the above code, for clarity)
Call this as:
movieDetails = netflix.get_title(title["id"], ["#synopsis", "#cast", "#directors", "#awards", "#similars"]);

Google Analytics and Python

I'm brand new at Python and I'm trying to write an extension to an app that imports GA information and parses it into MySQL. There is a shamfully sparse amount of infomation on the topic. The Google Docs only seem to have examples in JS and Java...
...I have gotten to the point where my user can authenticate into GA using SubAuth. That code is here:
import gdata.service
import gdata.analytics
from django import http
from django import shortcuts
from django.shortcuts import render_to_response
def authorize(request):
next = 'http://localhost:8000/authconfirm'
scope = 'https://www.google.com/analytics/feeds'
secure = False # set secure=True to request secure AuthSub tokens
session = False
auth_sub_url = gdata.service.GenerateAuthSubRequestUrl(next, scope, secure=secure, session=session)
return http.HttpResponseRedirect(auth_sub_url)
So, step next is getting at the data. I have found this library: (beware, UI is offensive) http://gdata-python-client.googlecode.com/svn/trunk/pydocs/gdata.analytics.html
However, I have found it difficult to navigate. It seems like I should be gdata.analytics.AnalyticsDataEntry.getDataEntry(), but I'm not sure what it is asking me to pass it.
I would love a push in the right direction. I feel I've exhausted google looking for a working example.
Thank you!!
EDIT: I have gotten farther, but my problem still isn't solved. The below method returns data (I believe).... the error I get is: "'str' object has no attribute '_BecomeChildElement'" I believe I am returning a feed? However, I don't know how to drill into it. Is there a way for me to inspect this object?
def auth_confirm(request):
gdata_service = gdata.service.GDataService('iSample_acctSample_v1.0')
feedUri='https://www.google.com/analytics/feeds/accounts/default?max-results=50'
# request feed
feed = gdata.analytics.AnalyticsDataFeed(feedUri)
print str(feed)
Maybe this post can help out. Seems like there are not Analytics specific bindings yet, so you are working with the generic gdata.
I've been using GA for a little over a year now and since about April 2009, i have used python bindings supplied in a package called python-googleanalytics by Clint Ecker et al. So far, it works quite well.
Here's where to get it: http://github.com/clintecker/python-googleanalytics.
Install it the usual way.
To use it: First, so that you don't have to manually pass in your login credentials each time you access the API, put them in a config file like so:
[Credentials]
google_account_email = youraccount#gmail.com
google_account_password = yourpassword
Name this file '.pythongoogleanalytics' and put it in your home directory.
And from an interactive prompt type:
from googleanalytics import Connection
import datetime
connection = Connection() # pass in id & pw as strings **if** not in config file
account = connection.get_account(<*your GA profile ID goes here*>)
start_date = datetime.date(2009, 12, 01)
end_data = datetime.date(2009, 12, 13)
# account object does the work, specify what data you want w/
# 'metrics' & 'dimensions'; see 'USAGE.md' file for examples
account.get_data(start_date=start_date, end_date=end_date, metrics=['visits'])
The 'get_account' method will return a python list (in above instance, bound to the variable 'account'), which contains your data.
You need 3 files within the app. client_secrets.json, analytics.dat and google_auth.py.
Create a module Query.py within the app:
class Query(object):
def __init__(self, startdate, enddate, filter, metrics):
self.startdate = startdate.strftime('%Y-%m-%d')
self.enddate = enddate.strftime('%Y-%m-%d')
self.filter = "ga:medium=" + filter
self.metrics = metrics
Example models.py: #has the following function
import google_auth
service = googleauth.initialize_service()
def total_visit(self):
object = AnalyticsData.objects.get(utm_source=self.utm_source)
trial = Query(object.date.startdate, object.date.enddate, object.utm_source, ga:sessions")
result = service.data().ga().get(ids = 'ga:<your-profile-id>', start_date = trial.startdate, end_date = trial.enddate, filters= trial.filter, metrics = trial.metrics).execute()
total_visit = result.get('rows')
<yr save command, ColumnName.object.create(data=total_visit) goes here>

Categories

Resources