I need to create random entries with a given sql-schema in sql with the help of python programming language.
Is there a simple way to do that or do I have to write own generators?
You could use Wikipedia as a data source. Select categories that are relevant to your schema and pick random articles from those categories.
This code accesses CatScan using requests for convenience. Maybe there is a library that can do the same (return pages in a Wikipedia category), but writing this short piece of code was easier than finding one.
choice selects a random element from a list.
from random import choice
from requests import post
def title(page):
return page['a']['title'].split('(')[0].replace('_', ' ').strip()
def category(name, depth=0):
url = 'https://tools.wmflabs.org/catscan2/catscan2.php'
payload = {
'categories': name,
'depth': depth,
'format': 'json',
'doit': 'Do it!',
}
category = post(url, data=payload).json()['*'][0]['a']['*']
return [title(page) for page in category]
first = category('Italian masculine given names')
last = category('Surnames of Italian origin')
work = category('Organized crime members by role')
for i in range(10):
print(*map(choice, (first, last, work)), sep=',')
The result:
$ python random_data.py | column -t -s,
Santino Comolli Boss
Constantino Furlan Made man
Ernesto Forlán Consigliere
Silvestro Gherardi Informant
Adelmo Mancuso Bagman
Giuliano Paganelli Made man
Renato Barberis Capobastone
Roberto Comollo Consigliere
Dario Speroni Consigliere
Gastone Pestalozzi Underboss
You could try a recently launched Python package called pydbgen.
Here is a article about it.
Introducing pydbgen: A random dataframe/database table generator
For example, this generate a .DB file which can be used with MySQL or SQLite. The resulting database table was opened in DB Browser for SQLite and looks like following,
myDB.gen_table(db_file='Testdb.DB',table_name='People',
fields=['name','city','street_address','email'])
As you can see, the db file name, the table name, and the fields can be specified.
You can also use faker.
just pip install faker
Just go through documentation and check it out
Related
I am playing around with the Zillow API, but I am having trouble retrieving the rent data. Currently I am using a Python Zillow wrapper, but I am not sure if it works for pulling the rent data.
This is the help page I am using for the Zillow API:
https://www.zillow.com/howto/api/GetSearchResults.htm
import pyzillow
from pyzillow.pyzillow import ZillowWrapper, GetDeepSearchResults
import pandas as pd
house = pd.read_excel('Housing_Output.xlsx')
### Login to Zillow API
address = ['123 Test Street City, State Abbreviation'] # Fill this in with an address
zip_code = ['zip code'] # fill this in with a zip code
zillow_data = ZillowWrapper(API KEY)
deep_search_response = zillow_data.get_deep_search_results(address, zip_code)
result = GetDeepSearchResults(deep_search_response)
# These API calls work, but I am not sure how to retrieve the rent data
print(result.zestimate_amount)
print(result.tax_value)
ADDING ADDITIONAL INFO:
Chapter 2 talks how to pull rent data by creating a XML function called zillowProperty. My skills going into XML aren't great, but I think I need to either:
a) import some xml package to help read it
b) save the code as an XML file and use the open function to read the file
https://www.amherst.edu/system/files/media/Comprehensive_Evaluation_-_Ningyue_Christina_Wang.pdf
I am trying to provide the code in here, but it won't let me break to the next line for some reason.
We can see that rent is not a field one can get using the pyzillow package, by looking into the attributes of your result by running dir(result), as well as the code here: Pyzillow source code.
However, thanks to the beauty of open source, you can edit the source code of this package and get the functionality you are looking for. Here is how:
First, locate where the code sits in your hard drive. Import pyzillow, and run:
pyzillow?
The File field shows this for me:
c:\programdata\anaconda3\lib\site-packages\pyzillow\__init__.py
Hence go to c:\programdata\anaconda3\lib\site-packages\pyzillow (or whatever it shows for you) and open the pyzillow.py file with a text editor.
Now we need to do two changes.
One: Inside the get_deep_search_results function, you'll see params. We need to edit that to turn the rentzestimate feature on. So change that function to:
def get_deep_search_results(self, address, zipcode):
"""
GetDeepSearchResults API
"""
url = 'http://www.zillow.com/webservice/GetDeepSearchResults.htm'
params = {
'address': address,
'citystatezip': zipcode,
'zws-id': self.api_key,
'rentzestimate': True # This is the only line we add
}
return self.get_data(url, params)
Two: Go to class GetDeepSearchResults(ZillowResults), and add the following into the attribute_mapping dictionary:
'rentzestimate_amount': 'result/rentzestimate/amount'
Voila! The customized&updated Python package now returns the Rent Zestimate! Let's try:
from pyzillow.pyzillow import ZillowWrapper, GetDeepSearchResults
address = ['11 Avenue B, Johnson City, NY']
zip_code = ['13790']
zillow_data = ZillowWrapper('X1-ZWz1835knufc3v_38l6u')
deep_search_response = zillow_data.get_deep_search_results(address, zip_code)
result = GetDeepSearchResults(deep_search_response)
print(result.rentzestimate_amount)
Which correctly returns the Rent Zestimate of $1200, which can be validated at the Zillow page of that address.
i'm using the cabby API: https://github.com/EclecticIQ/cabby
in hopes of pulling stix info through the taxii client.
I've got my python code pulling the data from www.hailataxii.com
the data is in a container.. and i can flip through it.. it looks like xml, but no xml parser will read or manipulate the data. I'd love to put each record into a dictionary, then put the data into some kind of database.. but until i find a way to access the data from the download, i'm at a loss. there is very little data or examples.
Any suggestions would be greatly appreciated.
Here is my basic code for testing:
import pprint
from cabby import create_client
HailATaxiiFeedList=[
'guest.Abuse_ch',
'guest.CyberCrime_Tracker',
'guest.EmergingThreats_rules',
'guest.Lehigh_edu',
'guest.MalwareDomainList_Hostlist',
'guest.blutmagie_de_torExits',
'guest.dataForLast_7daysOnly',
'guest.dshield_BlockList',
'guest.phishtank_com'
]
client = create_client(
'hailataxii.com',
use_https=False,
discovery_path='/taxii-discovery-service')
print (": Discover_Collections:")
services = client.discover_services()
for service in services:
print('Service type= {s.type} , address= {s.address}' .format(s=service))
print (": Get_Collections:")
collections = client.get_collections(
uri='http://hailataxii.com/taxii-data')
for collection_name in HailATaxiiFeedList:
print ("Polling :", collection_name, ".. could take a while, please be patient..")
file = open(("./iocs/"+ collection_name + ".xml"), "w")
content_blocks = client.poll(collection_name=collection_name)
count =1
for block in content_blocks:
taxii_message=block.content.decode('utf-8')
file.write(taxii_message)
count+=1
if count > 20: # just getting the 20 top objects because the lists are huge
break
file.close()
The output looks like xml, but no xml parser will touch it.
<stix:STIX_Package xmlns:cyboxCommon="http://cybox.mitre.org/common-2" xmlns:cybox="http://cybox.mitre.org/cybox-2" xmlns:cyboxVocabs="http://cybox.mitre.org/default_vocabularies-2" xmlns:marking="http://data-marking.mitre.org/Marking-1" xmlns:simpleMarking="http://data-marking.mitre.org/extensions/MarkingStructure#Simple-1" xmlns:tlpMarking="http://data-marking.mitre.org/extensions/MarkingStructure#TLP-1" xmlns:TOUMarking="http://data-marking.mitre.org/extensions/MarkingStructure#Terms_Of_Use-1" xmlns:opensource="http://hailataxii.com" xmlns:edge="http://soltra.com/" xmlns:indicator="http://stix.mitre.org/Indicator-2" xmlns:ttp="http://stix.mitre.org/TTP-1" xmlns:stixCommon="http://stix.mitre.org/common-1" xmlns:stixVocabs="http://stix.mitre.org/default_vocabularies-1" xmlns:stix="http://stix.mitre.org/stix-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:taxii="http://taxii.mitre.org/messages/taxii_xml_binding-1" xmlns:taxii_11="http://taxii.mitre.org/messages/taxii_xml_binding-1.1" xmlns:tdq="http://taxii.mitre.org/query/taxii_default_query-1" id="edge:Package-a2c8f8f2-5a4d-4f0e-92be-d3fa482247d0" version="1.1.1" timestamp="2017-10-09T20:39:36.179672+00:00">
<stix:STIX_Header>
<stix:Handling>
<marking:Marking>
<marking:Controlled_Structure>../../../../descendant-or-self::node()</marking:Controlled_Structure>
<marking:Marking_Structure xsi:type="tlpMarking:TLPMarkingStructureType" color="WHITE"/>
<marking:Marking_Structure xsi:type="TOUMarking:TermsOfUseMarkingStructureType">
<TOUMarking:Terms_Of_Use>zeustracker.abuse.ch | Abuse source[https://sslbl.abuse.ch/blacklist/] - As for all abuse.ch projects, the use of the SSL Blacklist is free for both commercial and non-commercial usage without any limitation. However, if you are a commercial vendor of security software/services and you want to integrate data from the SSL Blacklist into your products / services, you will have to ask for permission first by contacting me using the contact form [http://www.abuse.ch/?page_id=4727].'
</TOUMarking:Terms_Of_Use>
</marking:Marking_Structure>
<marking:Marking_Structure xsi:type="simpleMarking:SimpleMarkingStructureType">
<simpleMarking:Statement>Unclassified (Public)</simpleMarking:Statement>
</marking:Marking_Structure>
</marking:Marking>
</stix:Handling>
</stix:STIX_Header>
<stix:Indicators>
<stix:Indicator id="opensource:indicator-00398022-0d9c-474b-b543-31b85a4f22ab" timestamp="2014-10-31T16:44:24.766014+00:00" xsi:type="indicator:IndicatorType" version="2.1.1">
<indicator:Title>ZeuS Tracker (offline)| s-k.kiev.ua/html/30/config.bin (2014-10-13) | This domain has been identified as malicious by zeustracker.abuse.ch</indicator:Title>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">Domain Watchlist</indicator:Type>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">URL Watchlist</indicator:Type>
<indicator:Type xsi:type="stixVocabs:IndicatorTypeVocab-1.1">File Hash Watchlist</indicator:Type>
<indicator:Description>This domain s-k.kiev.ua has been identified as malicious by zeustracker.abuse.ch. For more detailed infomation about this indicator go to [CAUTION!!Read-URL-Before-Click] [https://zeustracker.abuse.ch/monitor.php?host=s-k.kiev.ua].</indicator:Description>
<indicator:Observable idref="opensource:Observable-94ead651-1df5-4cfe-b4bb-e34ce5e60224">
</indicator:Observable>
<indicator:Indicated_TTP>
<stixCommon:TTP idref="opensource:ttp-6055672f-ecfd-40ae-aa84-0b336a5accb6" xsi:type="ttp:TTPType"/>
</indicator:Indicated_TTP>
<indicator:Producer>
<stixCommon:Identity id="opensource:Identity-3066ae12-3db6-44dd-9636-6b083b6479dc">
<stixCommon:Name>zeustracker.abuse.ch</stixCommon:Name>
</stixCommon:Identity>
<stixCommon:Time>
<cyboxCommon:Produced_Time>2014-10-13T00:00:00+00:00</cyboxCommon:Produced_Time>
<cyboxCommon:Received_Time>2014-10-20T19:29:30+00:00</cyboxCommon:Received_Time>
</stixCommon:Time>
</indicator:Producer>
</stix:Indicator>
</stix:Indicators>
Any suggestions would be greatly appreciated.
The XML you're looking at is STIX. Check out: https://www.eclecticiq.com/stix-taxii. Then follow the link to the STIX website and find (right bottom) "tooling" section. You should find libraries and parsing tools to make it useful.
Alternatively, there are commercial platform available to do data processing magic. Google "Threat Intelligence Platform".
Cheers,
Joep
Founder
EclecticIQ
I have a JSON document in my database that I want to modify frequently from my python program, once every 25 seconds. I know how to upload a document to the database and read a document from it, but I do not know how to modify/replace a document.
This link shows the functions offered in the python module. I see the ReplaceDocument function, but it takes in a document-link. Though how can I get the document link? Where am I suppose to look for this information?
Thanks.
It sounds like you had resolved it. Just as summary, the code below.
# Query a document
query = { 'query': 'SELECT * FROM <collection name> ....'}
docs = client.QueryDocuments(coll_link, query)
doc = list(docs)[0]
# Get the document link from attribute `_self`
doc_link = doc['_self']
# Modify the document
.....
# Replace the document via document link
client.ReplaceDocument(doc_link, doc)
April 2020
If you are reading MS Azure's Quickstart guide and following a supporting git repo, note that there might be some differences.
For example,
from azure.cosmos import exceptions, CosmosClient, PartitionKey
endpoint = 'endpoint'
key = 'key'
db_name = 'cosmos-db-name'
container_name = 'container-name'
client = CosmosClient(endpoint, key)
db = client.create_database_if_not_exists(id=db_name)
container = db.create_container_if_not_exists(id=container_name, partition_key=PartitionKey(path="/.."), offer_throughput=456
...
# Replace item
container.replace_item(doc_link, doc)
When it comes to doc_link and doc, in the above case, I encountered an error when I used doc['_self']. By using the primary key of the doc, the doc is updated.
I’m implementing a service in Python that interacts with Magento through SOAP v2. So far, I’m able to get the product list doing something like this:
import suds
from suds.client import Client
wsdl_file = 'http://server/api/v2_soap?wsdl=1'
user = 'user'
password = 'password'
client = Client(wsdl_file) # load the wsdl file
session = client.service.login(user, password) # login and create a session
client.service.catalogProductList(session)
However, I’m not able to create a product, as I don’t really know what data I should send and how to send it. I know the method I have to use is catalogProductCreate, but the PHP examples shown here don’t really help me.
Any input would be appreciated.
Thank you in advance.
You are there already. I think your issue is not able to translate the PHP array of arguments to be passed into Python Key Value pairs. If thats the case this will help you a bit. As well as need to set the attribute set using catalogProductAttributeSetList before you create the product
attributeSets = client.service.catalogProductAttributeSetList(session)
#print attributeSets
# Not very sure how to get the current element from the attributeSets array. hope the below code will work
attributeSet = attributeSets[0]
product_details = [{'name':'Your Product Name'},{'description':'Product description'},{'short_description':'Product short description'},{'weight':'10'},{ 'status':'1'},{'url_key':'product-url-key'},{'url_path':'product-url-path'},{'visibility' :4},{'price':100},{'tax_class_id':1},{'categories': [2]},{'websites': [1]}]
client.service.catalogProductCreate(session , 'simple', attributeSet.set_id, 'your_product_sku',product_details)
suds.TypeNotFound: Type not found: 'productData'
Because the productData format is not right. You may want to change from
product_details = [{'name':'Your Product Name'},{'description':'Product description'}, {'short_description':'Product short description'},{'weight':'10'},{ 'status':'1'},{'url_key':'product-url-key'},{'url_path':'product-url-path'},{'visibility' :4},{'price':100},{'tax_class_id':1},{'categories': [2]},{'websites': [1]}]
to
product_details = {'name':'Your Product Name','description':'Product description', 'short_description':'Product short description','weight':'10', 'status':'1','url_key':'product-url-key','url_path':'product-url-path','visibility' :4,'price':100,'tax_class_id':1,'categories': [2],'websites': [1]}
It works for me.
I'm brand new at Python and I'm trying to write an extension to an app that imports GA information and parses it into MySQL. There is a shamfully sparse amount of infomation on the topic. The Google Docs only seem to have examples in JS and Java...
...I have gotten to the point where my user can authenticate into GA using SubAuth. That code is here:
import gdata.service
import gdata.analytics
from django import http
from django import shortcuts
from django.shortcuts import render_to_response
def authorize(request):
next = 'http://localhost:8000/authconfirm'
scope = 'https://www.google.com/analytics/feeds'
secure = False # set secure=True to request secure AuthSub tokens
session = False
auth_sub_url = gdata.service.GenerateAuthSubRequestUrl(next, scope, secure=secure, session=session)
return http.HttpResponseRedirect(auth_sub_url)
So, step next is getting at the data. I have found this library: (beware, UI is offensive) http://gdata-python-client.googlecode.com/svn/trunk/pydocs/gdata.analytics.html
However, I have found it difficult to navigate. It seems like I should be gdata.analytics.AnalyticsDataEntry.getDataEntry(), but I'm not sure what it is asking me to pass it.
I would love a push in the right direction. I feel I've exhausted google looking for a working example.
Thank you!!
EDIT: I have gotten farther, but my problem still isn't solved. The below method returns data (I believe).... the error I get is: "'str' object has no attribute '_BecomeChildElement'" I believe I am returning a feed? However, I don't know how to drill into it. Is there a way for me to inspect this object?
def auth_confirm(request):
gdata_service = gdata.service.GDataService('iSample_acctSample_v1.0')
feedUri='https://www.google.com/analytics/feeds/accounts/default?max-results=50'
# request feed
feed = gdata.analytics.AnalyticsDataFeed(feedUri)
print str(feed)
Maybe this post can help out. Seems like there are not Analytics specific bindings yet, so you are working with the generic gdata.
I've been using GA for a little over a year now and since about April 2009, i have used python bindings supplied in a package called python-googleanalytics by Clint Ecker et al. So far, it works quite well.
Here's where to get it: http://github.com/clintecker/python-googleanalytics.
Install it the usual way.
To use it: First, so that you don't have to manually pass in your login credentials each time you access the API, put them in a config file like so:
[Credentials]
google_account_email = youraccount#gmail.com
google_account_password = yourpassword
Name this file '.pythongoogleanalytics' and put it in your home directory.
And from an interactive prompt type:
from googleanalytics import Connection
import datetime
connection = Connection() # pass in id & pw as strings **if** not in config file
account = connection.get_account(<*your GA profile ID goes here*>)
start_date = datetime.date(2009, 12, 01)
end_data = datetime.date(2009, 12, 13)
# account object does the work, specify what data you want w/
# 'metrics' & 'dimensions'; see 'USAGE.md' file for examples
account.get_data(start_date=start_date, end_date=end_date, metrics=['visits'])
The 'get_account' method will return a python list (in above instance, bound to the variable 'account'), which contains your data.
You need 3 files within the app. client_secrets.json, analytics.dat and google_auth.py.
Create a module Query.py within the app:
class Query(object):
def __init__(self, startdate, enddate, filter, metrics):
self.startdate = startdate.strftime('%Y-%m-%d')
self.enddate = enddate.strftime('%Y-%m-%d')
self.filter = "ga:medium=" + filter
self.metrics = metrics
Example models.py: #has the following function
import google_auth
service = googleauth.initialize_service()
def total_visit(self):
object = AnalyticsData.objects.get(utm_source=self.utm_source)
trial = Query(object.date.startdate, object.date.enddate, object.utm_source, ga:sessions")
result = service.data().ga().get(ids = 'ga:<your-profile-id>', start_date = trial.startdate, end_date = trial.enddate, filters= trial.filter, metrics = trial.metrics).execute()
total_visit = result.get('rows')
<yr save command, ColumnName.object.create(data=total_visit) goes here>