I have used the following packages
import pandas as pd
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder
from azure.kusto.data.exceptions import KustoServiceError
from azure.kusto.data.helpers import dataframe_from_result_table
I would like to see the detailed output of the results, but I followed the official tutorial and I am not sure if I am correct
x = dataframe_from_result_table(response.primary_results[0])
His results look like this
Empty DataFrame
Columns: [Resource]
Index: []
Is this result wrong or normal?
If it is normal, how do I call them? What would it look like if the executed database had output?
I want to see the content of the Resource in the specified content: Columns: [Resource], because this will have the output I want. I am using translation software, please understand
Officially, I can manipulate the data according to the python panda, but I won't be able to call out the data
kusto query results
When I use other query statements. The result shows
Empty DataFrame
Columns: [Tag,Level,Sequence,Message,Metrics]
Index: []
How do I retrieve the values of Tag, Level, Sequence, Message, Metrics from the results?
The result class looks like this
You can play with the publicly available help cluster.
Please note that the connection requires an interactive login.
A login window will pop when you execute the code.
from azure.kusto.data import KustoClient, KustoConnectionStringBuilder
from azure.kusto.data.helpers import dataframe_from_result_table
cluster = "https://help.kusto.windows.net"
db = "Samples"
query = """
StormEvents
| summarize count() by EventType
| top 5 by count_
"""
kcsb = KustoConnectionStringBuilder.with_interactive_login(cluster)
client = KustoClient(kcsb)
response = client.execute(db, query)
df = dataframe_from_result_table(response.primary_results[0])
print(df)
EventType count_
0 Thunderstorm Wind 13015
1 Hail 12711
2 Flash Flood 3688
3 Drought 3616
4 Winter Weather 3349
P.S.
Another option is to leverage the KWE (Kusto Web Explorer) experience.
Get your own free cluster and easily ingest data using OneClick.
Related
I have a data frame:
I want to automatically insert the data frame details in quip. I have searched online, but couldn't find any satisfactory answer. Please help
here is my answer based on a similar problem and this article: https://towardsdatascience.com/updating-a-quip-spreadsheet-with-python-api-1b4bb24d4aac
First, follow the step to get and use a personal access token from quip.com/dev/token--this will help with your authentication.
Then, you can get an updated client version from Lynn Zheng's medium post (linked above) https://github.com/RuolinZheng08/quip-api for local import.
My imports look like this:
import quip_update as quip #from Zheng's repo
from login_token import login_token #this is a variable that holds the value of the token I got from their auth website
Then you setup/authorize the client with the following:
quip_client = quip.QuipClient(access_token=login_token, base_url='https://platform.quip.com')
user = quip_client.get_authenticated_user()
If your company has a contract with quip, it might look like base_url='https://platform.quip-amazon.com'
I like to print(user) to see basic info/that the client connected.
Then, again mostly narrating Zheng's article, you can use one of the client functions to insert a spreadsheet:
def add_to_spreadsheet(self, thread_id, *rows, **kwargs):
'''Adds the given rows to the named (or first) spreadsheet in the
given document.
client = quip.QuipClient(...)
client.add_to_spreadsheet(thread_id, ["5/1/2014", 2.24])'''
from quip.py
So you can put a spreadsheet in manually and then call to it by name="name of spreadsheet" to incorporate rows from Pandas.
So, for example:
I used pd.read_html to try and import a table, but I'm getting a long string instead when I run it. Is there a simple way to change the format of the result to get 1 word per row rather than a long string, or should i be using a function other than pd.read_html? Thank you!
here is my code:
import requests
import pandas as pd
url ='http://www.linfo.org/acronym_list.html'
dfs = pd.read_html(url, header =0)
df = pd.concat(dfs)
df
i also used this and got the same result:
import pandas as pd
url ='http://www.linfo.org/acronym_list.html'
data = pd.read_html(url, header=0)
data[0]
Out[1]:
ABCDEFGHIJKLMNOPQRSTUVWXYZ A AMD Advanced Micro Devices API application programming interface ARP address resolution protocol ARPANET Advanced Research Projects Agency Network AS autonomous system ASCII American Standard Code for Information Interchange AT&T American Telephone and Telegraph Company ATA advanced technology attachment ATM asynchronous transfer mode B B byte BELUG Bellevue Linux Users Group BGP border gateway protocol...
The problem is how the table was created in this site.
According to https://www.w3schools.com/html/html_tables.asp, an HTML table is defined with the < table > tag. Each table row is defined with the < tr > tag. A table header is defined with the < th > tag. By default, table headings are bold and centered. A table data/cell is defined with the < td > tag.
If you press CTRL+SHIFT+I, you can inspect the html elements of your site and you will see that this site does not follow this standard. That is why you are not getting the correct dataframe using pandas.read_html.
I'm trying to access the table details to ultimately put into a dataframe and save as a csv with a limited number of rows(the dataset is massive) from the following site: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
I'm just starting out webscraping and was practicing on this dataset. I can effectively pull tags like div but when I try soup.findAll('tr') or td, it returns an empty set.
The table appears to be embedded in a different code(see link above) so that's maybe my issue, but still unsure how to access the detail rows and headers, etc..., Selenium maybe?
Thanks in advance!
By the looks of it, the website already allows you to export the data:
As it would seem, the original link is:
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
The .csv download link is:
https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD
The .json link is:
https://data.cityofchicago.org/resource/ijzp-q8t2.json
Therefore you could simply extract the ID of the data, in this case ijzp-q8t2, and replace it on the download links above. Here is the official documentation of their API.
import pandas as pd
from sodapy import Socrata
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofchicago.org", None)
# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cityofchicago.org,
# MyAppToken,
# userame="user#example.com",
# password="AFakePassword")
# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("ijzp-q8t2", limit=2000)
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
I am trying to connect knack online database with my python data handling scripts in order to renew objects/tables directly into my knack app builder. I discovered pyknackhq Python API for KnackHQ can fetch objects and return json objects for the object's records. So far so good.
However, following the documentation (http://www.wbh-doc.com.s3.amazonaws.com/pyknackhq/quick%20start.html) I have tried to fetch all rows (records in knack) for my object-table (having in total 344 records).
My code was:
i =0
for rec in undec_obj.find():
print(rec)
i=i+1
print(i)
>> 25
All first 25 records were returned indeed, however the rest until the 344-th were never returned. The documentation of pyknackhq library is relatively small so I couldn't find a way around my problem there. Is there a solution to get all my records/rows? (I have also changed the specification in knack to have all my records appear in the same page - page 1).
The ultimate goal is to take all records and make them a pandas dataframe.
thank you!
I haven't worked with that library, but I've written another python Knack API wrapper that should help:
https://github.com/cityofaustin/knackpy
The docs should get you where you want to go. Here's an example:
>>> from knackpy import Knack
# download data from knack object
# will fetch records in chunks of 1000 until all records have been downloaded
# optionally pass a rows_per_page and/or page_limit parameter to limit record count
>>> kn = Knack(
obj='object_3',
app_id='someappid',
api_key='topsecretapikey',
page_limit=10, # not needed; this is the default
rows_per_page=1000 # not needed; this is the default
)
>>> for row in kn.data:
print(row)
{'store_id': 30424, 'inspection_date': 1479448800000, 'id': '58598262bcb3437b51194040'},...
Hope that helps. Open a GitHub issue if you have any questions using the package.
I've crawled a tracklist of 36.000 songs, which have been played on the Danish national radio station P3. I want to do some statistics on how frequently each of the genres have been played within this period, so I figured the discogs API might help labeling each track with genre. However, the documentation for the API doesent seem to include an example for querying the genre of a particular song.
I have a CSV-file with with 3 columns: Artist, Title & Test(Test where i want the API to label each song with the genre).
Here's a sample of the script i've built so far:
import json
import pandas as pd
import requests
import discogs_client
d = discogs_client.Client('ExampleApplication/0.1')
d.set_consumer_key('key-here', 'secret-here')
input = pd.read_csv('Desktop/TEST.csv', encoding='utf-8',error_bad_lines=False)
df = input[['Artist', 'Title', 'Test']]
df.columns = ['Artist', 'Title','Test']
for i in range(0, len(list(df.Artist))):
x = df.Artist[i]
g = d.artist(x)
df.Test[i] = str(g)
df.to_csv('Desktop/TEST2.csv', encoding='utf-8', index=False)
This script has been working with a dummy file with 3 records in it so far, for mapping the artist of a given ID#. But as soon as the file gets larger(ex. 2000), it returns a HTTPerror when it cannot find the artist.
I have some questions regarding this approach:
1) Would you recommend using the search query function in the API for retrieving a variable as 'Genre'. Or do you think it is possible to retrieve Genre with a 'd.' function from the API?
2) Will I need to aquire an API-key? I have succesfully mapped the 3 records without an API-key so far. Looks like the key is free though.
Here's the guide I have been following:
https://github.com/discogs/discogs_client
And here's the documentation for the API:
https://www.discogs.com/developers/#page:home,header:home-quickstart
Maybe you need to re-read the discogs_client examples, i am not an expert myself, but a newbie trying to use this API.
AFAIK, g = d.artist(x) fails because x must be a integer not a string.
So you must first do a search, then get the artist id, then d.artist(artist_id)
Sorry for no providing an example, i am python newbie right now ;)
Also have you checked acoustid for
It's a probably a rate limit.
Read the status code of your response, you should find an 429 Too Many Requests
Unfortunately, if that's the case, the only solution is to add a sleep in your code to make one request per second.
Checkout the api doc:
http://www.discogs.com/developers/#page:home,header:home-rate-limiting
I found this guide:
https://github.com/neutralino1/discogs_client.
Access the api with your key and try something like:
d = discogs_client.Client('something.py', user_token=auth_token)
release = d.release(774004)
genre = release.genres
If you found a better solution please share.