I have a pandas' DataFrame like this:
listings_df = pd.DataFrame({'prices': prices,
'listing_links': listing_links,
'photo_links': photo_links,
'listing_names': listing_names})
photo_links list contains URLs to photos. Say I want to get a link straight from the dataframe and open it in webbrowser like this:
link_to_open = listings_df.loc[1:1,'photo_links']
webbrowser.open(link_to_open)
However the link does not open and I get a 404 error, because the link is stored in the dataframe (or at least printed) in a shortened version:
https://a0.muscache.com/im/pictures/70976075/b
versus the original link as it is stored in the photo_links list:
https://a0.muscache.com/im/pictures/70976075/b20d9efc_original.jpg?aki_policy=large
The question is, how do I access full link from within dataframe?
link_to_open = listings_df.loc[1:1,'photo_links'] returns Series object.
try this
link_to_open = listings_df.loc[1,'photo_links']
Related
I'm sucessfully getting a list from shareplum, and using "UpdateListItems" method to update a value.
sharepoint_site = authenticate(MY_SHAREPOINT_URL, MY_SHAREPOINT_SITE, MY_USERNAME, MY_PASSWORD)
sharepoint_list = get_sp_list(sharepoint_site , MY_SHAREPOINT_LIST)
# "Name" is the column that I am trying to update
data = [{'ID': "13", 'Name': 'Teste Python'}]
sharepoint_list.UpdateListItems(data=data, kind='Update')
time.sleep(3)
# at this point, my sharepoint online list is just the same as It was
I'm using ID as the index of certain row that I want to update (I'm trying to update the 13th element). But once I update the value, nothing happens on online sharepoint. The idea I had is that I would do some integration with other local databases, and use that data to upload certain things to a List (a csv/excel file) that is shared on Sharepoint. Currently my users are manually updating things on that sharepoint online list.
Is there any other command I should use to actually upload my new list to online sharepoint?
The 13th element doesn´t necessarily have ID=13 in SharePoint. If you remove items from a SharePoint list that ID is gone forever. Check if you got the right ID.
Also check that the ínternal name of the column (field) really is "Name". Go to list settings and click on the column and look at the URL and it will look something like this towards the end:
/_layouts/15/FldEdit.aspx?List=%7B681071B5-D6BB-4345-894A-69C2D9E27A3C%7D&Field=Prio
The internal name is what comes after Field= and it can be something else than you see when you view the list.
I'm going crazy trying to get data through an API call using request and pandas. It looks like it's nested data, but I cant get the data i need.
https://xorosoft.docs.apiary.io/#reference/sales-orders/get-sales-orders
above is the api documentation. I'm just trying to keep it simple and get the itemnumber and qtyremainingtoship, but i cant even figure out how to access the nested data. I'm trying to use DataFrame to get it, but am just lost. any help would be appreciated. i keep getting stuck at the 'Data' level.
type(json['Data'])
df = pd.DataFrame(['Data'])
df.explode('SoEstimateHeader')
df.explode('SoEstimateHeader')
Cell In [64], line 1
df.explode([0:])
^
SyntaxError: invalid syntax
I used the link to grab a sample response from the API documentation page you provided. From the code you provided it looks like you are already able to get the data and I'm assuming the you have it as a dictionary type already.
From what I can tell I don't think you should be using pandas, unless its some downstream requirement in the task you are doing. But to get the ItemNumber & QtyRemainingToShip you can use the code below.
# get the interesting part of the data out of the api response
data_list = json['Data']
#the data_list is only one element long, so grab the first element which is of type dictionary
data = data_list[0]
# the dictionary has two keys at the top level
so_estimate_header = data['SoEstimateHeader']
# similar to the data list the value associated with "SoEstimateItemLineArr" is of type list and has 1 element in it, so we grab the first & only element.
so_estimate_item_line_arr = data['SoEstimateItemLineArr'][0]
# now we can grab the pieces of information we're interested in out of the dictionary
qtyremainingtoship = so_estimate_item_line_arr["QtyRemainingToShip"]
itemnumber = so_estimate_item_line_arr["ItemNumber"]
print("QtyRemainingToShip: ", qtyremainingtoship)
print("ItemNumber: ", itemnumber)
Output
QtyRemainingToShip: 1
ItemNumber: BC
Side Note
As a side note I wouldn't name any variables json because thats also the name of a popular library in python for parsing json, so that will be confusing to future readers and will clash with the name if you end up having to import the json library.
I'm still a newbie in Python and having a hard time trying to code something.
I have a list with more than 80k URLs and this is the only thing I have in my .xls, the URLs looks like this:
https://domainexample.com/user-query/credit-card-debit-balance/
https://domainexample.com/user-query/second-invoice-current-debt/
https://domainexample.com/user-query/query-balances/
https://domainexample.com/user-query/where-is-client-portal/
https://domainexample.com/user-query/i-want-to-change-my-password/
https://domainexample.com/user-query/second-invoice-internet/
https://domainexample.com/user-query/print-payment-invoice/
I want to create a code that will read this excel and based on certain categories I already wrote, will put them in other columns.
So, whenever the code finds "paswword" it will put that URL in the column password, when it finds "user" will put the URL in the column "user".
It would look like this:
debt
https://domainexample.com/user-query/second-invoice-current-debt/
password
https://domainexample.com/user-query/i-want-to-change-my-password/
payment
https://domainexample.com/user-query/print-payment-invoice/
The code doesn't necessarily needs to change the column of the URLs, if it can create a 2nd column and write of what categories that URL belongs, it would be also great.
There is no need for the code to read the URL, just the excel file, like those URLs are simple text.
If anyone can help me, thanks a lot!
Try this where df is your dataframe, and 'url_column' is the column with all your urls
df.loc[df['url_column'] =='url.com/what-is-a-car', 'car'] = 'url.com/'+'car'
df.loc[df['url_column'] =='url.com/what-is-a-bike', 'bike'] = 'url.com/'+'bike'
df.loc[df['url_column'] =='url.com/what-is-a-van', 'van'] = 'url.com/'+'van'
I'm trying to access the table details to ultimately put into a dataframe and save as a csv with a limited number of rows(the dataset is massive) from the following site: https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
I'm just starting out webscraping and was practicing on this dataset. I can effectively pull tags like div but when I try soup.findAll('tr') or td, it returns an empty set.
The table appears to be embedded in a different code(see link above) so that's maybe my issue, but still unsure how to access the detail rows and headers, etc..., Selenium maybe?
Thanks in advance!
By the looks of it, the website already allows you to export the data:
As it would seem, the original link is:
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2/data
The .csv download link is:
https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv?accessType=DOWNLOAD
The .json link is:
https://data.cityofchicago.org/resource/ijzp-q8t2.json
Therefore you could simply extract the ID of the data, in this case ijzp-q8t2, and replace it on the download links above. Here is the official documentation of their API.
import pandas as pd
from sodapy import Socrata
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofchicago.org", None)
# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cityofchicago.org,
# MyAppToken,
# userame="user#example.com",
# password="AFakePassword")
# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("ijzp-q8t2", limit=2000)
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
Recently I am reading some stock prices database in Quandl using API call to extract the data. But I am really confused by the example I have.
import requests
api_url = 'https://www.quandl.com/api/v1/datasets/WIKI/%s.json' % stock
session = requests.Session()
session.mount('http://', requests.adapters.HTTPAdapter(max_retries=3))
raw_data = session.get(api_url)
Can anyone explain that to me?
1) for api_url, if I copy that webepage, it says 404 not found. So if I want to use other database, how do I prepare this api_usl? What does '% stock' mean?
2) here request looks like to be used to extract the data, what is the format of the raw_data? How do I know the column names? How do I extract the columns?
To expand on my comment above:
% stock is a string formatting operation, replacing %s in the preceding string with the value referenced by stock. Further details can be found here
raw_data actually references a Response object (part of the requests module - details found here
To expand on your code.
import requests
#Set the stock we are interested in, AAPL is Apple stock code
stock = 'AAPL'
#Your code
api_url = 'https://www.quandl.com/api/v1/datasets/WIKI/%s.json' % stock
session = requests.Session()
session.mount('http://', requests.adapters.HTTPAdapter(max_retries=3))
raw_data = session.get(api_url)
# Probably want to check that requests.Response is 200 - OK here
# to make sure we got the content successfully.
# requests.Response has a function to return json file as python dict
aapl_stock = raw_data.json()
# We can then look at the keys to see what we have access to
aapl_stock.keys()
# column_names Seems to be describing the individual data points
aapl_stock['column_names']
# A big list of data, lets just look at the first ten points...
aapl_stock['data'][0:10]
Edit to answer question in comment
So the aapl_stock[column_names] shows Date and Open as the first and second values respectively. This means they correspond to positions 0 and 1 in each element of the data.
Therefore to access date use aapl_stock['data'][0:10][0] (date value for first ten items) and to access the value for open use aapl_stock['data'][0:78][1] (open value for first 78 items).
To get a list of every value in the dataset, where each element is a list with values for Date and Open you could add something like aapl_date_open = aapl_stock['data'][:][0:1].
If you are new to python I seriously recommend looking at the list slice notation, a quick intro can be found here