Why I can't download and use OGB (Open Graph Benchmark)?

Why I can't download and use OGB (Open Graph Benchmark)? - python

update: This code can just cause the perpetual running. Even if I don't add any other code ?
from ogb.nodeproppred import PygNodePropPredDataset
Here is my code, and I want to download the OGB.
import torch_geometric.transforms as T
from ogb.nodeproppred import PygNodePropPredDataset
dataset_name = 'ogbn-arxiv'
dataset = PygNodePropPredDataset(name=dataset_name,
transform=T.ToSparseTensor())
print('The {} dataset has {} graph'.format(dataset_name, len(dataset)))
# Extract the graph
data = dataset[0]
print(data)
But when I run this code, it just keep the state of running and output nothing.
I think I've already match the requirement which shows in OGB website.
I use windows11 and pycharm.

If you want to download the OGB dataset, you should uninstall the "outdated" package, as it seems there are some conflicts among the package. For more details, please read the OGB github issues.

Related

Read OpenAir File using Python GDAL

I need to read OpenAir files in Python.
According to the following vector driver description, GDAL has built-in OpenAir functionality:
https://gdal.org/drivers/vector/openair.html
However there is no example code for reading such OpenAir files.
So far I have tried to read a sample file using the following lines:
from osgeo import gdal
airspace = gdal.Open('export.txt')
However it returns me the following error:
ERROR 4: `export.txt' not recognized as a supported file format.
I already looked at vectorio however no OpenAir functionality has been implemented.
Why do I get the error above?
In case anyone wants to reproduce the problem: sample OpenAir files can easily be generated using XContest:
https://airspace.xcontest.org/

Since you're dealing with vector data, you need to use ogr instead of gdal (it's normally packaged along with gdal)
So you can do:
from osgeo import ogr
ds = ogr.Open('export.txt')
layer = ds.GetLayer(0)
featureCount = layer.GetFeatureCount()
print(featureCount)
There's plenty of info out there on using ogr, but this cookbook might be helpful.

Does a technical solution exist to open a .mpr file in python?

I have to read informations from a .mpr file (in order to complete a dataset). Does anyone know how it works ?
I tried with pandas, open(), but on the net i got anything ..
Thanks a lot !

There's a package on GitHub called galvani that you can use. Install from source (it seems that their pip install galvani is not updated)
Then simply do:
from galvani import BioLogic as BL
import pandas as pd
mpr = BL.MPRfile('path_to_your.mpr')
df = pd.DataFrame(mpr.data)
df.head()
You will see your data

ERROR:zygote_host_impl_linux.cc(89) - Chartify

I'm trying the new library(Chartify) provided by Spotify Team. On running the code below, I'm receiving the following error:
import chartify
import pandas as pd
file = "./data/Social_Network_Ads.csv"
data = pd.read_csv(file, sep = ',')
chart = chartify.Chart(blank_labels=True, y_axis_type='categorical', x_axis_type='linear')
chart.plot.scatter(
data_frame=data,
categorical_columns='Gender',
numeric_column='EstimatedSalary',
color_column='EstimatedSalary')
chart.style.color_palette.reset_palette_order()
chart.set_title("Scatter Plot w.r.t. Salaries of different Gender")
chart.set_subtitle("Labels for specific observations.")
chart.show()
[9643:9643:1127/175201.738360:ERROR:zygote_host_impl_linux.cc(89)] Running as root without --no-sandbox is not supported. See https://crbug.com/638180.
The HTML is being created though, but on opening the HTML, It gives a blank page.

An old question, but if you are facing similar issues... That message is related to the OS while executing X tool.
As a workaround, this helped me to solve the issue on a CentOS 7.7 while trying to execute different binaries!
export QTWEBENGINE_DISABLE_SANDBOX=1

Google drive python api: export never completes.

Summary:
I have an issue where sometimes a the google-drive-sdk for python does not detect the end of the document being exported. It seems to think that the google document is of infinite size.
Background, source code and tutorials I followed:
I am working on my own python based google-drive backup script (one with a nice CLI interface for browsing around). git link for source code
Its still in the making and currently only finds new files and downloads them (with 'pull' command).
To do the most important google-drive commands, I followed the official google drive api tutorials for downloading media. here
What works:
When a document or file is a non-google-docs document, the file is downloaded properly. However, when I try to "export" a file. I see that I need to use a different mimeType. I have a dictionary for this.
For example: I map application/vnd.google-apps.document to application/vnd.openxmlformats-officedocument.wordprocessingml.document when exporting a document.
When downloading google documents documents from google drive, this seems to work fine. By this I mean: my while loop with the code status, done = downloader.next_chunk() will eventual set done to true and the download completes.
What does not work:
However, on some files, the done flag never gets to true and script will download forever. This eventually amounts to several Gb. Perhaps I am looking for the wrong flag that says the file is complete when doing an export. I am surprised that google-drive never throws an error. Anybody know what could cause this?
Current status
For now I have exporting of google documents disabled in my code.
When I use scripts like "drive by rakyll" (at least the version I have) just puts a link to the online copy. I would really like to do a proper export so that my offline system can maintain a complete backup of everything on drive.
P.s. It's fine to put "you should use this service instead of the api" for the sake of others finding this page. I know that there are other services out there for this, but I'm really looking to explore the drive-api functions for integration with my own other systems.

OK. I found a pseudo solution here.
The problem is that the Google API never returns the Content-Length and the response is done in Chunks. However, either the chunk returned is wrong, or the Python API is not able to process it correctly.
What I did was, grab the code for the MediaIoBaseDownload from here
I left all the same, but changed this part:
if 'content-range' in resp:
content_range = resp['content-range']
length = content_range.rsplit('/', 1)[1]
self._total_size = int(length)
elif 'content-length' in resp:
self._total_size = int(resp['content-length'])
else:
# PSEUDO BUG FIX: No content-length, no chunk info, cut the response here.
self._total_size = self._progress
The else at the end is what I've added. I've also changed the default chunk size by setting DEFAULT_CHUNK_SIZE = 2*1024*1024. Also you will have to copy a few imports from that file, including this one from googleapiclient.http import _retry_request, _should_retry_response
Of course this is not a solution, it just says "if I don't understand the response, just stop it here". This will probably make some exports not work, but at least it doesn't kill the server. This is only until we can find a good solution.
UPDATE:
Bug is already reported here: https://github.com/google/google-api-python-client/issues/15
and as of January 2017, the only workaround is to not use MediaIoBaseDownload and do this instead (not suitable to large files):
req = service.files().export(fileId=file_id, mimeType=mimeType)
resp = req.execute(http=http)

I'm using this and it's works with the following library:
google-auth-oauthlib==0.4.1
google-api-python-client
google-auth-httplib2
This is the snippet I'm using:
from apiclient import errors
from googleapiclient.http import MediaIoBaseDownload
from googleapiclient.discovery import build
def download_google_document_from_drive(self, file_id):
try:
request = self.service.files().get_media(fileId=file_id)
fh = io.BytesIO()
downloader = MediaIoBaseDownload(fh, request)
done = False
while done is False:
status, done = downloader.next_chunk()
print('Download %d%%.' % int(status.progress() * 100))
return fh
except Exception as e:
print('Error downloading file from Google Drive: %s' % e)
You can write the file stream to a file:
import xlrd
workbook = xlrd.open_workbook(file_contents=fh.getvalue())

Tensorflow examples all fail due to AttributeError: 'module' object has no attribute 'datasets'

I have built v0.8.0 of tensorflow using pip install, but when I try any of the skflow examples, they all fail due to
AttributeError: 'module' object has no attribute 'datasets'
Which is as a result of this
from tensorflow.contrib import learn
### Training data
# Downloads, unpacks and reads DBpedia dataset.
dbpedia = learn.datasets.load_dataset('dbpedia')

Several people have encountered this. Please install latest version, .e.g. one of the recent nightly builds.
run this from the command line
pip3 install --upgrade http://ci.tensorflow.org/view/Nightly/job/nightly-matrix-cpu/TF_BUILD_CONTAINER_TYPE=CPU,TF_BUILD_IS_OPT=OPT,TF_BUILD_IS_PIP=PIP,TF_BUILD_PYTHON_VERSION=PYTHON3,label=cpu-slave/lastSuccessfulBuild/artifact/pip_test/whl/tensorflow-0.8.0-cp34-cp34m-linux_x86_64.whl

I've found a less annoying way around this problem is to just download and load the data manually. It's quite easy, here is how I did it.
from tensorflow.contrib import learn
# Downloads, unpacks and reads DBpedia dataset.
## dbpedia = learn.datasets.load_dataset('dbpedia')
## BUT THAT ABOVE FUNCTION DOESN'T WORK SO....
## MANUALLY DOWNLOAD THE DATA FROM THIS LINK:
## https://googledrive.com/host/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M/dbpedia_csv.tar.gz
## MANUALLY UNPACK THE DATA BY DOUBLE CLICKING IT
## make sure the paths are correct
## LOAD IT LIKE YOU WOULD A REGULAR CSV FILE.
train = pandas.read_csv('dbpedia_csv/train.csv', header=None)
X_train, y_train = train[2], train[0]
test = pandas.read_csv('dbpedia_csv/test.csv', header=None)
X_test, y_test = test[2], test[0]

Hi I seem to have the same issue and traced it to the ~/skflow/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets init.py does not have dbpedia as a dataset yet the github version of it has it. I am using version 0.8.0 of tensor flow

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.