Create a file throught code in Kubernetes - python

I need (or at least I think I need) to create a file (could be a temp file but for now it does not work while I was testing it) where I can copy a file stored in google cloud storage.
This file is a geojson file and after load the file i will read it using geopandas.
The code will be run it inside a Kubernete in Google Cloud
The code:
def geoalarm(self,input):
from shapely.geometry import Point
import uuid
from google.cloud import storage
import geopandas as gpd
fp = open("XXX.geojson", "w+")
storage_client = storage.Client()
bucket = storage_client.get_bucket('YYY')
blob = bucket.get_blob('ZZZ.geojson')
blob.download_to_file(fp)
fp.seek(0)
PAIS = gpd.read_file(fp.name)
(dictionaryframe,_)=input
try:
place = Point((float(dictionaryframe["lon"])/100000), (float(dictionaryframe["lat"]) / 100000))
<...>
The questions are:
How could I create the file in kubernetes?
Or, how could I use the content of the file as string (if I use download_as_string) in geopandas to do the equivalent of geopanda.read_file(name)?
Extra
I tried using:
PAIS = gpd.read_file("gs://bucket/xxx.geojson")
But I have the following error:
DriverError: '/vsigs/bucket/xxx.geojson' does not exist in the file system, and is not recognized as a supported dataset name.

A VERY general overview of the pattern:
You can start by putting the code on a git repository. On Kubernetes, create a deployment/pod with the ubuntu image and make sure you install python, your python dependencies and pull your code in an initialization script, with the final line invoking python to run your code. In the "command" attribute of your pod template, you should use /bin/bash to run your script. Assuming you have the correct credentials, you will be able to grab the file from Google Storage and process it. To debug, you can attach to the running container using "kubectl exec".
Hope this helps!

A solution to avoid to create a file in kubernetes:
storage_client = storage.Client()
bucket = storage_client.get_bucket('buckte')
blob = bucket.get_blob('file.geojson')
string = blob.download_as_string()
PAIS = gpd.GeoDataFrame.from_features(json.loads(string))

Related

How to deal with GSutil URI not working all the time

I am facing a little issue here that I can't explain.
On some occasions, I am able to open files from my cloud storage buckets using a GSutil URI. For instance this one works fine
df = pd.read_csv('gs://poker030120203/ouptut_test.csv')
But on some other occasions, this method does not work & returns an error FileNotFoundError: [Errno 2] No such file or directory
This happens for instance with the following codes
rank_table_filename = 'gs://poker030120203/rank_table.bin'
rank_table_file = open(rank_table_filename, "r")
preflop_table_filename = 'gs://poker030120203/preflop_table.npy'
self.preflop_table = np.load(preflop_table_filename)
I am not sure if this is related to the "open" or "load" methode, or maybe the file type, but I can't figure out why this return an error. I do not know if this has an impact on that matter, but I'm running everything from Vertex (ie. the AI module that automatically sets up a storage bucket / a VM and a jupyter notebook).
Thanks a lot for the help
In order to read and write the file from the google cloud storage, you can use google recommended methods. It's easier to use google client libraries to read / write anything from / in Google Cloud Storage.
From the doc Example:
from google.cloud import storage
def write_read(bucket_name, blob_name):
"""Write and read a blob from GCS using file-like IO"""
# The ID of your GCS bucket
# bucket_name = "your-bucket-name"
# The ID of your new GCS object
# blob_name = "storage-object-name"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(blob_name)
# Mode can be specified as wb/rb for bytes mode.
# See: https://docs.python.org/3/library/io.html
with blob.open("w") as f:
f.write("Hello world")
with blob.open("r") as f:
print(f.read())

How to interact (read/write/delete) files in S3 bucket directly through streamlit webapp?

I am trying to load a file to s3 bucket through streamlit web UI. Also next step is to read or delete files present in s3 bucket through streamlit app in an interactive way .. I have written a code to upload file on streamlit app but failing in loading it to s3 bucket.
We have a pretty detailed doc on this. You can use s3fs to make the connection, e.g.:
# streamlit_app.py
import streamlit as st
import s3fs
import os
# Create connection object.
# `anon=False` means not anonymous, i.e. it uses access keys to pull data.
fs = s3fs.S3FileSystem(anon=False)
# Retrieve file contents.
# Uses st.experimental_memo to only rerun when the query changes or after 10 min.
#st.experimental_memo(ttl=600)
def read_file(filename):
with fs.open(filename) as f:
return f.read().decode("utf-8")
content = read_file("testbucket-jrieke/myfile.csv")
# Print results.
for line in content.strip().split("\n"):
name, pet = line.split(",")
st.write(f"{name} has a :{pet}:")

Azure Grab Data from Blob Storage w. Python (No downloading)

I'm trying to open a series of different cracked documents / texts that we've stored in Azure Blob storage, ideally pushing them all into a pandas db. I do not want to download them (I'm going to be opening them from a Docker Container), I just want to store the information in memory.
The file structure looks like: Azure Blob Storage -> MyContainer -> UUIDFolderNames (many) -> 1 "knowledge.json" file in each Folder.
What I've got working:
container = ContainerClient.from_connection_string( <my connection str>, <MyContainer> )
blob_list = container.list_blobs()
for blob in blob_list:
blobClient = container.get_blob_client( blob ) #Not sure this is needed
Ideally for each item in my for loop, I'd do something like opening the .json file, then adding it's text to a row in my dataframe. However, I can't actually manage to open any of the JSON files.
What I've tried:
#1
name = blob.name
json.loads( name )
#2
with open(name, 'r') as f:
data = json.load( f )
Errors:
#1 Json Decoder Error Expecting Value: line 1 column 1 (char 0)
#2: No such file or directory
I've tried other sillier things like json.loads( blob ) or json.loads('knowledge.json') (no folder name in path), but those are kinda nonsensicle things that I was just trying to see if they worked, they're not exactly reasonable.
Most methods (including on Azure's documentation) download the file first, but again, I don't want to download the file.
*Edit: I realized that its somewhat obvious why the file's cannot be found - json.load etc will look in my local directory / where I'm running the python file from, rather than the blob location. Still, not sure how to load a file w.o downloading it.
With the help of the below block you will be able to view the JSON blob:
for blobs in container_client.list_blobs():
blob_client = service_client.get_blob_client(container=Container_name, blob=blobs)
content = blob_client.download_blob()
contentastext = content.readall()
print(contentastext)
Below is the full code to read JSON files from blobs, later you can add this data to your dataframes:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient,PublicAccess
import os
import logging
import sys
import azure.functions as func
from azure.storage import blob
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
def UploadFiles():
CONNECTION_STRING="ENTER_CONNECTION_STR"
Container_name="gatherblobs"
service_client=BlobServiceClient.from_connection_string(CONNECTION_STRING)
container_client = service_client.get_container_client(Container_name)
for blobs in container_client.list_blobs():
blob_client = service_client.get_blob_client(container=Container_name, blob=blobs)
content = blob_client.download_blob()
contentastext = content.readall()
print(contentastext)
if __name__ == '__main__':
UploadFiles()

How to process a file located in Azure blob Storage using python with pandas read_fwf function

I need to open and work on data coming in a text file with python.
The file will be stored in the Azure Blob storage or Azure file share.
However, my question is can I use the same modules and functions like os.chdir() and read_fwf() I was using in windows? The code I wanted to run:
import pandas as pd
import os
os.chdir( file_path)
df=pd.read_fwf(filename)
I want to be able to run this code and file_path would be a directory in Azure blob.
Please let me know if it's possible. If you have a better idea where the file can be stored please share.
Thanks,
As far as I know, os.chdir(path) can only operate on local files. If you want to move files from storage to local, you can refer to the following code:
connect_str = "<your-connection-string>"
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name = "<container-name>"
file_name = "<blob-name>"
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(file_name)
download_file_path = "<local-path>"
with open(download_file_path, "wb") as download_file:
download_file.write(blob_client.download_blob().readall())
pandas.read_fwf can read blob directly from storage using URL:
For example:
url = "https://<your-account>.blob.core.windows.net/test/test.txt?<sas-token>"
df=pd.read_fwf(url)

Extract particular file from zip blob stored in azure container with python using Jupyter notebook

I had uploaded zip file in my azure account as a blob in azure container.
Zip file contains .csv, .ascii files and many other formats.
I need to read specific file, lets say ascii file data containing in zip file. I am using python for this case.
How to read particular file data from this zip file without downloading it on local? I would like to handle this process in memory only.
I am also trying with jypyter notebook provided by azure for ML functionality
I am using ZipFile python package for this case.
Request you to assist in this matter to read the file
Please find following code snippet.
blob_service=BlockBlobService(account_name=ACCOUNT_NAME,account_key=ACCOUNT_KEY)
blob_list=blob_service.list_blobs(CONTAINER_NAME)
allBlobs = []
for blob in blob_list:
allBlobs.append(blob.name)
sampleZipFile = allBlobs[0]
print(sampleZipFile)
The below code should work. This example accesses an Azure Container using an Account URL and Key combination.
from azure.storage.blob import BlobServiceClient
from io import BytesIO
from zipfile import ZipFile
key = r'my_key'
service = BlobServiceClient(account_url="my_account_url",
credential=key
)
container_client = service.get_container_client('container_name')
zipfilename = 'myzipfile.zip'
blob_data = container_client.download_blob(zipfilename)
blob_bytes = blob_data.content_as_bytes()
inmem = BytesIO(blob_bytes)
myzip = ZipFile(inmem)
otherfilename = 'mycontainedfile.csv'
filetoread = BytesIO(myzip.read(otherfilename))
Now all you have to do is pass filetoread into whatever method you would normally use to read a local file (eg. pandas.read_csv())
you could use below code for reading file inside .zip file without extracting in python
import zipfile
archive = zipfile.ZipFile('images.zip', 'r')
imgdata = archive.read('img_01.png')
For details , you can refer to ZipFile docs here
Alternatively, you can do something like this
-- coding: utf-8 --
"""
Created on Mon Apr 1 11:14:56 2019
#author: moverm
"""
import zipfile
zfile = zipfile.ZipFile('C:\\LAB\Pyt\sample.zip')
for finfo in zfile.infolist():
ifile = zfile.open(finfo)
line_list = ifile.readlines()
print(line_list)
Here is the output for the same
Hope it helps.

Categories

Resources