import pandas as pd
from ast import literal_eval
from cdqa.utils.filters import filter_paragraphs
from cdqa.utils.download import download_model, download_bnpp_data
from cdqa.pipeline.cdqa_sklearn import QAPipeline
# Download data and models
download_bnpp_data(dir='./data/bnpp_newsroom_v1.1/')
download_model(model='bert-squad_1.1', dir='./models')
# Loading data and filtering / preprocessing the documents
df = pd.read_csv('data/bnpp_newsroom_v1.1/bnpp_newsroom-v1.1.csv', converters={'paragraphs': literal_eval})
df = filter_paragraphs(df)
# Loading QAPipeline with CPU version of BERT Reader pretrained on SQuAD 1.1
cdqa_pipeline = QAPipeline(reader='C:/models/bert_qa.joblib')
# Fitting the retriever to the list of documents in the dataframe
cdqa_pipeline.fit_retriever(X=df)
I was trying to load model but the cdqa_pipeline = QAPipeline(reader='C:/models/bert_qa.joblib') line throws an error saying AttributeError: module 'transformers.modeling_bert' has no attribute 'gelu'
I was using transformers version 3.5
from transformers.activations import gelu
I ended up using a different version. Issue was I had a requirement for this version so I changed other parts of my program too.
Related
I am trying to learn sentiment analysis using vaderSentiment. For some reason, the when I create a query, I am getting the above error. I've checked the documentation, and there is no mention of a Utc attribute being required. Here is the basic code I am using:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import twint
analyzer = SentimentIntensityAnalyzer()
c = twint.Config()
c.Search = "Tesla"
c.Store_object = True
c.Since = "2023-01-20"
twint.run.Search(c)
for tweet in twint.output.tweets_list:
print(tweet.tweet)
scores = analyzer.polarity_scores(tweet.tweet)
print(scores)
I've had a great deal of issues trying to get twint to work with my system, so it's highly probable I've not installed it correctly.
I am trying to deploy an image classification model on server using FastAPI.
As such, I have two issues related to my code.
The first issue is that in the original code (without using FastAPI), I would read an image using OpenCV and then convert it from BGR to RGB. Not doing this conversion would give me inaccurate results at test time.
Using FastAPI, the image is being read as follows:
def read_image(payload):
stream=BytesIO(payload)
image=np.asarray(bytearray(stream.read()),dtype="uint8")
image=cv2.imdecode(image,cv2.IMREAD_COLOR)
if isinstance(image,np.ndarray):
img=Image.fromarray(image)
return img
The second issue I am facing is with the POST method when I run the server, and accessing the URL
http:127.0.0.1:9999/, the GET method is running, which prints the following message:
Welcome to classification server
However, when I execute the post method shown below:
#app.post("/classify/")
async def classify_image(file:UploadFile=File(...)):
#return "File Uploaded."
image_byte=await file.read()
return classify(image_byte)
When I go to the link http:127.0.0.1:9999/classify/ I end up recieving the error:
method not allowed
Any reasons on why this is happening and what can be done to fix the error?
The full code is listed below. If there are any errors that I am missing in this, please let me know. I am new to FastAPI and as such, I am really confused about this.
from fastapi import FastAPI, UploadFile, File
import uvicorn
import torch
import torchvision
from torchvision import transforms as T
from PIL import Image
from build_effnet import build_model
import torch.nn.functional as F
import io
from io import BytesIO
import numpy as np
import cv2
app = FastAPI()
class_name = ['F_AF', 'F_AS', 'F_CA', 'F_LA', 'M_AF', 'M_AS', 'M_CA', 'M_LA']
idx_to_class = {i: j for i, j in enumerate(class_name)}
class_to_idx = {value: key for key, value in idx_to_class.items()}
test_transform = T.Compose([
#T.Resize(size=(224,224)), # Resizing the image to be 224 by 224
#T.RandomRotation(degrees=(-20,+20)), #NO need for validation
T.ToTensor(), #converting the dimension from (height,weight,channel) to (channel,height,weight) convention of PyTorch
T.Normalize([0.485,0.456,0.406],[0.229,0.224,0.225]) # Normalize by 3 means 3 StD's of the image net, 3 channels
])
#Load model
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model = build_model()
model.load_state_dict(torch.load(
'CharacterClass_effnet_SGD.pt', map_location='cpu'))
model.eval()
model.to(device)
def read_image(payload):
stream=BytesIO(payload)
image=np.asarray(bytearray(stream.read()),dtype="uint8")
image=cv2.imdecode(image,cv2.IMREAD_COLOR)
if isinstance(image,np.ndarray):
img=Image.fromarray(image)
return img
def classify(payload):
img=read_image(payload)
img=test_transform(img)
with torch.no_grad():
ps=model(img.unsqueeze(0))
ps=F.softmax(ps,dim=1)
topk,topclass=ps.topk(1,dim=1)
x=topclass.view(-1).cpu().view()
return idx_to_class[x[0]]
#app.get("/")
def get():
return "Welcome to classification server."
#app.post("/classify/")
async def classify_image(file:UploadFile=File(...)):
#return "File Uploaded."
image_byte=await file.read()
return classify(image_byte)
Your code has defined the classify route on POST requests. Your browser will only perform GET requests from the url
If you expect the browser to work, use #app.get("/classify/"). However, you'll need a different way to provide the file argument, which as a query parameter and a file path (not recommended for security reasons).
If you want POST requests to work, test your code with curl or Postman
I had this error while using AraBERT,
from arabert.preprocess import ArabertPreprocessor
model_name = "bert-base-arabertv2"
arabert_prep = ArabertPreprocessor(model_name=model_name, keep_emojis=False)
text = "ولن نبالغ إذا قلنا إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"
arabert_prep.preprocess(text)
It might be that farasapy is required as per the docs, so try to install first.
It is recommended to apply our preprocessing function before training/testing on any dataset. Install farasapy to segment text for AraBERT v1 & v2 pip install farasapy
I am developing an object detection model to detect ships using YOLO. I want to use the COCO dataset. Is there a way to download only the images that have ships with the annotations?
To download images from a specific category, you can use the COCO API. Here's a demo notebook going through this and other usages. The overall process is as follows:
Install pycocotools
Download one of the annotations jsons from the COCO dataset
Now here's an example on how we could download a subset of the images containing a person and saving it in a local file:
from pycocotools.coco import COCO
import requests
# instantiate COCO specifying the annotations json path
coco = COCO('...path_to_annotations/instances_train2014.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)
Which returns a list of dictionaries with basic information on the images and its url. We can now use requests to GET the images and write them into a local folder:
# Save the images into a local folder
for im in images:
img_data = requests.get(im['coco_url']).content
with open('...path_saved_ims/coco_person/' + im['file_name'], 'wb') as handler:
handler.write(img_data)
Note that this will save all images from the specified category. So you might want to slice the images list to the first n.
From what I personally know, if you're talking about the COCO dataset only, I don't think they have a category for "ships". The closest category they have is "boat". Here's the link to check the available categories: http://cocodataset.org/#overview
BTW, there are ships inside the boat category too.
If you want to just select images of a specific COCO category, you might want to do something like this (taken and edited from COCO's official demos):
# display COCO categories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))
# get all images containing given categories (I'm selecting the "bird")
catIds = coco.getCatIds(catNms=['bird']);
imgIds = coco.getImgIds(catIds=catIds);
Nowadays there is a package called fiftyone with which you could download the MS COCO dataset and get the annotations for specific classes only. More information about installation can be found at https://github.com/voxel51/fiftyone#installation.
Once you have the package installed, simply run the following to get say the "person" and "car" classes:
import fiftyone.zoo as foz
# To download the COCO dataset for only the "person" and "car" classes
dataset = foz.load_zoo_dataset(
"coco-2017",
split="train",
label_types=["detections", "segmentations"],
classes=["person", "car"],
# max_samples=50,
)
If desired, you can comment out the last option to set a maximum samples size. Moreover, you can change the "train" split to "validation" in order to obtain the validation split instead.
To visualize the dataset downloaded, simply run the following:
# Visualize the dataset in the FiftyOne App
import fiftyone as fo
session = fo.launch_app(dataset)
If you would like to download the splits "train", "validation", and "test" in the same function call of the data to be loaded, you could do the following:
dataset = foz.load_zoo_dataset(
"coco-2017",
splits=["train", "validation", "test"],
label_types=["detections", "segmentations"],
classes=["person"],
# max_samples=50,
)
I tried the code that #yatu and #Tim had shared here, but I got lots of requests.exceptions.ConnectionError: HTTPSConnectionPool.
So after carefully reading this answer to Max retries exceeded with URL in requests, I rewrote the code like this one and now it runs smoothly:
from pycocotools.coco import COCO
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import requests
from tqdm.notebook import tqdm
# instantiate COCO specifying the annotations json path
coco = COCO('annotations/instances_train2017.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)
# handle annotations
ANNOTATIONS = {"info": {
"description": "my-project-name"
}
}
def cocoJson(images: list) -> dict:
arrayIds = np.array([k["id"] for k in images])
annIds = coco.getAnnIds(imgIds=arrayIds, catIds=catIds, iscrowd=None)
anns = coco.loadAnns(annIds)
for k in anns:
k["category_id"] = catIds.index(k["category_id"])+1
catS = [{'id': int(value), 'name': key}
for key, value in categories.items()]
ANNOTATIONS["images"] = images
ANNOTATIONS["annotations"] = anns
ANNOTATIONS["categories"] = catS
return ANNOTATIONS
def createJson(JsonFile: json, label='train') -> None:
name = label
Path("data/labels").mkdir(parents=True, exist_ok=True)
with open(f"data/labels/{name}.json", "w") as outfile:
json.dump(JsonFile, outfile)
def downloadImages(images: list) -> None:
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
for im in tqdm(images):
if not isfile(f"data/images/{im['file_name']}"):
img_data = session.get(im['coco_url']).content
with open('data/images/' + im['file_name'], 'wb') as handler:
handler.write(img_data)
trainSet = cocoJson(images)
createJson(trainSet)
downloadImages(images)
On my side I had recent difficulties installing fiftyone with Apple Silicon Mac (M1), so I created a script based on pycocotools that allows me to quickly download a subset of the coco 2017 dataset (images and annotations).
It is very simple to use, details are available here: https://github.com/tikitong/minicoco , hope this helps.
Im having troubles running the following code:
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(BQJSONKEY,project = BQPROJECT)
dataset = client.dataset(BQDATASET)
assert not dataset.exists()
The following error pop up:
'DatasetReference' object has no attribute 'exists'
Similarly when i do:
table = dataset.table(BQTABLE)
i get: 'TableReference' object has no attribute 'exists'
However, according to the docs it should work:
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery/usage.html#datasets
here is my pip freeze (the part with google-cloud):
gapic-google-cloud-datastore-v1==0.15.3
gapic-google-cloud-error-reporting-v1beta1==0.15.3
gapic-google-cloud-logging-v2==0.91.3
gevent==1.2.2
glob2==0.5
gmpy2==2.0.8
google-api-core==0.1.1
google-auth==1.2.1
google-cloud==0.30.0
google-cloud-bigquery==0.28.0
google-cloud-bigtable==0.28.1
google-cloud-core==0.28.0
google-cloud-datastore==1.4.0
google-cloud-dns==0.28.0
google-cloud-error-reporting==0.28.0
google-cloud-firestore==0.28.0
google-cloud-language==1.0.0
google-cloud-logging==1.4.0
google-cloud-monitoring==0.28.0
google-cloud-pubsub==0.29.1
google-cloud-resource-manager==0.28.0
google-cloud-runtimeconfig==0.28.0
google-cloud-spanner==0.29.0
google-cloud-speech==0.30.0
google-cloud-storage==1.6.0
google-cloud-trace==0.16.0
google-cloud-translate==1.3.0
google-cloud-videointelligence==0.28.0
google-cloud-vision==0.28.0
google-gax==0.15.16
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3
I wonder how can i fix it and make it work?
Not sure how you got to this docs but you should be using these as reference:
https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#datasets
Code for 0.28 would be something like:
dataset_refence = client.dataset(BQDATASET)
dataset = client.get_dataset(dataset_reference)
assert dataset.created is not None
I think you forgot to create the dataset before calling exists()
dataset = client.dataset(BQDATASET)
dataset.create()
assert not dataset.exists()