AttributeError: 'ArabertPreprocessor' object has no attribute 'farasa_segmenter' - python

I had this error while using AraBERT,
from arabert.preprocess import ArabertPreprocessor
model_name = "bert-base-arabertv2"
arabert_prep = ArabertPreprocessor(model_name=model_name, keep_emojis=False)
text = "ولن نبالغ إذا قلنا إن هاتف أو كمبيوتر المكتب في زمننا هذا ضروري"
arabert_prep.preprocess(text)

It might be that farasapy is required as per the docs, so try to install first.
It is recommended to apply our preprocessing function before training/testing on any dataset. Install farasapy to segment text for AraBERT v1 & v2 pip install farasapy

Related

AttributeError: 'Config' object has no attribute 'Utc'

I am trying to learn sentiment analysis using vaderSentiment. For some reason, the when I create a query, I am getting the above error. I've checked the documentation, and there is no mention of a Utc attribute being required. Here is the basic code I am using:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import twint
analyzer = SentimentIntensityAnalyzer()
c = twint.Config()
c.Search = "Tesla"
c.Store_object = True
c.Since = "2023-01-20"
twint.run.Search(c)
for tweet in twint.output.tweets_list:
print(tweet.tweet)
scores = analyzer.polarity_scores(tweet.tweet)
print(scores)
I've had a great deal of issues trying to get twint to work with my system, so it's highly probable I've not installed it correctly.

Add custom properties to Excel workbook with openpyxl

With VBA, I can edit arbitrary workbook metadata like so, and it will be reflected on SharePoint:
With ThisWorkbook
.ContentTypeProperties("Property A") = 1
.ContentTypeProperties("Prop B") = “Something”
End With
Now, I am hoping to do the same with openpyxl
I can do this for properties without spaces:
wb.properties.title = 'test'
but properties with spaces won't work--I try this and the script runs, but nothing shows on SharePoint:
setattr(wb.properties, 'Project Title', 'hello')
wb.properties.__dict__['Project Number'] = '12'
This can be done with xlsxwriter
import xlsxwriter
workbook = xlsxwriter.Workbook(wb_path)
workbook.set_custom_property('Project Title', 'hello')
workbook.close()
but it will create a new workbook...
According to this https://foss.heptapod.net/openpyxl/openpyxl/-/merge_requests/384/diffs?commit_id=e00ce36aa92ae4fffa7014e460a8999681d73b8b
I could simply do
wb.custom_doc_props.add(k, v), but I'm getting no attribute custom_doc_props with the latest version (I believe), 3.0.10.
I installed the 3.2.0b1 version with pip, and now get AttributeError: 'CustomDocumentPropertyList' object has no attribute 'add'. I guess the method isn't fully implemented yet
This works with version 3.1.0 of openpyxl. You can download via pip with
python -m pip install https://foss.heptapod.net/openpyxl/openpyxl/-/archive/branch/3.1/openpyxl-branch-3.1.zip
and assign properties like so
from openpyxl.packaging.custom import (
BoolProperty,
DateTimeProperty,
FloatProperty,
IntProperty,
LinkProperty,
StringProperty,
CustomPropertyList,
)
props = CustomPropertyList()
props.append(StringProperty(name='hello world', value='foo bar'))
wb.custom_doc_props = props
wb.save(...)
Data is preserved on SharePoint. More info here: https://foss.heptapod.net/openpyxl/openpyxl/-/blob/branch/3.1/doc/workbook_custom_doc_props.rst

AttributeError: module 'transformers.modeling_bert' has no attribute 'gelu'

import pandas as pd
from ast import literal_eval
from cdqa.utils.filters import filter_paragraphs
from cdqa.utils.download import download_model, download_bnpp_data
from cdqa.pipeline.cdqa_sklearn import QAPipeline
# Download data and models
download_bnpp_data(dir='./data/bnpp_newsroom_v1.1/')
download_model(model='bert-squad_1.1', dir='./models')
# Loading data and filtering / preprocessing the documents
df = pd.read_csv('data/bnpp_newsroom_v1.1/bnpp_newsroom-v1.1.csv', converters={'paragraphs': literal_eval})
df = filter_paragraphs(df)
# Loading QAPipeline with CPU version of BERT Reader pretrained on SQuAD 1.1
cdqa_pipeline = QAPipeline(reader='C:/models/bert_qa.joblib')
# Fitting the retriever to the list of documents in the dataframe
cdqa_pipeline.fit_retriever(X=df)
I was trying to load model but the cdqa_pipeline = QAPipeline(reader='C:/models/bert_qa.joblib') line throws an error saying AttributeError: module 'transformers.modeling_bert' has no attribute 'gelu'
I was using transformers version 3.5
from transformers.activations import gelu
I ended up using a different version. Issue was I had a requirement for this version so I changed other parts of my program too.

How can I download a specific part of Coco Dataset?

I am developing an object detection model to detect ships using YOLO. I want to use the COCO dataset. Is there a way to download only the images that have ships with the annotations?
To download images from a specific category, you can use the COCO API. Here's a demo notebook going through this and other usages. The overall process is as follows:
Install pycocotools
Download one of the annotations jsons from the COCO dataset
Now here's an example on how we could download a subset of the images containing a person and saving it in a local file:
from pycocotools.coco import COCO
import requests
# instantiate COCO specifying the annotations json path
coco = COCO('...path_to_annotations/instances_train2014.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)
Which returns a list of dictionaries with basic information on the images and its url. We can now use requests to GET the images and write them into a local folder:
# Save the images into a local folder
for im in images:
img_data = requests.get(im['coco_url']).content
with open('...path_saved_ims/coco_person/' + im['file_name'], 'wb') as handler:
handler.write(img_data)
Note that this will save all images from the specified category. So you might want to slice the images list to the first n.
From what I personally know, if you're talking about the COCO dataset only, I don't think they have a category for "ships". The closest category they have is "boat". Here's the link to check the available categories: http://cocodataset.org/#overview
BTW, there are ships inside the boat category too.
If you want to just select images of a specific COCO category, you might want to do something like this (taken and edited from COCO's official demos):
# display COCO categories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))
# get all images containing given categories (I'm selecting the "bird")
catIds = coco.getCatIds(catNms=['bird']);
imgIds = coco.getImgIds(catIds=catIds);
Nowadays there is a package called fiftyone with which you could download the MS COCO dataset and get the annotations for specific classes only. More information about installation can be found at https://github.com/voxel51/fiftyone#installation.
Once you have the package installed, simply run the following to get say the "person" and "car" classes:
import fiftyone.zoo as foz
# To download the COCO dataset for only the "person" and "car" classes
dataset = foz.load_zoo_dataset(
"coco-2017",
split="train",
label_types=["detections", "segmentations"],
classes=["person", "car"],
# max_samples=50,
)
If desired, you can comment out the last option to set a maximum samples size. Moreover, you can change the "train" split to "validation" in order to obtain the validation split instead.
To visualize the dataset downloaded, simply run the following:
# Visualize the dataset in the FiftyOne App
import fiftyone as fo
session = fo.launch_app(dataset)
If you would like to download the splits "train", "validation", and "test" in the same function call of the data to be loaded, you could do the following:
dataset = foz.load_zoo_dataset(
"coco-2017",
splits=["train", "validation", "test"],
label_types=["detections", "segmentations"],
classes=["person"],
# max_samples=50,
)
I tried the code that #yatu and #Tim had shared here, but I got lots of requests.exceptions.ConnectionError: HTTPSConnectionPool.
So after carefully reading this answer to Max retries exceeded with URL in requests, I rewrote the code like this one and now it runs smoothly:
from pycocotools.coco import COCO
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import requests
from tqdm.notebook import tqdm
# instantiate COCO specifying the annotations json path
coco = COCO('annotations/instances_train2017.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)
# handle annotations
ANNOTATIONS = {"info": {
"description": "my-project-name"
}
}
def cocoJson(images: list) -> dict:
arrayIds = np.array([k["id"] for k in images])
annIds = coco.getAnnIds(imgIds=arrayIds, catIds=catIds, iscrowd=None)
anns = coco.loadAnns(annIds)
for k in anns:
k["category_id"] = catIds.index(k["category_id"])+1
catS = [{'id': int(value), 'name': key}
for key, value in categories.items()]
ANNOTATIONS["images"] = images
ANNOTATIONS["annotations"] = anns
ANNOTATIONS["categories"] = catS
return ANNOTATIONS
def createJson(JsonFile: json, label='train') -> None:
name = label
Path("data/labels").mkdir(parents=True, exist_ok=True)
with open(f"data/labels/{name}.json", "w") as outfile:
json.dump(JsonFile, outfile)
def downloadImages(images: list) -> None:
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
for im in tqdm(images):
if not isfile(f"data/images/{im['file_name']}"):
img_data = session.get(im['coco_url']).content
with open('data/images/' + im['file_name'], 'wb') as handler:
handler.write(img_data)
trainSet = cocoJson(images)
createJson(trainSet)
downloadImages(images)
On my side I had recent difficulties installing fiftyone with Apple Silicon Mac (M1), so I created a script based on pycocotools that allows me to quickly download a subset of the coco 2017 dataset (images and annotations).
It is very simple to use, details are available here: https://github.com/tikitong/minicoco , hope this helps.

References from Python Bigquery Client don't work

Im having troubles running the following code:
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(BQJSONKEY,project = BQPROJECT)
dataset = client.dataset(BQDATASET)
assert not dataset.exists()
The following error pop up:
'DatasetReference' object has no attribute 'exists'
Similarly when i do:
table = dataset.table(BQTABLE)
i get: 'TableReference' object has no attribute 'exists'
However, according to the docs it should work:
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery/usage.html#datasets
here is my pip freeze (the part with google-cloud):
gapic-google-cloud-datastore-v1==0.15.3
gapic-google-cloud-error-reporting-v1beta1==0.15.3
gapic-google-cloud-logging-v2==0.91.3
gevent==1.2.2
glob2==0.5
gmpy2==2.0.8
google-api-core==0.1.1
google-auth==1.2.1
google-cloud==0.30.0
google-cloud-bigquery==0.28.0
google-cloud-bigtable==0.28.1
google-cloud-core==0.28.0
google-cloud-datastore==1.4.0
google-cloud-dns==0.28.0
google-cloud-error-reporting==0.28.0
google-cloud-firestore==0.28.0
google-cloud-language==1.0.0
google-cloud-logging==1.4.0
google-cloud-monitoring==0.28.0
google-cloud-pubsub==0.29.1
google-cloud-resource-manager==0.28.0
google-cloud-runtimeconfig==0.28.0
google-cloud-spanner==0.29.0
google-cloud-speech==0.30.0
google-cloud-storage==1.6.0
google-cloud-trace==0.16.0
google-cloud-translate==1.3.0
google-cloud-videointelligence==0.28.0
google-cloud-vision==0.28.0
google-gax==0.15.16
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3
I wonder how can i fix it and make it work?
Not sure how you got to this docs but you should be using these as reference:
https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#datasets
Code for 0.28 would be something like:
dataset_refence = client.dataset(BQDATASET)
dataset = client.get_dataset(dataset_reference)
assert dataset.created is not None
I think you forgot to create the dataset before calling exists()
dataset = client.dataset(BQDATASET)
dataset.create()
assert not dataset.exists()

Categories

Resources