I am trying to learn sentiment analysis using vaderSentiment. For some reason, the when I create a query, I am getting the above error. I've checked the documentation, and there is no mention of a Utc attribute being required. Here is the basic code I am using:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
import twint
analyzer = SentimentIntensityAnalyzer()
c = twint.Config()
c.Search = "Tesla"
c.Store_object = True
c.Since = "2023-01-20"
twint.run.Search(c)
for tweet in twint.output.tweets_list:
print(tweet.tweet)
scores = analyzer.polarity_scores(tweet.tweet)
print(scores)
I've had a great deal of issues trying to get twint to work with my system, so it's highly probable I've not installed it correctly.
With VBA, I can edit arbitrary workbook metadata like so, and it will be reflected on SharePoint:
With ThisWorkbook
.ContentTypeProperties("Property A") = 1
.ContentTypeProperties("Prop B") = “Something”
End With
Now, I am hoping to do the same with openpyxl
I can do this for properties without spaces:
wb.properties.title = 'test'
but properties with spaces won't work--I try this and the script runs, but nothing shows on SharePoint:
setattr(wb.properties, 'Project Title', 'hello')
wb.properties.__dict__['Project Number'] = '12'
This can be done with xlsxwriter
import xlsxwriter
workbook = xlsxwriter.Workbook(wb_path)
workbook.set_custom_property('Project Title', 'hello')
workbook.close()
but it will create a new workbook...
According to this https://foss.heptapod.net/openpyxl/openpyxl/-/merge_requests/384/diffs?commit_id=e00ce36aa92ae4fffa7014e460a8999681d73b8b
I could simply do
wb.custom_doc_props.add(k, v), but I'm getting no attribute custom_doc_props with the latest version (I believe), 3.0.10.
I installed the 3.2.0b1 version with pip, and now get AttributeError: 'CustomDocumentPropertyList' object has no attribute 'add'. I guess the method isn't fully implemented yet
This works with version 3.1.0 of openpyxl. You can download via pip with
python -m pip install https://foss.heptapod.net/openpyxl/openpyxl/-/archive/branch/3.1/openpyxl-branch-3.1.zip
and assign properties like so
from openpyxl.packaging.custom import (
BoolProperty,
DateTimeProperty,
FloatProperty,
IntProperty,
LinkProperty,
StringProperty,
CustomPropertyList,
)
props = CustomPropertyList()
props.append(StringProperty(name='hello world', value='foo bar'))
wb.custom_doc_props = props
wb.save(...)
Data is preserved on SharePoint. More info here: https://foss.heptapod.net/openpyxl/openpyxl/-/blob/branch/3.1/doc/workbook_custom_doc_props.rst
import pandas as pd
from ast import literal_eval
from cdqa.utils.filters import filter_paragraphs
from cdqa.utils.download import download_model, download_bnpp_data
from cdqa.pipeline.cdqa_sklearn import QAPipeline
# Download data and models
download_bnpp_data(dir='./data/bnpp_newsroom_v1.1/')
download_model(model='bert-squad_1.1', dir='./models')
# Loading data and filtering / preprocessing the documents
df = pd.read_csv('data/bnpp_newsroom_v1.1/bnpp_newsroom-v1.1.csv', converters={'paragraphs': literal_eval})
df = filter_paragraphs(df)
# Loading QAPipeline with CPU version of BERT Reader pretrained on SQuAD 1.1
cdqa_pipeline = QAPipeline(reader='C:/models/bert_qa.joblib')
# Fitting the retriever to the list of documents in the dataframe
cdqa_pipeline.fit_retriever(X=df)
I was trying to load model but the cdqa_pipeline = QAPipeline(reader='C:/models/bert_qa.joblib') line throws an error saying AttributeError: module 'transformers.modeling_bert' has no attribute 'gelu'
I was using transformers version 3.5
from transformers.activations import gelu
I ended up using a different version. Issue was I had a requirement for this version so I changed other parts of my program too.
I am developing an object detection model to detect ships using YOLO. I want to use the COCO dataset. Is there a way to download only the images that have ships with the annotations?
To download images from a specific category, you can use the COCO API. Here's a demo notebook going through this and other usages. The overall process is as follows:
Install pycocotools
Download one of the annotations jsons from the COCO dataset
Now here's an example on how we could download a subset of the images containing a person and saving it in a local file:
from pycocotools.coco import COCO
import requests
# instantiate COCO specifying the annotations json path
coco = COCO('...path_to_annotations/instances_train2014.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)
Which returns a list of dictionaries with basic information on the images and its url. We can now use requests to GET the images and write them into a local folder:
# Save the images into a local folder
for im in images:
img_data = requests.get(im['coco_url']).content
with open('...path_saved_ims/coco_person/' + im['file_name'], 'wb') as handler:
handler.write(img_data)
Note that this will save all images from the specified category. So you might want to slice the images list to the first n.
From what I personally know, if you're talking about the COCO dataset only, I don't think they have a category for "ships". The closest category they have is "boat". Here's the link to check the available categories: http://cocodataset.org/#overview
BTW, there are ships inside the boat category too.
If you want to just select images of a specific COCO category, you might want to do something like this (taken and edited from COCO's official demos):
# display COCO categories
cats = coco.loadCats(coco.getCatIds())
nms=[cat['name'] for cat in cats]
print('COCO categories: \n{}\n'.format(' '.join(nms)))
# get all images containing given categories (I'm selecting the "bird")
catIds = coco.getCatIds(catNms=['bird']);
imgIds = coco.getImgIds(catIds=catIds);
Nowadays there is a package called fiftyone with which you could download the MS COCO dataset and get the annotations for specific classes only. More information about installation can be found at https://github.com/voxel51/fiftyone#installation.
Once you have the package installed, simply run the following to get say the "person" and "car" classes:
import fiftyone.zoo as foz
# To download the COCO dataset for only the "person" and "car" classes
dataset = foz.load_zoo_dataset(
"coco-2017",
split="train",
label_types=["detections", "segmentations"],
classes=["person", "car"],
# max_samples=50,
)
If desired, you can comment out the last option to set a maximum samples size. Moreover, you can change the "train" split to "validation" in order to obtain the validation split instead.
To visualize the dataset downloaded, simply run the following:
# Visualize the dataset in the FiftyOne App
import fiftyone as fo
session = fo.launch_app(dataset)
If you would like to download the splits "train", "validation", and "test" in the same function call of the data to be loaded, you could do the following:
dataset = foz.load_zoo_dataset(
"coco-2017",
splits=["train", "validation", "test"],
label_types=["detections", "segmentations"],
classes=["person"],
# max_samples=50,
)
I tried the code that #yatu and #Tim had shared here, but I got lots of requests.exceptions.ConnectionError: HTTPSConnectionPool.
So after carefully reading this answer to Max retries exceeded with URL in requests, I rewrote the code like this one and now it runs smoothly:
from pycocotools.coco import COCO
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
import requests
from tqdm.notebook import tqdm
# instantiate COCO specifying the annotations json path
coco = COCO('annotations/instances_train2017.json')
# Specify a list of category names of interest
catIds = coco.getCatIds(catNms=['person'])
# Get the corresponding image ids and images using loadImgs
imgIds = coco.getImgIds(catIds=catIds)
images = coco.loadImgs(imgIds)
# handle annotations
ANNOTATIONS = {"info": {
"description": "my-project-name"
}
}
def cocoJson(images: list) -> dict:
arrayIds = np.array([k["id"] for k in images])
annIds = coco.getAnnIds(imgIds=arrayIds, catIds=catIds, iscrowd=None)
anns = coco.loadAnns(annIds)
for k in anns:
k["category_id"] = catIds.index(k["category_id"])+1
catS = [{'id': int(value), 'name': key}
for key, value in categories.items()]
ANNOTATIONS["images"] = images
ANNOTATIONS["annotations"] = anns
ANNOTATIONS["categories"] = catS
return ANNOTATIONS
def createJson(JsonFile: json, label='train') -> None:
name = label
Path("data/labels").mkdir(parents=True, exist_ok=True)
with open(f"data/labels/{name}.json", "w") as outfile:
json.dump(JsonFile, outfile)
def downloadImages(images: list) -> None:
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
for im in tqdm(images):
if not isfile(f"data/images/{im['file_name']}"):
img_data = session.get(im['coco_url']).content
with open('data/images/' + im['file_name'], 'wb') as handler:
handler.write(img_data)
trainSet = cocoJson(images)
createJson(trainSet)
downloadImages(images)
On my side I had recent difficulties installing fiftyone with Apple Silicon Mac (M1), so I created a script based on pycocotools that allows me to quickly download a subset of the coco 2017 dataset (images and annotations).
It is very simple to use, details are available here: https://github.com/tikitong/minicoco , hope this helps.
Im having troubles running the following code:
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json(BQJSONKEY,project = BQPROJECT)
dataset = client.dataset(BQDATASET)
assert not dataset.exists()
The following error pop up:
'DatasetReference' object has no attribute 'exists'
Similarly when i do:
table = dataset.table(BQTABLE)
i get: 'TableReference' object has no attribute 'exists'
However, according to the docs it should work:
https://googlecloudplatform.github.io/google-cloud-python/stable/bigquery/usage.html#datasets
here is my pip freeze (the part with google-cloud):
gapic-google-cloud-datastore-v1==0.15.3
gapic-google-cloud-error-reporting-v1beta1==0.15.3
gapic-google-cloud-logging-v2==0.91.3
gevent==1.2.2
glob2==0.5
gmpy2==2.0.8
google-api-core==0.1.1
google-auth==1.2.1
google-cloud==0.30.0
google-cloud-bigquery==0.28.0
google-cloud-bigtable==0.28.1
google-cloud-core==0.28.0
google-cloud-datastore==1.4.0
google-cloud-dns==0.28.0
google-cloud-error-reporting==0.28.0
google-cloud-firestore==0.28.0
google-cloud-language==1.0.0
google-cloud-logging==1.4.0
google-cloud-monitoring==0.28.0
google-cloud-pubsub==0.29.1
google-cloud-resource-manager==0.28.0
google-cloud-runtimeconfig==0.28.0
google-cloud-spanner==0.29.0
google-cloud-speech==0.30.0
google-cloud-storage==1.6.0
google-cloud-trace==0.16.0
google-cloud-translate==1.3.0
google-cloud-videointelligence==0.28.0
google-cloud-vision==0.28.0
google-gax==0.15.16
google-resumable-media==0.3.1
googleapis-common-protos==1.5.3
I wonder how can i fix it and make it work?
Not sure how you got to this docs but you should be using these as reference:
https://googlecloudplatform.github.io/google-cloud-python/latest/bigquery/usage.html#datasets
Code for 0.28 would be something like:
dataset_refence = client.dataset(BQDATASET)
dataset = client.get_dataset(dataset_reference)
assert dataset.created is not None
I think you forgot to create the dataset before calling exists()
dataset = client.dataset(BQDATASET)
dataset.create()
assert not dataset.exists()