splitting annotated images with json file - python

to make it short, I'm trying to split annotated images into train - val - test. I've used the libary splitfolders however it only split the images while the annotated file (.JSON) didn't.
so I tried to use annotated_image libary as I've seen it can do that however I've had an error:
TypeError Traceback (most recent call last)
<ipython-input-19-2345d5e2fe5c> in <module>()
6 RANDOM_SEED=1337
7 IMG_PATH= '/content/Untitled Folder'
----> 8 annotated_images.split(IMG_PATH,'/content/', seed=1337, ratio=(0.8,0.15,0.05))
9
10 import splitfolders
3 frames
/usr/local/lib/python3.7/dist-packages/annotated_images/split.py in list_files(directory)
26 if len(files) == 0:
27 files = glob.glob
---> 28 if len(files) == 0:
29 files = glob.glob(directory + "*.*")
30 return files
TypeError: object of type 'function' has no len()
I'm using Google Colab tho
can anyone help please? or if there is anyway to split images + their respective annotations together!
thank you.
PS: I'm kinda new to python and all, not very familiar with it.
PS2: the code:
import annotated_images
RANDOM_SEED=1337
IMG_PATH= '/content/Untitled Folder'
annotated_images.split(IMG_PATH,'/content/', seed=1337, ratio=(0.8,0.15,0.05))

A simple oversight. Add an extra slash to the end of your path like this
IMG_PATH = '/content/Untitled Folder/'

Related

How to merge chosen text file that saved in google drive and saved as a single csv file in python?

My data are stored in different directories on google drive. I want to extract one certain text file from each directory and store them as a single csv file. The csv file called model keeps all the different file names that I need to get. And this is the only part I need to change for searching the files that are qualified.
To be more specific:the model csv file contains the following :['ENS','ENS_hr','ENS_lr','MM5','MM5G','MPAS25','NMM3GFS','NMM3NAM','WRF2GFS','WRF2GFS81','WRF2NAM','WRF2NAM81','WRF3ARPEGE','WRF3GEM','WRF3GFS','WRF3GFSgc01','WRF3NAM','WRF3NAVGEM','WRF4ICON','WRF4NAM','WRF4RDPS']
here is my code:
md = []
model = pd.read_csv(verification_path + 'model_name.csv')
#find file for the correct model
for m in model.iterrows():
model_file = verification_path + m +'/MAE_monthly_APCP6_60hr_small.txt'
new = pd.read_csv(model_file)
md.append(new)
But I got the error shows:
TypeError Traceback (most recent call last)
<ipython-input-5-981115533dbc> in <module>
6 #find file for the correct model
7 for m in model.iterrows():
----> 8 model_file = verification_path + m +'/MAE_monthly_APCP6_60hr_small.txt'
9 new = pd.read_csv(model_file)
10 md.append(new)
TypeError: can only concatenate str (not "tuple") to str
Does anyone have any idea how to solve it? Is there another better way?
Thx!
I tried to convert the tuple by the following code and got the new error:
The code for converting tuple:
import functools
import operator
def convertTuple(tup):
str = functools.reduce(operator.add, (tup))
return str
The updated code:
for m in model.iterrows():
model_file = verification_path + convertTuple(m) +'/MAE_monthly_APCP6_60hr_small.txt'
new = pd.read_csv(model_file)
md.append(new)
The error message:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/ops/array_ops.py in _na_arithmetic_op(left, right, op, is_cmp)
165 try:
--> 166 result = func(left, right)
167 except TypeError:
12 frames
TypeError: unsupported operand type(s) for +: 'int' and 'str'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/roperator.py in radd(left, right)
7
8 def radd(left, right):
----> 9 return right + left
10
11
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Saving files in specific folders using a for loop

I am trying to save various excel files that are produced in the code below to a specific folder, but keep getting "SyntaxError: invalid syntax".
#df = some random data
dfinvest = df[df['INVEST'] >= 0.1]
groupedinvest = dfinvest.groupby("SEDOL")
keysI = groupedinvest.groups.keys()
save_path = "C:/Users/Documents/Python Scripts/"
for key in keysI:
splitdf = groupedinvest.get_group(key)
splitdf.to_excel(os.path.join(save_path((str(key)+ str(datetime.datetime.now().strftime(" %d-%m-%Y - Invest") )+ ".xlsx"), engine='xlsxwriter')))
Any help or pointing in the right direction would be much appreciated.
Error:
TypeError Traceback (most recent call last)
<ipython-input-48-5616ac4f50b7> in <module>
1 for key in keysI: #looping through each key
2 splitdf = groupedinvest.get_group(key) # creating a temporary dataframe with only the values of the current key.
----> 3 splitdf.to_excel(os.path.join(save_path((str(key)+ str(datetime.datetime.now().strftime(" %d-%m-%Y - Invest") )+ ".xlsx"), engine='xlsxwriter')))
TypeError: 'str' object is not callable

HuggingFace BPE Trainer Error - Training Tokenizer

I am trying to train a ByteLevelBPETokenizer using an iterable instead of from files. There must be something I am doing wrong when I instantiate the trainer, but I can't tell what it is. When I try to train the tokenizer with my dataset (clothing data from Kaggle) + the BpeTrainer, I get an error.
**TypeError**: 'tokenizers.trainers.BpeTrainer' object cannot be interpreted as an integer
I am using Colab
Step 1: Install tokenizers & download the Kaggle data
!pip install tokenizers
# Download clothing data from Kaggle
# https://www.kaggle.com/nicapotato/womens-ecommerce-clothing-reviews/version/1?select=Womens+Clothing+E-Commerce+Reviews.csv
Step 2: Upload the file
# use colab file upload
from google.colab import files
uploaded = files.upload()
Step 3: Clean the data (remove floats) & run trainer
import io
import pandas as pd
# convert the csv to a dataframe so it can be parsed
data = io.BytesIO(uploaded['clothing_dataset.csv'])
df = pd.read_csv(data)
# convert the review text to a list so it can be passed as iterable to tokenizer
clothing_data = df['Review Text'].to_list()
# Remove float values from the data
clean_data = []
for item in clothing_data:
if type(item) != float:
clean_data.append(item)
from tokenizers import ByteLevelBPETokenizer
from tokenizers.processors import BertProcessing
from tokenizers import trainers, pre_tokenizers
from tokenizers.trainers import BpeTrainer
from pathlib import Path
# Initialize a tokenizer
tokenizer = ByteLevelBPETokenizer(lowercase=True)
# Intantiate BpeTrainer
trainer = BpeTrainer(
vocab_size=20000,
min_frequence = 2,
show_progress=True,
special_tokens=["<s>","<pad>","</s>","<unk>","<mask>"],)
# Train the tokenizer
tokenizer.train_from_iterator(clean_data, trainer)
Error - I can see that the trainer is a BpeTrainer Type.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-103-7738a7becb0e> in <module>()
34
35 # Train the tokenizer
---> 36 tokenizer.train_from_iterator(clean_data, trainer)
/usr/local/lib/python3.7/dist-packages/tokenizers/implementations/byte_level_bpe.py in train_from_iterator(self, iterator, vocab_size, min_frequency, show_progress, special_tokens)
119 show_progress=show_progress,
120 special_tokens=special_tokens,
--> 121 initial_alphabet=pre_tokenizers.ByteLevel.alphabet(),
122 )
123 self._tokenizer.train_from_iterator(iterator, trainer=trainer)
TypeError: 'tokenizers.trainers.BpeTrainer' object cannot be interpreted as an integer
Interesting Note: If I set the input trainer=trainer I get this
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-104-64737f948e6d> in <module>()
34
35 # Train the tokenizer
---> 36 tokenizer.train_from_iterator(clean_data, trainer=trainer)
TypeError: train_from_iterator() got an unexpected keyword argument 'trainer'
I haven't used train_from_iterator before but looking at the HF docs it seems you should use a generator function. So something like:
def clothing_generator():
for item in clothing_data:
if type(item) != float:
yield item
Followed by:
tokenizer.train_from_iterator(clothing_generator(), trainer)
Might help?

When I use TF-IDF in Natural language processing, it said list is not callable.Can you help me with it?

I got error like this :
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-38-b9ac626e6121> in <module>
5
6 # Fitting TF-IDF to both training and test sets (semi-supervised learning)
----> 7 tfv.fit(list(xtrain) + list(xvalid))
8 xtrain_tfv = tfv.transform(xtrain)
9 xvalid_tfv = tfv.transform(xvalid)
TypeError: 'list' object is not callable
When I run these codes in python:
tfv = TfidfVectorizer(min_df=3, max_features=None,
strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',
ngram_range=(1, 3), use_idf=1,smooth_idf=1,sublinear_tf=1,
stop_words = 'english')
# Fitting TF-IDF to both training and test sets (semi-supervised learning)
tfv.fit(list(xtrain) + list(xvalid))
xtrain_tfv = tfv.transform(xtrain)
xvalid_tfv = tfv.transform(xvalid)
P.S. I also tried to convert the xtrain to list with xtrain.tolist(), but it doesn't work for me either.
From the code you provided nothing seems wrong. However, I hypothesize that somewhere before that block of code, you assigned an object to the variable name list (most likely something along the lines of list = [...]) which is usually the cause of this error.
Try to find that line of code if it exists and rename that variable. Generally it is not a good idea to rename built-in types for this reason. For more info read this

Type error when trying to list all the sub-directories of a directory

I want to list all the sub-directories of a directory but it throws type error
TRAIN_PATH_ARRAY=['New folder/train/']
TEST_PATH_ARRAY=['New folder/test/']
train_ids = next(os.walk(TRAIN_PATH_ARRAY))[1]
test_ids = next(os.walk(TEST_PATH_ARRAY))[1]
np.random.seed(10)
Output:
TypeError Traceback (most recent call last)
<ipython-input-11-a1a31c46fb70> in <module>
----> 1 train_ids = next(os.walk(TRAIN_PATH_ARRAY))[1]
2 test_ids = next(os.walk(TEST_PATH_ARRAY))[1]
3 np.random.seed(10)
~\Anaconda3\lib\os.py in walk(top, topdown, onerror, followlinks)
334
335 """
--> 336 top = fspath(top)
337 dirs = []
338 nondirs = []
TypeError: expected str, bytes or os.PathLike object, not list
Like the error message rather plainly says, the argument to os.walk() should be a str (or a pathlib path), not a list.
It's not really clear what you hope for the code to actually accomplish. Extracting just the second element out of the first result from os.walk() is not correct because it returns a file names relative to the starting directory. But if that's what you are after, maybe try
TRAIN_PATH='New folder/train/'
TEST_PATH='New folder/test/'
train_ids = [os.path.join(TRAIN_PATH, x) for x in next(os.walk(TRAIN_PATH))[1])]
test_ids = [os.path.join(TEST_PATH, x) for x in next(os.walk(TEST_PATH))[1])]
If indeed you want to traverse an array, I'm afraid you will need to explain the intention of your code in more detail.

Categories

Resources