Is there another way not to use ```.get_feature_names_out()```? - python

Suppose there is a pipeline as follows....
pipe = make_pipline(one-hot,selectkbest,......)
If you want to see only the selected columns...
selected_mask = pipe.named_steps['selectkbest'].get_feature_names_out()
I tried other methods, but I couldn't solve them... Here's how I tried it.
all_names = pipe.named_steps['onehotencoder'].get_feature_names()
selected_mask = pipe.named_steps['selectkbest'].get_support()
selected_names = all_names[selected_mask]
When I did the following sentence...
selected_names = all_names[selected_mask]
Recall the following error:
TypeError: only integer scalar arrays can be converted to a scalar index
How can I solve it?

Related

I keep getting this error: TypeError: tuple indices must be integers or slices, not tuple

I am trying to do this tutorial: https://colab.research.google.com/drive/1d8PEeSdVlP0JogKwkytvFeyXXPu_qfXg?usp=sharing#scrollTo=sDixMreeUS_9
and this is https://github.com/mjpramirez/Volvo-DataX the repository in GitHub
So when I am trying to run the model I kept getting this error and I already find which file has this error and this is where the problem:
unmatched_trackers = []
for t,trk in enumerate(trackers):
if(t not in matched_indices[:,1]):
unmatched_trackers.append(t)
I tried to replace 1 with 0 but still not working.
You need to replace the sklearn.utils.linear_assignment_.linear_assignment function by the scipy.optimize.linear_sum_assignment function by importing like from scipy.optimize import linear_sum_assignment as linear_assignment.
The difference is in the return format: linear_assignment() is
returning a numpy array and linear_sum_assignment() a tuple of numpy arrays.
You obtain the same output by converting the output of linear_sum_assignment() in array and transpose it.
Example:
matched_indices = linear_assignment(-iou_matrix)
matched_indices = np.asarray(matched_indices)
matched_indices = np.transpose(matched_indices)
OR
matched_indices = np.array(list(zip(*matched_indices)))

Extract value in specific range

I have one dataset with several column:
data-pioggia-name.....
I would like to get values, within of the column pioggia, between 0 and 400.
I tried with:
start='0'
end='400'
data = (data['pioggia']>start)&(data['pioggia']<=end)
but I have error: ">" not supported between instances of 'str' and 'int'
I tried also:
data = data['pioggia'].between(0,400, inclusive=True)
but I have the same error.
There is some solution? also for example with replace?
Try adding this line:
data['pioggia'] = data['pioggia'].astype(int)
Also, make your start and end variables be ints (e.g. 0) instead of strings (e.g. '0').
Like this:
start = 0 # Notice this and `end` are ints, not strings
end = 400
data['pioggia'] = data['pioggia'].astype(int)
data = (data['pioggia']>start)&(data['pioggia']<=end)

How do you make a tfrecord such that you can access features using a string key

I am trying to use someone else's code where it appears they are able to access a feature in a tfrecord example by simply subscripting with a string. Here is a brief version of their code
def foo(example):
text = example["text"]
subtokens = some_other_function(text)
features = {
"my_subtokens": subtokens}
return(features)
input_files = ['test.tfrecord']
d = tf.data.Dataset.from_tensor_slices(tf.constant(input_files))
d = d.map(foo)
The key line in there is text = example["text"]. How were they able to get it so they could access a feature simply by subscripting the example with a string? Every time I try to write a tf record and then use a string as a key, I get the error TypeError: Only integers, slices (':'), ellipsis ('...'), tf.newaxis ('None') and scalar tf.int32/tf.int64 tensors are valid indices
To make my tfrecord, I just copied the code exactly from this website https://www.tensorflow.org/tutorials/load_data/tfrecord

TypeError: Expected unicode, got pandas._libs.properties.CachedProperty

I,m trying to add empty column in my dataset on colab but it give me this error. does anybody know possible solution for this?
My code.
dataframe["Comp"] = ''
dataframe["Negative"] = ''
dataframe["Neutral"] = ''
dataframe["Positive"] = ''
dataframe
Error message
TypeError: Expected unicode, got pandas._libs.properties.CachedProperty
I find an answer from Quabr, and it works:
Reason: It's due to 'freq' of datetimeindex was set as "pandas._libs.properties.CachedProperty object at 0x7fb22dd1e2c8>"
Solution: used df = df.asfreq('H') to set it correctly.

Update values in new column

I want to run a package(RAKE) to extract keyphrases from comments(df['CUSTOMER_RECOMMENDATIONS_TRANS]) and create a new column(df['keyphrase_RAKE']) to store them corresponding to each comment. I'm getting an error saying "ValueError: Length of values does not match the length of index".
I know the reason behind the error but don't know how to fix it. What can be done?
keywords return a list of keyphrases.
This the code:
import RAKE
import operator
# Reka setup with stopword directory
stop_dir = "SmartStoplist.txt"
rake_object = RAKE.Rake(stop_dir)
# Sample text to test RAKE
df = pd.read_excel('my.xlsx')
for i in df['CUSTOMER_RECOMMENDATIONS_TRANS']:
keywords = rake_object.run(i)
df['keyphrase_RAKE'] = keywords
you can usepandas.DataFrame.apply and avoid the for loop
df['keyphrase_RAKE'] = df['CUSTOMER_RECOMMENDATIONS_TRANS'].apply(rake_object.run)

Categories

Resources