I'm trying to break up Product names into categories, for example if the product is "Demi Baguette", the category should be "Baguette" and sub category "Demi". I have looked at NLP articles but nothing seems to be what I need as they all focus on sentences and text.
I've seen other questions answered by saying to use a dict, however there is over 15 thousand rows in the excel file so that's not really possible.
Any ideas as to how I can tackle this or where I can look?
Here is an example of my data.
So I would want the category to be "Soup" and then sub categories based on flavour e.g"Chicken", and misc labels "Cream".
Related
I have this Dataset
Now, In this Dataset, I want to Add a Filter where as we can see there are same or little different names for the same product, so we want to add a filter which if we chose LOSARTAN will show all the values in Product relating to that LOSARTAN, same for the other Products too. Basically, a Filter where we filter all the products which have similar names, if we choose one name in filter we will be able to see all the different names used for that Specific Product.
Thank you!
so I'm pretty stuck with something, let me explain. So, I made some groupings to know the category from which each customer belongs to. However, I ordered them vertically so it took the list as columns, and have no idea on how to associate them.
customers = pd.read_excel (r'/xxxxx/customers.XLSX')
restaurants = customers['Restaurants'].tolist()
restaurantsNoNAN = [item for item in restaurants if not(pd.isnull(item)) == True]
print (restaurantsNoNaN)
So, at this point, I created the group of restaurants, which have all those customers that are restaurants. However, the problem arises in that, in order to know both the group and specific customer that buys products, I need to link both datasets. And the problem comes in that both are aligned in different ways. I made these examples that simulate the real files I'm working with. Consider that I have grouped both products and customers like the excel of the customer.
Excel of Customers
Excel of Sales
In this case, what would be the necessary step to align both, in order to be able to use matplotlib to get data?
I have a healthcare dataset that includes columns with different text (such as medical history, doctor notes etc..). I want to use these notes to help build a 'criteria list' for the patients that stayed at the hospital for less than 2 days (i have that flagged in the dataset).
I'm new with NLP and have only done coursework projects where only one column of text is used but this dataset has multiple columns so how do i go about doing it? do i combine all the columns to be one big string and then do all the text cleaning and processing? or what is another option?
Heres a screenshot of the dataset, i couldnt get any other way to display it:
I’m very new to Web Scraping so looking out for some help.
So basically these classes contain a bunch of product lists that I want to extract. But inspite having same initial class name A some have an additional name further as “animate” or “noOffer”. This pattern occurs more than 200-300 times and is random for each product, so I would really like to automate the extraction.
Problem is when I try extracting the data of first 100 products there’s this mixture of class names.
I have been able to put a for loop and findAll to extract data from either 1 of those classes. I need a way to append the data to a list which contain a mixture of all these 3 classes
Ultimately I want to create a CSV, from panda DataFrame.
I have two different product data of 5.4 Million and 4.5 Million Products, which scraped from the competitor website. Most products are non branded products that don't have any unique standard SKU. I want to compare 300K product data with similar products which our competitor are selling and want to find out the price difference.
I have tired compare dataset using two different sphinx with similar words but not able to find out a good result because of the title are not similar of non branded products with a standard brand name, title or SKU
Is there any way to get the result using ML or some big data algorithm ?
If you use Sphinx/Manticore you can:
take each of your products from dataset 1
convert it into a query using a quorum operator with percentile and a ranking formula of your choice
run the query against dataset 2
find results
take top K
There're some additional tricks that can help like:
IDF boosting
skipping stop-words
use of atc-based ranking
The tricks and the concept of finding similar content in general are described in this interactive course - https://play.manticoresearch.com/mlt/