Searching sub string value in Numpy Array - python

First of all, I am using "Python" and the latest Pycharm Community edition.
I am currently working on a user interface with tkinter which is requesting several values from the user - two string values and one integer. Afterwards the program should search through an Excel or CSV file to find those values. Unfortuantely, I am currently stucked on the first entry. I've created a numpy array out of the dataframe, since I've read that arrays are much faster when it comes to work with great data. The final excel/csv file I am working with will contain several thousands of rows and up to 60 columns. In addition, the enrty_name could be a sub string of a bigger string and the search algortihm should find the fraction or the full name ( example: entry: "BMW", in array([["BMW Werk", "BMW-Automobile", "BMW_Client"], ["BMW Part1", "BMW Part2", "XS-12354"]) ). Afterwards I would like to proceed with other calculations based on the values in the array.
Example: entry: "BMW", in array([["BMW Werk", "Car1", "XD-12345"], ["BMW Part1", "exauster", "XS-12354"]])
Program found "BMW Werk" and "BMW Part1" in array, returns ["BMW Werk", "Car1", "XD-12345"] and ["BMW Part1", "exauster", "XS-12354"]
entry_name = "BMW"
path_for_excel = "D:\Python PyCharm\Tool\Clientlist.xslx"
client_list_df= pd.read_excel(path_for_excel , engine="openpyxl")
client_list_array= client_list_df.to_numpy()
#first check if entry_name is populated ( entry field in ui )
if entry_name == True:
#search for sub string in string
part_string_value = np.char.startswith(client_list_array, entry_name)
if part_string_value in client_list_array:
index = np.where(client_list_array == part_string_value)
#print found value, including the other values in the list
print(client_list_array[])
I am able to retrieve the requested values if the client is using the correct, full name like "BMW Werk", but any typo will hinder the process and it is very exausting for some names to type the full name, as an example one name looks like: "BMW Werk Bloemfontein, 123-45, Willows".
Hopefully, somebody finds time to help with my issue.
Thank you !

Related

how to get nested data with pandas and request

I'm going crazy trying to get data through an API call using request and pandas. It looks like it's nested data, but I cant get the data i need.
https://xorosoft.docs.apiary.io/#reference/sales-orders/get-sales-orders
above is the api documentation. I'm just trying to keep it simple and get the itemnumber and qtyremainingtoship, but i cant even figure out how to access the nested data. I'm trying to use DataFrame to get it, but am just lost. any help would be appreciated. i keep getting stuck at the 'Data' level.
type(json['Data'])
df = pd.DataFrame(['Data'])
df.explode('SoEstimateHeader')
df.explode('SoEstimateHeader')
Cell In [64], line 1
df.explode([0:])
^
SyntaxError: invalid syntax
I used the link to grab a sample response from the API documentation page you provided. From the code you provided it looks like you are already able to get the data and I'm assuming the you have it as a dictionary type already.
From what I can tell I don't think you should be using pandas, unless its some downstream requirement in the task you are doing. But to get the ItemNumber & QtyRemainingToShip you can use the code below.
# get the interesting part of the data out of the api response
data_list = json['Data']
#the data_list is only one element long, so grab the first element which is of type dictionary
data = data_list[0]
# the dictionary has two keys at the top level
so_estimate_header = data['SoEstimateHeader']
# similar to the data list the value associated with "SoEstimateItemLineArr" is of type list and has 1 element in it, so we grab the first & only element.
so_estimate_item_line_arr = data['SoEstimateItemLineArr'][0]
# now we can grab the pieces of information we're interested in out of the dictionary
qtyremainingtoship = so_estimate_item_line_arr["QtyRemainingToShip"]
itemnumber = so_estimate_item_line_arr["ItemNumber"]
print("QtyRemainingToShip: ", qtyremainingtoship)
print("ItemNumber: ", itemnumber)
Output
QtyRemainingToShip: 1
ItemNumber: BC
Side Note
As a side note I wouldn't name any variables json because thats also the name of a popular library in python for parsing json, so that will be confusing to future readers and will clash with the name if you end up having to import the json library.

how to remove elements from array in firestore with a where clause in python

I've a sample collection in my firestore
I've got two arrays in a document. I want to delete the whole array dictionary from two arrays if monthName is January.
Desired output
I've tried
doc_ref = db.collection('calender').where('monthName', 'array_contains', 'January').arrayremove()
But I'm getting an error
AttributeError: 'Query' object has no attribute 'arrayRemove'
I referred the documentation, but I couldn't understand that to this problem. So, looking for help here.
Firestore doesn't have the concept of an update query, where you send a condition and an operation to the database. You'll instead have to:
Execute the query to get all documents matching your condition.
Loop over the documents in your Python code.
Then remove the array from each document with an update call.
In addition, arrayRemove can only be used if you know the exact, complete item to remove from the array, you will have to:
Load the document (same as above).
Get the array, and remove the item.
Write the entire remaining array back to the database.
You can create a firestore index to query for documents without knowing their IDs. If you create an index on calender on the fields month1 and month2, your CollectionGroup query would run successfully with (in pseudocode):
objectToDelete = {id: 1, monthName: "January"}
for month in ["month1", "month2"]:
matchingDocRefs = db.collection("calender")
.where(month, "array-contains", objectToDelete)
.get() // Returns a list of matching DocumentReferences
for documentRef in matchingDocRefs:
documentRef.update({ month, firestore.ArrayRemove(objectToDelete)})
Tested and working in JavaScript.
REFERENCES:
Firestore queries on array membership (click "Python" tab)
Removing elements from an array (click "Python" tab)
Python Document object documentation
P.S., it's spelled "calendar", not "calender"

Scraping data from a http & javaScript site

I currently want to scrape some data from an amazon page and I'm kind of stuck.
For example, lets take this page.
https://www.amazon.com/NIKE-Hyperfre3sh-Athletic-Sneakers-Shoes/dp/B01KWIUHAM/ref=sr_1_1_sspa?ie=UTF8&qid=1546731934&sr=8-1-spons&keywords=nike+shoes&psc=1
I wanted to scrape every variant of shoe size and color. That data can be found opening the source code and searching for 'variationValues'.
There we can see sort of a dictionary containing all the sizes and colors and, below that, in 'asinToDimentionIndexMap', every product code with numbers indicating the variant from the variationValues 'dictionary'.
For example, in asinToDimentionIndexMap we can see
"B01KWIUH5M":[0,0]
Which means that the product code B01KWIUH5M is associated with the size '8M US' (position 0 in variationValues size_name section) and the color 'Teal' (same idea as before)
I want to scrape both the variationValues and the asinToDimentionIndexMap, so i can associate the IndexMap numbers to the variationValues one.
Another person in the site (thanks for the help btw) suggested doing it this way.
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '(\{.+?\}_')
import json
d = json.loads(data[0])
d['products'][0]
I can sort of understand the first part. We get everything that's a 'script' as a string and then get everything between {}. The issue is what happens after that. My knowledge of json is not that great and reading some stuff about it didn't help that much.
Is it there a way to get, from that data, 2 dictionaries or lists with the variationValues and asinToDimentionIndexMap? (maybe using some regular expressions in the middle to get some data out of a big string). Or explain a little bit what happens with the json part.
Thanks for the help!
EDIT: Added photo of variationValues and asinToDimensionIndexMap
I think you are close Manuel!
The following code will turn your scraped source into easy-to-select boxes:
import json
d = json.loads(data[0])
JSON is a universal format for storing object information. In other words, it's designed to interpret string data into object data, regardless of the platform you are working with.
https://www.w3schools.com/js/js_json_intro.asp
I'm assuming where you may be finding things a challenge is if there are any errors when accessing a particular "box" inside you json object.
Your code format looks correct, but your access within "each box" may look different.
Eg. If your 'asinToDimentionIndexMap' object is nested within a smaller box in the larger 'products' object, then you might access it like this (after running the code above):
d['products'][0]['asinToDimentionIndexMap']
I've hacked and slash a little bit so you can better understand the structure of your particular json file. Take a look at the link below. On the right-hand side, you will see "which boxes are within one another" - which is precisely what you need to know for accessing what you need.
JSON Object Viewer
For example, the following would yield "companyCompliancePolicies_feature_div":
import json
d = json.loads(data[0])
d['updateDivLists']['full'][0]['divToUpdate']
The person helping you before outlined a general case for you, but you'll need to go in an look at structure this way to truly find what you're looking for.
variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
asinVariationValues = re.findall(r'asinVariationValues\" : ({.*?}})', ' '.join(script))[0]
dimensionValuesData = re.findall(r'dimensionValuesData\" : (\[.*\])', ' '.join(script))[0]
asinToDimensionIndexMap = re.findall(r'asinToDimensionIndexMap\" : ({.*})', ' '.join(script))[0]
dimensionValuesDisplayData = re.findall(r'dimensionValuesDisplayData\" : ({.*})', ' '.join(script))[0]
Now you can easily convert them to json as use them combine as you wish.

Arcpy, select features based on part of a string

So for my example, I have a large shapefile of state parks where some of them are actual parks and others are just trails. However there is no column defining which are trails vs actual parks, and I would like to select those that are trails and remove them. I DO have a column for the name of each feature, that usually contains the word "trail" somewhere in the string. It's not always at the beginning or end however.
I'm only familiar with Python at a basic level and while I could go through manually selecting the ones I want, I was curious to see if it could be automated. I've been using arcpy.Select_analysis and tried using "LIKE" in my where_clause and have seen examples using slicing, but have not been able to get a working solution. I've also tried using the 'is in' function but I'm not sure I'm using it right with the where_clause. I might just not have a good enough grasp of the proper terms to use when asking and searching. Any help is appreciated. I've been using the Python Window in ArcMap 10.3.
Currently I'm at:
arcpy.Select_analysis ("stateparks", "notrails", ''trail' is in \"SITE_NAME\"')
Although using the Select tool is a good choice, the syntax for the SQL expression can be a challenge. Consider using an Update Cursor to tackle this problem.
import arcpy
stateparks = r"C:\path\to\your\shapefile.shp"
notrails = r"C:\path\to\your\shapefile_without_trails.shp"
# Make a copy of your shapefile
arcpy.CopyFeatures_management(stateparks, notrails)
# Check if "trail" exists in the string--delete row if so
with arcpy.da.UpdateCursor(notrails, "SITE_NAME") as cursor:
for row in cursor:
if "trails" in row[0]: # row[0] refers to the current row in the "SITE_NAME" field
cursor.deleteRow() # Delete the row if condition is true

Python search for value in string using wildcards

I'm trying to make an address book for a school project, but I can't get my head around searching for values when part of the value is queried.
Here is the block of code at which I am stuck on:
self.ui.tableWidget.setRowCount(0)
with open("data.csv") as file:
for rowdata in csv.reader(file):
row = self.ui.tableWidget.rowCount()
if query in rowdata:
self.ui.tableWidget.insertRow(row)
for column, data in enumerate(rowdata):
item = QtGui.QTableWidgetItem(data)
self.ui.tableWidget.setItem(row, column, item)
As you can see, I'm using a PyQt TableWidget to display search results from a csv file. The code above does work, but it only displays the result when a full query is given. The line of code that checks for a match is:
if query in rowdata:
So for example, if I wanted to find someone called John, I would have to search for exactly "John". It wouldn't appear if I searched "Joh" or "john" or "hn" and so on...
If you require more information, just ask :P
What I think you want to do is search for query within the fields of your rows, not the whole row at once.
The problem you're having is that string in obj does different things depending on what type obj has. If it is a string, it does a substring search (like you want). If it is a non-string container however, it does a regular membership check (which is why it only finds exact matches now).
So to fix it, change your test from if query in rowdata to if any(query in field for field in rowdata). If you only want to search for matches within a specific field (like the contact's name) then it could be even simpler: if query in rowdata[name_column] (where name_column is the number of the column of the name field is in your CSV file).

Categories

Resources