Query PyTables Nested Columns

Query PyTables Nested Columns - python

I have a table with a nested table column, route. Beneath that are two other nested datatypes, master and slave that both have an integer id and string type field.
I would like to run something like table.readWhere('route/master/id==0') but I get "variable route refers to a nested column, not allowed in conditions"
Is there a method to query a nested datatype in pytables?

You have to create variables to be used inside the condition string. One option is to define a variable dictionary:
table.readWhere('rId==0', condvars={'rId': table.cols.route.master.id})
Another option is to define local variables for the columns to be used in the condition.
rId = table.cols.route.master.id
table.readWhere('rId==0')
As this pollutes the namespace, I recommend you to create a function to wrap the code. I tried to reference the column itself, but it seems the interpreter fetches the whole dataset before throwing a NameError.
table.readWhere('table.cols.route.master.id==0') # DOES NOT WORK
More info on the where() method in the library reference.

Building on the answer by streeto, here is a quick way to get access to all the nested columns in a table when constructing a query
condvars = {k.replace('/', '__'): v for k, v in table.colinstances.items()}
result = table.read_where('route__master__id == 0', condvars=condvars)
table.colinstances returns a flat dict whose keys are slash separated paths to all columns (including nested ones) in the table, and whose values are instances of the PyTables Col class located at that path. You can't use the slash separated path in the query, but if you replace with some other separator that is allowed within Python identifiers (in this case I chose double underscores), then everything works fine. You could choose some other separator if you like.

Related

Splitting a DataFrame to filtered "sub - datasets"

So I have a DataFrame with several columns, some contain objects (string) and some are numerical.
I'd like to create new dataframes which are "filtered" to the combination of the objects available.
To be clear, those are my object type columns:
Index(['OS', 'Device', 'Design',
'Language'],
dtype='object')
["Design"] and ["Language"] have 3 options each.
I filtered ["OS"] and ["Device"] manually as I needed to match them.
However, now I want to create multiple variables each contains a "filtered" dataframe.
For example:
I have
"android_fltr1_d1" to represent the next filter:
["OS"]=android, ["Device"]=1,["Design"]=1
and "android_fltr3_d2" to represent:
["OS"]=android, ["Device"]=3,["Design"]=2
I tried the next code (which works perfectly fine).
android_fltr1_d1 = android_fltr1[android_fltr1["Design"]==1].drop(["Design"],axis=1)
android_fltr1_d2 = android_fltr1[android_fltr1["Design"]==2].drop(["Design"],axis=1)
android_fltr1_d3 = android_fltr1[android_fltr1["Design"]==3].drop(["Design"],axis=1)
android_fltr3_d1 = android_fltr3[android_fltr3["Design"]==1].drop(["Design"],axis=1)
android_fltr3_d2 = android_fltr3[android_fltr3["Design"]==2].drop(["Design"],axis=1)
android_fltr3_d3 = android_fltr3[android_fltr3["Design"]==3].drop(["Design"],axis=1)
android_fltr5_d1 = android_fltr5[android_fltr5["Design"]==1].drop(["Design"],axis=1)
android_fltr5_d2 = android_fltr5[android_fltr5["Design"]==2].drop(["Design"],axis=1)
android_fltr5_d3 = android_fltr5[android_fltr5["Design"]==3].drop(["Design"],axis=1)
As you can guess, I don't find it efficient and would like to use a for loop to generate those variables (as I'd need to match each ["Language"] option to each filter I created. Total of 60~ variables).
Thought about using something similar to .format() in the loop in order to be some kind of a "place-holder", couldn't find a way to do it.
It would be probably the best to use a nested loop to create all the variables, though I'd be content even with a single loop for each column.
I find it difficult to build the for loop to execute it and would be grateful for any help or directions.
Thanks!
As suggested I tried to find my answer in:How do I create variable variables?
Yet I failed to understand how I use the globals() function in my case. I also found that using '%' is not working anymore.

How can I specify a CellRange of variable length in DataNitro?

I am creating a script and part of it requires a list of names from a cell range to be stored as a list. I need the list to store as many names as are added to the cellrange however it must not store the values of empty cells.
If I simply use a longer range than is necessary like so:
names = CellRange("C10:C99999").value
then my final script will iterate through all the empty values which is extremely inefficient.

After quite some searching through the DataNitro documentation I found .vertical property which "returns the values of the cells starting with the cell it’s called from, and ending in the last non-empty cell in the same column."
So in my example this would mean:
names = Cell("C10").vertical

ldap3 library: modify attribute with multiple values

Trying to modify an ldap attribute that has multiple values, can't seem to figure out the syntax.
I'm using the ldap3 library with python3.
The documentation gives an example which modifies two attributes of an entry - but each attribute only has one value.
The dictionary from that example is the bit I'm having trouble with:
c.modify('cn=user1,ou=users,o=company',
{'givenName': [(MODIFY_REPLACE, [<what do I put here>])]})
Instead of 'givenname' which would have one value I want to modify the memberuid attribute which obviously would have many names as entries.
So I spit all my memberuids into a list, make the modification, and then am trying to feed my new usernames/memberuid list to the MODIFY command.
Like so:
oldval = 'super.man'
newval = 'clark.kent'
existingmembers = ['super.man', 'the.hulk', 'bat.man']
newmemberlist = [newval if x==oldval else x for x in existingmembers]
# newmemberlist = ", ".join(str(x) for x in newmemberlist)
I've tried passing in newmemberlist as a list
'memberuid': [(MODIFY_REPLACE, ['clark.kent', 'the.hulk','bat.man'])]
which gives me TypeError: 'str' object cannot be interpreted as an integer
or various combinations (the commented line) of one long string, separated with spaces, commas, semi colons and anything else I can think of
'memberuid': [(MODIFY_REPLACE, 'clark.kent, the.hulk, bat.man')]
which does the replace but I get one memberuid looking like this
'clark.kent, the.hulk, bat.man'

You need to ensure you are passing in the DN of the ldapobject you wish to modify.
c.modify(FULL_DN_OF_OBJECT, {'memberuid': [(MODIFY_REPLACE, ['clark.kent', 'the.hulk','bat.man'])]})
Then you should be able just to pass in newmemberlist instead of ['clark.kent', 'the.hulk','bat.man']
c.modify(FULL_DN_OF_OBJECT, {'memberuid': [(MODIFY_REPLACE, newmemberlist )]})

I believe MODIFY_REPLACE command would not accept multiple values, as it would not understand which values would be replaced with new ones. Instead you should try doing MODIFY_DELETE old values first and MODIFY_ADD new values afterwards.

I have single-element arrays. How do I change them into the elements themselves?

Importing a JSON document into a pandas dataframe using records = pandas.read_json(path), where path was a pre-defined path to the JSON document, I discovered that the content of certain columns of the resulting dataframe "records" are not simply strings as expected. Instead, each "cell" in such a column is an array, containing one single element -- the string of interest. This makes selecting columns using boolean indexing difficult. For example, records[records['category']=='Python Books'] in Ipython outputs an empty dataframe; had the "cells" contained strings instead of arrays of strings, the output would have been nonempty, containing rows that correspond to python books.
I could modify the JSON document, so that "records" reads the strings in properly. But is there a way to modify "records" directly, to somehow strip the single-element arrays into the elements themselves?

Update: After clarification, I believe this might accomplish what you want while limiting it to a single iteration over the data:
nested_column_1 = records["column_name_1"]
nested_column_2 = records["column_name_2"]
clean_column_1 = []
clean_column_2 = []
for i in range(0, len(records.index):
clean_column_1.append(nested_column_1[i][0])
clean_column_2.append(nested_column_2[i][0])
Then you convert the clean_column lists to Series like you mentioned in your comment. Obviously, you make as many nested_column and clean_column lists as you need, and update them all in the loop.
You could generalize this pretty easily by keeping a record of "problem" columns and using that to create a data structure to manage the nested/clean lists, rather than declaring them explicitly as I did in my example. But I thought this might illustrate the approach more clearly.
Obviously, this assumes that all columns have the same number of elements, which maybe isn't a a valid assertion in your case.
Original Answer:
Sorry if I'm oversimplifying or misunderstanding the problem, but could you just do something like this?
simplified_list = [element[0] for element in my_array_of_arrays]
Or if you don't need the whole thing at once, just a generator instead:
simplifying_generator = (element[0] for element in my_array_of_arrays)

How to rewrite this Dictionary For Loop in Python?

I have a Dictionary of Classes where the classes hold attributes that are lists of strings.
I made this function to find out the max number of items are in one of those lists for a particular person.
def find_max_var_amt(some_person) #pass in a patient id number, get back their max number of variables for a type of variable
max_vars=0
for key, value in patients[some_person].__dict__.items():
challenger=len(value)
if max_vars < challenger:
max_vars= challenger
return max_vars
What I want to do is rewrite it so that I do not have to use the .iteritems() function. This find_max_var_amt function works fine as is, but I am converting my code from using a dictionary to be a database using the dbm module, so typical dictionary functions will no longer work for me even though the syntax for assigning and accessing the key:value pairs will be the same. Thanks for your help!

Since dbm doesn't let you iterate over the values directly, you can iterate over the keys. To do so, you could modify your for loop to look like
for key in patients[some_person].__dict__:
value = patients[some_person].__dict__[key]
# then continue as before
I think a bigger issue, though, will be the fact that dbm only stores strings. So you won't be able to store the list directly in the database; you'll have to store a string representation of it. And that means that when you try to compute the length of the list, it won't be as simple as len(value); you'll have to develop some code to figure out the length of the list based on whatever string representation you use. It could just be as simple as len(the_string.split(',')), just be aware that you have to do it.
By the way, your existing function could be rewritten using a generator, like so:
def find_max_var_amt(some_person):
return max(len(value) for value in patients[some_person].__dict__.itervalues())
and if you did it that way, the change to iterating over keys would look like
def find_max_var_amt(some_person):
dct = patients[some_person].__dict__
return max(len(dct[key]) for key in dct)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Query PyTables Nested Columns - python

Related

Splitting a DataFrame to filtered "sub - datasets"

How can I specify a CellRange of variable length in DataNitro?

ldap3 library: modify attribute with multiple values

I have single-element arrays. How do I change them into the elements themselves?

How to rewrite this Dictionary For Loop in Python?

Categories

Resources