How can i get the just the column headers of a table in KDB? Is there a special query for this? I am asking because when I pull the data from the table into python the column headers are lost. Thanks!
If you're using one of the two regular python APIs for getting data from KDB, I think both implement a Flip class from which you can get the column names (usually there is an x which is an array of strings).
Otherwise cols tableName gives you list of symbols (which would be deserialized as an array of strings in python)
Related
I have a Pandas DataFrame that looks like this.
DataFrame picture
I thought of save a tuple of two values under a column and then retrieve whichever value is needed. But now, for example, if I want the first value in the tuple located at the first row of the 'Ref' column, I get "(" instead of "c0_4"
df = pd.read_csv(df_path)
print(df['Ref'][0][0])
The output for this is "(" and not "c0_4".
I don't want to use split() because I want the values to be searchable in the dataframe. For example, I would want to search for "c0_8" under the "Ref" column and get the row.
What other alternatives do I have to save two values in a row under the same column?
The immediate problem is that you're simply accessing character 0 of a string.
file is character-oriented storage; there is no "data frame" abstraction. Hence, we use CSV to hold the columnar data as text, a format that allows easy output and input recovery.
A CSV file consists only of text, with the separator character and newline having special meanings. There is no "tuple" form. Your data frame is stored as string data. If you want to recover your original tuple form, you will need to write parsing code to convert the strings back to tuples. Alternately, you can switch to a "pickle" (PCL) format for storing your data.
That should be enough leads to allow you to research whatever alternatives you want.
Your data is stored as a string
To format it into a tuple, split every string in your DataFrame and save it back as a tuple, with something like:
for n...
for m...
df[n][m] = (df[n][m].split(",")[0][1:], df[n][m].split(",")[1][:-1])
I have a dataset with a lot of fields, so I don't want to load all of it into a pd.DataFrame, but just the basic ones.
Sometimes, I would like to do some filtering upon loading and I would like to apply the filter via the query or eval methods, which means that I need a query string in the form of, i.e. "PROBABILITY > 10 and DISTANCE <= 50", but these columns need to be loaded in the dataframe.
Is is possible to extract the column names from the query string in order to load them from the dataset?
I know some magic using regex is possible, but I'm sure that it would break sooner or later, as the conditions get complicated.
So, I'm asking if there is a native pandas way to extract the column names from the query string.
I think you can use when you load your dataframe the term use cols I use it when I load a csv I dont know that is possible when you use a SQL or other format.
Columns_to use=['Column1','Column3']
pd.read_csv(use_cols=Columns_to_use,...)
Thank you
I have a list/array of strings:
l = ['jack','jill','bob']
Now I need to create a table in slite3 for python using which I can insert this array into a column called "Names". I do not want multiple rows with each name in each row. I want a single row which contains the array exactly as shown above and I want to be able to retrieve it in exactly the same format. How can I insert an array as an element in a db? What am I supposed to declare as the data type of the array while creating the db itself? Like:
c.execute("CREATE TABLE names(id text, names ??)")
How do I insert values too? Like:
c.execute("INSERT INTO names VALUES(?,?)",(id,l))
EDIT: I am being so foolish. I just realized that I can have multiple entries for the id and use a query to extract all relevant names. Thanks anyway!
You can store an array in a single string field, if you somehow genereate a string representation of it, e.g. sing the pickle module. Then, when you read the line, you can unpickle it. Pickle converts many different complex objects (but not all) into a string, that the object can be restored of. But: that is most likely not what you want to do (you wont be able to do anything with the data in the tabel, except selecting the lines and then unpickle the array. You wont be able to search.
If you want to have anything of varying length (or fixed length, but many instances of similiar things), you would not want to put that in a column or multiple columns. Thing vertically, not horizontally there, meaning: don't thing about columns, think about rows. For storing a vector with any amount of components, a table is a good tool.
It is a little difficult to explain from the little detail you give, but you should think about creating a second table and putting all the names there for every row of your first table. You'd need some key in your first table, that you can use for your second table, too:
c.execute("CREATE TABLE first_table(int id, varchar(255) text, additional fields)")
c.execute("CREATE TABLE names_table(int id, int num, varchar(255) name)")
With this you can still store whatever information you have except the names in first_table and store the array of names in names_table, just use the same id as in first_table and num to store the index positions inside the array. You can then later get back the array by doing someting like
SELECT name FROM names_table
WHERE id=?
ORDER BY num
to read the array of names for any of your rows in first_table.
That's a pretty normal way to store arrays in a DB.
This is not the way to go. You should consider creating another table for names with foreign key to names.
You could pickle/marshal/json your array and store it as binary/varchar/jsonfield in your database.
Something like:
import json
names = ['jack','jill','bill']
snames = json.dumps(names)
c.execute("INSERT INTO nametable " + snames + ";")
I am using client.insert_rows_from_dataframe method to insert data into my table.
obj = client.insert_rows_from_dataframe(table=TableRef, dataframe=df)
If there is no errors, obj will be an empty list of lists like
> print(obj)
[[] [] []]
But I want to know how to get the error messages out, if there are some errors while inserting?
I tried
obj[["errors"]] ?
but that is not correct. Please help.
To achieve the results that you want, you must set to your DataFrame a header identical to the one in your schema. For example, if you schema in BigQuery has the fields index and name, your DataFrame should have these two columns.
Lets take a look in the example below:
I created an table in BigQuery named insert_from_dataframe which contains the fields index, name and number, respectively INTEGER, STRING and INTEGER, all of them REQUIRED.
In the image below you can see that the insertion cause no errors.In the second image, we can see that the data was inserted.
No erros raised
Data inserted successfully
After that, I removed the column number for the last row of the same data. As you can see below, when I tried to push it to BigQuery, I got an error.
Given that, I would like to reinforce two points:
The error structured that is returned is a list of lists ( [],[],[],...]). The reason for that is because your data is supposed to be pushed in chunks (subsets of your data). In the function used you can specify how many rows each chunk will have using the parameter chunk_size=<number_of_rows>. Lets suppose that your data has 1600 rows and your chunk size is 500. You data will be divided into 4 chunks. The object returned after the insert request, hence, will consist of 4 lists inside a list, where each one of the four lists is related to one chunk. Its also important to say that if a row fails the process, all the rows inside the same chunk will not be inserted in the table.
If you are using string fields you should pay attention in the data inserted. Sometimes Pandas read null values as empty strings and it leads to a misinterpretation of the data by the insertion mechanism. In other words, its possible that you have empty strings inserted in your table while the expected result would be an error saying that the field can not be null.
Finally, I would like to post here some useful links for this problem:
BigQuery client documentation
Working with missing values in Pandas
I hope it helps.
Newbie working with db2. Developing a python script using ibm_db package. I have a select query where i am binding params using ibm_db.bind_param(stmt, 1,param1). and then doing a result = ibm_db.execute(stmt). How can I get the results from the query. the documentation is scarce on this topic. Would appreciate any example code.
After ibm_db.execute(stmt) you need to fetch data from a result
try this:
data = ibm_db.fetch_assoc(stmt)
Fetch data from a result set by calling one of the fetch functions.
ibm_db.fetch_tuple: Returns a tuple, which is indexed by column position, representing a row in a result set. The columns are 0-indexed.
ibm_db.fetch_assoc: Returns a dictionary, which is indexed by column name, representing a row in a result set.
ibm_db.fetch_both: Returns a dictionary, which is indexed by both column name and position, representing a row in a result set.
ibm_db.fetch_row: Sets the result set pointer to the next row or requested row. Use this function to iterate through a result set.
Study the examples for fetching result-sets in Python with ibm_db, that are in the Db2 knowledge center online at this link