I am trying to construct a hypergraph via Graphistry from this dataframe:
data = [
["Jack", "Lauren", "Brian"],
["Lauren", "Brian", "Jaden"],
["Brian", "Jaden", "Tessa"],
]
names_df = pd.DataFrame(data, columns=["Previous", "Current", "Next"])
hg1 = graphistry.hypergraph(names_df, entity_types=["Previous", "Current", "Next"])
hg1_g = hg1["graph"]
hg1_g.plot()
The problem is that the nodes are treated as different each time they appear in the various columns.
I would like to get 5 different nodes and 3 edges, one for each row in the dataframe.
If you want to merge node that appear in different columns, you can specify categories with the opts kwarg. Moreover, you can draw your edges directly between nodes by specifying direct=True:
data = [
["Jack", "Lauren", "Brian"],
["Lauren", "Brian", "Jaden"],
["Brian", "Jaden", "Tessa"],
]
names_df = pd.DataFrame(data, columns=["Previous", "Current", "Next"])
hg1 = graphistry.hypergraph(names_df,
entity_types=["Previous", "Current", "Next"],
direct=True,
opts={
'CATEGORIES': {
'person': ["Previous", "Current", "Next"]
},
'EDGES': {
"Previous": ["Current"],
"Current": ["Next"]
}
}
)
hg1_g = hg1["graph"]
hg1_g.plot()
Edit: you can also specify your edges with 'EDGES' inside opts. See hypergraph docstring
Note that you don't need to specify entity type in this case since you're using all columns.
Related
I couldn't find any resources on how to format the google spreadsheet in gspread python using the row and column values instead of A1 notations.
I have a spreadsheet with 50 rows and I don't want to find the notation of the 50th column. Rather I like to use the row and column coordinates like (1,50) -> first row with 50 columns to make them bold and adjusting the width of all the columns.
Please suggest and thanks in advance
I'm able to format the cells using row and column coordinates using the below function. Accumulated different answers available for the related questions in Stackoverflow. Thank you.
def formatHeaderRow(gs, ws):
#to change the column width of the specific range of cells using row and column numbers
sheetId = ws._properties['sheetId']
numOfColumns = 26 # Keep your column number here
body = {
"requests": [
{
"updateDimensionProperties": {
"range": {
"sheetId": sheetId,
"dimension": "COLUMNS",
"startIndex": 0, #from A1
"endIndex": numOfColumns # 26 -> A26
},
"properties": {
"pixelSize": "150"
},
"fields": "pixelSize"
}
}
]
}
res = gs.batch_update(body)
#to bold the first row using row number
ws.format("1", {
"textFormat": {
"bold": True
}
})
I need to 'cross join' (for want of a better term !) 2 lists.
Between them they represent a tabled dataset but ..
One holds the column header names, the other a nested array with the row values.
I've managed the easy bit :
col_names = [i['name'] for i in c]
which strips the column names out in to a list without 'typeName'
But just thinking how to extract the row field values and map them with column names .. is giving me a headache!
Any pointers appreciated ;)
Thanks
Columns (as provided):
[
{
"name": "col1",
"typeName": "varchar"
},
{
"name": "col2",
"typeName": "int4"
}
]
Records (as provided):
[
[
{
"stringValue": "apples"
},
{
"longValue": 1
}
],
[
{
"stringValue": "bananas"
},
{
"longValue": 2
}
]
]
Required Result:
[
{
'col1':'apples',
'col2':1
},
{
'col1':'bananas',
'col2':2
}
]
You have to be able to assume there is a 1-to-1 correspondence between the names in the schema and the dicts in the records. Once you assume that, it's pretty easy:
names = [i['name'] for i in schema]
data = []
for row in records:
d = {}
for a,b in zip( names, row ):
d[a] = list(b.values())[0]
data.append(d)
print(data)
I'm parsing some XML data, doing some logic on it, and trying to display the results in an HTML table. The dictionary, after filling, looks like this:
{
"general_info": {
"name": "xxx",
"description": "xxx",
"language": "xxx",
"prefix": "xxx",
"version": "xxx"
},
"element_count": {
"folders": 23,
"conditions": 72,
"listeners": 1,
"outputs": 47
},
"external_resource_count": {
"total": 9,
"extensions": {
"jar": 8,
"json": 1
},
"paths": {
"/lib": 9
}
},
"complexity": {
"over_1_transition": {
"number": 4,
"percentage": 30.769
},
"over_1_trigger": {
"number": 2,
"percentage": 15.385
},
"over_1_output": {
"number": 4,
"percentage": 30.769
}
}
}
Then I'm using pandas to convert the dictionary into a table, like so:
data_frame = pandas.DataFrame.from_dict(data=extracted_metrics, orient='index').stack().to_frame()
The result is a table that is mostly correct:
While the first and second levels seem to render correctly, those categories with a sub-sub category get written as a string in the cell, rather than as a further column. I've also tried using stack(level=1) but it raises an error "IndexError: Too many levels: Index has only 1 level, not 2". I've also tried making it into a series with no luck. It seems like it only renders "complete" columns. Is there a way of filling up the empty spaces in the dictionary before processing?
How can I get, for example, external_resource_count -> extensions to have two daughter rows jar and json, with an additional column for the values, so that the final table looks like this:
Extra credit if anyone can tell me how to get rid of the first row with the index numbers. Thanks!
The way you load the dataframe is correct but you should rename the 0 to a some column name.
# this function extracts all the keys from your nested dicts
def explode_and_filter(df, filterdict):
return [df[col].apply(lambda x:x.get(k) if type(x)==dict else x).rename(f'{k}')
for col,nested in filterdict.items()
for k in nested]
data_frame = pd.DataFrame.from_dict(data= extracted_metrics, orient='index').stack().to_frame(name='somecol')
#lets separate the rows where a dict is present & explode only those rows
mask = data_frame.somecol.apply(lambda x:type(x)==dict)
expp = explode_and_filter(data_frame[mask],
{'somecol':['jar', 'json', '/lib', 'number', 'percentage']})
# here we concat the exploded series to a frame
exploded_df = pd.concat(expp, axis=1).stack().to_frame(name='somecol2').reset_index(level=2)\.rename(columns={'level_2':'somecol'})
# and now we concat the rows with dict elements with the rows with non dict elements
out = pd.concat([data_frame[~mask], exploded_df])
The output dataframe looks like this
I am trying to retrieve the values from specific columns from the python list object. This is the response format from Log analytics API here - https://dev.loganalytics.io/documentation/Using-the-API/ResponseFormat
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "Category",
"type": "string"
},
{
"name": "count_",
"type": "long"
}
],
"rows": [
[
"Administrative",
20839
],
[
"Recommendation",
122
],
[
"Alert",
64
],
[
"ServiceHealth",
11
]
]
}
]
}
There are hundreds of columns and i want to read specific columns and row values. To do that, i initially tried to find an index for the column for e.g., "Category" and retrieve all the values from rows. Here is what i have done so far.
result=requests.get(url, params=params, headers=headers, verify=False)
index_category = (result.json()['tables'][0]['columns']).index('Category')
result contains data in the format posted above. I get this below error. What am i missing?
ValueError: 'Category' is not in list
I want to be able to retrieve the Category values from rows array in a loop. I have also done this below loop and i am able to get what i want but want to confirm if there is a better way to do this. Also i am retrieving the column index first before reading the row value because i suspect blindly reading the row values with explicit index values is prone to error, particularly when the sequence of columns change.
for column in range(0,columns):
if ((result.json()['tables'][0]['columns'][column]['name']) == 'Category'):
index_category = column
for row in range(0,rows):
print(result.json()['tables'][0]['rows'][row][index_category])
json_data = results.json()
for index, columns in enumerate(json_data['tables'][0]['columns']):
if columns['name'] == 'Category':
category_index = index
break
category_list = []
for row in json_data['tables'][0]['rows']:
category_list.append(row[category_index])
Haven't tested it btw.
You could also refactor the first loop where we find the index for the category with the filter function.
I have 3 dictionaries( 2 of them are setdefault dicts with multiple values)-
Score_dict-
{'Id_1': [('100001124156327', 0.0),
('100003643614411',0.0)],
'Id_2': [('100000435456546',5.7),
('100000234354556',3.5)]}
post_dict-
{'Id_1':[(+,100004536)],
'Id_2' :[(-,100035430)]}
comment_dict-
{'Id_1':[(+,1023434234)],
'Id_2':[(-,10343534534)
(*,1097963644)]}
My current approach is to write them into 3 different csv files and then merging them,I want to merge them according to a common first row(ID_row).
But I am unable to figure out how to merge 3 csv files into a single csv file. Also , Is there any way which I can write all the 3 dictionaries into a single csv without writing them individually.
Output required-
Ids Score_Ids Post_Ids Comment_Ids
Id_1 100001124156327',0.0 +,100004536 +,1023434234
100003643614411',0.0
Id_2 100000435456546',5.7 -,100035430 -,10343534534
100000234354556',3.5 *,1097963644
How to do this in a correct way with the best approach?
You can merge them all first, then write them to a csv file:
import pprint
scores = {
'Id_1': [
('100001124156327', 0.0),
('100003643614411',0.0)],
'Id_2': [
('100000435456546',5.7),
('100000234354556',3.5)
]
}
post_dict = {
'Id_1':[
('+',100004536)
],
'Id_2' :[
('-',100035430)
]
}
comment_dict = {
'Id_1':[
('+',1023434234)
],
'Id_2':[
('-',10343534534),
('*',1097963644)
]
}
merged = {
key: {
"Score_Ids": value,
"Post_Ids": post_dict[key],
"Comment_Ids": comment_dict[key]
}
for key, value
in scores.iteritems()
}
pp = pprint.PrettyPrinter(depth=6)
pp.pprint(merged)
For reference: https://repl.it/repls/SqueakySlateblueDictionaries
I suggest you to transform your three dicts into one list of dicts before write it to a csv file.
Example
rows = [
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
{"Score_Id": "...", "Post_Id": "...", "Comment_Id": "..."},
...
]
And then use the csv.DictWriter class to write all the rows.
Since you have commas in your values (are you sure it's a good behaviour? Maybe splitting them in two different columns could be a better approach), be careful to use tabs or something else as separator
I suggest writing all three to the same file
You could get common keys by doing something like:
common_keys = set(score_dict.keys()+post_dict.keys()+comment_dict.keys())
for key_ in common_keys:
val_score = score_dict.get(key_, some_default_value)
post_score = post_dict.get(key_, some_default_value)
comment_score = comment_dict.get(key_, some_default_value)
# print key and vals to csv as before