result = {'data1': [1,2], 'data2': [4,5]}
I know how to write if I key has single value, but I this case it has list of values how I can iterate. Create table in BigQuery as follows:
| data1 | data2 |
| -------- | -------------- |
| 1 | 4 |
| 2 | 5 |
I had to same problem and I couldnt find a solution, so what I did(All our DB's are managed as service in our code) is the following:
I am fetching the table id, then using get_table(table_id) .
Now I am sending the table._properties['schema']['fields'] with the list data to a function that convert it into Json
def __Convert_Data_To_Json(self,data,table_fields):
if not(len(table_fields)==len(data[0])):
raise CannotCreateConnectionError(CANNOT_CREATE_CONNECTION_MESSAGE % str('Data length doesnt match table feild list'))
for row in data:
dataset={}
i=0
for col in row:
if col==None:
dataset[list(table_fields[i].values())[0] ]=None
else:
dataset[list(table_fields[i].values())[0] ]=str(col)
i+=1
self.__rows_to_insert.append(dataset)
and then using insert_rows_json
Related
I am using PySpark and I want to convert a Spark DataFrame into a specific JSON file.
the Dataframe is like this:
| Key | desc | value |
|:---- |:----:| -----:|
| 12345| type | AA |
| 12345| id | q1w2e3|
| 98765| type | BB |
| 98765| id | z1x2c3|
I need to convert it into a JSON like this:
{
"12345": {
"type":"AA,
"id":"q1w2e3"
},
"98765":{
"type":"BB",
"id":"z1x2c3"
}
}
First collect the dataframe
Output = df.collect()
if you try to print the “Output” you will get List of Row Tuple something like this
[Row(key:1234,desc:type,value:AA)…..]
Now iterate over this list using for loop and Create dictionary and assign these value
you can directly access them like this.
For row in Output:
dict[key] = row[key]
once the dictionary is create then you can use Json.dumps(dict)
What I am looking to do is for every email address that is the same, take the corresponding rows with that same email and create new dataframes and then send an email with the row information to the email address in col 1.
| email | Acct # | Acct Status |
| ------------------|--------|-------------|
| janedoe#gmail.com | 1230 | Closed |
| janedoe#gmail.com | 2546 | Closed |
| janedoe#gmail.com | 2468 | Closed |
| janedoe#gmail.com | 7896 | Closed |
| michaeldoe#aol.com| 4565 | Closed |
| michaeldoe#aol.com| 9686 | Closed |
|jackdoe#aol.com | 4656 | Closed |
I tried something along the lines of converting the dataframe into a list by using groupby but I am stuck:
df_list = [x for _, x in df.groupby(['email'])
I am not sure how you want to store you data or what you want to do with it. I've chosen to store the output in a Python dictionary with email contact as the key and all their various accounts and their status as the value. You can use a combination of groupby and drop_duplicates to extract and form the information you want.
df_grouped = df.groupby('email').groups
df_contacts = df.drop_duplicates(subset = ['email'])
result = {} # dictionary for results
for item in df_contacts['email']:
rows = df_grouped[item].tolist()
my_data = []
for x in rows:
info = df[['Accnt #', 'Accnt Status']].iloc[x].values
my_data.append(info.tolist())
result[item] = my_data
Then you can use the data as required. For example:
for i, j in result.items():
print('Send email to ', i, ' with their account info as follows')
for z in j:
print('Account : ', z[0], ' Status :', z[1])
If for some reason you really want the resulting data to go in separate DataFrames then this could be in a Dictionary of DataFrames as follows:
dx = {}
for i, j in result.items():
dfx = pd.DataFrame.from_dict(result[i])
dfx.columns =['Accnt', 'Accnt Status']
dx[i]=dfx
print(dx['janedoe#gmail.com']) #as an example of accessing the data
My data is organized in a data frame with the following structure
| ID | Post | Platform |
| -------- | ------------------- | ----------- |
| 1 | Something #hashtag1 | Twitter |
| 2 | Something #hashtag2 | Insta |
| 3 | Something #hashtag1 | Twitter |
I have been able to extract and count the hashtag using the following (using this post):
df.Post.str.extractall(r'(\#\w+)')[0].value_counts().rename_axis('hashtags').reset_index(name='count')
I am now trying to count hashtag operation occurrence from each platform. I am trying the following:
df.groupby(['Post', 'Platform'])['Post'].str.extractall(r'(\#\w+)')[0].value_counts().rename_axis('hashtags').reset_index(name='count')
But, I am getting the following error:
AttributeError: 'SeriesGroupBy' object has no attribute 'str'
We can solve this easily using 2 steps.Assumption each post has just single hashtag
Step 1: Create a new column with Hashtag
df['hashtag']= df.Post.str.extractall(r'(\#\w+)')[0].reset_index()[0]
Step 2: Group by and get the counts
df.groupby([ 'Platform']).hashtag.count()
Generic Solutions Works for any number of hashtag
We can solve this easily using 2 steps.
# extract all hashtag
df1 = df.Post.str.extractall(r'(\#\w+)')[0].reset_index()
# Ste index as index of original tagle where hash tag came from
df1.set_index('level_0',inplace = True)
df1.rename(columns={0:'hashtag'},inplace = True)
df2 = pd.merge(df,df1,right_index = True, left_index = True)
df2.groupby([ 'Platform']).hashtag.count()
This question already has answers here:
Python - Printing a dictionary as a horizontal table with headers
(7 answers)
Closed 2 years ago.
Basically, I have a dictionary and I'd like to construct a table from it.
The dictionary is of the form:
dict={
'1':{'fruit':'apple',
'price':0.60,
'unit':'pieces',
'stock':60
},
'2':{'fruit':'cherries',
'price':15.49,
'unit':'kg',
'stock':5.6
},
and so on.
}
I want the table to look like with correct alignment of numbers:
no |item | price | stock
----+----------+-------+----------
1 |apple | 0.60 | 60 pieces
----+----------+-------+----------
2 |cherries | 15.49 | 5.6 kg
and so on...
I do NOT want to print this table out, I'm trying to write a function that takes the dict as input and RETURNS this table as a string.
Here's my attempt:
def items(dct)
table="{0:<2} | {1:<33} | {2:^8} | {3:^11}".format("no", "item", "price","stock")
...
return table
I'm having trouble with formatting strings, I've tried to add line breaks and play around with different things but I always get various errors and things just aren't working out :(
I'm new to Python, could someone educate me pls.
Thanks!
def table_create(dct):
dashes = "{0:<2} + {1:<33} + {2:^8} + {3:^11} \n".format("-"*2, "-"*33, "-"*8, "-"*11)
table="{0:<2} | {1:<33} | {2:^8} | {3:^11} \n".format("no", "item", "price", "stock")
table+=dashes
for key, value in dct.items():
table+="{0:<2} | {1:<33} | {2:^8} | {3:^11} \n".format(key, value["fruit"], value["price"],str(value["stock"])+" "+value["unit"])
table+=dashes
return table
print(table_create(dct))
# output
no | item | price | stock
-- + --------------------------------- + -------- + -----------
1 | apple | 0.6 | 60 pieces
-- + --------------------------------- + -------- + -----------
2 | cherries | 15.49 | 5.6 kg
-- + --------------------------------- + -------- + -----------
the same way you stored the header of the table, you can store its entries and print them or do whatever you want.
dict={
'1':{'fruit':'apple','price':0.60,'unit':'pieces','stock':60},
'2':{'fruit':'cherries','price':15.49,'unit':'kg','stock':5.6}
}
def items(dct):
table="{0:<2} | {1:<33} | {2:^8} | {3:^11}".format("no", "item", "price","stock")
print(table)
for i in dict:
print("{0:<2} | {1:<33} | {2:^8} | {3:^11}".format(i,dict[i]['fruit'] ,dict[i]['price'],str(dict[i]['stock'])+' '+dict[i]['unit']))
items(dict)
You can check these questions:
Python - Printing a dictionary as a horizontal table with headers
Printing Lists as Tabular Data
Instead of printing the data, just concatenate it in a string.
This question already has answers here:
Printing Lists as Tabular Data
(20 answers)
Closed 3 years ago.
I want to make a table in python
+----------------------------------+--------------------------+
| name | rank |
+----------------------------------+--------------------------+
| {} | [] |
+----------------------------------+--------------------------+
| {} | [] |
+----------------------------------+--------------------------+
But the problem is that I want to first load a text file that should contain domains name and then I would like to making a get request to each domain one by one and then print website name and status code in table format and table should be perfectly align. I have completed some code but failed to display output in a table format that should be in perfectly align as you can see in above table format.
Here is my code
f = open('sub.txt', 'r')
for i in f:
try:
x = requests.get('http://'+i)
code = str(x.status_code)
#Now here I want to display `code` and `i` variables in table format
except:
pass
In above code I want to display code and i variables in table format as I showed in above table.
Thank you
You can achieve this using the center() method of string. It creates and returns a new string that is padded with the specified character.
Example,
f = ['AAA','BBBBB','CCCCCC']
codes = [401,402,105]
col_width = 40
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
print("|"+"Name".center(col_width)+"|"+"Rank".center(col_width)+"|")
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
for i in range(len(f)):
_f = f[i]
code = str(codes[i])
print("|"+code.center(col_width)+"|"+_f.center(col_width)+"|")
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
Output
+----------------------------------------+----------------------------------------+
| Name | Rank |
+----------------------------------------+----------------------------------------+
| 401 | AAA |
+----------------------------------------+----------------------------------------+
| 402 | BBBBB |
+----------------------------------------+----------------------------------------+
| 105 | CCCCCC |
+----------------------------------------+----------------------------------------+