Creating table from dictionary & string formatting - Python [duplicate] - python

This question already has answers here:
Python - Printing a dictionary as a horizontal table with headers
(7 answers)
Closed 2 years ago.
Basically, I have a dictionary and I'd like to construct a table from it.
The dictionary is of the form:
dict={
'1':{'fruit':'apple',
'price':0.60,
'unit':'pieces',
'stock':60
},
'2':{'fruit':'cherries',
'price':15.49,
'unit':'kg',
'stock':5.6
},
and so on.
}
I want the table to look like with correct alignment of numbers:
no |item | price | stock
----+----------+-------+----------
1 |apple | 0.60 | 60 pieces
----+----------+-------+----------
2 |cherries | 15.49 | 5.6 kg
and so on...
I do NOT want to print this table out, I'm trying to write a function that takes the dict as input and RETURNS this table as a string.
Here's my attempt:
def items(dct)
table="{0:<2} | {1:<33} | {2:^8} | {3:^11}".format("no", "item", "price","stock")
...
return table
I'm having trouble with formatting strings, I've tried to add line breaks and play around with different things but I always get various errors and things just aren't working out :(
I'm new to Python, could someone educate me pls.
Thanks!

def table_create(dct):
dashes = "{0:<2} + {1:<33} + {2:^8} + {3:^11} \n".format("-"*2, "-"*33, "-"*8, "-"*11)
table="{0:<2} | {1:<33} | {2:^8} | {3:^11} \n".format("no", "item", "price", "stock")
table+=dashes
for key, value in dct.items():
table+="{0:<2} | {1:<33} | {2:^8} | {3:^11} \n".format(key, value["fruit"], value["price"],str(value["stock"])+" "+value["unit"])
table+=dashes
return table
print(table_create(dct))
# output
no | item | price | stock
-- + --------------------------------- + -------- + -----------
1 | apple | 0.6 | 60 pieces
-- + --------------------------------- + -------- + -----------
2 | cherries | 15.49 | 5.6 kg
-- + --------------------------------- + -------- + -----------

the same way you stored the header of the table, you can store its entries and print them or do whatever you want.
dict={
'1':{'fruit':'apple','price':0.60,'unit':'pieces','stock':60},
'2':{'fruit':'cherries','price':15.49,'unit':'kg','stock':5.6}
}
def items(dct):
table="{0:<2} | {1:<33} | {2:^8} | {3:^11}".format("no", "item", "price","stock")
print(table)
for i in dict:
print("{0:<2} | {1:<33} | {2:^8} | {3:^11}".format(i,dict[i]['fruit'] ,dict[i]['price'],str(dict[i]['stock'])+' '+dict[i]['unit']))
items(dict)

You can check these questions:
Python - Printing a dictionary as a horizontal table with headers
Printing Lists as Tabular Data
Instead of printing the data, just concatenate it in a string.

Related

convert list to BigQuery table python

result = {'data1': [1,2], 'data2': [4,5]}
I know how to write if I key has single value, but I this case it has list of values how I can iterate. Create table in BigQuery as follows:
| data1 | data2 |
| -------- | -------------- |
| 1 | 4 |
| 2 | 5 |
I had to same problem and I couldnt find a solution, so what I did(All our DB's are managed as service in our code) is the following:
I am fetching the table id, then using get_table(table_id) .
Now I am sending the table._properties['schema']['fields'] with the list data to a function that convert it into Json
def __Convert_Data_To_Json(self,data,table_fields):
if not(len(table_fields)==len(data[0])):
raise CannotCreateConnectionError(CANNOT_CREATE_CONNECTION_MESSAGE % str('Data length doesnt match table feild list'))
for row in data:
dataset={}
i=0
for col in row:
if col==None:
dataset[list(table_fields[i].values())[0] ]=None
else:
dataset[list(table_fields[i].values())[0] ]=str(col)
i+=1
self.__rows_to_insert.append(dataset)
and then using insert_rows_json

Python count hashtag per platform

My data is organized in a data frame with the following structure
| ID | Post | Platform |
| -------- | ------------------- | ----------- |
| 1 | Something #hashtag1 | Twitter |
| 2 | Something #hashtag2 | Insta |
| 3 | Something #hashtag1 | Twitter |
I have been able to extract and count the hashtag using the following (using this post):
df.Post.str.extractall(r'(\#\w+)')[0].value_counts().rename_axis('hashtags').reset_index(name='count')
I am now trying to count hashtag operation occurrence from each platform. I am trying the following:
df.groupby(['Post', 'Platform'])['Post'].str.extractall(r'(\#\w+)')[0].value_counts().rename_axis('hashtags').reset_index(name='count')
But, I am getting the following error:
AttributeError: 'SeriesGroupBy' object has no attribute 'str'
We can solve this easily using 2 steps.Assumption each post has just single hashtag
Step 1: Create a new column with Hashtag
df['hashtag']= df.Post.str.extractall(r'(\#\w+)')[0].reset_index()[0]
Step 2: Group by and get the counts
df.groupby([ 'Platform']).hashtag.count()
Generic Solutions Works for any number of hashtag
We can solve this easily using 2 steps.
# extract all hashtag
df1 = df.Post.str.extractall(r'(\#\w+)')[0].reset_index()
# Ste index as index of original tagle where hash tag came from
df1.set_index('level_0',inplace = True)
df1.rename(columns={0:'hashtag'},inplace = True)
df2 = pd.merge(df,df1,right_index = True, left_index = True)
df2.groupby([ 'Platform']).hashtag.count()

I want to display variables in table format that should be perfectly align in python [duplicate]

This question already has answers here:
Printing Lists as Tabular Data
(20 answers)
Closed 3 years ago.
I want to make a table in python
+----------------------------------+--------------------------+
| name | rank |
+----------------------------------+--------------------------+
| {} | [] |
+----------------------------------+--------------------------+
| {} | [] |
+----------------------------------+--------------------------+
But the problem is that I want to first load a text file that should contain domains name and then I would like to making a get request to each domain one by one and then print website name and status code in table format and table should be perfectly align. I have completed some code but failed to display output in a table format that should be in perfectly align as you can see in above table format.
Here is my code
f = open('sub.txt', 'r')
for i in f:
try:
x = requests.get('http://'+i)
code = str(x.status_code)
#Now here I want to display `code` and `i` variables in table format
except:
pass
In above code I want to display code and i variables in table format as I showed in above table.
Thank you
You can achieve this using the center() method of string. It creates and returns a new string that is padded with the specified character.
Example,
f = ['AAA','BBBBB','CCCCCC']
codes = [401,402,105]
col_width = 40
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
print("|"+"Name".center(col_width)+"|"+"Rank".center(col_width)+"|")
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
for i in range(len(f)):
_f = f[i]
code = str(codes[i])
print("|"+code.center(col_width)+"|"+_f.center(col_width)+"|")
print("+"+"-"*col_width+"+"+"-"*col_width+"+")
Output
+----------------------------------------+----------------------------------------+
| Name | Rank |
+----------------------------------------+----------------------------------------+
| 401 | AAA |
+----------------------------------------+----------------------------------------+
| 402 | BBBBB |
+----------------------------------------+----------------------------------------+
| 105 | CCCCCC |
+----------------------------------------+----------------------------------------+

string manipulation, data wrangling, regex

I have a .txt file of 3 million rows. The file contains data that looks like this:
# RSYNC: 0 1 1 0 512 0
#$SOA 5m localhost. hostmaster.localhost. 1906022338 1h 10m 5d 1s
# random_number_ofspaces_before_this text $TTL 60s
#more random information
:127.0.1.2:https://www.spamhaus.org/query/domain/$
test
:127.0.1.2:https://www.spamhaus.org/query/domain/$
.0-0m5tk.com
.0-1-hub.com
.zzzy1129.cn
:127.0.1.4:https://www.spamhaus.org/query/domain/$
.0-il.ml
.005verf-desj.com
.01accesfunds.com
In the above data, there is a code associated with all domains listed beneath it.
I want to turn the above data into a format that can be loaded into a HiveQL/SQL. The HiveQL table should look like:
+--------------------+--------------+-------------+-----------------------------------------------------+
| domain_name | period_count | parsed_code | raw_code |
+--------------------+--------------+-------------+-----------------------------------------------------+
| test | 0 | 127.0.1.2 | :127.0.1.2:https://www.spamhaus.org/query/domain/$ |
| .0-0m5tk.com | 2 | 127.0.1.2 | :127.0.1.2:https://www.spamhaus.org/query/domain/$ |
| .0-1-hub.com | 2 | 127.0.1.2 | :127.0.1.2:https://www.spamhaus.org/query/domain/$ |
| .zzzy1129.cn | 2 | 127.0.1.2 | :127.0.1.2:https://www.spamhaus.org/query/domain/$ |
| .0-il.ml | 2 | 127.0.1.4 | :127.0.1.4:https://www.spamhaus.org/query/domain/$ |
| .005verf-desj.com | 2 | 127.0.1.4 | :127.0.1.4:https://www.spamhaus.org/query/domain/$ |
| .01accesfunds.com | 2 | 127.0.1.4 | :127.0.1.4:https://www.spamhaus.org/query/domain/$ |
+--------------------+--------------+-------------+-----------------------------------------------------+
Please note that I do not want the vertical bars in any output. They are just to make the above look like a table
I'm guessing that creating a HiveQL table like the above will involve converting the .txt into a .csv or a Pandas data frame. If creating a .csv, then the .csv would probably look like:
domain_name,period_count,parsed_code,raw_code
test,0,127.0.1.2,:127.0.1.2:https://www.spamhaus.org/query/domain/$
.0-0m5tk.com,2,127.0.1.2,:127.0.1.2:https://www.spamhaus.org/query/domain/$
.0-1-hub.com,2,127.0.1.2,:127.0.1.2:https://www.spamhaus.org/query/domain/$
.zzzy1129.cn,2,127.0.1.2,:127.0.1.2:https://www.spamhaus.org/query/domain/$
.0-il.ml,2,127.0.1.4,:127.0.1.4:https://www.spamhaus.org/query/domain/$
.005verf-desj.com,2,127.0.1.4,:127.0.1.4:https://www.spamhaus.org/query/domain/$
.01accesfunds.com,2,127.0.1.4,:127.0.1.4:https://www.spamhaus.org/query/domain/$
I'd be interested in a Python solution, but lack familiarity with the packages and functions necessary to complete the above data wrangling steps. I'm looking for a complete solution, or code tidbits to construct my own solution. I'm guessing regular expressions will be needed to identify the "category" or "code" line in the raw data. They always start with ":127.0.1." I'd also like to parse the code out to create a parsed_code column, and a period_count column that counts the number of periods in the domain_name string. For testing purposes, please create a .txt of the sample data I have provided at the beginning of this post.
Regardless of how you want to format in the end, I suppose the first step is to separate the domain_name and code. That part is pure python
rows = []
code = None
parsed_code = None
with open('input.txt', 'r') as f:
for line in f:
line = line.rstrip('\n')
if line.startswith(':127'):
code = line
parsed_code = line.split(':')[1]
continue
if line.startswith('#'):
continue
period_count = line.count('.')
rows.append((line,period_count,parsed_code, code))
Just for illustration, you can use pandas to format the data nicely as tables, which might help if you want to pipe this to SQL, but it's not absolutely necessary. Post-processing of strings are also quite straightforward in pandas.
import pandas as pd
df = pd.DataFrame(rows, columns=['domain_name', 'period_count', 'parsed_code', 'raw_code'])
print (df)
prints this:
domain_name period_count parsed_code raw_code
0 test 0 127.0.1.2 :127.0.1.2:https://www.spamhaus.org/query/doma...
1 .0-0m5tk.com 2 127.0.1.2 :127.0.1.2:https://www.spamhaus.org/query/doma...
2 .0-1-hub.com 2 127.0.1.2 :127.0.1.2:https://www.spamhaus.org/query/doma...
3 .zzzy1129.cn 2 127.0.1.2 :127.0.1.2:https://www.spamhaus.org/query/doma...
4 .0-il.ml 2 127.0.1.4 :127.0.1.4:https://www.spamhaus.org/query/doma...
5 .005verf-desj.com 2 127.0.1.4 :127.0.1.4:https://www.spamhaus.org/query/doma...
6 .01accesfunds.com 2 127.0.1.4 :127.0.1.4:https://www.spamhaus.org/query/doma...
You can do all of this with the Python standard library.
HEADER = "domain_name | code"
# Open files
with open("input.txt") as f_in, open("output.txt", "w") as f_out:
# Write header
print(HEADER, file=f_out)
print("-" * len(HEADER), file=f_out)
# Parse file and output in correct format
code = None
for line in f_in:
if line.startswith("#"):
# Ignore comments
continue
if line.endswith("$"):
# Store line as the current "code"
code = line
else:
# Write these domain_name entries into the
# output file separated by ' | '
print(line, code, sep=" | ", file=f_out)

Print from PrettyTable with Python2 vs Python3

I am playing a little with PrettyTable in Python and I noticed completely different behavior in Python2 and Python 3. Can somebody exactly explain me the difference in output? Nothing in docs gave me satisfied answer for that. But let's start with little code. Let's start with creating my_table
from prettytable import PrettyTable
my_table = PrettyTable()
my_table.field_name = ['A','B']
It creates two column table with column A and column B. Let's add on row to it, but assume that value in cell can have multi lines, separated by Python new line '\n' , as the example some properties of parameter from column A.
row = ['parameter1', 'component: my_component\nname:somename\nmode: magic\ndate: None']
my_table.add_row(row)
Generally the information in row can be anything, it's just a string retrieved from other function. As you can see, it has '\n' inside. The thing that I don't completely understand is the output of print function.
I have in Python2
print(my_table.get_string().encode('utf-8'))
Which have me output like this:
+------------+-------------------------+
| Field 1 | Field 2 |
+------------+-------------------------+
| parameter1 | component: my_component |
| | name:somename |
| | mode: magic |
| | date: None |
+------------+-------------------------+
But in Python3 I have:
+------------+-------------------------+
| Field 1 | Field 2 |
+------------+-------------------------+
| parameter1 | component: my_component |
| | name:somename |
| | mode: magic |
| | date: None |
+------------+-------------------------+
If I completely removes the encode part, it seems that output looks ok on both version of Python.
So when I have
print(my_table.get_string())
It works on Python3 and Python2. Should I remove the encode part from code? It is save to assume it is not necessary? Where is the problem exactly?

Categories

Resources