How do I satisfy these conditions in my loop for Python - python

I need to find the engagement rate for each data row and then find the video title with the highest engagement rate.
'row' in the dict youtube_usa_videos consist of many elements of which the index for each element has been named. The problem arises when I try to set the condition as seen below which python gives 'division by zero' error message as there are some views that are zero in the data set.
Engagement rate = no. of comments / no. of views
#code below
highest_EGR = 0
for row in youtube_usa_videos:
comments = row [8]
views = row [5]
title = row [1]
if views != 0:
EGR = comments / views
else:
EGR = 0
If EGR > highest_EGR:
highest_EGR = EGR
top_vid = title
Can you help me clean my code such that the conditions will be met and the top video title and its engagement rate will be printed?

The code below works well.
Consider replacing If with if
highest_EGR = 0
youtube_usa_videos = [['t1',12,45],['t2',34,12],['t3',2,123]]
TITLE_OFFSET = 0
VIEWS_OFFSET = 1
COMMENTS_OFFSET = 2
for row in youtube_usa_videos:
comments = row [COMMENTS_OFFSET]
views = row [VIEWS_OFFSET]
title = row [TITLE_OFFSET]
if views != 0:
EGR = comments / views
else:
EGR = 0
if EGR > highest_EGR:
highest_EGR = EGR
top_vid = title
print('highest_EGR: {}'.format(highest_EGR))
# or using max and lambda
max_egr = max(youtube_usa_videos,key=lambda vid:vid[COMMENTS_OFFSET] / vid[VIEWS_OFFSET] if vid[VIEWS_OFFSET] else 0)
print('video with max eger: {}'.format(max_egr))
output
highest_EGR: 61.5
video with max eger: ['t3', 2, 123]

highest_EGR = 0
for row in youtube_usa_videos:
comments = row [8]
views = row [5]
title = row [1]
if int (views) > 0:
EGR = int (comments) / int (views)
if EGR > highest_EGR:
highest_EGR = EGR
top_vid = title
print (top_vid, highest_EGR)
Okay I simply changed 'if views != 0:' to 'if views > 0:' and removed 'else:'

Related

Python Docx Minimum Table Height

I'm trying to fit 10 rows (and three columns) of a table on one page, howver I'm running into a limitation where I can't get any more than 8 rows to fit. I've tried the following code:
table = document.add_table(rows=0, cols=3)
for row in table.rows:
row.height = Cm(1)
However, at some point when reducing the size,there is no difference in the output. Is it possible to fit 10 rows on one page?
An adapted version of my code, which is iterating through a dataframe and writing columns of my dataframe to cells of a table.
document = Document()
sections = document.sections
for section in sections:
section.top_margin = Inches(0.00)
section.bottom_margin = Inches(0.00)
section.left_margin = Inches(0.00)
section.right_margin = Inches(0.00)
style = document.styles['Normal']
font = style.font
font.size = Pt(8)
table = document.add_table(rows=0, cols=3)
index = 0
full_count = 1
for item_one, item_two,description,max_portion,quantity_adjusted, mods in zip(line_items['title'].tolist(), line_items['quantity'],line_items['description'], line_items['max_portion'],line_items['quantity_adjusted'], line_items['modifications']):
count = 0
if index % 3 == 0:
cell_row = table.add_row()
cell_row.height = Cm(0.1)
row_cells = cell_row.cells
part_one_cell = row_cells[index % 3]
part_one_cell.height = Cm(0.1)
#para = doc.add_paragraph().add_run('GeeksforGeeks is a Computer Science portal for geeks.')
#para.font.size = Pt(12)
p = part_one_cell.add_paragraph()
p.alignment = WD_ALIGN_PARAGRAPH.CENTER
#p1 = part_one_cell.paragraphs[0].add_run(item_one.upper()+ ' ' + description.upper())
#p1.alignment = WD_ALIGN_PARAGRAPH.CENTER
if len(item_one + description) < 40:
p.add_run(item_one.upper()+ ' ' + description.upper()).font.size = Pt(12)
elif len(item_one + description) < 60:
p.add_run(item_one.upper()+ ' ' + description.upper()).font.size = Pt(10)
else:
p.add_run(item_one.upper()+ ' ' + description.upper()).font.size = Pt(8)
row1 = row_cells[index % 3]
row2= row1.add_paragraph(mods)
row2.alignment = WD_ALIGN_PARAGRAPH.CENTER
row = row_cells[index % 3]
p1 = row.add_paragraph(f'{x[str(quantity_adjusted)]}')
p1.alignment=WD_ALIGN_PARAGRAPH.RIGHT
#part_one_cell.paragraphs[0].add_run(f'{x[str(item_two)]}')
#part_one_cell.paragraphs[0].add_run(f' {str(x)}').bold= True
index = index + 1
full_count = full_count + 1
if full_count % 30 == 0:
document.add_page_break()
table = document.add_table(rows=0, cols=3)
I have no problem getting 10 1cm rows in a single page. I declare the number of rows when adding the table:
from docx import Document
from docx.shared import Cm
document = Document()
table = document.add_table(rows=10, cols=3)
table.style = 'Table Grid'
for row in table.rows:
row.height = Cm(1)
document.save('demo.docx')
To add rows in a for loop:
table = document.add_table(rows=0, cols=3)
table.style = 'Table Grid'
for i in range(10):
row = table.add_row()
row.height = Cm(1)
document.save('demo.docx')

While and Append in pandas python

I am trying to Call Api in a while loop and append the dataframe. But it is not appending .
#Max timestamp
MaxTs = 1635876000
api_key = "api_key"
cnt = 0
while cnt < 4:
url = f"https://min-api.cryptocompare.com/data/v2/histohour?fsym=BTC&tsym=USD&limit=2000&toTs={MaxTs}&api_key={api_key}"
r = requests.get(url)
data = r.json()
price_df = pd.DataFrame(data['Data']['Data'])
i = 0
reccnt = 2000
while i < reccnt:
currTs = price_df.iloc[i]['time']
if currTs < MaxTs:
MaxTs = currTs
i = i + 1
if cnt == 0:
#Copying the Orginal df to new df.
newdf = price_df.copy()
else:
#when counter increases append the df.
newdf.append(price_df)
print(MaxTs)
cnt = cnt + 1
You should increase cnt inside the while loop, not outside.
But after you perform a correction you will get several copies of the same price_df. Is that what you are trying to get?

Can I scrape table from html file in Python?

I want to scrape the table from this text file text_file and the table I want is SUMMARY CONSOLIDATED FINANCIAL AND OTHER DATA. The BeautifulSoup.content gives me the code looks like this The Origin Code. My code is attached and can someone tell me where it went wrong?
url = r'https://www.sec.gov/Archives/edgar/data/1181232/000104746903038553/a2123752z424b4.htm'
filing_url = requests.get(url)
content = filing_url.text
soup = BeautifulSoup(content, 'lxml')
tables = soup.find_all(text=re.compile('SUMMARY CONSOLIDATED FINANCIAL AND OTHER DATA'))
n_columns = 0
n_rows = 0
column_names = []
for table in tables:
for row in table.find_next('table').find_all('tr'):
# Determine the number of rows in the table
td_tags = row.find_all('td')
if len(td_tags) > 0:
n_rows += 1
if n_columns == 0:
# Set the number of columns for the table
n_columns = len(td_tags)
# Handle column names if find them
th_tags = row.find_all('th')
if len(th_tags) > 0 and len(column_names) == 0:
for th in th_tags:
column_names.append(th.get_text())
# Safeguard on Column Titles
if len(column_names) > 0 and len(column_names) != n_columns:
raise Exception("Column titles do not match the number of columns")
columns = column_names if len(column_names) > 0 else range(0, n_columns)
df = pd.DataFrame(columns=columns,
index=range(0, n_rows))
row_marker = 0
for row in table.find_all('tr'):
column_marker = 0
columns = row.find_all('td')
for column in columns:
df.iat[row_marker, column_marker] = column.get_text()
column_marker += 1
if len(columns) > 0:
row_marker += 1
print(df)
In this particular case, you could simplify this significantly, using pandas:
import pandas as pd
url = 'https://www.sec.gov/Archives/edgar/data/1181232/000104746903038553/a2123752z424b4.htm'
tables = pd.read_html(url)
#there are more than 100 tables on that page, so you have to narrow it down
targets = []
for t in tables:
if 'Unaudited' in str(t.columns):
targets.append(t)
targets[0] #only two meet that requirement, and the first is your target
Output is your target table.

Python index error: list index out of range

I am programming an experiment on Otree in which players type in their names and participate in a competition on who donate the most threes. The trees are entered via a form field. Round number is a predefined variable. And every player gets an id when he enters the experiment by default (starting from 1)
I already programmed the whole code and it works with up to 5 participants, but now I should program it for up to 8 participants. When I try to enter the number of participants on Otree, I receive the error message "index error: list index out of range".
Error occurs in the line matrix[p.id_in_group - 1][0] = p.name
matrix[p.id_in_group - 1][self.round_number] = p.cumulative_donated_trees
so I guess the initialization of the matrix is wrong?
I just want to have a Matrix with 2 columns: names and trees (and index column of course) and n rows (n = number of players)```
models.py:
class Donation(Page):
try:
form_model = 'player'
form_fields = ['donation']
def vars_for_template(self):
names = []
trees = []
sortedtrees = []
anzahlspalten = 0
for p in self.subsession.get_players():
if self.round_number == 1:
matrix[p.id_in_group - 1][0] = p.name
matrix[p.id_in_group - 1][self.round_number] = p.cumulative_donated_trees
for i in range(0, Constants.number_of_players):
names.append(matrix[i][0])
trees.append(matrix[i][self.round_number])
sortedtrees = sorted(trees, reverse=True)
anzahlspalten = len(sortedtrees)
for l in range(0, anzahlspalten):
if self.player.cumulative_donated_trees == sortedtrees[l]:
self.player.current_position = str(l + 1)
models.py:
name = models.StringField(label="Your first name:")
transcribed_text = models.LongStringField()
levenshtein_distance = models.IntegerField()
guthaben = models.CurrencyField(initial=c(0))
cumulative_guthaben = models.CurrencyField()
cumulative_donation = models.FloatField(null=True)
right_answer = models.BooleanField()
right_answer_text = models.StringField()
treatmentgroup = models.StringField()
donation = models.FloatField(min=c(0))
no_trees = models.FloatField(initial=0.0)
cumulative_donated_trees = models.FloatField()
current_position = models.StringField()
spielstand = models.StringField()
Treceback:
File "c:\users\wiwi-admin\appdata\local\programs\python\python37\lib\site-packages\otree\views\abstract.py" in get_context_data
338. user_vars = self.vars_for_template()
File "C:\Users\Wiwi-Admin\Desktop\Backup Version 2 - Competition WITHIN Treatmentgroups\Master thesis code\__pycache__\oTree\my_environmental_surveyTG4\pages.py" in vars_for_template
252. matrix[p.id_in_group - 1][0] = p.name
Exception Type: IndexError at /p/tgv0j88z/my_environmental_surveyTG4/Donation/7/
Exception Value: list index out of range

Python Pandas How to save output to csv

Hello now im working on my project. I want to get candidate of text block by using algorithm below.
My input is a csv document which contain :
HTML column : the html code in a line
TAG column : the tag of html code in a line
Words : the text inside the tag in aline
TC : the number of words in a line
LTC : the number of anchor words in a line
TG : the number of tag in a line
P : the number of tag p and br in a line
CTTD : TC + (0.2*LTC) + TG - P
CTTDs : the smoothed CTTD
This is my algorithm to find candidate of text block. I make the csv file into dataframe using pandas. I am using CTTDs,TC and TG column to find the candidate.
from ListSmoothing import get_filepaths_smoothing
import pandas as pd
import numpy as np
import csv
filenames = get_filepaths_smoothing(r"C:\Users\kimhyesung\PycharmProjects\newsextraction\smoothing")
index = 0
for f in filenames:
file_html=open(str(f),"r")
df = pd.read_csv(file_html)
#df = pd.read_csv('smoothing/Smoothing001.csv')
news = np.array(df['CTTDs'])
new = np.array(df['TG'])
minval = np.min(news[np.nonzero(news)])
maxval = np.max(news[np.nonzero(news)])
j = 0.2
thetaCTTD = minval + j * (maxval-minval)
#maxGap = np.max(new[np.nonzero(new)])
#minGap = np.min(new[np.nonzero(new)])
thetaGap = np.min(new[np.nonzero(new)])
#print thetaCTTD
#print maxval
#print minval
#print thetaGap
def create_candidates(df, thetaCTTD, thetaGAP):
k = 0
TB = {}
TC = 0
for index in range(0, len(df) - 1):
start = index
if df.ix[index]['CTTDs'] > thetaCTTD:
start = index
gap = 0
TC = df.ix[index]['TC']
for index in range(index + 1, len(df) - 1):
if df.ix[index]['TG'] == 0:
continue
elif df.ix[index]['CTTDs'] <= thetaCTTD and gap >= thetaGAP:
break
elif df.ix[index]['CTTDs'] <= thetaCTTD:
gap += 1
TC += df.ix[index]['TC']
if (TC < 1) or (start == index):
continue
TB.update({
k: {
'start': start,
'end': index - 1
}
})
k += 1
return TB
def get_unique_candidate(TB):
TB = tb.copy()
for key, value in tb.iteritems():
if key == len(tb) - 1:
break
if value['end'] == tb[key+1]['end']:
del TB[key+1]
elif value['start'] < tb[key+1]['start'] < value['end']:
TB[key]['end'] = tb[key+1]['start'] - 1
else:
continue
return TB
index += 1
stored_file = "textcandidate/textcandidate" + '{0:03}'.format(index) + ".csv"
tb = create_candidates(df, thetaCTTD, thetaGap)
TB = get_unique_candidate(tb)
filewrite = open(stored_file, "wb")
df_list = []
for (k, d) in TB.iteritems():
candidate_df = df.loc[d['start']:d['end']]
candidate_df['candidate'] = k
df_list.append(candidate_df)
output_df = pd.concat(df_list)
output_df.to_csv(stored_file)
writer = csv.writer(filewrite, lineterminator='\n')
filewrite.close
ThetaCTTD is 10.36 and thethaGap is 1.
The output is
The output means there are 2 candidates of text block . First the candiate of text block start from line number 215 and end line number 225 (like the pict bellow). And the other candidate of text block start from line number 500 and end line number 501.
My question is how to save the output into csv and not only the number of line but the range of the text block and the others column will appear as the output too?
My expected output is like the screenshot of candidate text block is like this one
Assuming your output is a list of dictionaries:
pd.concat([df.loc[d['start']:d['end']] for (k, d) in TB.iteritems()])
Note that we slice by label, so d['end'] will be included.
Edit: add the candidate number in a new column.
It's cleaner to write a loop than to do two concat operations:
df_list = []
for (k, d) in TB.iteritems():
candidate_df = df.loc[d['start']:d['end']]
candidate_df['candidate'] = k
df_list.append(candidate_df)
output_df = pd.concat(df_list)
It's also faster to concatenate all dataframes at once at the end.

Categories

Resources