Google chart input data - python

I have a python script to build inputs for a Google chart. It correctly creates column headers and the correct number of rows, but repeats the data for the last row in every row. I tried explicitly setting the row indices rather than using a loop (which wouldn't work in practice, but should have worked in testing). It still gives me the same values for each entry. I also had it working when I had this code on the same page as the HTML user form.
end1 = number of rows in the data table
end2 = number of columns in the data table represented by a list of column headers
viewData = data stored in database
c = connections['default'].cursor()
c.execute("SELECT * FROM {0}.\"{1}\"".format(analysis_schema, viewName))
viewData=c.fetchall()
curDesc = c.description
end1 = len(viewData)
end2 = len(curDesc)
Creates column headers:
colOrder=[curDesc[2][0]]
if activityOrCommodity=="activity":
tableDescription={curDesc[2][0] : ("string", "Activity")}
elif (activityOrCommodity == "commodity") or (activityOrCommodity == "aa_commodity"):
tableDescription={curDesc[2][0] : ("string", "Commodity")}
for i in range(3,end2 ):
attValue = curDesc[i][0]
tableDescription[curDesc[i][0]]= ("number", attValue)
colOrder.append(curDesc[i][0])
Creates row data:
data=[]
values = {}
for i in range(0,end1):
for j in range(2, end2):
if j == 2:
values[curDesc[j][0]] = viewData[i][j].encode("utf-8")
else:
values[curDesc[j][0]] = viewData[i][j]
data.append(values)
dataTable = gviz_api.DataTable(tableDescription)
dataTable.LoadData(data)
return dataTable.ToJSon(columns_order=colOrder)
An example javascript output:
var dt = new google.visualization.DataTable({cols:[{id:'activity',label:'Activity',type:'string'},{id:'size',label:'size',type:'number'},{id:'compositeutility',label:'compositeutility',type:'number'}],rows:[{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]},{c:[{v:'AA26FedGovAccounts'},{v:49118957568.0},{v:1.94956132673}]}]}, 0.6);

it seems you're appending values to the data but your values are not being reset after each iteration...
i assume this is not intended right? if so just move values inside the first for loop in your row setting code

Related

How do I loop column names in a pandas dataframe?

I am new to Python and have never really used Pandas, so forgive me if this doesn't make sense. I am trying to create a df based on frontend data I am sending to a flask route. The data is looped through and appended for each row. My only problem is that I don't know how to get the df columns to reflect that. Here is my code to build the rows and the current output:
claims = csv_data["claims"]
setups = csv_data["setups"]
for setup in setups:
setup = setups[0]
offerings = setup["currentOfferings"]
considered = setup["considerationSet"]
reach_dict = setup["reach"]
favorite_dict = setup["favorite"]
summary_dict = setup["summaryMetrics"]
rows = []
for i, claim in enumerate(claims):
row = []
row.append(i + 1)
row.append(claim)
for setup in setups:
setup = setups[0]
row.append("X") if claim in setup["currentOfferings"] else row.append(float('nan'))
row.append("X") if claim in setup["considerationSet"] else row.append(float('nan'))
if claim in setup["currentOfferings"]:
reach_score = reach_dict[claim]
reach_percentage = "{:.0%}".format(reach_score)
row.append(reach_percentage)
else:
row.append(float('nan'))
if claim in setup["currentOfferings"]:
favorite_score = favorite_dict[claim]
fav_percentage = "{:.0%}".format(favorite_score)
row.append(fav_percentage)
else:
row.append(float('nan'))
rows.append(row)
I know that I can put columns = ["#", "Claims", "Setups", etc...] in the df, but that doesn't work because the rows are looping through multiple setups, and the number of setups can change. If I don't specify the column names (how it is in the image), then I just have numbers as columns names. Ideally it should loop through the data it receives in the route, and would start with "#" "Claims" as columns, and then for each setup "Setup 1", "Consideration Set 1", "Reach", "Favorite", "Setup 2", "Consideration Set 2", and so on... etc.
I tried to create a similar type of loop for the columns:
my_columns = []
for i, row in enumerate(rows):
col = []
if row[0] != None:
col.append("#")
else:
pass
if row[1] != None:
col.append("Claims")
else:
pass
if row[2] != None:
col.append("Setup")
else:
pass
if row[3] != None:
col.append("Consideration Set")
else:
pass
if row[4] != None:
col.append("Reach")
else:
pass
if row[5] != None:
col.append("Favorite")
else:
pass
my_columns.append(col)
df = pd.DataFrame(
rows,
columns = my_columns
)
But this didn't work because I have the same issue of no loop, I have 6 columns passed and 10 data columns passed. I'm not sure if I am just not doing the loop of the columns properly, or if I am making everything more complicated than it needs to be.
This is what I am trying to accomplish without having to explicitly name the columns because this is just sample data. There could end up being 3, 4, however many setups in the actual app.
what I would like the ouput to look like
I don't know if this is the most efficient way of doing something like this but I think this is what you want to achieve.
def create_columns(df):
new_cols=[]
for i in range(len(df.columns)):
repeated_cols = 6 #here is the number of columns you need to repeat for every setup
idx = 1 + i // repeated_cols
basic = ['#', 'Claims', f'Setup_{idx}', f'Consideration_Set_{idx}', 'Reach', 'Favorite']
new_cols.append(basic[i % len(basic)])
return new_cols
df.columns = create_columns(df)
If your data comes as csv then try pd.read_csv() to create dataframe.

populate column in dataframe with a list using for loop

I would like to populate a dataframe using a for loop.
one of the column is a list.
this list is empty at the begining at each itteration an element is added or removed from it.
when I print my list at each iteration I am getting the right results, but when I print my dataframe, I am getting the same list on each row:
I you have a look to my code the list I am updatin is list_employe. The magic should happen in the 3 last rows but it did not.
Does anyone have an idea why the list is updated in one way and the dataframe record only the last update on all rows
list_employe = []
total_employe = 0
rows=[]
shiftday = example['SHIFT_DATE'].dt.strftime('%Y-%m-%d').unique().tolist()
for i in shiftday:
shift_day = example[example['SHIFT_DATE'] == i]
list_employe_shift = example[example['SHIFT_DATE']==i]['EMPLOYEE_CODE_POS_UPPER'].unique().tolist()
new_employe = 0
end_employe = 0
for k in list_employe_shift:
shift_days_emp = shift_day[shift_day['EMPLOYEE_CODE_POS_UPPER'] == k]
days = shift_days_emp.iloc[0]['last_day']
#print(days)
if k in list_employe:
if days>1:
end_employe= end_employe+1
total_employe = total_employe-1
list_employe.remove(k)
else:
new_employe = new_employe+1
total_employe = total_employe + 1
list_employe.extend([k])
day = i
total_emp = total_employe
new_emp = new_employe
end_emp = end_employe
rows.append([day, total_emp, new_emp, end_emp, list_employe])
print(list_employe)
df = pd.DataFrame(rows, columns=["day", "total_employe", "new_employe", "end_employe", "list_employe"])
the list list_employe is always the same object that you append to the list rows. What you need to do to solve the problem is at the 3rd line from the bottom : rows.append([day, total_emp, new_emp, end_emp, list(list_employe)]) Which create a new list at each itteration

Is it possible to update a row of data using position of column (e.g. like a list index) in Python / SQLAlchemy?

I am trying to compare two rows of data to one another which I have stored in a list.
for x in range(0, len_data_row):
if company_data[0][0][x] == company_data[1][0][x]:
print ('MATCH 1: {} - {}'.format(x, company_data[0][0][x]))
# do nothing
if company_data[0][0][x] == None and company_data[1][0][x] != None:
print ('MATCH 2: {} - {}'.format(x, company_data[1][0][x]))
# update first company_id with data from 2nd
if company_data[0][0][x] != None and company_data[1][0][x] == None:
print ('MATCH 3: {} - {}'.format(x, company_data[0][0][x]))
# update second company_id with data from 1st
Psuedocode of what I want to do:
If data at index[x] of a list is not None for row 2, but is blank for row 1, then write the value of row 2 at index[x] for row 1 data in my database.
The part I can't figure out is if in SQLAlchemy you can do specify which column is being updated by an "index" (I think in db-land index means something different than what I mean. What I mean is like a list index, e.g., list[1]). And also if you can dynamically specify which column is being updated by passing a variable to the update code? Here's what I'm looking to do (it doesn't work of course):
def some_name(column_by_index, column_value):
u = table_name.update().where(table_name.c.id==row_id).values(column_by_index=column_value)
db.execute(u)
Thank you!

Mapping values into two additional DataFrame columns by an existing one in Python

I am making a generic tool which can take up any csv file.The file contains a city column which needs to be geocoded to latitudes and Longitudes. I have a csv file which looks something like this. The first row is the column name and the second row is the type of variable.
Time,M1,M2,M3,CityName
temp,num,num,num,loc
20-May-13,19,20,0,delhi
20-May-13,25,42,7,agra
20-May-13,23,35,4,mumbai
20-May-13,21,32,3,delhi
20-May-13,17,27,1,mumbai
20-May-13,16,40,5,delhi
First of all, I find the unique values in the City column and form a list of it.
filename = 'data_file.csv'
data_date = pd.read_csv(filename)
column_name = data_date.ix[:, data_date.loc[0] == "city"]
column_work = column_name.iloc[1:]
column_unique = column_work.iloc[:,3].unique().tolist()
Secondly, I have written code for geocoding my cities.
def geocode(address):
i = 0
try:
while i < len(geocoders):
# try to geocode using a service
location = geocoders[i].geocode(address)
# if it returns a location
if location != None:
# return those values
return [location.latitude, location.longitude]
else:
# otherwise try the next one
i += 1
except:
print (sys.exc_info()[0])
return ['null','null']
# if all services have failed to geocode, return null values
return ['null','null']
list = ['delhi', 'agra', 'mumbai']
j = 0
lat = []
for row in list:
print ('processing #',j)
j+=1
try:
state = row
address = state
result = geocode(address)
# add the lat/lon values to the row
lat.extend(result)
except:
# print 'Unsuccessful'
to_print = 'Unsuccessful'
# row.extend(to_print)
dout.append(row)
print(lat)
This gives me a list of latitudes and longitudes [28.7040592, 77.10249019999999, 27.1766701, 78.00807449999999, 19.0759837, 72.8776559]. I want to write this onto my CSV file as
Time,M1,M2,M3,CityName,Latitude,Longitude
temp,num,num,num,loc,lat,lng
20-May-13,19,20,0,delhi,28.7040592,77.10249019999999
20-May-13,25,42,7,agra,27.1766701,78.00807449999999
20-May-13,23,35,4,mumbai,19.0759837, 72.8776559
20-May-13,21,32,3,delhi,28.7040592,77.10249019999999
20-May-13,17,27,1,mumbai,19.0759837, 72.8776559
20-May-13,16,40,5,delhi,28.7040592,77.10249019999999
I tried making a separate list of latitudes and longitudes latitude = lat[0::2] longitude = lat[1::2] or convert it to into a dictionary {'delhi': [28.7040592, 77.10249019999999], 'agra': [27.1766701, 78.00807449999999], 'mumbai': [19.0759837, 72.8776559]} but somehow could not figure out how to write it on a csv file.
I think converting them into a dictionary is a good approach.
dic = {'delhi': [28.7040592, 77.10249019999999],
'agra': [27.1766701, 78.00807449999999],
'mumbai': [19.0759837, 72.8776559]}
# Create new columns
data_date["Latitude"] = data_date.apply(lambda row: dic.get(row["CityName"])[0], axis = 1)
data_date["Longitude"] = data_date.apply(lambda row: dic.get(row["CityName"])[1], axis = 1)
# Write the data back to csv file
data_date.to_csv(filename, index = False)
In this way it gets values of corresponding city names from the dictionary, and write them into the specified column. Finally it overwrites the old csv file with the new data frame.

Find an efficient way of searching in nested python lists

I am very new to this forum and am basically a Network Engineer learning Python to automate some tasks and make my work more efficient. Well, straight to the point. I have a big excel workbook of 4 sheets with around 50K rows in each sheet. After learning for couple of weeks and extensive search I was able to load the whole excel cell values in a nested list e.g.
list [sheet_index][row_index][column_index].
Now after getting the inputs, next part is manipulation of those data. My task is to find specific column value from each row and search in the entire workbook and if found, corresponding data from a different column should be written in line with the original searched object.
My method is like below:
Getting the cell values in a big list (as I mentioned earlier)
flatten that list in a different variable as a one dimensional list.
in a loop, get the specific value from a row (fixed column) and search in entire one-dimensional list, if found, write the corresponding value in a different excel file.
So far, this method is working fine with a extra long delay which was the motivation for drifting from Excel VBA program to Python. So, I am here to ask the experts if theres something very basic I am missing. Here is the code below:
import xlrd
import xlwt
from compiler.ast import flatten
datafile = 'Peering_DB.xls'
# Data Read Function Definition
def main(datafile):
wb = xlrd.open_workbook(datafile)
wwb = copy(wb)
data = [[[wb.sheet_by_index(i).cell_value(r, col)
for col in range(wb.sheet_by_index(i).ncols)]
for r in range(wb.sheet_by_index(i).nrows)]
for i in range(0,4)]
data1 = flatten(data)
k = 2
x = 0
while x < 4:
r = wb.sheet_by_index(x).nrows
A = data[x][k][1]
B = data[x][k][2]
counter = 4
loc = [loc for (loc , e ) in enumerate(data1) if e == A]
if len(loc) != 1:
for n in range(len(loc)):
if data1[loc[n] + 1] != B:
wwb.get_sheet(x).write(k,counter,data1[loc[n] + 1])
counter = counter + 1
else:
wwb.get_sheet(x).write(k,counter,"No Backup")
k = k + 1
if k == r - 1 and x < 3:
print 'Page number ', x , 'Completed'
x = x + 1
k = 2
elif k == r and x == 3:
print "Operation Completed Successfully"
break
wwb.save('Peering_output.xls')
main(datafile)

Categories

Resources