This is a simple program but I am finding difficulty how it is actually working.
I have database with 3 tuples.
import matplotlib.pyplot as plt
queries = {}
rewrites = {}
urls = {}
for line in open("data.tsv"):
q, r, u = line.strip().split("\t")
queries.setdefault(q,0)
queries[q] += 1
rewrites.setdefault(r,0)
rewrites[r] += 1
urls.setdefault(u,0)
urls[u] += 1
sQueries = []
sQueries = [x for x in rewrites.values()]
sQueries.sort()
x = range(len(sQueries))
line, = plt.plot(x, sQueries, '-' ,linewidth=2)
plt.show()
This is whole program,
Now
queries.setdefault(q,0)
This command will set the values as 0 , if key i,e and q is not found.
queries[q] += 1
This command will increment the value of each key by 1 if key is there.
Same we continue with all tuples.
Then,
sQueries = [x for x in rewrites.values()]
Then we store the values from Dictionary rewrites , to List Squeries
x = range(len(sQueries))
This command I am not getting what is happening. Can anyone please explain.
len(sQueries)
gives number of elements in your list sQueries
x = range(len(sQueries))
will create a list x containing elements from 0,1,... to (but not including) length of your sQueries array
This:
sQueries = []
sQueries = [x for x in rewrites.values()]
sQueries.sort()
is an obtuse way of writing
sQueries = rewrites.values()
sQueries = sorted(sQueries)
in other words, sort the values of the rewrites dictionary. If, for the sake of argument, sQueries == [2, 3, 7, 9], then len(sQueries) == 4 and range(4) == [0, 1, 2, 3].
So, now you're plotting (0,2), (1,3), (2,7), (3,9), which doesn't seem very useful to me. It seems more likely that you would want the keys of rewrites on the x-axis, which would be the distinct values of r that you read from the TSV file.
length = len(sQueries) # this is length of sQueries
r = range(length) # this one means from 0 to length-1
so
x = range(len(sQueries)) # means x is from 0 to sQueries length - 1
Related
cust_id = semi_final_df['0_x'].tolist()
date = semi_final_df[1].tolist()
total_amount = semi_final_df[0].tolist()
prod_num = semi_final_df['0_y'].tolist()
prod_deduped = []
quant_cleaned = []
product_net_amount = []
cust_id_final = []
date_final = []
for row in total_amount:
quant_cleaned.append(float(row))
for unique_prodz in prod_num:
if unique_prodz not in prod_deduped:
prod_deduped.append(unique_prodz)
for unique_product in prod_deduped:
indices = [i for i, x in enumerate(prod_num) if x == unique_product]
product_total = 0
for index in indices:
product_total += quant_cleaned[index]
product_net_amount.append(product_total)
first_index = prod_num.index(unique_product)
cust_id_final.append(cust_id[first_index])
date_final.append(date[first_index])
Above code calculates sum amount by one condition in order to sum the total on an invoice.
The data had multiple lines but shared the same invoice/product number.
Problem:
I need to modify the below code so that I can sum by unique product and unique date.
I have given it a go but I am getting a value error -
saying x, y is not in a list
As per my understanding the issue lies in the fact that I am zipping two de-duped lists together of different lengths and then I am attempting to loop through the result inline.
This line causes the error
for i,[x, y] in enumerate(zipped_list):
Any help would be sincerely appreciated. Here is the second batch of code with comments.
from itertools import zip_longest
#I have not included the code for the three lists below but you can assume they are populated as these are the lists that I will be #working off of. They are of the same length.
prod_numbers = []
datesz = []
converted_quant = []
#Code to dedupe date and product which will end up being different lengths. These two lists are populated by the two for loops below
prod_deduped = []
dates_deduped = []
for unique_prodz in prod_numbers:
if unique_prodz not in prod_deduped:
prod_deduped.append(unique_prodz)
for unique_date in datesz:
if unique_date not in dates_deduped:
dates_deduped.append(unique_date)
#Now for the fun part. Time to sum by date and product. The three lists below are empty until we run the code
converted_net_amount = []
prod_id_final = []
date_final = []
#I zipped the list together using itertools which I imported at the top
for unique_product, unique_date in zip_longest(prod_deduped, dates_deduped, fillvalue = ''):
indices = []
zipped_object = zip(prod_numbers, datesz)
zipped_list = list(zipped_object)
for i,[x, y] in enumerate(zipped_list):
if x == unique_product and y == unique_date:
indices.append(i)
converted_total = 0
for index in indices:
converted_total += converted_quant[index]
converted_net_amount.append[converted_total]
first_index = zipped_list.index([unique_product, unique_date])
prod_id_final.append(prod_numbers[first_index])
date_final.append(datesz[first_index])
from collections import defaultdict
summed_dictionary = defaultdict(int)
for x, y, z in list:
summed_dictionary[(x,y)] += z
Using defaultdict should solve your problem and is a lot easier on the eyes than all your code above. I saw this on reddit this morning and figured you crossposted. Credit to the guy from reddit on /r/learnpython
I apologise for the terrible description and if this is a duplicated, i have no idea how to phrase this question. Let me explain what i am trying to do. I have a list consisting of 0s and 1s that is 3600 elements long (1 hour time series data). i used itertools.groupby() to get a list of consecutive keys. I need (0,1) to be counted as (1,1), and be summed with the flanking tuples.
so
[(1,8),(0,9),(1,5),(0,1),(1,3),(0,3)]
becomes
[(1,8),(0,9),(1,5),(1,1),(1,3),(0,3)]
which should become
[(1,8),(0,9),(1,9),(0,3)]
right now, what i have is
def counter(file):
list1 = list(dict[file]) #make a list of the data currently working on
graph = dict.fromkeys(list(range(0,3601))) #make a graphing dict, x = key, y = value
for i in list(range(0,3601)):
graph[i] = 0 # set all the values/ y from none to 0
for i in list1:
graph[i] +=1 #populate the values in graphing dict
x,y = zip(*graph.items()) # unpack graphing dict into list, x = 0 to 3600 and y = time where it bite
z = [(x[0], len(list(x[1]))) for x in itertools.groupby(y)] #make a new list z where consecutive y is in format (value, count)
z[:] = [list(i) for i in z]
for i in z[:]:
if i == [0,1]:
i[0]=1
return(z)
dict is a dictionary where the keys are filenames and the values are a list of numbers to be used in the function counter(). and this gives me something like this but much longer
[[1,8],[0,9],[1,5], [1,1], [1,3],[0,3]]
edits:
solved it with the help of a friend,
while (0,1) in z:
idx=z.index((0,1))
if idx == len(z)-1:
break
z[idx] = (1,1+z[idx-1][1] + z[idx+1][1])
del z[idx+1]
del z[idx-1]
Not sure what exactly is that you need. But this is my best attempt of understanding it.
def do_stuff(original_input):
new_original = []
new_original.append(original_input[0])
for el in original_input[1:]:
if el == (0, 1):
el = (1, 1)
if el[0] != new_original[-1][0]:
new_original.append(el)
else:
(a, b) = new_original[-1]
new_original[-1] = (a, b + el[1])
return new_original
# check
print (do_stuff([(1,8),(0,9),(1,5),(0,1),(1,3),(0,3)]))
I need to get a high correlation group from the correlation coefficient matrix, keep one of them and exclude the other。But I don't know how to do it gracefully and efficiently.
Here's a similar answer, but hopefully it will be done using a vector-like matrix.:
Merge arrays if they contain one or more of the same value
For example:
a = np.array([[1,0,0,0,0,1],
[0,1,0,1,0,0],
[0,0,1,0,1,1],
[0,1,0,1,0,0],
[0,0,1,0,1,0],
[1,0,1,0,0,1]])
Diagonal:
(0,0),(1,1),(2,2)...(5,5)
Other:
(0,5),(1,3),(2,4),(2,5)
These three pairs because each other contains merged into a group of:
(0,2,4,5) = (0,5),(2,4),(2,5)
So ultimately I need the output:
(I will use the results to index other data and therefore decide to keep the largest index in each group)
out = [(0,2,4,5),(1,3)]
I think the simplest approach is to take a nested loop and iterate through all the elements multiple times. I would like to have a more concise and efficient way to achieve, thank you
This is a loop implementation, I'm sorry to write it hard to see:
a = np.array([[1,0,0,0,0,1],
[0,1,0,1,0,0],
[0,0,1,0,1,1],
[0,1,0,1,0,0],
[0,0,1,0,1,0],
[1,0,1,0,0,1]])
a[np.tril_indices(6, -1)]= 0
a[np.diag_indices(6)] = 0
g = list(np.c_[np.where(a)])
p = {}; index = 1
while len(g)>0:
x = g.pop(0)
if not p:
p[index] = list(x)
for i,l in enumerate(g):
if np.in1d(l,x[0]).any()|np.in1d(l,x[1]).any():
n = list(g.pop(i))
p[index].extend(n)
else:
T = False
for key,v in p.items():
if np.in1d(v,x[0]).any()|np.in1d(v,x[1]).any():
v.extend(list(x))
T = True
if T==False:
index += 1; p[index] = list(x)
for i,l in enumerate(g):
if np.in1d(l,x[0]).any()|np.in1d(l,x[1]).any():
n = list(g.pop(i))
p[index].extend(n)
for key,v in p.items():
print key,np.unique(v)
out:
1 [0 2 4 5]
2 [1 3]
The central problem of merging / consolidating the pairs with common extrema can be solved using this answer.
Hence, the above code may be rewritten like:
a = np.array([[1,0,0,0,0,1],
[0,1,0,1,0,0],
[0,0,1,0,1,1],
[0,1,0,1,0,0],
[0,0,1,0,1,0],
[1,0,1,0,0,1]])
a[np.tril_indices(6, -1)]= 0
a[np.diag_indices(6)] = 0
g = np.c_[np.where(a)].tolist()
def consolidate(items):
items = [set(item.copy()) for item in items]
for i, x in enumerate(items):
for j, y in enumerate(items[i + 1:]):
if x & y:
items[i + j + 1] = x | y
items[i] = None
return [sorted(x) for x in items if x]
p = {i + 1: x for i, x in enumerate(sorted(consolidate(g)))}
thats what I get:
TypeError: 'float' object is unsubscriptable
Thats what I did:
import numpy as N
import itertools
#I created two lists, containing large amounts of numbers, i.e. 3.465
lx = [3.625, 4.625, ...]
ly = [41.435, 42.435, ...] #The lists are not the same size!
xy = list(itertools.product(lx,ly)) #create a nice "table" of my lists
#that iterttools gives me something like
print xy
[(3.625, 41.435), (3.625, 42.435), (... , ..), ... ]
print xy[0][0]
print xy[0][1] #that works just fine, I can access the varios values of the tuple in the list
#down here is where the error occurs
#I basically try to access certain points in "lon"/"lat" with values from xy through `b` and `v`with that iteration. lon/lat are read earlier in the script
b = -1
v = 1
for l in xy:
b += 1
idx = N.where(lon==l[b][b])[0][0]
idy = N.where(lat==l[b][v])[0][0]
lan/lot are read earlier in the script. I am working with a netCDF file and this is the latitude/longitude,read into lan/lot.
Its an array, build with numpy.
Where is the mistake?
I tried to convert b and v with int() to integers, but that did not help.
The N.where is accessing through the value from xy a certain value on a grid with which I want to proceed. If you need more code or some plots, let me know please.
Your problem is that when you loop over xy, each value of l is a single element of your xy list, one of the tuples. The value of l in the first iteration of the loop is (3.625, 41.435), the second is (3.625, 42.435), and so on.
When you do l[b], you get 3.625. When you do l[b][b], you try to get the first element of 3.625, but that is a float, so it has no indexes. That gives you an error.
To put it another way, in the first iteration of the loop, l is the same as xy[0], so l[0] is the same as xy[0][0]. In the second iteration, l is the same as xy[1], so l[0] is the same as xy[1][0]. In the third iteration, l is equivalent to xy[2], and so on. So in the first iteration, l[0][0] is the same as xy[0][0][0], but there is no such thing so you get an error.
To get the first and second values of the tuple, using the indexing approach you could just do:
x = l[0]
y = l[1]
Or, in your case:
for l in xy:
idx = N.where(lon==l[0])[0][0]
idy = N.where(lat==l[1])[0][0]
However, the simplest solution would be to use what is called "tuple unpacking":
for x, y in xy:
idx = N.where(lon==x)[0][0]
idy = N.where(lat==y)[0][0]
This is equivalent to:
for l in xy:
x, y = l
idx = N.where(lon==x)[0][0]
idy = N.where(lat==y)[0][0]
which in turn is equivalent to:
for l in xy:
x = l[0]
y = l[1]
idx = N.where(lon==x)[0][0]
idy = N.where(lat==y)[0][0]
I have a list that contains sublists with 3 values and I need to print out a list that looks like:
I also need to compare the third column values with eachother to tell if they are increasing or decreasing as you go down.
bb = 3.9
lowest = 0.4
#appending all the information to a list
allinfo= []
while bb>=lowest:
everything = angleWithPost(bb,cc,dd,ee)
allinfo.append(everything)
bb-=0.1
I think the general idea for finding out whether or not the third column values are increasing or decreasing is:
#Checking whether or not Fnet are increasing or decreasing
ii=0
while ii<=(10*(bb-lowest)):
if allinfo[ii][2]>allinfo[ii+1][2]:
abc = "decreasing"
elif allinfo[ii][2]<allinfo[ii+1][2]:
abc = "increasing"
ii+=1
Then when i want to print out my table similar to the one above.
jj=0
while jj<=(10*(bb-lowest))
print "%8.2f %12.2f %12.2f %s" %(allinfo[jj][0], allinfo[jj][1], allinfo[jj][2], abc)
jj+=1
here is the angle with part
def chainPoints(aa,DIS,SEG,H):
#xtuple x chain points
n=0
xterms = []
xterm = -DIS
while n<=SEG:
xterms.append(xterm)
n+=1
xterm = -DIS + n*2*DIS/(SEG)
#
#ytuple y chain points
k=0
yterms = []
while k<=SEG:
yterm = H + aa*m.cosh(xterms[k]/aa) - aa*m.cosh(DIS/aa)
yterms.append(yterm)
k+=1
return(xterms,yterms)
#
#
def chainLength(aa,DIS,SEG,H):
xterms, yterms = chainPoints(aa,DIS,SEG,H)# using x points and y points from the chainpoints function
#length of chain
ff=1
Lterm=0.
totallength=0.
while ff<=SEG:
Lterm = m.sqrt((xterms[ff]-xterms[ff-1])**2 + (yterms[ff]-yterms[ff-1])**2)
totallength += Lterm
ff+=1
return(totallength)
#
def angleWithPost(aa,DIS,SEG,H):
xterms, yterms = chainPoints(aa,DIS,SEG,H)
totallength = chainLength(aa,DIS,SEG,H)
#Find the angle
thetaradians = (m.pi)/2. + m.atan(((yterms[1]-yterms[0])/(xterms[1]-xterms[0])))
#Need to print out the degrees
thetadegrees = (180/m.pi)*thetaradians
#finding the net force
Fnet = abs((rho*grav*totallength))/(2.*m.cos(thetaradians))
return(totallength, thetadegrees, Fnet)
Review this Python2 implementation which uses map and an iterator trick.
from itertools import izip_longest, islice
from pprint import pprint
data = [
[1, 2, 3],
[1, 2, 4],
[1, 2, 3],
[1, 2, 5],
]
class AddDirection(object):
def __init__(self):
# This default is used if the series begins with equal values or has a
# single element.
self.increasing = True
def __call__(self, pair):
crow, nrow = pair
if nrow is None or crow[-1] == nrow[-1]:
# This is the last row or the direction didn't change. Just return
# the direction we previouly had.
inc = self.increasing
elif crow[-1] > nrow[-1]:
inc = False
else:
# Here crow[-1] < nrow[-1].
inc = True
self.increasing = inc
return crow + ["Increasing" if inc else "Decreasing"]
result = map(AddDirection(), izip_longest(data, islice(data, 1, None)))
pprint(result)
The output:
pts/1$ python2 a.py
[[1, 2, 3, 'Increasing'],
[1, 2, 4, 'Decreasing'],
[1, 2, 3, 'Increasing'],
[1, 2, 5, 'Increasing']]
Whenever you want to transform the contents of a list (in this case the list of rows), map is a good place where to begin thinking.
When the algorithm requires data from several places of a list, offsetting the list and zipping the needed values is also a powerful technique. Using generators so that the list doesn't have to be copied, makes this viable in real code.
Finally, when you need to keep state between calls (in this case the direction), using an object is the best choice.
Sorry if the code is too terse!
Basically you want to add a 4th column to the inner list and print the results?
#print headers of table here, use .format for consistent padding
previous = 0
for l in outer_list:
if l[2] > previous:
l.append('increasing')
elif l[2] < previous:
l.append('decreasing')
previous = l[2]
#print row here use .format for consistent padding
Update for list of tuples, add value to tuple:
import random
outer_list = [ (i, i, random.randint(0,10),)for i in range(0,10)]
previous = 0
allinfo = []
for l in outer_list:
if l[2] > previous:
allinfo.append(l +('increasing',))
elif l[2] < previous:
allinfo.append(l +('decreasing',))
previous = l[2]
#print row here use .format for consistent padding
print(allinfo)
This most definitely can be optimized and you could reduce the number of times you are iterating over the data.