I have a collection of 101 documents, I need to iterate over them taking 10 collections at a time and store a value of a particular field(of 10 documents) in a list.
I tried this:
values = db.find({},{"field":1})
urls = []
count = 0
for value in values:
if(count < 10):
urls.append(value["field"])
count = count + 1
print count
else:
print urls
urls = []
urls.append(value["field"])
count = 1
It doesn't fetch the last value because it doesn't reach if condition. Any elegant way to do this and rectify ths situation?
You reset count to 0 everytime the loop restarted. Move the declaration outside the loop:
count = 0
for value in values:
If urls is already filled, this will be your only problem.
As far as I can tell, you've some data that you want to organize into batches of size 10. If so, perhaps this will help:
N = 10
values = list(db.find({},{"field":1}))
url_batches = [
[v['field'] for v in values[i:i+N]]
for i in xrange(0, len(values), N)
]
Related
I have a Pandas dataframe with ~100,000,000 rows and 3 columns (Names str, Time int, and Values float), which I compiled from ~500 CSV files using glob.glob(path + '/*.csv').
Given that two different names alternate, the job is to go through the data and count the number of times a value associated with a specific name ABC deviates from its preceding value by ±100, given that the previous 50 values for that name did not deviate by more than ±10.
I initially solved it with a for loop function that iterates through each row, as shown below. It checks for the correct name, then checks the stability of the previous values of that name, and finally adds one to the count if there is a large enough deviation.
count = 0
stabilityTime = 0
i = 0
if names[0] == "ABC":
j = value[0]
stability = np.full(50, values[0])
else:
j = value[1]
stability = np.full(50, values[1])
for name in names:
value = values[i]
if name == "ABC":
if j - 10 < value < j + 10:
stabilityTime += 1
if stabilityTime >= 50 and np.std(stability) < 10:
if value > j + 100 or value < j - 100:
stabilityTime = 0
count += 1
stability = np.roll(stability, -1)
stability[-1] = value
j = value
i += 1
Naturally, this process takes a very long computing time. I have looked at NumPy vectorization, but do not see how I can apply it in this case. Is there some way I can optimize this?
Thank you in advance for any advice!
Bonus points if you can give me a way to concatenate all the data from every CSV file in the directory that is faster than glob.glob(path + '/*.csv').
Is it possible to make a randomizer that randomizes entire rows using the csvwriter? The code I have is similar to this:
for i in range(45):
count=count+1
writer.writerow((count,pattern))
Where pattern is a number which corresponds to count. For example: when count=1 pattern=1; count=2 pattern=9; count=3 pattern=17, and so on... I want a way to randomize the rows so that the correct count corresponds to the correct pattern still. Any help is greatly appreciated!
Load it into a two dimensional array storing the count in a[i][0] and the pattern in a[i][1] then shuffle then write them to the csv file.
import random
count = 0
a = []
for i in range(45):
count = count + 1
a.append([count,pattern])
random.shuffle(a)
for i in range(len(a)):
writer.writerow(a[i][0], a[i][1]) #a[i][0] = count, a[i][1] = pattern
This is not really a csvwriter specific question, but the way I understand your question is that you want to write random 'counts' to your csv file, that correspond to a set number. I'm not sure if your pattern in this case is n + 8, but that's how it would look like. One option would be to just create a dictionary with the pattern, select key from the dictionary and then select the value and write them. Like so:
import random
dict = {}
n = 1
for i in range(n):
n += 8
dict[i+1] = n
for i in range(a):
count = random.randint(1,n)
row = (count, dict[count])
writer.writerow(row)
I'm trying to improve the efficiency of a script that takes a nested list representing a data table, with a column of IDs (each of which might have many entries). The script counts the number of IDs that have more than 100 entries, and more than 200 entries.
Is there a way I can not have to cycle through the list each time with the list comprehension maybe?
list_of_IDs = [row[4] for row in massive_nested_list] ### get list of ID numbers
list_of_IDs = set(list_of_IDs) ### remove duplicates
list_of_IDs = list(list_of_IDs)
counter200 = 0
counter100 = 0
for my_ID in list_of_IDs:
temp = [row for row in massive_nested_list if row[4] == my_ID]
if len(temp) > 200:
counter200 += 1
if len(temp) > 100:
counter100 += 1
Use a collections.Counter() instance to count your ids. There is no need to collect all possible ids first. You can then collate counts from there:
from collections import Counter
counts = Counter(row[4] for row in massive_nested_list)
counter100 = counter200 = 0
for id, count in counts.most_common():
if count >= 200:
counter200 += 1
elif count >= 100:
counter100 += 1
else:
break
Given K unique IDs in N nested lists, your code would take O(KN) loops to count everything; worst case (K == N) that means your solution takes quadratic time (for every additional row you need to do N times more work). The above code reduces this no one loop over N items, then another loop over K items, making it a O(N) (linear) algorithm.
The simplest method would be to go:
temp100 = [row for row in massive_nested_list if row[4] == my_ID and row >= 100 and row < 200]
temp200 = [row for row in massive_nested_list if row[4] == my_ID and row >= 200]
then you could go:
len(temp200)
OR
counter200 = len(temp200)
I need to break up a length of numbers into chunks of 100 and what ever is left over and then add them to a final dictionary at the end.
I am able to do it with loops but I feel I might be missing something that would make this a much cleaner and efficient operation.
l = 238 # length of list to process
i = 0 #setting up count for while loop
screenNames = {}#output dictionary
count = 0 #count of total numbers processed
while i < l:
toGet = {}
if l - count > 100:#blocks off in chunks of 100
for m in range (0,100):
toGet[count] = m
count = count + 1
else:
k = count
for k in range (0,(l - count)):#takes the remainder of the numbers
toGet[count] = k
count = count + 1
i = l # kills loop
screenNames.update(toGet)
#This logic structure breaks up the list of numbers in chunks of 100 or their
#Remainder and addes them into a dictionary with their count number as the
#index value
print 'returning:'
print screenNames
The above code works but it feels clunky does anyone have any better ways of handling this?
as far as I can see, you map a key n to the value n % 100, so this might be as well written as
screenNames = dict((i, i%100) for i in range(238))
print screenNames
Running your code, it looks like you're just doing modular arithmetic:
l = 238
sn = {}
for i in xrange(l):
sn[i] = i % 100
print sn
Or more succinctly:
l = 238
print dict((i, i % 100) for i in xrange(l))
That works by constructing a dictionary based on key-pair tuples.
So I need to save the results of a loop and I'm having some difficulty. I want to record my results to a new list, but I get "string index out of range" and other errors. The end goal is to record the products of digits 1-5, 2-6, 3-7 etc, eventually keeping the highest product.
def product_of_digits(number):
d= str(number)
for integer in d:
s = 0
k = []
while s < (len(d)):
j = (int(d[s])*int(d[s+1])*int(d[s+2])*int(d[s+3])*int(d[s+4]))
s += 1
k.append(j)
print(k)
product_of_digits(n)
Similar question some time ago. Hi Chauxvive
This is because you are checking until the last index of d as s and then doing d[s+4] and so on... Instead, you should change your while loop to:
while s < (len(d)-4):