I am using tflearn and I am using the following code to load my csv file...
data, labels = load_csv('/home/eric/Documents/Speed Dating Data.csv',
target_column=0, categorical_labels=False)
Here is a snippet of my csv file (there are a lot more columns)...
I want to remove a specific column. For example, let's say I remove column 1 and then print out the data for column 1 to 5...
def preprocess(cols_del):
data, labels = load_csv('/home/eric/Documents/Speed Dating Data.csv',
target_column=0, categorical_labels=False)
for col_del in sorted(cols_del):
[data.pop(col_del) for position in data]
for i in range(20):
print(data[i][0:5])
def main(_):
delete = [0]
preprocess(delete)
This is the result...
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['9', '1', '18', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
['10', '1', '20', '2', '11']
The data is clearly different. What is going on? Are rows being deleted instead of column? How can I delete the entire column completely without altering any other columns?
Also, I know it is kind of a separate question, but if I were to use n_classes in my load csv function, how would I do that? Is that the number of column in my CSV?
What's happening is that the line [data.pop(col_del) for position in data] is deleting half your rows, and then you're displaying the first 20 rows of what's left. (It would delete all the rows, but the call to pop is advancing the loop iterator.)
If you don't want certain columns you should pass your delete list to the columns_to_ignore parameter when you call load_csv. See the function description at load_csv. If you need to remove columns from a dataset in memory I think it would be worth your time to learn the basics of the Pandas library; it will make your life much simpler.
You would need n_classes if your target labels were categorical, in order to tell load_csv how many categories there are. Since you have categorical_labels=False, you shouldn't need it.
Related
mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
Suppose, I have this vector of vectors.
Say, I need to extract 2nd column of each row, convert them into binary, and then create a vector of them.
Is it possible to do it without using NumPy?
Use zip for transpose list and make loop with enumerate and filter by id with bin().
mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
vec = [[bin(int(r)) for r in row] for idx, row in enumerate(zip(*mat)) if idx == 1][0]
print(vec) # ['0b10', '0b111', '0b1100']
Yes. This is achievable with the following code :
mat = [['1', '2', '3', '4', '5'],
['6', '7', '8', '9', '10'],
['11', '12', '13', '14', '15']]
def decimalToBinary(n):
return bin(n).replace("0b", "")
new_vect = []
for m in mat:
m = int(m[1])
new_vect.append(decimalToBinary(m))
print (new_vect)
Hope this is expected
['10', '111', '1100']
For who's interested in; lucky numbers are generated by eliminating numbers based on their position in the set. i.e:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
First eliminates every second number bc second value is 2:
1,3,5,7,9,11,13,15,17,19,21
First remaining number in the list after 1 is 3; so it eliminates every third number:
1,3,7,9,13,15,17,19,21
Next number is 7; eliminate every 7th number in the list:
1,3,7,9,13,15,21
Next surviving nunmber after 7 is 9 but obviously there are not enough numbers to eliminate.
For further information you can check Lucky Number
So, if my list doesn't contain any negative number and begins with 1 i.e:
numbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22']
My code supposed to delete every number unless it's a lucky number, so I tried:
def lucky_numbers(numbers):
del numbers[::-2] # Delete even numbers
while int(numbers[1]) < len(numbers):
x = int(numbers[1])
del numbers[-1::x]
print(numbers)
return
return
lucky_numbers(numbers)
But it returns:
['1', '3', '5', '7', '9', '11', '13', '15', '17', '19']
Where am I wrong? Or is there any efficient way to write it? Thank you.
The negative index is a bit confusing to me (at least), see if this code is easy to interpret-
numbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22']
def lucky_numbers(numbers):
index = 1
next_freq = int(numbers[index])
while int(next_freq) < len(numbers):
del numbers[next_freq-1::next_freq]
print(numbers)
if str(next_freq) in numbers:
index += 1
next_freq = int(numbers[index])
else:
next_freq = int(numbers[index])
return
lucky_numbers(numbers)
['1', '3', '5', '7', '9', '11', '13', '15', '17', '19', '21']
['1', '3', '7', '9', '13', '15', '19', '21']
['1', '3', '7', '9', '13', '15', '21']
After reading from a file I have a list of lists contaning not only digits but also other characters, which I would like to get rid of.
I've tried using re.sub function but this doesn't seem to work
import re
Poly_id= [['0', '[4', '8', '18', '20', '5', '0', '4]'], ['1', '[13', '16',
'6', '11', '13]'], ['2', '[3', '1', '10', '9', '2', '15', '3]'], ['3',
'[13', '12', '16', '13]'], ['4', '[13', '11', '17', '14', '7', '13]']]
for x in Poly_id:
[re.sub(r'\W', '', ch) for ch in x]
This doesn't seem to change a thing in this list.
I would like to have a list with only numbers as elements so that I could convert them into integers
I guess technically [4 is non numeric so you can do something like this:
Poly_id = [[char for char in _list if str.isnumeric(char)] for _list in Poly_id]
Output:
['0', '8', '18', '20', '5', '0']
['1', '16', '6', '11']
['2', '1', '10', '9', '2', '15']
['3', '12', '16']
['4', '11', '17', '14', '7']
If you just want to remove the non numeric values and not the complete entry then you can do this:
Poly_id = [[''.join(char for char in substring if str.isnumeric(char)) for substring in _list] for _list in Poly_id]
Output:
['0', '4', '8', '18', '20', '5', '0', '4']
['1', '13', '16', '6', '11', '13']
['2', '3', '1', '10', '9', '2', '15', '3']
['3', '13', '12', '16', '13']
['4', '13', '11', '17', '14', '7', '13']
Here a solution if you want to get rid of the '[' in '[4' but keep the '4':
res = [[re.sub(r'\W', '', st) for st in inlist] for inlist in Poly_id]
res is:
[
['0', '4', '8', '18', '20', '5', '0', '4'],
['1', '13', '16', '6', '11', '13'],
['2', '3', '1', '10', '9', '2', '15', '3'],
['3', '13', '12', '16', '13'],
['4', '13', '11', '17', '14', '7', '13']
]
You can use a module, "itertools"
import itertools
list_of_lists = [[1, 2], [3, 4]]
print(list(itertools.chain(*list_of_lists)))
>>>[1, 2, 3, 4]
Basically the issue is as follows: I have a bunch of workers that have a function prescribed to each (the function is worker(alist) ) and am trying to process 35 workers at the same time. Each worker reads their line from the file (the modulo part) and should process the line using the "worker" function. I've pen-tested and found that the raw manipulation and deletion of the useless indices is working 100% as intended.
The args part of the "pool.apply_async" function isn't passing the list "raw" into it and starting the process. Raw is completely correct and functions normally, worker by itself functions normally, the pool.apply_async function is the only place that there seems to be an issue and I have no idea how to fix it. Any help please?
The relevant code is here:
NUM_WORKERS=35
f=open("test.csv")
pool=multiprocessing.Pool()
open("final.csv",'w')
for workernumber in range(1, NUM_WORKERS):
for i,line in enumerate(f):
if i==0:
print "Skipping first line" #dont do anything
elif i%workernumber==0:
raw = line.split(',')[0][1:-1].split()
uselessindices=[-2,-3,-4,-5,-6]
counter=0
for ui in uselessindices:
del raw[ui+counter]
counter+=1
print raw
pool.apply_async(worker, args=(raw,))
pool.close()
pool.join()
I suggest you put the calculation of raw into a generator function, and then use Pool.imap_unordered() or Pool.map() to run worker() over all of the items in the generator.
Something like this untested code:
def get_raw():
with open("test.csv", 'rU') as f:
for i, line in enumerate(f):
if i == 0:
# skip header
continue
raw = line.split(',')[0][1:-1].split()
uselessindices=[-2,-3,-4,-5,-6]
counter=0
for ui in uselessindices:
del raw[ui+counter]
counter+=1
yield raw
pool=multiprocessing.Pool(processes=NUM_WORKERS)
pool.map(worker, get_raw())
pool.close()
pool.join()
import multiprocessing
def worker(arg):
print 'doing work "%s"' % arg
return
NUM_WORKERS=35
with open('test.csv', 'w') as test:
for i in xrange(100):
if i % 10 == 0:
test.write('\n')
test.write('"%s 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23",' % i)
f=open("test.csv")
pool=multiprocessing.Pool(processes=NUM_WORKERS)
open("final.csv",'w')
for i, line in enumerate(f):
if i == 0:
continue
raw = line.split(',')[0][1:-1].split()
uselessindices=[-2,-3,-4,-5,-6]
counter=0
for ui in uselessindices:
del raw[ui+counter]
counter+=1
pool.apply_async(worker, args=(raw,))
pool.close()
pool.join()
print 'last raw len: %s' % len(raw)
print 'last raw value: %s' % raw
Output:
doing work "['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['10', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['20', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['30', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['40', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['50', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['60', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['70', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['80', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
doing work "['90', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']"
last raw len: 19
last raw value: ['90', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '23']
So I found out it wasn't throwing an error that was occurring inside the worker as a result of a mismatched number of inputs into a child function (AKA worker was calling another function dosomething(a1,a2,...a20) and was only giving it 19 inputs). It seems async won't throw error outputs about issues happening inside the worker which is quite annoying but I now understand. Thanks for all the help!
I need to read in a CSV file, from Excel, whose rows may be an arbitrary length.
The problem is the python retains these blank entries, but need to delete them for a future algorithm. Below is the output, I don't want the blank entries.
['5', '1', '5', '10', '4', '']
['3', '1', '5', '10', '2', '']
['6', '1', '5', '10', '5', '2']
['9', '10', '5', '10', '7', '']
['8', '5', '5', '10', '7', '']
['1', '1', '5', '10', '', '']
['2', '1', '5', '10', '1', '']
['7', '1', '5', '10', '6', '4']
['4', '1', '5', '10', '3', '1']
Here's a list comprehension integrated with the csv library:
import csv
with open('input.csv') as in_file:
reader = csv.reader(in_file)
result = [[item for item in row if item != ''] for row in reader]
print result
This is about as verbose a function as I could write to do what you want. There are certainly slicker ways.
def remove_blanks(a_list):
new_list = []
for item in a_list:
if item != "":
new_list.append(item)
return new_list
List comprehension version:
a = ['5', '1', '5', '10', '4', '']
[x for x in a if x != '']
Out[19]: ['5', '1', '5', '10', '4']
You may be better served by filtering at the csv read step instead.