Python and CSV: find a row and give column value

Python and CSV: find a row and give column value - python

It consists in creating a function def(,) that searches for the name of the kid in the CSV file and gives his age.
The CSV file is structured as this:
Nicholas,12
Matthew,6
Lorna,12
Michael,8
Sebastian,8
Joseph,10
Ahmed,15
while the code that I tried is this:
def fetchcolvalue(kids_agefile, kidname):
import csv
file = open(kids_agefile, 'r')
ct = 0
for row in csv.reader(file):
while True:
print(row[0])
if row[ct] == kidname:
break
The frustrating thing is that it doesn't give me any error, but an infinite loop: I think that's what I'm doing wrong.
So far, what I learnt from the book is only loops (while and for) and if-elif-else cycles, besides CSV and file basic manipulation operations, so I can't really figure out how can I solve the problem with only those tools.
Please notice that the function would have to work with a generic 2-columns CSV file and not only the kids' one.

the while True in your loop is going to make you loop forever (no variables are changed within the loop). Just remove it:
for row in csv.reader(file):
if row[ct] == kidname:
break
else:
print("{} not found".format(kidname))
the csv file is iterated upon, and as soon as row[ct] equals kidname it breaks.
I would add an else statement so you know if the file has been completely scanned without finding the kid's name (just to expose some little-known usage of else after a for loop: if no break encountered, goes into else branch.)
EDIT: you could do it in one line using any and a generator comprehension:
any(kidname == row[ct] for row in csv.reader(file))
will return True if any first cell matches, probably faster too.

This should work, in your example the for loop sets row to the first row of the file, then starts the while loop. The while loop never updates row so it is infinite. Just remove the while loop:
def fetchcolvalue(kids_agefile, kidname):
import csv
file = open(kids_agefile, 'r')
ct = 0
for row in csv.reader(file):
if row[ct] == kidname:
print(row[1])

Related

I can not add a loop to another Python loop, please help me

I want to add another loop to a Python loop that I have, but I can not come up with the solution, I am new to Python so I need help to make it work, please.
Here is what the code does:
1- It starts with the first CSV file, which reads from the first row (there is a given range to use).
2- it uses the data to run the function. After a few seconds of delay, it goes to the next row and does the same, same process to use all rows in the given range.
3- After a few seconds of delay, it goes to the next CSV file, does the same process as above (delay between each row), until all CSV files are used.
The problem is, when I run the file for the second time and more, it uses the same rows.
I want to add another loop to it, so when I run it again, it iterates over the next row that has not been used.
This is how it should work, assuming the given range is: 2 rows.
1- the first-time run, should read rows 1 and 2 of all CSV files, use the data to run the function.
2- second-time run, should read rows 3 and 4 of all CSV files, use the data to run function. and so on when the next time the file is going to be run.
I appreciate if I can help to make this work,
I have posted this problem some time ago but I did not get any working solution. so I try again.
Here is code that works:
from abc.zzz import xyz
path_id_map = [
{'path':'file1.csv', 'id': '12345678'},
{'path':'file2.csv', 'id': '44556677'}
{'path':'file3.csv', 'id': '33377799'}
{'path':'file4.csv', 'id': '66221144'}]
s_id = None
for pair in path_id_map:
with open(pair['path'], 'r') as f:
next(f) # skip first header line
for _ in range(1, 3):
line = next(f)
img_url, title_1, desc_1, link_1 = map(str.strip, line.split(';'))
zzz.func1(img_url=img_url, title_1=title_1, desc_1=desc_1,
link_1=link_1, B_id=B_id=pair['id'], s_id=s_id)
time.sleep(25)
CSV file content looks like this:
img_url,desc_1 title_1,link_1
site.com/image22.jpg;someTitle;description1;site1.com
site.com/image32.jpg;someTitle;description2;site2.com
site.com/image44.jpg;someTitle;description3;site3.com
thanks.
Edited:
OK, I got some help with a posted solution, but unfortunately, it did not work, and the person did not want to continue to work on the provided code to solve the problem.
I am back to asking for help again.

Can you put your processing logic in a separate function and call that?
All files should have same size or you need to put safety checks to not process when next(f) return empty. (which is a good thing to have anyway)
def process_n_lines_from_offset(file_set, processed, offset):
for pair in path_id_map:
with open(pair['path'], 'r') as f:
next(f) # skip first header line
# skip processed lines
x=0
while (x<processed):
next(f)
x += 1
num_processed = 0
for _ in range(offset):
line = next(f)
img_url, title_1, desc_1, link_1 = map(str.strip, line.split(';'))
zzz.func1(img_url=img_url, title_1=title_1, desc_1=desc_1,
link_1=link_1, B_id=B_id=pair['id'], s_id=s_id)
time.sleep(25)
num_processed += 1
return num_processed
if __name__ == '__main__':
start = 0
default_offset = 2
num_processed = process_n_lines_from_offset(path_id_map, start, default_offset)
num_processed = process_n_lines_from_offset(path_id_map, num_processed, default_offset)

Create Excel file from Python

My project is to treat different Excel files. To do this, I would like to create a single file that contains some data of the previous files. All this in order to have my database. The goal is to obtain graphs of these data. All of this automatically.
I wrote this program in Python. However, it takes 20 minutes to run it. How can I optimize it?
In addition, I have identical variables in some files. So I would like that in the final file, the identical variables are not repeated. How to do?
Here is my program :
import os
import xlrd
import xlsxwriter
from xlrd import open_workbook
wc = xlrd.open_workbook("U:\\INSEE\\table-appartenance-geo-communes-16.xls")
sheet0=wc.sheet_by_index(0)
# création
with xlsxwriter.Workbook('U:\\INSEE\\Department61.xlsx') as bdd:
dept61 = bdd.add_worksheet('deprt61')
folder_path = "U:\\INSEE\\2013_telechargement2016"
col=8
constante3=0
lastCol=0
listeV = list()
for path, dirs, files in os.walk(folder_path):
for filename in files:
filename = os.path.join(path, filename)
wb = xlrd.open_workbook(filename, '.xls')
sheet1 = wb.sheet_by_index(0)
lastRow=sheet1.nrows
lastCol=sheet1.ncols
colDep=None
firstRow=None
for ligne in range(0,lastRow):
for col2 in range(0,lastCol):
if sheet1.cell_value(ligne, col2) == 'DEP':
colDep=col2
firstRow=ligne
break
if colDep is not None:
break
col=col-colDep-2-constante3
constante3=0
for nCol in range(colDep+2,lastCol):
constante=1
for ligne in range(firstRow,lastRow):
if sheet1.cell(ligne, colDep).value=='61':
Q=(sheet1.cell(firstRow, nCol).value in listeV)
if Q==False:
V=sheet1.cell(firstRow, nCol).value
listeV.append(V)
dept61.write(0,col+nCol,sheet1.cell(firstRow, nCol).value)
for ligne in range(ligne,lastRow):
if sheet1.cell(ligne, colDep).value=='61':
dept61.write(constante,col+nCol,sheet1.cell(ligne, nCol).value)
constante=constante+1
elif Q==True:
constante3=constante3+1 # I have a problem here. I would like to count the number of variables that already exists but I find huge numbers.
break
col=col+lastCol
bdd.close()
Thanks you for your future help. :)

This one may be too broad for SO, so here are some pointers for where you can optimise. Maybe add a sample screenshot of what the sheets look like.
Wrt if sheet1.cell_value(ligne, col2) == 'DEP': Can 'DEP' occur multiple times in a sheet? If it will definitely occur only once and that's when you get your values for both colDep and firstRow, then break out of both loops. Add break out of both loops, by adding a break to end the inner loop, then check for a flag value and break out of the outer loop before iterating over it. Like so:
colDep = None # initialise to None
firstRow = None # initialise to None
for ligne in range(0,lastRow):
for col2 in range(0,lastCol):
if sheet1.cell_value(ligne, col2) == 'DEP':
colDep=col2
firstRow=ligne
break # break out of the `col2 in range(0,lastCol)` loop
if colDep is not None: # or just `if colDep:` if colDep will never be 0.
break # break out of the `ligne in range(0,lastRow)` loop
I think the range in for ligne in range(0,lastRow): in your write-to-bdd blocks should start at firstRow since you know that 0 to firstRow-1 will be empty in sheet1 which you've just read to look for the header.
for ligne in range(firstRow, lastRow):
That will avoid wasting time reading empty header rows.
Other considerations for cleaner code:
Use the with xlsxwriter.Workbook('U:\INSEE\\Department61.xlsx') as bdd: syntax for clarity.
and always use double slashes \\ inside strings even if not preceding a control character: 'U:\\INSEE\\Department61.xlsx'
You've used sheet1.cell_value() as well as sheet1.cell().value for your read operations. Pick one, unless you needed extended Cell info in the value=='61' case.
Read PEP-8 for how to write more readable code.

Removing rows dependent on one column Python

I'm working on taking apart a python script piece by piece to get a better understanding of 1.) how python works & 2.) what this particular script does & if I can make it better (i.e. usable by slightly more varied inputs).
Okay so, there is a line in my code that looks like this:
thisChrominfo = chrominfo[thisChrom]
Where chrominfo calls a dictionary set that looks like this:
{'chrY': ['chrY', '59373566', '3036303846'], 'chrX': ['chrX', '155270560', '2881033286'], 'chr13': ['chr13', '115169878', '2084766301'], 'chr12': ['chr12', '133851895', '1950914406'], 'chr11': ['chr11', '135006516', '1815907890'], 'chr10': ['chr10', '135534747', '1680373143'], 'chr17': ['chr17', '81195210', '2500171864'], 'chr16': ['chr16', '90354753', '2409817111'], 'chr15': ['chr15', '102531392', '2307285719'], 'chr14': ['chr14', '107349540', '2199936179'], 'chr19': ['chr19', '59128983', '2659444322'], 'chr18': ['chr18', '78077248', '2581367074'], 'chrM': ['chrM', '16571', '3095677412'], 'chr22': ['chr22', '51304566', '2829728720'], 'chr20': ['chr20', '63025520', '2718573305'], 'chr21': ['chr21', '48129895', '2781598825'], 'chr7': ['chr7', '159138663', '1233657027'], 'chr6': ['chr6', '171115067', '1062541960'], 'chr5': ['chr5', '180915260', '881626700'], 'chr4': ['chr4', '191154276', '690472424'], 'chr3': ['chr3', '198022430', '492449994'], 'chr2': ['chr2', '243199373', '249250621'], 'chr1': ['chr1', '249250621', '0'], 'chr9': ['chr9', '141213431', '1539159712'], 'chr8': ['chr8', '146364022', '1392795690']}
and thisChrom calls a single column (non-integer) that includes things like this:
'*' to `chr4`to `chrY` etc.
thisChrom only returns one value at a time, because it relies on a piece higher up in the file that specifies only a single row:
for x in INFILE:
arow = x.rstrip().split("\t")
thisChrom = arow[2]
thisChrompos = arow[3]
So it's pulling one column from one row.
The whole thing falls apart when values like '*' are present in arow, because that's not in the chrominfo dictionary. At first I thought I should just go ahead and add it to the dictionary, but now I'm thinking it would be easier and better to instead add a line at the top that says something like, if arow[2] == '*' then delete it, else continue.
I know it should look something like this:
for x in INFILE:
arow = x.rstrip().split("\t")
thisChrom = arow[2]
thisChrompos = arow[3]
if arow == '*': arow.remove(*)
else:
continue
but I haven't been able to get the syntax quite right. All of my Python is self & stackoverflow taught, so I appreciate your suggestions and guidance. Sorry if that was an over-explanation of something that is very simple for most experts (I am not an expert).

The continue keyword is somewhat non-obvious. What it does is skip the remaining contents of the loop, and start the next iteration. So, what you wrote will skip the rest of the loop only if arow is not equal to '*'.
Instead of
if arow == '*': arow.remove(*)
else:
continue
# process the row
you might simply want to either use a simple if condition:
if arow != '*':
# process the row
or use continue in the way you probably intended:
if arow == '*': continue
# process the row
See how it works in the opposite way of what you thought? Also, you don't need an else in this case because of how the continue skips the rest of the loop.
If you're familiar with the break keyword, it may make more sense as a comparison. The break keyword stops the loop entirely - it "breaks" out of it and moves on. The continue keyword is simply a "weaker" version of that - it "breaks" out of the current iteration but not all the way out of the loop.

appending array breaks program

I am writing a program to analyze some of our invoice data. Basically,I need to take an array containing each individual invoice we sent out over the past year & break it down into twelve arrays which contains the invoices for that month using the dateSeperate() function, so that monthly_transactions[0] returns Januaries transactions, monthly_transactions[1] returns Februaries & so forth.
I've managed to get it working so that dateSeperate returns monthly_transactions[0] as the january transactions. However, once all of the January data is entered, I attempt to append the monthly_transactions array using line 44. However, this just causes the program to break & become unrepsonsive. The code still executes & doesnt return an error, but Python becomes unresponsive & I have to force quite out of it.
I've been writing the the global array monthly_transactions. dateSeperate runs fine as long as I don't include the last else statement. If I do that, monthly_transactions[0] returns an array containing all of the january invoices. the issue arises in my last else statement, which when added, causes Python to freeze.
Can anyone help me shed any light on this?
I have written a program that defines all of the arrays I'm going to be using (yes I know global arrays aren't good. I'm a marketer trying to learn programming so any input you could give me on how to improve this would be much appreciated
import csv
line_items = []
monthly_transactions = []
accounts_seperated = []
Then I import all of my data and place it into the line_items array
def csv_dict_reader(file_obj):
global board_info
reader = csv.DictReader(file_obj, delimiter=',')
for line in reader:
item = []
item.append(line["company id"])
item.append(line["user id"])
item.append(line["Amount"])
item.append(line["Transaction Date"])
item.append(line["FIrst Transaction"])
line_items.append(item)
if __name__ == "__main__":
with open("ChurnTest.csv") as f_obj:
csv_dict_reader(f_obj)
#formats the transacation date data to make it more readable
def dateFormat():
for i in range(len(line_items)):
ddmmyyyy =(line_items[i][3])
yyyymmdd = ddmmyyyy[6:] + "-"+ ddmmyyyy[:2] + "-" + ddmmyyyy[3:5]
line_items[i][3] = yyyymmdd
#Takes the line_items array and splits it into new array monthly_tranactions, where each value holds one month of data
def dateSeperate():
for i in range(len(line_items)):
#if there are no values in the monthly transactions, add the first line item
if len(monthly_transactions) == 0:
test = []
test.append(line_items[i])
monthly_transactions.append(test)
# check to see if the line items year & month match a value already in the monthly_transaction array.
else:
for j in range(len(monthly_transactions)):
line_year = line_items[i][3][:2]
line_month = line_items[i][3][3:5]
array_year = monthly_transactions[j][0][3][:2]
array_month = monthly_transactions[j][0][3][3:5]
#print(line_year, array_year, line_month, array_month)
#If it does, add that line item to that month
if line_year == array_year and line_month == array_month:
monthly_transactions[j].append(line_items[i])
#Otherwise, create a new sub array for that month
else:
monthly_transactions.append(line_items[i])
dateFormat()
dateSeperate()
print(monthly_transactions)
I would really, really appreciate any thoughts or feedback you guys could give me on this code.

Based on the comments on the OP, your csv_dict_reader function seems to do exactly what you want it to do, at least inasmuch as it appends data from its argument csv file to the top-level variable line_items. You said yourself that if you print out line_items, it shows the data that you want.
"But appending doesn't work." I take it you mean that appending the line_items to monthly_transactions isn't being done. The reason for that is that you didn't tell the program to do it! The appending that you're talking about is done as part of your dateSeparate function, however you still need to call the function.
I'm not sure exactly how you want to use your dateFormat and dateSeparate functions, but in order to use them, you need to include them in the main function somehow as calls, i.e. dateFormat() and dateSeparate().
EDIT: You've created the potential for an endless loop in the last else: section, which extends monthly_transactions by 1 if the line/array year/month aren't equal. This is problematic because it's within the loop for j in range(len(monthly_transactions)):. This loop will never get to the end if the length of monthly_transactions is increased by 1 every time through.

Why re is not compiling 'if' when there is 'else'?

Hello I'm facing a problem and I don't how to fix it. All I know is that when I add an else statement to my if statement the python execution always goes to the else statement even there is there a true statement in if and can enter the if statement.
Here is the script, without the else statement:
import re
f = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
d = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
w = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
s=""
av =0
b=""
filtred=[]
Mlines=f.readlines()
Wlines=d.readlines()
for line in Wlines:
Wspl=line.split()
for line2 in Mlines:
Mspl=line2.replace('\n','').split("\t")
if ((Mspl[0]).lower()==(Wspl[0])):
Wspl.append(Mspl[1])
if(len(Mspl)>=3):
Wspl.append(Mspl[2])
s="\t".join(Wspl)+"\n"
if s not in filtred:
filtred.append(s)
break
for x in filtred:
w.write(x)
f.close()
d.close()
w.close()
with the else statement and I want else for the if ((Mspl[0]).lower()==(Wspl[0])):
import re
f = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
d = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
w = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
s=""
av =0
b=""
filtred=[]
Mlines=f.readlines()
Wlines=d.readlines()
for line in Wlines:
Wspl=line.split()
for line2 in Mlines:
Mspl=line2.replace('\n','').split("\t")
if ((Mspl[0]).lower()==(Wspl[0])):
Wspl.append(Mspl[1])
if(len(Mspl)>=3):
Wspl.append(Mspl[2])
s="\t".join(Wspl)+"\n"
if s not in filtred:
filtred.append(s)
break
else:
b="\t".join(Wspl)+"\n"
if b not in filtred:
filtred.append(b)
break
for x in filtred:
w.write(x)
f.close()
d.close()
w.close()

first of all, you're not using "re" at all in your code besides importing it (maybe in some later part?) so the title is a bit misleading.
secondly, you are doing a lot of work for what is basically a filtering operation on two files. Remember, simple is better than complex, so for starters, you want to clean your code a bit:
you should use a little more indicative names than 'd' or 'w'. This goes for 'Wsplt', 's' and 'av' as well. Those names don't mean anything and are hard to understand (why is the d.readlines named Wlines when ther's another file named 'w'? It's really confusing).
If you choose to use single letters, it should still make sense (if you iterate over a list named 'results' it makes sense to use 'r'. 'line1' and 'line2' however, are not recommanded for anything)
You don't need parenthesis for conditions
You want to use as little variables as you can as to not get confused. There's too much different variables in your code, it's easy to get lost. You don't even use some of them.
you want to use strip rather than replace, and you want the whole 'cleaning' process to come first and then just have a code the deals with the filtering logic on the two lists. If you split each line according to some logic, and you don't use the original line anywhere in the iteration, then you can do the whole thing in the beggining.
Now, I'm really confused what you're trying to achieve here, and while I don't understand why your doing it that way, I can say that looking at your logic you are repeating yourself a lot. The action of checking against the filtered list should only happend once, and since it happens regardless of whether the 'if' checks out or not, I see absolutely no reason to use an 'else' clause at all.
Cleaning up like I mentioned, and re-building the logic, the script looks something like this:
# PART I - read and analyze the lines
Wappresults = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
Mikrofull = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
Wapp = map(lambda x: x.strip().split(), Wappresults.readlines())
Mikro = map(lambda x: x.strip().split('\t'), Mikrofull.readlines())
Wappresults.close()
Mikrofull.close()
# PART II - filter using some logic
filtred = []
for w in Wapp:
res = w[:] # So as to copy the list instead of point to it
for m in Mikro:
if m[0].lower() == w[0]:
res.append(m[1])
if len(m) >= 3 :
res.append(m[2])
string = '\t'.join(res)+'\n' # this happens regardles of whether the 'if' statement changed 'res' or not
if string not in filtred:
filtred.append(string)
# PART III - write the filtered results into a file
combination = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
for comb in filtred:
combination.write(comb)
combination.close()
I can't promise it will work (because again, like I said, I don't know what you're trying to achive) but this should be a lot easier to work with.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python and CSV: find a row and give column value - python

Related

I can not add a loop to another Python loop, please help me

Create Excel file from Python

Removing rows dependent on one column Python

appending array breaks program

Why re is not compiling 'if' when there is 'else'?

Categories

Resources