Alternative to "for" LINQ equivalent of Where?

Alternative to "for" LINQ equivalent of Where? - python

I have a CSV file contents in a variable raw_data. I only need certain rows depending on whether the first element of the row (row_split[0]) matches a price_number. The below code works fine however coming from a C# background I know there is a LINQ equivalent. Is there anything in Python that use Where or Any which also includes the row.split(',').
for row in raw_data:
row_split = row.split(',')
if str(row_split[0]) == price_number:
filtered_data.append(row)

I think you can do it using list comprehension
filtered_data = [row for row in row_data if str(row.split(',')[0]) == price_number]

Related

Need to create multiple variables based on number of columns in csv file python

This is and example of what my csv file looks like with 6 columns:
0.0028,0.008,0.0014,0.008,0.0014,0.008,
I want to create 6 variables to use later in my program using these numbers as the values; however, the number of columns WILL vary depending on exactly which csv file I open.
If I were to do this manually and the number of columns was always 6, I would just create the variables like this:
thickness_0 = (row[0])
thickness_1 = (row[1])
thickness_2 = (row[2])
thickness_3 = (row[3])
thickness_4 = (row[4])
thickness_5 = (row[5])
Is there a way to create these variables with a for loop so that it is not necessary to know the number of columns? Meaning it will create the same number of variables as there are columns?

There are ways to do what you want, but this is considered very bad practice, you better never mix your source code with data.
If your code depends on dynamic data from "outer world", use dictionary (or list in your case) to access your data programatically.

You can use a dictionary
mydict = {}
with open('StackupThick.csv', 'r') as infile:
reader = csv.reader(infile, delimiter=',')
for idx, row in enumerate(reader):
key = "thickness_" + str(idx)
mydict[key] = row
Call your values like this:
print(mydict['thickness_3'])

From your question, I understand that your csv files have only one line with the comma separated values. If so, and if you are not fine with dictionares (as in #Mike C. answers) you can use globals() to add variables to the global namespace, which is a dict.
import csv
with open("yourfile.csv", "r", newline='') as yourfile:
rd = csv.reader(yourfile, delimiter=',')
row = next(rd)
for i, j in enumerate(row):
globals()['thickness_' + str(i)] = float(j)
Now you have whatever number of new variables called thickness_i where i is a number starting from 0.
Please be sure that all the values are in the first line of the csv file, as this code will ignore any lines beyond the first.

2 questions on Python : created table & fail to find duplicates in rows

I have this data set which is in this format in this way in csv file:
1st question : I am trying to find duplicates rows in the table just created in python below?
I did try to use the set function to run the rows and the output I got is
no duplicates even though there is a duplicate row in the data set.
2nd question: is it possible to reference this table as i realized that it becomes a table when I print?So that I can use it on the next step for calculation purpose.
COL_1_WIDTH = 10
COL_2_WIDTH = 35
for row in data:
IC1 = len(str(row[0]))
IC2 = len(str(row[1]))
print( str(row[0])+ str( (COL_1_WIDTH-IC1) *' ') +\
str(row[1]) + str( (COL_2_WIDTH-IC2) *' ') +\
str(row[2]))
for row in data:
if len(set(row)) !=len(row):
print ('duplicates: ', row)
else:
print ('no duplicates:', row)
P.s. Permit to use built in function & numpy only.
Grateful for any ideas. Thank you!

You don't really explain what kind of object is 'data', so I assumed it was a list of strings.
Here's how I created mine from a csv file:
with open('/home/sebastien/Documents/answerSO.csv') as file:
data=file.read() #a string
data=data.split('\n') #a list of strings
data.pop() #to delete the last element, an empty string
(note that using the csv module may be a better idea)
Now, to look for duplicates, I used the method explained here:
How do I find the duplicates in a list and create another list with them?
seen = set()
uniq = []
for row in data:
if row not in seen:
uniq.append(row)
seen.add(row)
else:
print("found a duplicate:",row)
And about referencing it, well, it's in 'data'

Python - Get item from a list under a list

I have a list like below.
list = [[Name,ID,Age,mark,subject],[karan,2344,23,87,Bio],[karan,2344,23,87,Mat],[karan,2344,23,87,Eng]]
I need to get only the name 'Karan' as output.
How can I get that?

This is a 2D list,
list[i][j]
will give you the 'i'th list within your list and the 'j'th item within that list.
So to get Karen you want list[1][0]

I upvoted Lio Elbammalf, but decided to provide an answer that made a couple of assumptions that should have been clarified in the question:
The First item of the list is the headers, they are actually in the list (and not there as part of the question), and they are provided as part of the list because there is no guarantee that the headers will always be in the same order.
This is probably a CSV file
Ignoring 2 for the moment, what you would want to do is remove the "headers" from the list (so that the rest of the list is uniform), and then find the index of "Name" (your desired output).
myinput = [["Name","ID","Age","mark","subject"],
["karan",2344,23,87,"Bio"],
["karan",2344,23,87,"Mat"],
["karan",2344,23,87,"Eng"]]
## Remove the headers from the list to simplify everything
headers = myinput.pop(0)
## Figure out where to find the person's Name
nameindex = headers.index("Name")
## Return a list of the Name in each row
return [stats[nameindex] for stats in myinput]
If the name is guaranteed to be the same in each row, then you can just return myinput[0][nameindex] like is suggested in the other answer
Now, if 2 is true, I'm assuming you're using the csv module, in which case load the file using the DictReader class and then just access each row using the 'Name' key:
def loadfile(myfile):
with open(myfile) as f:
reader = csv.DictReader(f)
return list(reader)
def getname(rows):
## This is the same return as above, and again you can just
## return rows[0]['Name'] if you know you only need the first one
return [row['Name'] for row in rows]

In Python 3 you can do this
_, [x, _, _, _, _], *_ = ls
Now x will be karan.

Best way to parse a file with columns that randomly change order before importing it into SQL Server 2008?

I have a file that has columns that look like this:
Column1,Column2,Column3,Column4,Column5,Column6
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
1,2,3,4,5,6
Column1,Column3,Column2,Column6,Column5,Column4
1,3,2,6,5,4
1,3,2,6,5,4
1,3,2,6,5,4
Column2,Column3,Column4,Column5,Column6,Column1
2,3,4,5,6,1
2,3,4,5,6,1
2,3,4,5,6,1
The columns randomly re-order in the middle of the file, and the only way to know the order is to look at the last set of headers right before the data (Column1,Column2, etc.) (I've also simplified the data so that it's easier to picture. In real life, there is no way to tell data apart as they are all large integer values that could really go into any column)
Obviously this isn't very SQL Server friendly when it comes to using BULK INSERT, so I need to find a way to arrange all of the columns in a consistent order that matches my table's column order in my SQL database. What's the best way to do this? I've heard Python is the language to use, but I have never worked with it. Any suggestions/sample scripts in any language are appreciated.

A solution in python:
I would read line-by-line and look for headers. When I find a header, I use it to figure out the order (somehow). Then I pass that order to itemgetter which will do the magic of reordering elements:
from operator import itemgetter
def header_parse(line,order_dict):
header_info = line.split(',')
indices = [None] * len(header_info)
for i,col_name in enumerate(header_info):
indices[order_dict[col_name]] = i
return indices
def fix(fname,foutname):
with open(fname) as f,open(foutname,'w') as fout:
#Assume first line is a "header" and gives the order to use for the
#rest of the file
line = f.readline()
order_dict = dict((name,i) for i,name in enumerate(line.strip().split(',')))
reorder_magic = itemgetter(*header_parse(line.strip(),order_dict))
for line in f:
if line.startswith('Column'): #somehow determine if this is a "header"
reorder_magic = itemgetter(*header_parse(line.strip(),order_dict))
else:
fout.write(','.join(reorder_magic(line.strip().split(','))) + '\n')
if __name__ == '__main__':
import sys
fix(sys.argv[1],sys.argv[2])
Now you can call it as:
python fixscript.py badfile goodfile

Since you didn't mention a specific problem, I'm going to assume you're having problems coming up with an algorithm.
For each row,
Parse the row into fields.
If it's the first header line,
Output the header.
Create a map of field names to position.
%map = map { $fields[$_] => $_ } 0..$#fields;
Create a map of original positions to new positions.
#map = #map{ #fields };
If it's a header line other than the first,
Update map of original positions to new positions.
#map = #map{ #fields };
If it's not a header line,
Reorder fields.
#fields[ #map ] = #fields;
Output the row.
(Snippets are in Perl.)

This can be fixed easily in two steps:
split file into multiple files when a new header starts
read each file using csv dict reader, sort the keys and re-output rows in correct order
Here is an example how you can ho about it,
def is_header(line):
return line.find('Column') >= 0
def process(lines):
headers = None
for line in lines:
line = line.strip()
if is_header(line):
headers = list(enumerate(line.split(",")))
headers_map = dict(headers)
headers.sort(key=lambda (i,v):headers_map[i])
print ",".join([h for i,h in headers])
continue
values = list(enumerate(line.split(",")))
values.sort(key=lambda (i,v):headers_map[i])
print ",".join([v for i,v in values])
if __name__ == "__main__":
import sys
process(open(sys.argv[1]))
You can also change function is_header to correctly identify header in real cases

Help with Excel, Python and XLRD

Relatively new to programming hence why I've chosen to use python to learn.
At the moment I'm attempting to read a list of Usernames, passwords from an Excel Spreadsheet with XLRD and use them to login to something. Then back out and go to the next line. Log in etc and keep going.
Here is a snippit of the code:
import xlrd
wb = xlrd.open_workbook('test_spreadsheet.xls')
# Load XLRD Excel Reader
sheetname = wb.sheet_names() #Read for XCL Sheet names
sh1 = wb.sheet_by_index(0) #Login
def readRows():
for rownum in range(sh1.nrows):
rows = sh1.row_values(rownum)
userNm = rows[4]
Password = rows[5]
supID = rows[6]
print userNm, Password, supID
print readRows()
I've gotten the variables out and it reads all of them in one shot, here is where my lack of programming skills come in to play. I know I need to iterate through these and do something with them but Im kind of lost on what is the best practice. Any insight would be great.
Thank you again

couple of pointers:
i'd suggest you not print your function with no return value, instead just call it, or return something to print.
def readRows():
for rownum in range(sh1.nrows):
rows = sh1.row_values(rownum)
userNm = rows[4]
Password = rows[5]
supID = rows[6]
print userNm, Password, supID
readRows()
or looking at the docs you can take a slice from the row_values:
row_values(rowx, start_colx=0,
end_colx=None) [#]
Returns a slice of the values of the cells in the given row.
because you just want rows with index 4 - 6:
def readRows():
# using list comprehension
return [ sh1.row_values(idx, 4, 6) for idx in range(sh1.nrows) ]
print readRows()
using the second method you get a list return value from your function, you can use this function to set a variable with all of your data you read from the excel file. The list is actually a list of lists containing your row values.
L1 = readRows()
for row in L1:
print row[0], row[1], row[2]
After you have your data, you are able to manipulate it by iterating through the list, much like for the print example above.
def login(name, password, id):
# do stuff with name password and id passed into method
...
for row in L1:
login(row)
you may also want to look into different data structures for storing your data. If you need to find a user by name using a dictionary is probably your best bet:
def readRows():
rows = [ sh1.row_values(idx, 4, 6) for idx in range(sh1.nrows) ]
# using list comprehension
return dict([ [row[4], (row[5], row[6])] for row in rows ])
D1 = readRows()
print D['Bob']
('sdfadfadf',23)
import pprint
pprint.pprint(D1)
{'Bob': ('sdafdfadf',23),
'Cat': ('asdfa',24),
'Dog': ('fadfasdf',24)}
one thing to note is that dictionary values returned arbitrarily ordered in python.

I'm not sure if you are intent on using xlrd, but you may want to check out PyWorkbooks (note, I am the writter of PyWorkbooks :D)
from PyWorkbooks.ExWorkbook import ExWorkbook
B = ExWorkbook()
B.change_sheet(0)
# Note: it might be B[:1000, 3:6]. I can't remember if xlrd uses pythonic addressing (0 is first row)
data = B[:1000,4:7] # gets a generator, the '1000' is arbitrarily large.
def readRows()
while True:
try:
userNm, Password, supID = data.next() # you could also do data[0]
print userNm, Password, supID
if usrNm == None: break # when there is no more data it stops
except IndexError:
print 'list too long'
readRows()
You will find that this is significantly faster (and easier I hope) than anything you would have done. Your method will get an entire row, which could be a thousand elements long. I have written this to retrieve data as fast as possible (and included support for such things as numpy).
In your case, speed probably isn't as important. But in the future, it might be :D
Check it out. Documentation is available with the program for newbie users.
http://sourceforge.net/projects/pyworkbooks/

Seems to be good. With one remark: you should replace "rows" by "cells" because you actually read values from cells in every single row

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Alternative to "for" LINQ equivalent of Where? - python

I think you can do it using list comprehension filtered_data = [row for row in row_data if str(row.split(',')[0]) == price_number]

Related

Need to create multiple variables based on number of columns in csv file python

2 questions on Python : created table & fail to find duplicates in rows

Python - Get item from a list under a list

Best way to parse a file with columns that randomly change order before importing it into SQL Server 2008?

Help with Excel, Python and XLRD

Categories

Resources