using Python to search a csv file and extract needed information

using Python to search a csv file and extract needed information - python

I have a large PC Inventory in csv file format. I would like to write a code that will help me find needed information. Specifically, I would like to type in the name or a part of the name of a user(user names are located in the 5th column of the file) and for the code to give me the name of that computer(computer names are located in second column in the file). My code doesn't work and I don't know what is the problem. Thank you for your help, I appreciate it!
import csv #import csv library
#open PC Inventory file
info = csv.reader(open('Creedmoor PC Inventory.csv', 'rb'), delimiter=',')
key_index = 4 # Names are in column 5 (array index is 4)
user = raw_input("Please enter employee's name:")
rows = enumerate(info)
for row in rows:
if row == user: #name is in the PC Inventory
print row #show the computer name

You've got three problems here.
First, since rows = enumerate(info), each row in rows is going to be a tuple of the row number and the actual row.
Second, the actual row itself is a sequence of columns.
So, if you want to compare user to the fifth column of an (index, row) tuple, you need to do this:
if row[1][key_index] == user:
Or, more clearly:
for index, row in rows:
if row[key_index] == user:
print row[1]
Or, if you don't actually have any need for the row number, just don't use enumerate:
for row in info:
if row[key_index] == user:
print row[1]
But that just gets you to your third problem: You want to be able to search for the name or a part of the name. So, you need the in operator:
for row in info:
if user in row[key_index]:
print row[1]
It would be clearer to read the whole thing into a searchable data structure:
inventory = { row[key_index]: row for row in info }
Then you don't need a for loop to search for the user; you can just do this:
print inventory[user][1]
Unfortunately, however, that won't work for doing substring searches. You need a more complex data structure. A trie, or any sorted/bisectable structure, would work if you only need prefix searches; if you need arbitrary substring searches, you need something fancier, and that's probably not worth doing.
You could consider using a database for that. For example, with a SQL database (like sqlite3), you can do this:
cur = db.execute('SELECT Computer FROM Inventory WHERE Name LIKE %s', name)
Importing a CSV file and writing a database isn't too hard, and if you're going to be running a whole lot of searches against a single CSV file it might be worth it. (Also, if you're currently editing the file by opening the CSV in Excel or LibreOffice, modifying it, and re-exporting it, you can instead just attach an Excel/LO spreadsheet to the database for editing.) Otherwise, it will just make things more complicated for no reason.

enumerate returns an iterator of index, element pairs. You don't really need it. Also, you forgot to use key_index:
for row in info:
if row[key_index] == user:
print row

it's hard to tell what's wrong without knowing how your file looks like, but I'm pretty sure the error is:
for row in info:
if row[key_Index] == user: #name is in the PC Inventory
print row #show the computer name
where you did define the column, but forget to get that column from each line you're comparing to the user, so in the end you're comparing a string with a list.
And you don't need the enumerate, per default you iterate over the rows.

Related

PostgreSQL and python 3.7 string search

I am a noob at this and appreciate all the help I can get.
Here goes:
I have a postgreSQL database that I would like to pull information out of and display the output.
I am using python 3.7 to do this.
I have connected to the database and can pull all the records and dump them on the screen.
When I try to do some logic checking, I run into problems.
Here is what I am attempting to do:
The database has two columns:
First Name and Last Name
I wanted to do a logic check, If your first name is John, print out the First and Last names.
For everyone else that is NOT named John, just print out the last name.
for row in test_database_1:
if (row[0]=='John'):
print ('Type:', row[0],':', row[1])
else:
print ('Type2:', row[0])
In the above statement, it completely skips the first print statement and just goes into the second one.
Let me know if you require additional clarification.
Thank you.

I have done some debugging and saw something odd.
When the string in row[0] is returned, it is returned as 'John |with 16 spaces|'.
Because the field in the database is set to 20, row[0] returns the name + 16 blank spaces.
This, was quite an odd find.
Regardless. Once I figured out what the heck was happening with the string being returned from the database, I was able to resolve the issue in the following way:
for row in test_database_1:
if (row[0].strip()=='John'):
print ('Type:', row[0],':', row[1])
else:
print ('Type2:', row[0])

Text from file, DB_query into a tuple or list, and get the difference between the two

I basically have this test2 file that has domain information that I want to use so I strip the additional stuff and get just the domain names as new_list.
What I want to then do is query a database with these domain names, pull the name and severity score and then (the part I'm really having a hard time with) getting a stored list (or tuple) that I can use that consists of the pulled domains and severity score.
It is a psql database for reference but my problem lies more on the managing after the query.
I'm still really new to Python and mainly did a bit of Java so my code probably looks terrible, but I've tried converting to strings and tried appending to a list at the end but I am quite unsuccessful with most of it.
def get_new():
data = []
with open('test2.txt', 'r') as file:
data = [line.rstrip('\n') for line in open('test2.txt')]
return data
new_list = get_new()
def db_query():
cur = connect.cursor()
query ="SELECT name, de.severity_score FROM domains d JOIN ips i ON i.domain_id = d.id JOIN domains_extended de ON de.domain_id = d.id WHERE name = '"
for x in new_list:
var = query + x + "'"
cur.execute(var)
get = cur.fetchall()
# STORE THE LOOPED QUERIES INTO A VARIABLE OF SOME KIND (problem area)
print(results)
cur.close()
connect.close()
db_query()
Happy place: Takes domain names from file, uses those domain names a part of the query parameters to get severity score associated, then stores it into a variable of some sort so that I can use those values later (in a loop or some logic).
I've tried everything I could think of and ran into errors with it being a query that I'm trying to store, lists won't combine, etc.

I would make sure that your get_new() function is returning what yous expect from that file. Just iterate on your new_data list.
There is no reference to results in your db_query() function (perhaps it is global like new_data[]) but try printing the result of the query, that is, print(get) in your for loop and see what comes out. If this works then you can create a list to append to.

Well first off in your code you are resetting your get variable in each loop. So after fixing that by initializing get = [] above your loop then adding get.extend(cur.fetchall()) into your loop instead of the current statement. You could then do something like domainNames = [row[0] for row in get] . If get is loading properly though getting the values out of it should be no problem.

SQLite3 + Python CSV DictReader: Best Method to handle empty values

Still new to Python and I ran into an issue earlier this month where String '0' was being passed into my Integer Column (using a SQLite db). More information from my original thread:
SQL: Can WHERE Statement filter out specific groups for GROUP BY Statement
This caused my SQL Query statements to return invalid data.
I'm having this same problem pop up in other columns in my database when the CSV file does not contain any value for the specific cell.
The source of my data is from an external csv file that I download (unicode format). I use the following code to insert my code into the DB:
with sqlite3.connect(db_filename) as conn:
dbcursor = conn.cursor()
with codecs.open(csv_filename, "r", "utf-8-sig") as f:
csv_reader = csv.DictReader(f, delimiter=',')
# This is a much smaller column example as the actual data has many columns.
csv_dict = [( i['col1'], i['col2'] ) for i in csv_reader)
dbcursor.executemany(sql_str, csv_dict)
From what I researched, by design, SQLite does not enforce column type when inserting values. My solution to my original problem was to do a manual check to see if it was an empty value and then make it an int 0 using this code:
def Check_Session_ID( sessionID ):
if sessionID == '':
sessionID = int(0)
return sessionID
Each integer / float column will need to be checked when I insert the values into the Database. Since there will be many rows on each import (100K +) x (50+ columns) I would imagine the imports to take quite a bit of time.
What are better ways to handle this problem instead of checking each value for each Int / Float column per row?
Thank you so much for the advice and guidance.

How to extract certain values from a table using the SearchCursor Tool

I konw that similar questions have been asked before but as I am really new to Python I wasnt able to apply other discussions and suggestions made there to my case!
I am trying to write a python script to extract certain values from a table: the table I am referring to is a big collection of nitrate values for different water depths, which are deposited in the columns of the table. As I only need the value of the surface and the deepest point, I want to search through the rows and extract the last value that is not 0. I have started writing a script using the SearchCursor Tool but get stuck at the point, where I want it to search for the first 0-value and then go back and print the value fro mthe column before... Does anyone have an idea how to solve that problem?
import arcpy
# Set the Workspace
arcpy.env.workspace = "D:\Teresa\Kerstin\SouthernOcean\03_workspace\Teresa"
# Make table
table = "C:/Users/theidema/Desktop/OxzUti_GridP_Annual.csv"
# Create the search cursor
cursor = arcpy.SearchCursor(Table)
# Iterate through the rows
row = cursor.next()
while row:
print (row.getValue(field))
row = cursor.next()

At a glance, I see three errors.
First, SearchCursor() parameter "Table" is capitalized and your variable "table" isn't. It is case-sensitive, so they'll need to match.
Secondly, you used "field" as a parameter in your print statement, but it isn't defined. Above the While loop, define the variable -- something like: field = "NameOfFieldYouWantToSearch"
Finally, also in your While loop, you don't want to redefine your variable "row". Just make it cursor.next(), removing "row = ". (Before the loop, row = cursor.next() is meant to be evaluated as True or False, keeping the while loop going so long as there is next row, then when there is no next row it's evaluated as False and the loop terminates.)
Oh yeah! I noticed your Workspace isn't a File Geodatabase. I'd work out of a GDB, if I were you. Especially since Feature Classes can't exist outside GDBs. If you work with Shapefiles, it might work though.
import arcpy
# Set the Workspace
arcpy.env.workspace = "D:\Teresa\Kerstin\SouthernOcean\03_workspace\Teresa"
#If "Teresa" is a GDB, append ".gdb"
# Define variables
table = "C:/Users/theidema/Desktop/OxzUti_GridP_Annual.csv"
field = "" #Add the Field Name that you want to Search
# Call your CSV in SearchCursor
cursor = arcpy.SearchCursor(table)
# Iterate through the rows
row = cursor.next()
while row:
print (row.getValue(field))
cursor.next()

Cannot get the entire column in a MSI table! Only get first Value of column

I need to get all the strings from the Target column of an MSI's shortcut table. I get the first value of the column, but I cannot get the rest. I used orca to make sure there were other values and the msi files each have two.
Here is my code to get it:
def verify(self):
self.db = msilib.OpenDatabase(str(self.msi_file),msilib.MSIDBOPEN_TRANSACT)
self.getColumnNames()
def getColumnNames(self):
view = self.db.OpenView("SELECT Target FROM Shortcut ")
view.Execute(None)
print view.GetColumnInfo(msilib.MSICOLINFO_NAMES)
record = view.Fetch()
print record.GetFieldCount()
self.value = record.GetString(1)
print record.GetString(1)
What do I have wrong with my code?

You need a while record is not null loop to process all your rows. See the following help topic for more information:
View.Fetch method

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

using Python to search a csv file and extract needed information - python

enumerate returns an iterator of index, element pairs. You don't really need it. Also, you forgot to use key_index: for row in info: if row[key_index] == user: print row

Related

PostgreSQL and python 3.7 string search

Text from file, DB_query into a tuple or list, and get the difference between the two

SQLite3 + Python CSV DictReader: Best Method to handle empty values

How to extract certain values from a table using the SearchCursor Tool

Cannot get the entire column in a MSI table! Only get first Value of column

Categories

Resources