I have a .csv file with about 1000 rows which looks like:
id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,cbegwell0#google.ca,Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,ncheatle1#msn.com,Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,lhamlen2#dot.gov,Female,52.165.62.174,24/04/1990
4,Gawen,Gillfillan,ggillfillan3#hp.com,Male,83.249.190.232,31/10/1984
5,Syd,Gilfether,sgilfether4#china.com.cn,Male,180.153.199.106,11/07/1995
What I have for code so far will ask for input, then go over each row and print the row if it contains the input. Looks like so:
import csv
# Asks for search criteria from user
search = input("Enter search criteria:\n")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
if search in row:
print(row)
What I want for end result, and what I'm stuck on, is to be able to enter more that one search criteria seperated by a "," and it will search and print those rows. Kind of like a way to filter the list.
for expample if there was multiple "David" that are "Male" in the file. I could enter : David, Male
It would then print all the rows that match but ignore those with a "David" thats is "Female".
You can split the input on the comma then check to make sure each field from the input is present on a given line using all() and list comprehensions.
This example uses a simplistic splitting of the input, and doesn't care which field each input matches. If you want to only match to specific columns, look into using csv.DictReader instead of csv.reader.
import csv
# Asks for search criteria from user
search_parts = input("Enter search criteria:\n").split(",")
# Opens csv data file
file = csv.reader(open("MOCK_DATA.csv"))
# Go over each row and print it if it contains user input.
for row in file:
if all([x in row for x in search_parts]):
print(row)
If you are happy to use a 3rd party library, this is possible with pandas.
I have modified your data slightly to demonstrate a simple query.
import pandas as pd
from io import StringIO
mystr = StringIO("""id,first_name,last_name,email,gender,ip_address,birthday
1,Ced,Begwell,cbegwell0#google.ca,Male,134.107.135.233,17/10/1978
2,Nataline,Cheatle,ncheatle1#msn.com,Female,189.106.181.194,26/06/1989
3,Laverna,Hamlen,lhamlen2#dot.gov,Female,52.165.62.174,24/04/1990
4,David,Gillfillan,ggillfillan3#hp.com,Male,83.249.190.232,31/10/1984
5,David,Gilfether,sgilfether4#china.com.cn,Male,180.153.199.106,11/07/1995""")
# replace mystr with 'file.csv'
df = pd.read_csv(mystr)
# retrieve user inputs
first_name = input('Input a first name\n:')
gender = input('Input a gender, Male or Female\n:')
# calculate Boolean mask
mask = (df['first_name'] == first_name) & (df['gender'] == gender)
# apply mask to result
res = df[mask]
print(res)
# id first_name last_name email gender \
# 3 4 David Gillfillan ggillfillan3#hp.com Male
# 4 5 David Gilfether sgilfether4#china.com.cn Male
# ip_address birthday
# 3 83.249.190.232 31/10/1984
# 4 180.153.199.106 11/07/1995
While you could just check if the strings "David" and "Male" exist in a row, it would not be very precise should you need to check column values. Instead, read in the data via csv and create a list of namedtuple objects that store the search value and header name:
from collections import namedtuple
import csv
data = list(csv.reader(open('filename.csv')))
search = namedtuple('search', 'value,header')
searches = [search(i, data[0].index(b)) for i, b in zip(input().split(', '), ['first_name', 'gender'])]
final_results = [i for i in data if all(c.value == i[c.header] for c in searches)]
Related
I have a python code where I am trying to read a csv file using pandas.
Once I have read the csv, the user is asked for input to enter the first name and I filter the records and show only the matching records.
Code :
# importing the module
import pandas as pd
# read specific columns of csv file using Pandas
df = pd.read_csv(r'D:\Extension\firefox_extension\claimant_record.csv', usecols = ['ClaimantFirstName','ClaimantLastName','Occupation'])
print(df)
required_df_firstName = input("Enter First Name of Claimant: ")
select_df = df[df['ClaimantFirstName'] == required_df_firstName]
print(select_df)
Issue :
There are multiple records matching with first name so I want the user to be able to narrow down the result using first and last name and I am not able to properly write the AND condition.
What I have tried so far:
select_df = df[df['ClaimantFirstName'] == required_df_firstName]and df[df['ClaimantLastName'] == required_df_lastName]
You can use boolean indexing to check more than one condition at once.
Here's how it would work:
import pandas as pd
# add sample data
data = {'ClaimantFirstName': ["Jane", "John"], "ClaimantLastName": ["Doe", "Doherty"]}
df = pd.DataFrame(data)
# filter by name
required_df_firstName, required_df_lastName = "John", "Doherty"
select_df = df.loc[(df['ClaimantFirstName'] == required_df_firstName) & (df['ClaimantLastName'] == required_df_lastName), ['ClaimantFirstName', 'ClaimantLastName']]
Check isin
#required_df_firstName = input("Enter First Name of Claimant: ")
#notice the input would be list type , like ['a','b']
select_df = df[df['ClaimantFirstName'].isin(required_df_firstName)]
So it is involves a previous question I posted, I gotten a lot good answers. But for this scenario, I want to enter more than one input at the same at the prompt, and search through a list of csv files.
For example:
Enter your strings:
11234567, 1234568. 1234569 etc.
(I want to set the parameter to be from 1 to 20)
And as for files input, is there a way to search for entire folder with files extensions with CSV, instead of hardcoding the names of csv files inside my code? So this way I don't have to keep adding names of CSV files in my code let's say if want to search through like 50 files. Is there a script like feature in Python to do this?
FYI, each input value I enter is distinct, so it cannot exist in 2 or more csv files at the same time.
Code:
import csv
files_to_open = ["C:/Users/CSV/salesreport1.csv","C:/Users/CSV//salesreport2.csv"]
data=[]
##iterate through list of files and add body to list
for file in files_to_open:
csvfile = open(file,"r")
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
keys_dict = {}
column = int(input("Enter the column you want to search: "))
val = input("Enter the value you want to search for in this column: ")
for row in data:
v = row[column]
##adds the row to the list with the right key
keys_dict[v] = row
if val in keys_dict:
print(keys_dict[val])
else:
print("Nothing was found at this column and key!")
Also one last thing, how do I show the name of the csv file as a result too?
Enter the column to search: 0
Enter values to search (separated by comma): 107561898, 107607997
['107561898', ......]
Process finished with exit code 0
107561898 is the from column 0 from file 1, and 107607997 is the second value of that is stored in file 2(column 0)
as you can see, the result is only returning file that contains first input, where I want both input to be returned, so two record
column 0 is where all the input values(card numbers are)
Seeing as you want to check a large number of files, here's an example of a very simple approach that checks all the CSVs in the same folder as this script.
This script allows you to specify the column and multiple values to search for.
Sample input:
Enter the column to search: 2
Enter values to search (separated by comma): leet,557,hello
This will search in the 3rd column for the worlds "leet", "hello" and the number 557.
Note that columns start counting at 0, and there should be no extra spaces unless the keyword itself has a space char.
import csv
import os
# this gets all the filenames of files that end with .csv in the specified directory
# this is where the file finding happens, to answer your comment #1
path_to_dir = 'C:\\Users\\Public\\Desktop\\test\\CSV'
csv_files = [x for x in os.listdir(path_to_dir) if x.endswith('.csv')]
# get row number and cast it to an integer type
# will throw error if you don't type a number
column_number = int(input("Enter the column to search: "))
# get values list
values = input("Enter values to search (separated by comma): ").split(',')
# loops over every filename
for filename in csv_files:
# opens the file
with open(path_to_dir + '\\' + filename) as file:
reader = csv.reader(file)
# loops for every row in the file
for row in reader:
# checks if the entry at this row, in the specified column is one of the values we want
if row[column_number] in values:
# prints the row
print(f'{filename}: {row}')
I'm having difficulty completing a coding challenge and would like some help if possible.
I have a CSV file with 1000 forenames, surnames, emails, genders and dobs (date of births) and am trying to separate each column.
My first challenge was to print the first 10 rows of the file, I did so using;
counter = 0
print("FORENAME SURNAME EMAIL GENDER DATEOFBIRTH","\n")
csvfile = open ("1000people.csv").read().splitlines()
for line in csvfile:
print(line+"\n")
counter = counter + 1
if counter >= 10:
break
This works and prints the 10 first rows as intended. The second challenge is to do the same, but in alphabetical order and only the names. I can only manage to print the first 10 rows alphabetically using:
counter = 0
print("FORENAME SURNAME EMAIL GENDER DATEOFBIRTH","\n")
csvfile = open ("1000people.csv").read().splitlines()
for line in sorted(csvfile):
print(line+"\n")
counter = counter + 1
if counter >= 10:
break
Outcome of above code:
>>>
FORENAME SURNAME EMAIL GENDER DATEOFBIRTH
Abba,Ebbings,aebbings7z#diigo.com,Male,23/06/1993
Abby,Powney,apowneynj#walmart.com,Female,01/03/1998
Abbye,Cardus,acardusji#ft.com,Female,30/10/1960
Abeu,Chaize,achaizehi#apple.com,Male,25/06/1994
Abrahan,Shuard,ashuardb5#zdnet.com,Male,09/12/1995
Adah,Lambkin,alambkinga#skyrock.com,Female,21/08/1992
Addison,Shiers,ashiersmg#freewebs.com,Male,13/07/1981
Adelaida,Sheed,asheedqh#elpais.com,Female,06/08/1976
Adelbert,Jurkowski,ajurkowski66#amazonaws.com,Male,27/10/1957
Adelice,Van Arsdall,avanarsdallqp#pagesperso-orange.fr,Female,30/06/1979
So would there be a way to separate the columns so I can print just one specific column when chosen?
Thank you for reading and replying if you do.
the csv module helps to split the columns. A pythonic way to achieve what you want would be:
import csv
with open("1000people.csv") as f:
cr = csv.reader(f) # default sep is comma, no need to change it
first_10_rows = [next(cr,[]) for _ in range(10)]
the next(cr,[]) instruction consumes one row in a list comprehension, and if the file is smaller than 10 rows, you'll get an empty row instead of an exception (that's the purpose of the second argument)
now first_10_rows is a list of lists containing your values. Quotes & commas are properly stripped off thanks to the csv module.
I 'm new in SO, new at programming and even more with python haha,
I'm trying to read CSV files (which will contain different data types) and store specific values ("coordinates") as variables.
CSV file example (sorry for using code format, text didn't want to stay quiet):
$id,name,last_name,age,phone_number,addrstr,addrnum
1,Constance,Harm,37,555-1234,Ocean_view,1
2,Homer,Simpson,40,555-1235,Evergreen_Terrace,742
3,John,Doe,35,555-1236,Fake_Street,123
4,Moe,Tavern,20,7648-4377,Walnut_Street,126
I want to know if there is some easy way to store a specific value using the rows as index, for example: "take row 2 and store 2nd value in variable Name, 3rd value in variable Lastname" and the "row" for each storage will vary.
Not sure if this will help because my coding level is very crappy:
row = #this value will be taken from ANOTHER csv file
people = open('people.csv', 'r')
linepeople = csv.reader(people)
data = list(linepeople)
name = int(data[**row**][1])
lastname = int(data[**row**][2])
age = int(data[**row**][3])
phone = int(data[**row**][4])
addrstr = int(data[**row**][5])
addrnum = int(data[**row**][6])
I haven't found nothing very similar to guide me into a solution. (I have been reading about dictionaries, maybe that will help me?)
EDIT (please let me know if its not allowed to edit questions): Thanks for the solutions, I'm starting to understand the possibilities but let me give more info about my expected output:
I'm trying to create an "universal" function to get only one value at given row/col and to store that single value into a variable, not the whole row nor the whole column.
Example: Need to store the phone number of John Doe (column 5, row 4) into a variable so that when printing that variable the output will be: 555-1236
You can iterate line by line. Watch out for your example code, you are trying to cast names of people into integers...
for row in linepeople:
name=row['name']
age = int(row['age'])
If you are going to do more complicated stuff, I recommend pandas. For starters it will try to convert numerical columns to float, and you can access them with attribute notation.
import pandas as pd
import numpy as np
people = pd.read_table('people.csv', sep=',')
people.name # all the names
people.loc[0:2] # first two rows
You can use the CSV DictReader which will automatically assign dictionary names based on your CSV column names on a per row basis as follows:
import csv
with open("input.csv", "r") as f_input:
csv_input = csv.DictReader(f_input)
for row in csv_input:
id = row['$id']
name = row['name']
last_name = row['last_name']
age = row['age']
phone_number = row['phone_number']
addrstr = row['addrstr']
addrnum = row['addrnum']
print(id, name, last_name, age, phone_number, addrstr, addrnum)
This would print out your CSV entries as follows:
1 Constance Harm 37 555-1234 Ocean_view 1
2 Homer Simpson 40 555-1235 Evergreen_Terrace 742
3 John Doe 35 555-1236 Fake_Street 123
4 Moe Tavern 20 7648-4377 Walnut_Street 126
If you wanted a list of just the names, you could build them as follows:
with open("input.csv", "r") as f_input:
csv_input = csv.DictReader(f_input)
names = []
for row in csv_input:
names.append(row['name'])
print(names)
Giving:
['Constance', 'Homer', 'John', 'Moe']
As the question has changed, a rather different approach would be needed. A simple get row/col type function would work but would be very inefficient. The file would need to be read in each time. A better approach would be to use a class. This would load the file in once and then you could get as many entries as you need. This can be done as follows:
import csv
class ContactDetails():
def __init__(self, filename):
with open(filename, "r") as f_input:
csv_input = csv.reader(f_input)
self.details = list(csv_input)
def get_col_row(self, col, row):
return self.details[row-1][col-1]
data = ContactDetails("input.csv")
phone_number = data.get_col_row(5, 4)
name = data.get_col_row(2,4)
last_name = data.get_col_row(3,4)
print "%s %s: %s" % (name, last_name, phone_number)
By using the class, the file is only read in once. This would print the following:
John Doe: 555-1236
Note, Python numbers indexes from 0, so your 5,4 has to be converted to 4,3 for Python.
My test1111.csv looks similar to this:
Sales #, Date, Tel Number, Comment
393ED3, 5/12/2010, 5555551212, left message
585E54, 6/15/2014, 5555551213, voice mail
585868, 8/16/2010, , number is 5555551214
I have the following code:
import re
import csv
from collections import defaultdict
# Below code places csv entries into dictionary so that they can be parsed
# by column. Then print statement prints Sales # column.
columns = defaultdict(list)
with open("c:\\test1111.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
# To print all columns, use: print columns
# To print a specific column, use: print(columns['ST'])
# Below line takes list output and separates into new lines
sales1 = "\n".join(columns['Sales #'])
print sales1
# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.
with open("c:\\test1111.csv", "r") as old, \
open("c:\\results1111.csv", 'wb') as new:
for line in old:
#Regex to match exactly 10 digits
match = re.search('(?<!\d)\d{10}(?!\d)', line)
if match:
match1 = match.group()
print match1
new.writelines((match1) + '\n')
else:
nomatch = "No match"
print nomatch
new.writelines((nomatch) + '\n')
The first section of the code opens the original csv and prints all entries from the Sales # column to stdout with each entry in its own row.
The second section of the code opens the original csv and searches every row for a 10 digit number. When it finds one it writes each one (or no match) to each row of a new csv.
What I would like to now do is to also write the sales column data to the new csv. So ultimately, the sales column data would appear as rows in the first column and the regex data would appear as rows in the second column in the new csv. I have been having trouble getting that to work as the new.writelines won't take two arguments. Can someone please help me with how to accomplish this?
I would like the results1111.csv to look like this:
393ED3, 5555551212
585E54, 5555551213
585868, 5555551214
Starting with the second part of your code, all you need to do is concatenate the sales data within your writelines:
sales_list = sales1.split('\n')
# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.
with open("c:\\test1111.csv", "r") as old, \
open("c:\\results1111.csv", 'wb') as new:
i = 0 # counter to add the proper sales figure
for line in old:
#Regex to match exactly 10 digits
match = re.search('(?<!\d)\d{10}(?!\d)', line)
if match:
match1 = match.group()
print match1
new.writelines(str(sales_list[i])+ ',' + (match1) + '\n')
else:
nomatch = "No match"
print nomatch
new.writelines(str(sales_list[i])+ ',' + (nomatch) + '\n')
i += 1
Using the counter i, you can keep track of what row you're on and use that to add the corresponding sales column figure.
Just to point out that in a CSV, unless the spaces are really needed, they shouldn't be there. Your data should look like this:
Sales #,Date,Tel Number,Comment
393ED3,5/12/2010,5555551212,left message
585E54,6/15/2014,5555551213,voice mail
585868,8/16/2010,,number is 5555551214
And, adding a new way of getting the same answer, you can use Pandas data analysis libraries for task involving data tables. It will only be 2 lines for what you want to achieve:
>>> import pandas as pd
# Read data
>>> data = pd.DataFrame.from_csv('/tmp/in.cvs')
>>> data
Date Tel Number Comment
Sales#
393ED3 5/12/2010 5555551212 left message
585E54 6/15/2014 5555551213 voice mail
585868 8/16/2010 NaN number is 5555551214
# Write data
>>> data.to_csv('/tmp/out.cvs', columns=['Tel Number'], na_rep='No match')
That last line will write to out.cvs the column Tel Number inserting No match when no telephone number is found, exactly what you want. Output file:
Sales#,Tel Number
393ED3,5555551212.0
585E54,5555551213.0
585868,No match