My test1111.csv looks similar to this:
Sales #, Date, Tel Number, Comment
393ED3, 5/12/2010, 5555551212, left message
585E54, 6/15/2014, 5555551213, voice mail
585868, 8/16/2010, , number is 5555551214
I have the following code:
import re
import csv
from collections import defaultdict
# Below code places csv entries into dictionary so that they can be parsed
# by column. Then print statement prints Sales # column.
columns = defaultdict(list)
with open("c:\\test1111.csv", "r") as f:
reader = csv.DictReader(f)
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
# To print all columns, use: print columns
# To print a specific column, use: print(columns['ST'])
# Below line takes list output and separates into new lines
sales1 = "\n".join(columns['Sales #'])
print sales1
# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.
with open("c:\\test1111.csv", "r") as old, \
open("c:\\results1111.csv", 'wb') as new:
for line in old:
#Regex to match exactly 10 digits
match = re.search('(?<!\d)\d{10}(?!\d)', line)
if match:
match1 = match.group()
print match1
new.writelines((match1) + '\n')
else:
nomatch = "No match"
print nomatch
new.writelines((nomatch) + '\n')
The first section of the code opens the original csv and prints all entries from the Sales # column to stdout with each entry in its own row.
The second section of the code opens the original csv and searches every row for a 10 digit number. When it finds one it writes each one (or no match) to each row of a new csv.
What I would like to now do is to also write the sales column data to the new csv. So ultimately, the sales column data would appear as rows in the first column and the regex data would appear as rows in the second column in the new csv. I have been having trouble getting that to work as the new.writelines won't take two arguments. Can someone please help me with how to accomplish this?
I would like the results1111.csv to look like this:
393ED3, 5555551212
585E54, 5555551213
585868, 5555551214
Starting with the second part of your code, all you need to do is concatenate the sales data within your writelines:
sales_list = sales1.split('\n')
# Below code searches all columns for a 10 digit number and outputs the
# results to a new csv file.
with open("c:\\test1111.csv", "r") as old, \
open("c:\\results1111.csv", 'wb') as new:
i = 0 # counter to add the proper sales figure
for line in old:
#Regex to match exactly 10 digits
match = re.search('(?<!\d)\d{10}(?!\d)', line)
if match:
match1 = match.group()
print match1
new.writelines(str(sales_list[i])+ ',' + (match1) + '\n')
else:
nomatch = "No match"
print nomatch
new.writelines(str(sales_list[i])+ ',' + (nomatch) + '\n')
i += 1
Using the counter i, you can keep track of what row you're on and use that to add the corresponding sales column figure.
Just to point out that in a CSV, unless the spaces are really needed, they shouldn't be there. Your data should look like this:
Sales #,Date,Tel Number,Comment
393ED3,5/12/2010,5555551212,left message
585E54,6/15/2014,5555551213,voice mail
585868,8/16/2010,,number is 5555551214
And, adding a new way of getting the same answer, you can use Pandas data analysis libraries for task involving data tables. It will only be 2 lines for what you want to achieve:
>>> import pandas as pd
# Read data
>>> data = pd.DataFrame.from_csv('/tmp/in.cvs')
>>> data
Date Tel Number Comment
Sales#
393ED3 5/12/2010 5555551212 left message
585E54 6/15/2014 5555551213 voice mail
585868 8/16/2010 NaN number is 5555551214
# Write data
>>> data.to_csv('/tmp/out.cvs', columns=['Tel Number'], na_rep='No match')
That last line will write to out.cvs the column Tel Number inserting No match when no telephone number is found, exactly what you want. Output file:
Sales#,Tel Number
393ED3,5555551212.0
585E54,5555551213.0
585868,No match
Related
So it is involves a previous question I posted, I gotten a lot good answers. But for this scenario, I want to enter more than one input at the same at the prompt, and search through a list of csv files.
For example:
Enter your strings:
11234567, 1234568. 1234569 etc.
(I want to set the parameter to be from 1 to 20)
And as for files input, is there a way to search for entire folder with files extensions with CSV, instead of hardcoding the names of csv files inside my code? So this way I don't have to keep adding names of CSV files in my code let's say if want to search through like 50 files. Is there a script like feature in Python to do this?
FYI, each input value I enter is distinct, so it cannot exist in 2 or more csv files at the same time.
Code:
import csv
files_to_open = ["C:/Users/CSV/salesreport1.csv","C:/Users/CSV//salesreport2.csv"]
data=[]
##iterate through list of files and add body to list
for file in files_to_open:
csvfile = open(file,"r")
reader = csv.reader(csvfile)
for row in reader:
data.append(row)
keys_dict = {}
column = int(input("Enter the column you want to search: "))
val = input("Enter the value you want to search for in this column: ")
for row in data:
v = row[column]
##adds the row to the list with the right key
keys_dict[v] = row
if val in keys_dict:
print(keys_dict[val])
else:
print("Nothing was found at this column and key!")
Also one last thing, how do I show the name of the csv file as a result too?
Enter the column to search: 0
Enter values to search (separated by comma): 107561898, 107607997
['107561898', ......]
Process finished with exit code 0
107561898 is the from column 0 from file 1, and 107607997 is the second value of that is stored in file 2(column 0)
as you can see, the result is only returning file that contains first input, where I want both input to be returned, so two record
column 0 is where all the input values(card numbers are)
Seeing as you want to check a large number of files, here's an example of a very simple approach that checks all the CSVs in the same folder as this script.
This script allows you to specify the column and multiple values to search for.
Sample input:
Enter the column to search: 2
Enter values to search (separated by comma): leet,557,hello
This will search in the 3rd column for the worlds "leet", "hello" and the number 557.
Note that columns start counting at 0, and there should be no extra spaces unless the keyword itself has a space char.
import csv
import os
# this gets all the filenames of files that end with .csv in the specified directory
# this is where the file finding happens, to answer your comment #1
path_to_dir = 'C:\\Users\\Public\\Desktop\\test\\CSV'
csv_files = [x for x in os.listdir(path_to_dir) if x.endswith('.csv')]
# get row number and cast it to an integer type
# will throw error if you don't type a number
column_number = int(input("Enter the column to search: "))
# get values list
values = input("Enter values to search (separated by comma): ").split(',')
# loops over every filename
for filename in csv_files:
# opens the file
with open(path_to_dir + '\\' + filename) as file:
reader = csv.reader(file)
# loops for every row in the file
for row in reader:
# checks if the entry at this row, in the specified column is one of the values we want
if row[column_number] in values:
# prints the row
print(f'{filename}: {row}')
I have a text file with information about 1000 student
So i need to save each student details in an excel sheet
Heres a sample of the data:
0000:
name=Jack
Age=16
Grade=90
0001:
name=Max
Age=18
Grade=85
0002:
name=Kayle
Age=17
Grade=92
I want to have a result like this:
It's quite easy using pandas and a dict:
with open('file.txt', 'r') as f:
lines = f.readlines()
students = []
student = {}
for line in lines:
if ':' in line:
student['id'] = line.split(':')[0]
elif 'name' in line:
student['Name'] = line.split('=')[1].replace('\n','')
elif 'Age' in line:
student['Age'] = line.split('=')[1].replace('\n','')
elif 'Grade' in line:
student['Grade'] = line.split('=')[1].replace('\n','')
students.append(student)
print(student)
student = {}
import pandas as pd
df = pd.DataFrame(students)
df.to_excel("output.xlsx")
print(df)
I always use Word for such a job. With Replace, search for Paragraph Marks and replace them with a Tab-character.
E.g. replace :[paragraph mark][space][space][space][space]name= with a [tab character]. With that you get rid of all the rubbish and you end up with 0000[tab character]Jack.
When you're done with all lines of tab separated data, select all the lines with data (make sure not to select empty lines without the three tab-characters, otherwise it won't work) and click on Insert -> Table -> Insert Table... Now the data is converted into a Word table. Just copy the table to Excel and you're done.
I'm learning python and have a data set (csv file) I've been able to split the lines by comma but now I need to find the max and min value in the third column and output the corresponding value in the first column in the same row.
This is the .csv file: https://www.dropbox.com/s/fj8tanwy1lr24yk/loan.csv?dl=0
I also can't use Pandas or any external libraries; I think it would have been easier if I used them
I have written this code so far:
f = open("loanData.csv", "r")
mylist = []
for line in f:
mylist.append(line)
newdata = []
for row in mylist:
data = row.split(",")
newdata.append(data)
I'd use the built-in csv library for parsing your CSV file, and then just generate a list with the 3rd column values in it:
import csv
with open("loanData.csv", "r") as loanCsv:
loanCsvReader = csv.reader(loanCsv)
# Comment out if no headers
next(loanCsvReader, None)
loan_data = [ row[2] for row in loanCsvReader]
max_val = max(loan_data)
min_val = min(loan_data)
print("Max: {}".format(max_val))
print("Max: {}".format(min_val))
Don't know if the details of your file, whether it has a headers or not but you can comment out
next(loanCsvReader, None)
if you don't have any headers present
Something like this might work. The index would start at zero, so the third column should be 2.
min = min([row.split(',')[2] for row in mylist])
max = max([row.split(',')[2] for row in mylist])
Separately, you could probably read and reformat your data to a list with the following:
with open('loanData.csv', 'r') as f:
data = f.read()
mylist = list(data.split('\n'))
This assumes that the end of each row of data is newline (\n) delimited (Windows), but that might be different depending on the OS you're using.
I have a CSV file I am trying to run through all rows and pull out a string between two strings using Python. I am new to python. I would then like to return the String found in a new column along with all other columns and rows. SAMPLE of How the CSV looks. I am trying to get everything between /**ZTAG & ZTAG**\
Number Assignment_Group Notes
123456 Team One "2019-06-10 16:24:36 - (Work notes)
05924267 /**ZTAG-DEVICE-HW-APPLICATION-WONT-LAUNCH-STUFF-
SENT-REPAIR-FORCE-REPROCESSED-APPLICATION-ZTAG**\
2019-05-24 16:44:48 - (Work notes)
Attachment:snippet.PNG sys_attachment
sys_id:b08bf432db69ff083bfe3a10ad961961
I have been reading about this for a two days. I know I am missing
something
easy. I have looked at splitting the file in multiple ways.
import csv
import pandas
import re
f = open('test.csv')
csv_f = csv.reader(f)
#match = re.search("/**\ZTAG (.+?) ZTAG**\\", csv_f,flags=re.IGNORECASE)
for row in csv_f:
#print(re.split('/**ZTAG| ',csv_f))
#x = csv_f.split('/**ZTAG')
match = re.search("/**\ZTAG (.+?) ZTAG**\\", csv_f,flags=re.IGNORECASE)
print (row[0])
f.close()
I would need to have all columns and rows return with new column
containing string. EXAMPLE Below
Number, Assignment_group, Notes, TAG
123456, Team One, All stuff, ZTAG-DEVICE-HW-APPLICATION-WONT-
LAUNCH-STUFF-SENT-REPAIR-FORCE-REPROCESSED-APPLICATION-
I believe this regular expression should work:
result = re.search("\/\**ZTAG(.*)ZTAG\**", text)
extracted_text = result.group(1)
this will give you the string
-DEVICE-HW-APPLICATION-WONT-LAUNCH-STUFF- SENT-REPAIR-FORCE-REPROCESSED-APPLICATION-
you can do this for each row in your for loop if necessary
I'm having difficulty completing a coding challenge and would like some help if possible.
I have a CSV file with 1000 forenames, surnames, emails, genders and dobs (date of births) and am trying to separate each column.
My first challenge was to print the first 10 rows of the file, I did so using;
counter = 0
print("FORENAME SURNAME EMAIL GENDER DATEOFBIRTH","\n")
csvfile = open ("1000people.csv").read().splitlines()
for line in csvfile:
print(line+"\n")
counter = counter + 1
if counter >= 10:
break
This works and prints the 10 first rows as intended. The second challenge is to do the same, but in alphabetical order and only the names. I can only manage to print the first 10 rows alphabetically using:
counter = 0
print("FORENAME SURNAME EMAIL GENDER DATEOFBIRTH","\n")
csvfile = open ("1000people.csv").read().splitlines()
for line in sorted(csvfile):
print(line+"\n")
counter = counter + 1
if counter >= 10:
break
Outcome of above code:
>>>
FORENAME SURNAME EMAIL GENDER DATEOFBIRTH
Abba,Ebbings,aebbings7z#diigo.com,Male,23/06/1993
Abby,Powney,apowneynj#walmart.com,Female,01/03/1998
Abbye,Cardus,acardusji#ft.com,Female,30/10/1960
Abeu,Chaize,achaizehi#apple.com,Male,25/06/1994
Abrahan,Shuard,ashuardb5#zdnet.com,Male,09/12/1995
Adah,Lambkin,alambkinga#skyrock.com,Female,21/08/1992
Addison,Shiers,ashiersmg#freewebs.com,Male,13/07/1981
Adelaida,Sheed,asheedqh#elpais.com,Female,06/08/1976
Adelbert,Jurkowski,ajurkowski66#amazonaws.com,Male,27/10/1957
Adelice,Van Arsdall,avanarsdallqp#pagesperso-orange.fr,Female,30/06/1979
So would there be a way to separate the columns so I can print just one specific column when chosen?
Thank you for reading and replying if you do.
the csv module helps to split the columns. A pythonic way to achieve what you want would be:
import csv
with open("1000people.csv") as f:
cr = csv.reader(f) # default sep is comma, no need to change it
first_10_rows = [next(cr,[]) for _ in range(10)]
the next(cr,[]) instruction consumes one row in a list comprehension, and if the file is smaller than 10 rows, you'll get an empty row instead of an exception (that's the purpose of the second argument)
now first_10_rows is a list of lists containing your values. Quotes & commas are properly stripped off thanks to the csv module.