Python row.get(regex)

Python row.get(regex) - python

I have been searching for this for a few hours now and I can't seem to find anything about it.
I am parsing a .csv file and I need to pull email addresses out of it. The headers indicate whether it is an email or not, but it is possible for the files to have different formats and I want to handle this with a regex, but I don't know how to do this.
A part of my code is below:
input_file = csv.DictReader(open("contacts.csv"))
for row in input_file:
if row.get('E-mail 1 - Value'):
print row.get('E-mail 1 - Value')
elif row.get('E-mail 2 - Value'):
print row.get('E-mail 2 - Value')
I would like to be able to do something like this:
input_file = csv.DictReader(open("contacts.csv"))
for row in input_file:
if row.get('E-mail*'):
print row.get('E-mail*')
Where it will grab anything that has a header starting with email, but I can't seem to figure out how. I tried to use re.search, but it wasn't what I needed since I don't know how to give it the input string. Thanks in advance for any help!
William

Here is an example that prints all the contents of the columns that start with "E-mail" for every row:
input_file = csv.DictReader(open("contacts.csv"))
for row in input_file:
for column in row.keys():
if column.startswith("E-mail"):
print row[column]

Related

Extract specific portion of a text file in python 3.x

How do I make the if statement to read from a specific location of a text file and stop at a specific point and then print it out. for example, printing out one patient's data, not all the list. beginner programmer here. thank you
ID = input("please enter a refernce id to search for the patient : ")
info = open("data.txt", 'r')
if ID in info:
# This should return only one patient's information not all the text file
else:
print("not in file")
info.close()

We would need to know the specific details of how the file is formatted to give an exact answer, but here is one way that may be helpful.
Firstly, your 'info' is right now just a TextIOWrapper object. You can tell by running print(type(info)). You need to make it info = open('data.txt', 'r').read() to give you a string of the text, or info = open('data.txt', 'r').readlines() to give you a list of the text by line, if the format is just plain text.
Assuming the data looks something like this:
Patient: Charlie
Age = 99
Description: blah blah blah
Patient: Judith
Age: 100
Description: blah blah blahs
You can do the following:
First, find and store the index of the ID you are looking for. Secondly, find and store the index of some string that denotes a new ID. In this case, that's the word 'Patient'. Lastly, return the string between those two indices.
Example:
ID = input("please enter a reference id to search for the patient: ")
info = open("data.txt", 'r').read()
if ID in info:
#find() returns the beginning index of a string
f = info.find(ID)
goods = info[f:]
l = goods.find('Patient')
goods = goods[:l]
print(goods)
else:
print("not in file")
Something along those lines should do the trick. There are probably better ways depending on the structure of the file. Things can go wrong if the user input is not specific enough, or the word patient is scattered in the descriptions, but the idea remains the same. You should do some error handling for the input, as well. I hope that helps! Good luck with your project.

Python: Writing peoples scores to individual lines

I have a task where I need to record peoples scores in a text file. My Idea was to set it out like this:
Jon: 4, 1, 3
Simon: 1, 3, 6
This has the name they inputted along with their 3 last scores (Only 3 should be recorded).
Now for my question; Can anyone point me in the right direction to do this? Im not asking for you to write my code for me, Im simply asking for some tips.
Thanks.
Edit: Im guessing it would look something like this: I dont know how I'd add scores after their first though like above.
def File():
score = str(Name) + ": " + str(correct)
File = open('Test.txt', 'w+')
File.write(score)
File.close()
Name = input("Name: ")
correct = input("Number: ")
File()

You could use pandas to_csv() function and store your data in a dictionary. It will be much easier than creating your own format.
from pandas import DataFrame, read_csv
import pandas as pd
def tfile(names):
df = DataFrame(data = names, columns = names.keys())
with open('directory','w') as f:
f.write(df.to_string(index=False, header=True))
names = {}
for i in xrange(num_people):
name = input('Name: ')
if name not in names:
names[name] = []
for j in xrange(3):
score = input('Score: ')
names[name].append(score)
tfile(names)
Simon Jon
1 4
3 1
6 3
This should meet your text requirement now. It converts it to a string and then writes the string to the .txt file. If you need to read it back in you can use pandas read_table(). Here's a link if you want to read about it.

Since you are not asking for the exact code, here is an idea and some pointers
Collect the last three scores per person in a list variable called last_three
do something like:
",".join(last_three) #this gives you the format 4,1,3 etc
write to file an entry such as
name + ":" + ",".join(last_three)
You'll need to do this for each "line" you process
I'd recommend using with clause to open the file in write mode and process your data (as opposed to just an "open" clause) since with handles try/except/finally problems of opening/closing file handles...So...
with open(my_file_path, "w") as f:
for x in my_formatted_data:
#assuming x is a list of two elements name and last_three elems (example: [Harry, [1,4,5]])
name, last_three = x
f.write(name + ":" + ",".join(last_three))
f.write("\n")# a new line
In this way you don't really need to open/close file as with clause takes care of it for you

How do I handle closing double quotes in CSV column with python?

This is the python script:
f = open('csvdata.csv','rb')
fo = open('out6.csv','wb')
for line in f:
bits = line.split(',')
bits[1] = '"input"'
fo.write( ','.join(bits) )
f.close()
fo.close()
I have a CSV file and I'm replacing the content of the 2nd column with the string "input". However, I need to grab some information from that column content first.
The content might look like this:
failurelog_wl","inputfile/source/XXXXXXXX"; "**X_CORD2**"; "Invoice_2M";
"**Y_CORD42**"; "SIZE_ID37""
It has weird type of data as you can see, especially that it has 2 double quotes at the end of the line instead of just one that you would expect.
I need to extract the XCORD and YCORD information, like XCORD = 2 and YCORD = 42, before replacing the column value. I then want to insert an extra column, named X_Y, which represents (2_42).
How can I modify my script to do that?

If I understand your question correctly, you can use a simple regular expression to pull out the numbers you want:
import re
f = open('csvdata.csv','rb')
fo = open('out6.csv','wb')
for line in f:
bits = line.split(',')
x_y_matches = re.match('.*X_CORD(\d+).*Y_CORD(\d+).*', bits[1])
assert x_y_matches is not None, 'Line had unexpected format: {0}'.format(bits[1])
x_y = '({0}_{1})'.format(x_y_matches.group(1), x_y_matches.group(2))
bits[1] = '"input"'
bits.append(x_y)
fo.write( ','.join(bits) )
f.close()
fo.close()
Note that this will only work if column 2 always says 'X_CORD' and 'Y_CORD' immediately before the numbers. If it is sometimes a slightly different format, you'll need to adjust the regular expression to allow for that. I added the assert to give a more useful error message if that happens.
You mentioned wanting the column to be named X_Y. Your script appears to assume that there is no header, and my modified version definitely makes this assumption. Again, you'd need to adjust for that if there is a header line.
And, yes, I agree with the other commenters that using the csv module would be cleaner, in general, for reading and writing csv files.

How to Store Rows from CSV File into Python and Print Data with HTML

Basically my problem is this: I have a CSV excel file with info on Southpark characters and I and I have an HTML template and what I have to do is take the data by rows (stored in lists) for each character and using the HTML template given implement that data to create 5 seperate HTML pages with the characters last names.
Here is an image of the CSV file: i.imgur.com/rcIPW.png
This is what I have so far:
askfile = raw_input("What is the filename?")
southpark = []
filename = open(askfile, 'rU')
for row in filename:
print row[0:105]
filename.close()
The above prints out all the info on the IDLE shell in five rows but I have to find a way to separate each row AND column and store it into a list (which I don't know how to do). It's pretty rudimentary code I know I'm trying to figure out a way to store the rows and columns first, then I will have to use a function (def) to first assign the data to the HTML template and then create an HTML file from that data/template..and I'm so far a noob I tried searching through the net but I just don't understand the stuff.
I am not allowed to use any downloadable modules but I can use things built in Python like import csv or whatnot, but really its supposed to be written with a couple functions, list, strings, and loops..
Once I figure out how to separate the rows and columns and store them then I can work on implementing into HTML template and creating the file.
I'm not trying to have my HW done for me it's just that I pretty much suck at programming so any help is appreciated!
BTW I am using Python 2.7.2 and if you want to DL the CSV file click here.
UPDATE:
Okay, thanks a lot! That helped me understand what each row was printing and what info is being read by the program. Now since I have to use functions in this program somehow this is what I was thinking.
Each row (0-6) prints out separate values, but just the print row function prints out one character and all his corresponding values which is what I need. What I want is to print out data like "print row" would but I have to store each of those 5 characters in a separate list.
Basically "print row" prints out all 5 characters with each of their corresponding attributes, how can I split each of them into 5 variables and store them as a list?
When I do print row[0] it only prints out the names, or print row1 only prints the DOB. I was thinking of creating a def function that takes only print "row" and splits into 5 variables in a loop and then another def function takes those variables/lists of data and combines them with the HTML template, and at the end I have to figure out how to create HTML files in Python..
Sorry if I sound confusing just trying to make sense of it all. This is my code right now it gives an error that there are too many values to unpack but I am just trying to fiddle around and try different things and see if they work. Based on what I wanted to do above I will probably have to delete all most of this code and find a way to rewrite it with list type functions like .append or .strip, etc which I am not very familiar with..
import csv
original = file('southpark.csv', 'rU')
reader = csv.reader(original)
# List of Data
name, dob, descript, phrase, personality, character, apparel = []
count = 0
def southparkinfo():
for row in reader:
count += 1
if count == 0:
row[0] = name
print row[0] # Name (ex. Stan Marsh)
print "----------------"
elif count == 1:
row[1] = dob
print row[1] # DOB
print "----------------"
elif count == 2:
row[2] = descript
print row[2] # Descriptive saying (ex. Respect My Authoritah!)
print "----------------"
elif count == 3:
row[3] = phrase
print row[3] # Catch Phrase (ex. Mooom!)
print "----------------"
elif count == 4:
row[4] = personality
print row[4] # Personality (ex. Jewish)
print "----------------"
elif count == 5:
row[5] = character
print row[5] # Characteristic (ex. Politically incorrect)
print "----------------"
elif count == 6:
row[6] = apparel
print row[6] # Apparel (ex. red gloves)
return
reader.close()

First and foremost, have a look at the CSV docs.
Once you understand the basics take a look at this code. This should get you started on the right path:
import csv
original = file('southpark.csv', 'rU')
reader = csv.reader(original)
for row in reader:
#will print each row by itself (all columns from names up to what they wear)
print row
print "-----------------"
#will print first column (character names only)
print row[0]
You want to import csv module so you can work with the CSV filetype. Open the file in universal newline mode and read it with csv.reader. Then you can use a for loop to begin iterating through the rows depending on what you want. The first print row will print a single line of all a single character's data (ie: everything from their name up to their clothing type) like so:
['Stan Marsh', 'DOB: October 19th', 'Dude!', 'Aww #$%^!', 'Star Quarterback', 'Wendy', 'red gloves']
-----------------
['Kyle Broflovski', 'DOB: May 26th', 'Kick the baby!', 'You ***!', 'Jewish', 'Canadian', 'Ushanka']
-----------------
['Eric Theodore Cartman', 'DOB: July 1', 'Respect My Authroitah!', 'Mooom!', 'Big-boned', 'Political
ly incorrect', 'Knit-cap!']
-----------------
['Kenny McCormick', 'DOB: March 22', 'DOD: Every other week', 'Mmff Mmff', 'MMMFFF!!!', 'Mysterion!'
, 'Orange Parka']
-----------------
['Leopold Butters Stotch', 'DOB:Younger than the others!', 'The 4th friend', 'Professor chaos', 'stu
tter', 'innocent', 'nerdy']
-----------------
Finally, the second statement print row[0] will provide you with the character names only. You can change the number and you'll be able to grab the other data as necessary. Remember, in a CSV file everything starts at 0, so in your case you can only go up to 6 because A=0, B=1, C=2, etc... To see these outputs more clearly, it's probably best if you comment out one of the print statements so you get a clearer picture of what you are grabbing.
-----------------
Stan Marsh
-----------------
Kyle Broflovski
-----------------
Eric Theodore Cartman
-----------------
Kenny McCormick
-----------------
Leopold Butters Stotch
Note I threw in that print "-----------------" so you would be able to see the different outputs.
Hope this helps you get you off to a start.
Edit To answer your second question: The easiest way (although probably not the best way) to grab all of a single character's info would be to do something like this:
import csv
original = file('southpark.csv', 'rU')
reader = csv.reader(original)
stan = reader.next()
kyle = reader.next()
eric = reader.next()
kenny = reader.next()
butters = reader.next()
print eric
which outputs:
['Eric Theodore Cartman', 'DOB: July 1', 'Respect My Authroitah!', 'Mooom!', 'Big-boned', 'Politically incorrect', 'Knit-cap!']
Take note that if your CSV is modified such that the order of the characters are moved (ex: butters is moved to top) you will output the info of another character.

A solution to remove the duplicates?

My code is below. Basically, I've got a CSV file and a text file "input.txt". I'm trying to create a Python application which will take the input from "input.txt" and search through the CSV file for a match and if a match is found, then it should return the first column of the CSV file.
import csv
csv_file = csv.reader(open('some_csv_file.csv', 'r'), delimiter = ",")
header = csv_file.next()
data = list(csv_file)
input_file = open("input.txt", "r")
lines = input_file.readlines()
for row in lines:
inputs = row.strip().split(" ")
for input in inputs:
input = input.lower()
for row in data:
if any(input in terms.lower() for terms in row):
print row[0]
Say my CSV file looks like this:
book title, author
The Rock, Herry Putter
Business Economics, Herry Putter
Yogurt, Daniel Putter
Short Story, Rick Pan
And say my input.txt looks like this:
Herry
Putter
Therefore when I run my program, it prints:
The Rock
Business Economics
The Rock
Business Economics
Yogurt
This is because it searches for all titles with "Herry" first, and then searches all over again for "Putter". So in the end, I have duplicates of the book titles. I'm trying to figure out a way to remove them...so if anyone can help, that would be greatly appreciated.

If original order does not matter, then stick the results into a set first, and then print them out at the end. But, your example is small enough where speed does not matter that much.

Stick the results in a set (which is like a list but only contains unique elements), and print at the end.
Something like;
if any(input in terms.lower() for terms in row):
if not row[0] in my_set:
my_set.add(row[0])

During the search stick results into a list, and only add new results to the list after first searching the list to see if the result is already there. Then after the search is done print the list.

First, get the set of search terms you want to look for in a single list. We use set(...) here to eliminate duplicate search terms:
search_terms = set(open("input.txt", "r").read().lower().split())
Next, iterate over the rows in the data table, selecting each one that matches the search terms. Here, I'm preserving the behavior of the original code, in that we search for the case-normalized search term in any column for each row. If you just wanted to search e.g. the author column, then this would need to be tweaked:
results = [row for row in data
if any(search_term in item.lower()
for item in row
for search_term in search_terms)]
Finally, print the results.
for row in results:
print row[0]
If you wanted, you could also list the authors or any other info in the table. E.g.:
for row in results:
print '%30s (by %s)' % (row[0], row[1])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python row.get(regex) - python

Here is an example that prints all the contents of the columns that start with "E-mail" for every row: input_file = csv.DictReader(open("contacts.csv")) for row in input_file: for column in row.keys(): if column.startswith("E-mail"): print row[column]

Related

Extract specific portion of a text file in python 3.x

Python: Writing peoples scores to individual lines

How do I handle closing double quotes in CSV column with python?

How to Store Rows from CSV File into Python and Print Data with HTML

A solution to remove the duplicates?

Categories

Resources