Read specific columns using python

Read specific columns using python - python

I have a file like this
SEQ_NUM|ICS_ORIG_STRT_DT|EDW_FIRST_OUT_IFP_DT|CURR_DT|DEV_GE_NUM_DAYS|DEV_LE_NUM_DAYS|FILENAME|CAMPAIGN_NAME_DESC| CAMPAIGN_WAVE|MARKET_SEGMENT|CAMPAIGN_NAME|CAMPAIGN_WAVE_RUN|EFFORT_TYPE|EFFORT_NUM|UU_ID|PRINT_ACCT_NUM|PRINT_PUB_CD|PREFIX|SUFFIX|FIRST_NAME|LAST_NAME|EMAIL|PHONE_NUM|BUS_PHONE|CO_NAME|STREET_NUM|ADDR|ADDR2|CITY|STATE_PROVINCE|ZIP_POSTAL|ZIP4|TRACK_CD|VANITY_URL|BILL_FORM|LETTER_TEXT|OUTER
130|20140401|00010101|20140728|85||Apr14WSJ_CNYR_NOEMAIL_CAP_TM_20140728.txt|Apr14WSJ_CNYR_NOEMAIL_CAP_TM|WSJ_CNYR_NOEMAIL_CAP_TM|CNYR|WSJ_CNYR_NOEMAIL_CAP|Apr14|TM|||032714296269|J|||ARTHUR|MURPHY||9784255147|||46|LANTERN###WAY||SHIRLEY|MA|01464|2136|aaqecw0c||||
I am trying to get PRINT_PUB_CODE =130,PRINT_ACCT_NUM= 20140401 CO_NAME = 00010101 and prefix = 20140728 and so on.
I am new to python and tried this code along with other codes but the the results coming like
130
20140401
None
aaqecw0c
Please let me know where I am doing wrong.What I can do to fix this.
The code is:
#!/usr/bin/env python
import csv
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
with open('temp1.txt','rb') as csvfile:
for row in csv.DictReader(csvfile, dialect='piper'):
print row['PRINT_PUB_CODE']
print row['PRINT_ACCT_NUM']
print row['CO_NAME']
print row['PREFIX']
original file is at http://pastebin.com/QFvLwcHu

print is writing out exactly what you tell it to, namely the values themselves. If you want name/value pairs, you'll need to arrange to format it yourself.
Try this in place of the existing print statements:
names = ('PRINT_PUB_CODE', 'PRINT_ACCT_NUM', 'CO_NAME', 'PREFIX')
print(", ".join("%s=%r" % (n, row[n]) for n in names))
Here's how the above code works:
", ".join(sequence) takes a sequence of strings and concatenates them together with ", " between each value.
% is the string formatting operator, which takes the string on the left and replaces patterns like %s and %r with formatted values from the tuple on the right
%s is the format code for a string, and it's used for the name in front of the = sign.
%r is the format code for "string representation of," or repr, and it's used to convert the value in the row to a string.
for n in names is part of a list comprehension, making Python evaluate the % operator once for each element in the names tuple.

Related

eliminate text after certain character in python pipeline- with slice?

This is a short script I've written to refine and validate a large dataset that I have.
# The purpose of this script is the refinement of the job data attained from the
# JSI as it is rendered by the `csv generator` contributed by Luis for purposes
# of presentation on the dashboard map.
import csv
# The number of columns
num_headers = 9
# Remove invalid characters from records
def url_escaper(data):
for line in data:
yield line.replace('&','&')
# Be sure to configure input & output files
with open("adzuna_input_THRESHOLD.csv", 'r') as file_in, open("adzuna_output_GO.csv", 'w') as file_out:
csv_in = csv.reader( url_escaper( file_in ) )
csv_out = csv.writer(file_out)
# Get rid of rows that have the wrong number of columns
# and rows that have only whitespace for a columnar value
for i, row in enumerate(csv_in, start=1):
if not [e for e in row if not e.strip()]:
if len(row) == num_headers:
csv_out.writerow(row)
else:
print "line %d is malformed" % i
I have one field that is structured like so:
finance|statistics|lisp
I've seen ways to do this using other utilities like R, but I want to ideally achieve the same effect within the scope of this python code.
Maybe I can iterate over all the characters of all the columnar values, perhaps as a list, and if I see a | I can dispose of the | and all the text that follows it within the scope of the column value.
I think surely it can be achieved with slices as they do here but I don't quite understand how the indices with slices work- and I can't see how I could include this process harmoniously within the cascade of the current script pipeline.
With regex I guess it's something like this
(?:|)(.*)

Why not use string's split method?
In[4]: 'finance|statistics|lisp'.split('|')[0]
Out[4]: 'finance'
It does not fail with exception when you do not have separator character in the string too:
In[5]: 'finance/statistics/lisp'.split('|')[0]
Out[5]: 'finance/statistics/lisp'

Looping a write command to output many different indices from a list separately in Python

Im trying to get an output like:
KPLR003222854-2009131105131
in a text file. The way I am attempting to derive that output is as such:
with open('Processed_Data.txt', 'r') as file_P, open('KIC_list.txt', 'w') as namelist:
nameData = []
for line in file_P:
splt_file_P = line.split()
nameData.append(splt_file_P[0])
for key in nameData:
namelist.write('\n' 'KPLR00' + "".join(str(w) for w in nameData) + '-2009131105131')
However I am having an issue in that the numbers in the nameData array are all appearing at once in the specified output, instead of using on ID cleanly as shown above the output is something like this:
KPLR00322285472138721382172198371823798123781923781237819237894676472634973256279234987-2009131105131
So my question is how do I loop the write command in a way that will allow me to get each separate ID (each has a specific index value, but there are over 150) to be properly outputted.
EDIT:
Also, some of the ID's in the list are not the same length, so I wanted to add 0's to the front of the 'key' to make them all equal 9 digits. I cheated this by adding the 0's into the KPLR in quotes but not all of the ID's need just two 0's. The question is, could I add 0's between KPLR and the key in any way to match the 9-digit format?

Your code looks like it's working as one would expect: "".join(str(w) for w in nameData) makes a string composed of the concatenation of every item in nameData.
Chances are you want;
for key in nameData:
namelist.write('\n' 'KPLR00' + key + '-2009131105131')
Or even better:
for key in nameData:
namelist.write('\nKPLR%09i-2009131105131'%int(key)) #no string concatenation
String concatenation tends to be slower, and if you're not only operating on strings, will involve explicit calls to str. Here's a pair of ideone snippets showing the difference: http://ideone.com/RR5RnL and http://ideone.com/VH2gzx
Also, the above form with the format string '%09i' will pad with 0s to make the number up to 9 digits. Because the format is '%i', I've added an explicit conversion to int. See here for full details: http://docs.python.org/2/library/stdtypes.html#string-formatting-operations
Finally, here's a single line version (excepting the with statement, which you should of course keep):
namelist.write("\n".join("KPLR%09i-2009131105131"%int(line.split()[0]) for line in file_P))

You can change this:
"".join(str(w) for w in nameData)
to this:
",".join(str(w) for w in nameData)
Basically, the "," will comma delimit the elements in your nameData list. If you use "", then there will be nothing to separate the elements, so they appear all at once. You can change the delimiter to suit your needs.

Just for kicks:
with open('Processed_Data.txt', 'r') as file_P, open('KIC_list.txt', 'w') as namelist:
nameData = [line.split()[0] for line in file_P]
namelist.write("\n".join("KPLR00" + str(key) + '-2009131105131' for key in nameData))
I think that will work, but I haven't tested it. You can make it even smaller/uglier by not using nameData at all, and just use that list comprehension right in its place.

How do i format the ouput of a list of list into a textfile properly?

I am really new to python and now I am struggeling with some problems while working on a student project. Basically I try to read data from a text file which is formatted in columns. I store the data in a list of list and sort and manipulate the data and write them into a file again. My problem is to align the written data in proper columns. I found some approaches like
"%i, %f, %e" % (1000, 1000, 1000)
but I don't know how many columns there will be. So I wonder if there is a way to set all columns to a fixed width.
This is how the input data looks like:
2 232.248E-09 74.6825 2.5 5.00008 499.482
5 10. 74.6825 2.5 -16.4304 -12.3
This is how I store the data in a list of list:
filename = getInput('MyPath', workdir)
lines = []
f = open(filename, 'r')
while 1:
line = f.readline()
if line == '':
break
splitted = line.split()
lines.append(splitted)
f.close()
To write the data I first put all the row elements of the list of list into one string with a free fixed space between the elements. But instead i need a fixed total space including the element. But also I don't know the number of columns in the file.
for k in xrange(len(lines)):
stringlist=""
for i in lines[k]:
stringlist = stringlist+str(i)+' '
lines[k] = stringlist+'\n'
f = open(workdir2, 'w')
for i in range(len(lines)):
f.write(lines[i])
f.close()
This code works basically, but sadly the output isn't formatted properly.
Thank you very much in advance for any help on this issue!

You are absolutely right about begin able to format widths as you have above using string formatting. But as you correctly point out, the tricky bit is doing this for a variable sized output list. Instead, you could use the join() function:
output = ['a', 'b', 'c', 'd', 'e',]
# format each column (len(a)) with a width of 10 spaces
width = [10]*len(a)
# write it out, using the join() function
with open('output_example', 'w') as f:
f.write(''.join('%*s' % i for i in zip(width, output)))
will write out:
' a b c d e'
As you can see, the length of the format array width is determined by the length of the output, len(a). This is flexible enough that you can generate it on the fly.
Hope this helps!

String formatting might be the way to go:
>>> print("%10s%9s" % ("test1", "test2"))
test1 test2
Though you might want to first create strings from those numbers and then format them as I showed above.
I cannot fully comprehend your writing code, but try working on it somehow like that:
from itertools import enumerate
with open(workdir2, 'w') as datei:
for key, item in enumerate(zeilen):
line = "%4i %6.6" % key, item
datei.write(item)

Get a value from a string in python

Program Details:
I am writing a program for python that will need to look through a text file for the line:
Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.
Problem:
Then after the program has found that line, it will then store the line into an array and get the value 19.612545, from f = 19.612545.
Question:
I so far have been able to store the line into an array after I have found it. However I am having trouble as to what to use after I have stored the string to search through the string, and then extract the information from variable f. Does anyone have any suggestions or tips on how to possibly accomplish this?

Depending upon how you want to go at it, CosmicComputer is right to refer you to Regular Expressions. If your syntax is this simple, you could always do something like:
line = 'Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.'
splitByComma=line.split(',')
fValue = splitByComma[1].replace('f= ', '').strip()
print(fValue)
Results in 19.612545 being printed (still a string though).
Split your line by commas, grab the 2nd chunk, and break out the f value. Error checking and conversions left up to you!

Using regular expressions here is maddness. Just use string.find as follows: (where string is the name of the variable the holds your string)
index = string.find('f=')
index = index + 2 //skip over = and space
string = string[index:] //cuts things that you don't need
string = string.split(',') //splits the remaining string delimited by comma
your_value = string[0] //extracts the first field
I know its ugly, but its nothing compared with RE.

How do I alphabetize a file in Python?

I am trying to get a list of presidents alphabetized by last name, even though the file that it is being drawn is currently listed first name, last name, date in office, and date out of office.
Here is what I have, any help on what I need to do with this. I have searched around for some answers, and most of them are beyond my level of understanding. I feel like I am missing something small. I tried to break them all out into a list, and then sort them, but I could not get it to work, so this is where I started from.
INPUT_FILE = 'presidents.txt'
OUTPUT_FILE = 'president_NEW.txt'
OUTPUT_FILE2 = 'president_NEW2.txt'
def main():
infile = open(INPUT_FILE)
outfile = open(OUTPUT_FILE, 'w')
outfile2 = open(OUTPUT_FILE2,'w')
stuff = infile.readline()
while stuff:
stuff = stuff.rstrip()
data = stuff.split('\t')
president_First = data[1]
president_Last = data[0]
start_date = data[2]
end_date = data[3]
sentence = '%s %s was president from %s to %s' % \
(president_First,president_Last,start_date,end_date)
sentence2 = '%s %s was president from %s to %s' % \
(president_Last,president_First,start_date, end_date)
outfile2.write(sentence2+ '\n')
outfile.write(sentence + '\n')
stuff = infile.readline()
infile.close()
outfile.close()
main()

What you should do is put the presidents in a list, sort that list, and then print out the resulting list.
Before your for loop add:
presidents = []
Have this code inside the for loop after you pull out the names/dates
president = (last_name, first_name, start_date, end_date)
presidents.append(president)
After the for loop
presidents.sort() # because we put last_name first above
# it will sort by last_name
Then print it out:
for president in presidents
last_name, first_name, start_date, end_date = president
string1 = "..."
It sounds like you tried to break them out into a list. If you had trouble with that, show us the code that resulting from that attempt. It was right way to approach the problem.
Other comments:
Just a couple of points where you code could be simpler. Feel free to ignore or use this as you want:
president_First=data[1]
president_Last= data[0]
start_date=data[2]
end_date=data[3]
can be written as:
president_Last, president_First, start_date, end_date = data
stuff=infile.readline()
And
while stuff:
stuff=stuff.rstrip()
data=stuff.split('\t')
...
stuff = infile.readline()
can be written as:
for stuff in infile:
...

#!/usr/bin/env python
# this sounds like a homework problem, but ...
from __future__ import with_statement # not necessary on newer versions
def main():
# input
with open('presidents.txt', 'r') as fi:
# read and parse
presidents = [[x.strip() for x in line.split(',')] for line in fi]
# sort
presidents = sorted(presidents, cmp=lambda x, y: cmp(x[1], y[1]))
# output
with open('presidents_out.txt', 'w') as fo:
for pres in presidents:
print >> fo, "president %s %s was president %s %s" % tuple(pres)
if __name__ == '__main__':
main()

I tried to break them all out into a list, and then sort them
What do you mean by "them"?
Breaking up the line into a list of items is a good start: that means you treat the data as a set of values (one of which is the last name) rather than just a string. However, just sorting that list is no use; Python will take the 4 strings from the line (the first name, last name etc.) and put them in order.
What you want to do is have a list of those lists, and sort it by last name.
Python's lists provide a sort method that sorts them. When you apply it to the list of president-info-lists, it will sort those. But the default sorting for lists will compare them item-wise (first item first, then second item if the first items were equal, etc.). You want to compare by last name, which is the second element in your sublists. (That is, element 1; remember, we start counting list elements from 0.)
Fortunately, it is easy to give Python more specific instructions for sorting. We can pass the sort function a key argument, which is a function that "translates" the items into the value we want to sort them by. Yes, in Python everything is an object - including functions - so there is no problem passing a function as a parameter. So, we want to sort "by last name", so we would pass a function that accepts a president-info-list and returns the last name (i.e., element [1]).
Fortunately, this is Python, and "batteries are included"; we don't even have to write that function ourself. We are given a magical tool that creates functions that return the nth element of a sequence (which is what we want here). It's called itemgetter (because it makes a function that gets the nth item of a sequence - "item" is more usual Python terminology; "element" is a more general CS term), and it lives in the operator module.
By the way, there are also much neater ways to handle the file opening/closing, and we don't need to write an explicit loop to handle reading the file - we can iterate directly over the file (for line in file: gives us the lines of the file in turn, one each time through the loop), and that means we can just use a list comprehension (look them up).
import operator
def main():
# We'll set up 'infile' to refer to the opened input file, making sure it is automatically
# closed once we're done with it. We do that with a 'with' block; we're "done with the file"
# at the end of the block.
with open(INPUT_FILE) as infile:
# We want the splitted, rstripped line for each line in the infile, which is spelled:
data = [line.rstrip().split('\t') for line in infile]
# Now we re-arrange that data. We want to sort the data, using an item-getter for
# item 1 (the last name) as the sort-key. That is spelled:
data.sort(key=operator.itemgetter(1))
with open(OUTPUT_FILE) as outfile:
# Let's say we want to write the formatted string for each line in the data.
# Now we're taking action instead of calculating a result, so we don't want
# a list comprehension any more - so we iterate over the items of the sorted data:
for item in data:
# The item already contains all the values we want to interpolate into the string,
# in the right order; so we can pass it directly as our set of values to interpolate:
outfile.write('%s %s was president from %s to %s' % item)

I did get this working with Karls help above, although I did have to edit the code to get it to work for me, due to some errors I was getting. I eliminated those and ended up with this.
import operator
INPUT_FILE = 'presidents.txt'
OUTPUT_FILE2= 'president_NEW2.txt'
def main():
with open(INPUT_FILE) as infile:
data = [line.rstrip().split('\t') for line in infile]
data.sort(key=operator.itemgetter(0))
outfile=open(OUTPUT_FILE2,'w')
for item in data:
last=item[0]
first=item[1]
start=item[2]
end=item[3]
outfile.write('%s %s was president from %s to %s\n' % (last,first,start,end))
main()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read specific columns using python - python

Related

eliminate text after certain character in python pipeline- with slice?

Looping a write command to output many different indices from a list separately in Python

How do i format the ouput of a list of list into a textfile properly?

Get a value from a string in python

How do I alphabetize a file in Python?

Categories

Resources