Storing last n values for x in txt file - python

If I have a program that asks for a user's name and score, opens a .txt file, searches the file for their name like:
for x in f.readlines():
if name in x.strip():
#etc
If the name is found it has to edit that line and add the new score, but it must only store x number of scores, say the last 4 scores. So if 4 scores are already stored, it must delete the oldest one so as to only keep the latest 4 scores.
If the name isn't found then it's just a simple append to end of file.
How would I accomplish this?

I would read the file and put it (parse) in some kind of datastruce (e.g. a dictionary), then update the values in the dict and afterwards write it back. It may not be the most efficient way, but the way I used it most.

First, you need to figure out the format of the line. How will you separate names and scores? I'm using a comma.
for line in f: # readlines() is not necessary here
if name in line:
name, s = line.split(',', 1)
scores = s.split(',')
if len(scores) > 4:
del scores[0] # remove the oldest one
scores.append(the_new_score)
new_line = "%s,%s" % (name, ','.join(scores))
And do something with new_line.
(note: this is just a very quick idea for how to accomplish this. This does not update the file in-place nor write out the results anywhere. That is left as an exercise to the reader, unless you actually need help with that part too)

Related

Replacing text in a file with csv file

I want to replace text in a text file with names that are in a csv file. For example, I have a text file that says:
Dear [name here],
Hello . . .. etc etc
And a csv file that has a 2 columns with the first name in the first and the last name in the second:
Joe Smith
Rachel Cool
How would I be able to read in the CSV file, and replace each name with the text [name here] in the file? I have this so far after opening the text and csv file and putting the first and last names in the variable names:
for row in f.readlines():
names = row.split(",")[0], row.split(",")[1]
And after that I tried doing something like this, but it isnt working:
for row in textfile.readlines():
print row.replace("[name here]", names)
And after running it, a TypeError: expected a string or other character buffer error. Assuming because the variable names isnt defined in the second for loop.
But how would I be able to read both files and replace just the [name here] in the text file?
Answering your question
From this line,
names = row.split(",")[0], row.split(",")[1]
names is a tuple of two string. But then, in this line
row.replace("[name here]", names)
you're trying to use it as a single string.
If you want to write both first/family names, you can append them as suggested in another answer, but why split them in the first place ? You may just read them in the first loop without splitting (which also solves cases where the person a three names (like Herbert George Wells)):
names = row
in the first loop, then
row.replace("[name here]", names)
in the second loop.
If you split the names because you want to use them separately, then it is unclear to me what you want to achieve.
 Please include tracebacks in questions
The hint here is in the traceback:
TypeError: expected a string or other character buffer error
replace expects a string or something that acts like a string. You're feeding it a tuple. Generally, when asking a question here, it is good practice to copy the full traceback so that we know where the error occurs exactly.
Next issue
Now, you're going to face another issue: in the loop, you erase names at each iteration.
You should use a list to store all names:
i = 0
for row in f.readlines():
names[i] = row
i += 1
i = 0
for row in textfile.readlines():
print(row.replace("[name here]", names[i]))
i += 1
(Or maybe you did this correctly in your code but you stripped it away when generating a simplified example for your question.)
The replace() method is expecting a string for the second parameter and you've passed it an array, hence the TypeError.
You can take the two names in the names array and append them together with a space in the middle using the join method. Try this:
print row.replace("[name here]", ' '.join(names))

How to 'flatten' lines from text file if they meet certain criteria using Python?

To start I am a complete new comer to Python and programming anything other than web languages.
So, I have developed a script using Python as an interface between a piece of Software called Spendmap and an online app called Freeagent. This script works perfectly. It imports and parses the text file and pushes it through the API to the web app.
What I am struggling with is Spendmap exports multiple lines per order where as Freeagent wants One line per order. So I need to add the cost values from any orders spread across multiple lines and then 'flatten' the lines into One so it can be sent through the API. The 'key' field is the 'PO' field. So if the script sees any matching PO numbers, I want it to flatten them as per above.
This is a 'dummy' example of the text file produced by Spendmap:
5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP
COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,42.000,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000002,1133919,359.400,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
The above has been formatted for easier reading and normally is just one line after the next with no text formatting.
The 'key' or PO field is the first bold item and the second bold/italic item is the cost to be totalled. So if this example was to be passed through the script id expect the first row to be left alone, the Second and Third row costs to be added as they're both from the same PO number and the Fourth line to left alone.
Expected result:
5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP
COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,401.400,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
Any help with this would be greatly appreciated and if you need any further details just say.
Thanks in advance for looking!
I won't give you the solution. But you should:
Write and test a regular expression that breaks the line down into its parts, or use the CSV library.
Parse the numbers out so they're decimal numbers rather than strings
Collect the lines up by ID. Perhaps you could use a dict that maps IDs to lists of orders?
When all the input is finished, iterate over that dict and add up all orders stored in that list.
Make a string format function that outputs the line in the expected format.
Maybe feed the output back into the input to test that you get the same result. Second time round there should be no changes, if I understood the problem.
Good luck!
I would use a dictionary to compile the lines, using get(key,0.0) to sum values if they exist already, or start with zero if not:
InputData = """5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,42.000,20,2013-10-31,103,xxxxxx,AP COMMENT,002143
301067,2013-09-06,2013-09-11,P000002,1133919,359.400,20,2013-10-31,103,xxxxxx,AP COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP COMMENT,002143"""
OutD = {}
ValueD = {}
for Line in InputData.split('\n'):
# commas in comments won't matter because we are joining after anyway
Fields = Line.split(',')
PO = Fields[3]
Value = float(Fields[5])
# set up the output string with a placeholder for .format()
OutD[PO] = ",".join(Fields[:5] + ["{0:.3f}"] + Fields[6:])
# add the value to the old value or to zero if it is not found
ValueD[PO] = ValueD.get(PO,0.0) + Value
# the output is unsorted by default, but you could sort or preserve original order
for POKey in ValueD:
print OutD[POKey].format(ValueD[POKey])
P.S. Yes, I know Capitals are for Classes, but this makes it easier to tell what variables I have defined...

How to pick out exactly the right string in my list of utterances?

So I am working on a python script using NumPy and Pandas and NLTK to take utterances from the CHILDES Database's Providence Corpus.
For reference, the idea of my script is to populate a dataframe for each child in the corpus with their name, utterance containing a linguistic feature I'm looking for (negation types), their age when they said it, and their MLU when they said it.
Great.
Now the user will be able to go in after the dataframes have been filled with this information and tag each utterance as being of a particular category, and the console will print out the utterance they will tag with a line of context on either side (if they just see that the child said 'no' it's hard to tell what they meant by it without seeing what Mom said right before that or what someone said afterward).
So my trick is getting the lines of context. I have it set up with other methods in the program to make this all happen, but I'd like you to look at one segment of one of the methods for populating the dataframes initially, as the line which says:
"if line == line_context:", is providing about 91 false positives for me!
I know why, because I'm making a temporary copy line by line of each file so that for each utterance that ends up having a negation, that utterance's index in the child's dataframe will be used as the key in a HashMap (or dict in Python) to a list of three Strings (well, lists of strings, since that's how the CHILDESCorpusReader gives me the sentences), the utterance, the utterance before it, and the utterance after it...
So I have that buggy line "if line == line_context" to check that as it's iterating through the list of lists of strings (a copy of the file's utterances by line, or 'line_context'), that it lines up with 'line', or the line of the kid's utterance that's being iterated through, so that later I can get the indexes to match up.
The problem is that there are many of these 'sentences' that are the same sequence of characters, (['no'] by itself shows up a lot!) so my program will see it as being the same, see it has a negation, and save it to the dataframe, but it'll save it each time it finds an instance of ['no'] in my copy of the file's utterances that's the same as one of the line's of only the child's speech in that file, so I'm getting about 91 extra instances of the same thing!
Phew! Anyway, is there any way that I can get something like "if line == line_context" to pick out a single instance of ['no'] in the file, so that I know I'm at the same point in the file on both sides??? I'm using the NLTK CHILDESCorpusReader, which doesn't seem to have resources for this kind of stuff (otherwise I wouldn't have to use this ridiculously roundabout way to get the lines of context!)
Maybe there's a way that as I iterate through the utterance_list I'm making for each file, after an utterance has been matched up with the child's utterances I'm also iterating through, I can change and/or delete that item in the utterance_list so as to prevent it from giving me a false positive c. 91 more times?!
Thanks.
Here is le code (I added some extra comments to hopefully help you understand exactly what each line is supposed to do):
for file in value_corpus.fileids(): #iterates through the .xml files in the corpus_map
for line_total in value_corpus.sents(fileids=file, speaker='ALL'): #creates a copy of the utterances by all speakers
utterance_list.append(line_total) #adds each line from the file to the list
for line_context in utterance_list: #iterates through the newly created list
for line in value_corpus.sents(fileids=file, speaker='CHI'): #checks through the original file's list of children's utterances
if line == line_context: #tries to make sure that for each child's utterance, I'm at the point in the embedded for loop where the utterance in my utterance_list and the utterance in the file of child's sentences is the same exact sentence BUGGY(many lines are the same --> false positives)
for type in syntax_types: #iterates through the negation syntactic types
if type in line: #if the line contains a negation
value_df.iat[i,5] = type #populates the "Syntactic Type" column
value_df.iat[i,3] = line #populates the "Utterance" column
MLU = str(value_corpus.MLU(fileids=file, speaker='CHI'))
MLU = "".join(MLU)
value_df.iat[i,2] = MLU #populates the "MLU" column
value_df.iat[i,1] = value_corpus.age(fileids=file, speaker='CHI',month=True) #populates the "Ages" column
utterance_index = utterance_list.index(line_context)
try:
before_line = utterance_list[utterance_index - 1]
except IndexError: #if no line before, doesn't look for context
before_line = utterance_list[utterance_index]
try:
after_line = utterance_list[utterance_index + 1]
except IndexError: #if no line after, doesn't look for context
after_line = utterance_list[utterance_index]
value_dict[i] = [before_line, line, after_line]
i = i + 1 #iterates to next row in "Utterance" column of df

python: merge two csv files

I have a problem while I'm doing my assignment with python.
I'm new to python so I am a complete beginner.
Question: How can I merge two files below?
s555555,7
s333333,10
s666666,9
s111111,10
s999999,9
and
s111111,,,,,
s222222,,,,,
s333333,,,,,
s444444,,,,,
s555555,,,,,
s666666,,,,,
s777777,,,,,
After merging, it should look something like:
s111111,10,,,,
s222222,,,,,
s333333,10,,,,
s444444,,,,,
s555555,7,,,,
s666666,9,,,,
s777777,,,,,
s999999,9,,,,
Thanks for reading and any helps would be appreciated!!!
Here are the steps you can follow for one approach to the problem. In this I'll be using FileA, FileB and Result as the various filenames.
One way to approach the problem is to give each position in the file (each ,) a number to reference it by, then you read the lines from FileA, then you know that after the first , you need to put the first line from FileB to build your result that you will write out to Result.
Open FileA. Ideally you should use the with statement because it will automatically close the file when its done. Or you can use the normal open() call, but make sure you close the file after you are done.
Loop through each line of FileA and add it to a list. (Hint: you should use split()). Why a list? It makes it easier to refer to items by index as that's our plan.
Repeat steps 1 and 2 for FileB, but store it in a different list variable.
Now the next part is to loop through the list of lines from FileA, match them with the list from FileB, to create a new line that you will write to the Result file. You can do this many ways, but a simple way is:
First create an empty list that will store your results (final_lines = [])
Loop through the list that has the lines for FileA in a for loop.
You should also keep in mind that not every line from FileA will have a corresponding line in FileB. For every first "bit" in FileA's list, find the corresponding line in FileB's list, and then get the next item by using the index(). If you are keen you would have realized that the first item is always 0 and the next one is always 1, so why not simply hard code the values? If you look at the assignment; there are multiple ,s so it could be that at some point you have a fourth or fifth "column" that needs to be added. Teachers love to check for this stuff.
Use append() to add the items in the right order to final_lines.
Now that you have the list of lines ready, the last part is simple:
Open a new file (use with or open)
Loop through final_lines
Write each line out to the file (make sure you don't forget the end of line character).
Close the file.
If you have any specific questions - please ask.
Not relating to python, but on linux:
sort -k1 c1.csv > sorted1
sort -k1 c2.csv > sorted2
join -t , -11 -21 -a 1 -a 2 sorted1 sorted2
Result:
s111111,10,,,,,
s222222,,,,,
s333333,10,,,,,
s444444,,,,,
s555555,7,,,,,
s666666,9,,,,,
s777777,,,,,
s999999,9
Make a dict using the first element as a primary key, and then merge the rows?
Something like this:
f1 = csv.reader(open('file1.csv', 'rb'))
f2 = csv.reader(open('file2.csv', 'rb'))
mydict = {}
for row in f1:
mydict[row[0]] = row[1:]
for row in f2:
mydict[row[0]] = mydict[row[0]].extend(row[1:])
fout = csv.write(open('out.txt','w'))
for k,v in mydict:
fout.write([k]+v)

Searching for area code in txt-file in Python

I have a textfile that looks like this:
Thomas Edgarson, Berliner Str 4, 13359 Berlin
Madeleine Jones, Müller Str 5, 15992 Karlsruhe
etc...
It's always two words, followed by a comma, then two words and number, comma, area code and city. There are no exceptions.
I used
f=open("C:\\Users\\xxxxxx\\Desktop\\useradresses.txt", "r")
text=f.readlines()
f.close()
So now I have a list of all the columns. How can I now search for the area codes in these strings. I need to create a dictionary that looks like this
{'13359':[('Neuss','Wolfgang'),('Juhnke','Harald')]}
Believe me, I've searched, but couldn't find useful information. To me, the whole idea of searching for something like an arbitray area code in a string is new and I haven't come across it so far.
I would be happy if you could give me some pointers as to where I should look for tutorials or give me an idea where to start.
dic = {}
with open('filename') as file:
for name, addr, zcode in (i.split(',') for i in file if i.rstrip()):
dic.setdefault(zcode.split()[0], []).append(name.split())
Further explanation as Sjoerd asked:
Using a generator expression to break each line in 3 variables: name, addr and zcode. Then I split zcode in the desired number and used it as a dictionary key.
As the dict may not have the key yet, I use the setdefault method and that sets the key with a empty list before appending the splitted name.
Loop through the file, reading lines, and split by comma. Then, process each part by splitting by space. Then, add the values to a dictionary.
d={}
for line in open('useradresses.txt','r'):
if line.strip()=='':
continue
(name,strasse,plzort) = line.split(',')
nachname,vorname=name.split()
plz,ort=plzort.split()
if plz in d:
d[plz].append((nachname,vorname))
else:
d[plz]=[(nachname,vorname),]
print d
Python has a lot of libraries dealing with string manipulation, which is what this is. You'll be wanting the re library and the shlex library. I'd suggest the following code:
with open("C:\\Users\\xxxxxx\\Desktop\\useradresses.txt", "r") as f:
for line in f.readlines():
split = shlex.split(line)
mydict[split[6]] = [(split[0], split[1])]
This won't be perfect, it will overwrite identical zip codes, and drops some values. It should point you in the right direction though.

Categories

Resources