I have a list of sample codes which I input into a website to get information about each of them (they are codes for stars, but it doesn't matter what the codes are, they are just a long string of numbers). All these numbers are in one column, one number per row. The website I need to input this file into accepts the numbers to still be in a column, but with a comma next to the numbers. This is an example:
Instead of:
164891738509173
184818483848283
18483943491u385
It's supposed to look like this:
164891738509173,
184818483848283,
18483943491u385,
I wanted to program a quick python code to do that automatically for each number in the entire column. How do I do that? I can manage theoretically to do that manually if the number of stars I'm dealing with is little, but unfortunately in the website, I need to input something like 60000 stars (so 60000 of these numbers) so doing it manually is suicide.
Very simple:
open('output.txt', 'w').writelines( # open 'output.txt' for writing and write multiple lines
line.rstrip('\n') + ',\n' # append comma to each line
for line in open('input.txt') # read lines with numbers from 'input.txt'
)
You could do it more idiomatically and use a with block, but that's probably overkill for such a small task:
with open('input.txt') as In, open('output.txt', 'w') as Out:
for line in In:
Out.write(line.rstrip('\n') + ',\n')
Is this what you want?
If you want to add comma at end the every entry during printing, you can do this:
>>> codes = ['164891738509173', '184818483848283', '18483943491u385']
>>> for code in codes:
... print(code, end=',\n')
...
164891738509173,
184818483848283,
18483943491u385,
To add a comma to every item within the list,
>>> end_comma = [f"{code}," for code in codes]
>>> end_comma
['164891738509173,', '184818483848283,', '18483943491u385,']
Related
houses = []
#print(houses)
#Creates an empty list for the houses
f = open("houses.txt", "r+")
#Creates a variable called "f" which opens the houses.txt file and imports it r+ = Read and + means you can append on the end
lines = f.readlines()
#The readlines() method returns a list containing each line in the file as a list item. Use the hint parameter to limit the number of lines returned.
#The for loop grabs each line of text from the file, and splits the houses and scores and makes them separate. It runs for as many lines are in the text file
for line in lines:
#The append adds the stripped and split code separately in a list
#.strip .split separates the houses and scores into separate strings
houses.append(line.strip().split())
for i in range(len(houses)):
#Loops for how many houses are in the list
houses[i][1] = int(houses[i][1])
#Turns the second part of the list into an integer
print(houses)
This part of the code imports houses (teams) from the text file which is laid out like this:
StTeresa 0
StKolbe 0
StMary 0
StAnn 0
I created a function to save the points. I would basically like it to take the houses and scores from the list in the program. To do so, it will delete all the contents in the text file, and then I would like it to rewrite it in the same format as the original text file to keep the updated house scores.
def save():
f.truncate(0)
f.write(str(houses))
I tried this but the output is:
This
Can anyone help me to rewrite the text file to include the updated scores and be in the same format as the text file orignally was?
This should accomplish what you want:
def save():
with open("houses.txt", "w") as file:
for index, line_content in enumerate(houses):
houses[index][1] = str(houses[index][1])
file.write("\n".join(map(lambda sub_list: " ".join(sub_list), houses)))
However, as you can see, this is quite hacky.
So I would have two main recommendations for your code:
Use open() within a with statement. Currently your code does not reliably close the file it opens, which can cause your code to do unexpected things (for example when an exception is thrown in the middle of your processing). Therefore, it is recommended practice to use with (further information on this can be found here)
Instead of using lists within a list, you could use tuples:
for line in lines:
# The append adds the stripped and split code separately in a list
# .strip .split separates the houses and scores into separate strings
house_name, house_score = tuple(line.strip().split())
houses.append((house_name, int(house_score)))
This gives you something like this: [("StTeresa", 0), ("StKolbe", 0), ("StMary", 0), ("StAnn", 0)]. Tuples in lists have the advantage that you can unpack them in loops, which makes it easier to handle them compared to a list within a list:
for index, (house_name, house_score) in enumerate(houses):
# do something with the score
houses[index] = (house_name, updated_score)
Another option is to follow what Barmar suggested and use the csv module, which is part of the standard library.
As a bonus: enumerate() provides you the indices of an iterable, so you can easily loop over them. So instead of
for i in range(len(houses)):
#Loops for how many houses are in the list
you can write
for i, house in enumerate(houses):
# Loop over all the houses
which makes the code a little easier to read and write.
So with all my suggestions combined you could have a save() function like this:
def save():
with open("houses.txt", "w") as file:
for house_name, house_score in houses:
file.write(f"{house_name} {house_score}\n")
Hope this helps :)
Imagine you have a textfile input.txt containing text and floats, but without a regular structure (such as header, .csv etc.), for instance :
Banana 1.4030391
(4.245, -345.2456)
4.245 -345.2456
Hello how are you?
Based on this file, you want to generate output.txt where each float has been rounded to 1 decimal, the remaining content left untouched. This would give
Banana 1.4
(4.2, -345.2)
4.2 -345.2
Hello how are you?
To achieve this in Python, you need following steps.
Open the inputfile and read each line
f = open('input.txt')
f.readlines()
Extract the floats
How to proceed? The difficulty lies in the fact that there is no regular structure in the file.
Round the floats
np.round(myfloat)
Write the line to the output file
...
Check this out. Use regular expression to match floating point numbers, then replace them.
import re
f = open('input.txt')
tempstring=f.readlines()
string = ""
string = string.join(tempstring)
def check_string(string):
temp = re.findall(r"\d+\.\d+",string)
for i in temp:
string=string.replace(i,str(round(float(i),1)))
return string
output=check_string(string)
file2=open("output.txt","a+")
file2.write(output)
Since it seems like you need ideas how to extracts floats from the text file, I can only contribute an idea.
I think it is simpler to create an empty list and add each words and numbers to it.
You can strip each items in the text file by stripping it where there is a space and newline. Then you can check if those items in list are floats, by using for loop.
Functions you can use are ".append", ".rstrip", "isinstance()"
Below code DOESN'T extract float numbers but you can work on it to strip each items in text file.
mylines = [] # Declare an empty list.
with open ('text.txt', 'rt') as myfile: # Open txt for reading text.
for myline in myfile: # For each line in the file,
mylines.append(myline.rstrip('\n' and ' ')) # strip newline and add to list.
for element in mylines:
print(element)
for item in element:
print(item)
I have a .csv file with comma separated fields. I am receiving this file from a 3rd party and the content cannot change. I need to import the file to a database, but there are commas in some of the "comma" separated fields. The comma separated fields are also fixed length - when I stright up print the fields as per the below lines in function insert_line_csv they are spaced in a fixed length.
I need essentially need an efficient method of collecting fields that could have comma's included in the field. I was hoping to combine the two methods. Not sure if that would be efficient.
I am using python 3 - willing to use any libraries to make the job efficient and easy.
Currently I am have the following:
with open(FileName, 'r') as f:
for count, line in enumerate(f):
insert_line_csv(count, line)
with the insert_line_csv function looking like:
def insert_line_csv(line_no, line):
line = line.split(",")
field0 = line[0]
field1 = line[1]
......
I am importing the line_no, as that is also being entered into the db.
Any insight would be appreciated.
A sample dataset:
text ,2000.00 ,2018-07-07,textwithoutcomma ,text ,1
text ,3000.00 ,2018-07-08,textwith,comma ,text ,7
text ,1000.00 ,2018-07-07,textwithoutcomma ,text ,4
If the comma seperated fields are all fixed length, you should be able to just splice them off by count instead of splicing by commas, see Split string by count of characters
as a mockup-code you have
toParse = line
while (toParse != "")
chunk = first X chars of toParse
restOfLine = toParse without the chars just cut off
write chunk to db
toParse = restOfLine
That should work imho
Edit:
upon seeing your sample dataset. Can there only be one field with commas inside of it? If so, you could split via comma, read out the first 3 fields, then the last two. Whatever is left, you concatenate again, because it is the value fo the 4th field. (If it had commas, ou'll need to actually concatenate there, if not, its already the value)
My input will be several separate line in the same form. I don't know how can I merge these inputs to one object.
For example:
Robin 590.00 343.05 3333.00
Max 45.00 234.44 3443.55
and I would like to have these data in one expression
(data = '''Robin 590.00 343.05 3333.00 Max 45.00 234.44 3443.55 ''')
because I want to execute this code (I have to sum up last value of every input line):
result = sum(float(x.split()[-1]) for x in data.splitlines())
Most of the things that you need to read a file are in the python manual. More specifically, you can use the with open command to get a list as shown below. The list comprehension is to remove any trailing new line characters (strip) and keep your data in a variable, since f will be closed as soon as you exit the with clause.
with open('test.txt') as f:
fout = [fone.strip('\n')for fone in f]
print(fout)
I am trying to read a very simple but somehow large(800Mb) csv file using the csv library in python. The delimiter is a single tab and each line consists of some numbers.
Each line is a record, and I have 20681 rows in my file. I had some problems during my calculations using this file,it always stops at a certain row. I got suspicious about the number of rows in the file.I used the code below to count the number of row in this file:
tfdf_Reader = csv.reader(open('v2-host_tfdf_en.txt'),delimiter=' ')
c = 0
for row in tfdf_Reader:
c = c + 1
print c
To my surprise c is printed with the value of 61722!!! Why is this happening? What am I doing wrong?
800 million bytes in the file and 20681 rows means that the average row size is over 38 THOUSAND bytes. Are you sure? How many numbers do you expect in each line? How do you know that you have 20681 rows? That the file is 800 Mb?
61722 rows is almost exactly 3 times 20681 -- is the number 3 of any significance e.g. 3 logical sub-sections of each record?
To find out what you really have in your file, don't rely on what it looks like. Python's repr() function is your friend.
Are you on Windows? Even if not, always open(filename, 'rb').
If the fields are tab-separated, then don't put delimeter=" " (whatever is between the quotes appears not to be a tab). Put delimiter="\t".
Try putting some debug statements in your code, like this:
DEBUG = True
f = open('v2-host_tfdf_en.txt', 'rb')
if DEBUG:
rawdata = f.read(200)
f.seek(0)
print 'rawdata', repr(rawdata)
# what is the delimiter between fields? between rows?
tfdf_Reader = csv.reader(f,delimiter=' ')
c = 0
for row in tfdf_Reader:
c = c + 1
if DEBUG and c <= 10:
print "row", c, repr(row)
# Are you getting rows like you expect?
print "rowcount", c
Note: if you are getting Error: field larger than field limit (131072), that means your file has 128Kb of data with no delimiters.
I'd suspect that:
(a) your file has random junk or a big chunk of binary zeroes apppended to it -- this should be obvious in a hex editor; it also should be obvious in a TEXT editor. Print all the rows that you do get to help identify where the trouble starts.
or (b) the delimiter is a string of one or more whitespace characters (space, tab), the first few rows have tabs, and the remaining rows have spaces. If so, this should be obvious in a hex editor (or in Notepad++, especially if you do View/Show Symbol/Show all characters). If this is the case, you can't use csv, you'd need something simple like:
f = open('v2-host_tfdf_en.txt', 'r') # NOT 'rb'
rows = [line.split() for line in f]
My first guess would be the delimeter. How are you ensuring the delimeter is a tab?
What is actually the value you are passing? (the code your pased lists a space, but I'm sure you intended to pass something else).
If your file is tab separated, then look specifically for '\t' as your delimeter. Looking for a space would mess up situations where there is space in your data that is not a column separator.
Also, if your file is an excel-tab, then there is a special "dialect" for that.