Take this invoice.txt for example
Invoice Number
INV-3337
Order Number
12345
Invoice Date
January 25, 2016
Due Date
January 31, 2016
And this is what dict.txt looks like:
Invoice Date
Invoice Number
Due Date
Order Number
I am trying to find keywords from 'dict.txt' in 'invoice.txt' and then add it and the text which comes after it (but before the next keyword) in a 2 column datatable.
So it would look like :
col1 ----- col2
Invoice number ------ INV-3337
order number ---- 12345
Here is what I have done till now
with open('C:\invoice.txt') as f:
invoices = list(f)
with open('C:\dict.txt') as f:
for line in f:
dict = line.strip()
for invoice in invoices:
if dict in invoice:
print invoice
This is working but the ordering is all wrong (it is as in dict.txt and not as in invoice.txt)
i.e.
The output is
Invoice Date
Invoice Number
Due Date
Order Number
instead of the order in the invoice.txt , which is
invoice number
order number
invoice date
due date
Can you help me with how I should proceed further ?
Thank You.
This should work. You can load your invoice data into a list, and your dict data into a set for easy lookup.
with open('C:\invoice.txt') as f:
invoice_data = [line.strip() for line in f if line.strip()]
with open('C:\dict.txt') as f:
dict_data = set([line.strip() for line in f if line.strip()])
Now iterate over invoices, 2 at a time and print out the line sets that match.
for i in range(0, len(invoice_data), 2):
if invoice_data[i] in dict_data:
print(invoive_data[i: i + 2])
Related
So I have a text file containing 22 lines and three headers which is:
economy name
unique economy code given the World Bank standard (3 uppercase letters)
Trade-to-GDP from year 1990 to year 2019 (30 years, 30 data points); 0.3216 means that trade-to-gdp ratio for Australia in 1990
is 32.16%
The code I have used to import this file and open/read it is:
def Input(filename):
f = open(filename, 'r')
lines = f.readlines()
lines = [l.strip() for l in lines]
f.close()
return lines
However once I have done that I have to create a code with for-loops to create a list variable named result. It should contain 22 tuples, and each tuple contains four elements:
economy name,
World Bank economy code,
average trade-to-gdp ratio for this economy from 1990 to 2004,
average trade-to-gdp ratio for this economy from 2005 to 2019.
Coming out like
('Australia', 'AUS', '0.378', '0.423')
So far the code I have written looks like this:
def result:
name, age, height, weight = zip(*[l.split() for l in text_file.readlines()])
I am having trouble starting this and knowing how to grapple with the multiple years required and output all the countries with corresponding ratios.Here is the table of all the data I have on the text file.
I would suggest to use Pandas for this.
You can simply do:
import pandas as pd
df = read_csv('filename.csv')
for index, row in df.iterrows():
***Do something***
In for loop you can use row['columnName'] and get the data, For example: row['code'] or row['1999'].
This approach will be lot easier for you to carry operations and process the data.
Also to answer your approach:
You can iter over the lines and extract the data using index.
Try the below code:
def Input(filename):
f = open(filename, 'r')
lines = f.readlines()
lines = [l.strip().split() for l in lines]
f.close()
return lines
for line in lines[1:]:
total = sum([float(x) for x in line[2:17])# this will give you sum of values from 1990 to 2004
total2 = sum([float(x) for x in line[17:])# this will give you sum of values from 2005 to 2019
val= (line[0], line[1], total, total1) #This will give you tuple
You can continue the approach and create a tuple in each for loop.
I have been working on a Python project analyzing a CSV file and cannot get the output to show me sums with my strings, just lists of the numbers that should be summed.
Code I'm working with:
import pandas as pd
data = pd.read_csv('XML_projectB.csv')
#inserted column headers since the raw data doesn't have any
data.columns = ['name','email','category','amount','date']
data['date'] = pd.to_datetime(data['date'])
#Calculate the total budget by cateogry
category_wise = data.groupby('category').agg({'amount':['sum']})
category_wise.reset_index(inplace=True)
category_wise.columns = ['category','total_budget']
#Determine which budget category people spent the most money in
max_budget = category_wise[category_wise['total_budget']==max(category_wise['total_budget'])]['category'].to_list()
#Tally the total amounts for each year-month (e.g., 2017-05)
months_wise = data.groupby([data.date.dt.year, data.date.dt.month])['amount'].sum()
months_wise = pd.DataFrame(months_wise)
months_wise.index.names = ['year','month']
months_wise.reset_index(inplace=True)
#Determine which person(s) spent the most money on a single item.
person = data[data['amount'] == max(data['amount'])]['name'].to_list()
#Tells user in Shell that text file is ready
print("Check your folder!")
#Get all this info into a text file
tfile = open('output.txt','a')
tfile.write(category_wise.to_string())
tfile.write("\n\n")
tfile.write("The type with most budget is " + str(max_budget) + " and the value for the same is " + str(max(category_wise['total_budget'])))
tfile.write("\n\n")
tfile.write(months_wise.to_string())
tfile.write("\n\n")
tfile.write("The person who spent most on a single item is " + str(person) + " and he/she spent " + str(max(data['amount'])))
tfile.close()
The CSV raw data looks like this (there are almost 1000 lines of it):
Walker Gore,wgore8i#irs.gov,Music,$77.98,2017-08-25
Catriona Driussi,cdriussi8j#github.com,Garden,$50.35,2016-12-23
Barbara-anne Cawsey,bcawsey8k#tripod.com,Health,$75.38,2016-10-16
Henryetta Hillett,hhillett8l#pagesperso-orange.fr,Electronics,$59.52,2017-03-20
Boyce Andreou,bandreou8m#walmart.com,Jewelery,$60.77,2016-10-19
My output in the txt file looks like this:
category total_budget
0 Automotive $53.04$91.99$42.66$1.32$35.07$97.91$92.40$21.28$36.41
1 Baby $93.14$46.59$31.50$34.86$30.99$70.55$86.74$56.63$84.65
2 Beauty $28.67$97.95$4.64$5.25$96.53$50.25$85.42$24.77$64.74
3 Books $4.03$17.68$14.21$43.43$98.17$23.96$6.81$58.33$30.80
4 Clothing $64.07$19.29$27.23$19.78$70.50$8.81$39.36$52.80$80.90
year month amount
0 2016 9 $97.95$67.81$80.64
1 2016 10 $93.14$6.08$77.51$58.15$28.31$2.24$12.83$52.22$48.72
2 2016 11 $55.22$95.00$34.86$40.14$70.13$24.82$63.81$56.83
3 2016 12 $13.32$10.93$5.95$12.41$45.65$86.69$31.26$81.53
I want the total_budget column to be the sum of the list for each category, not the individual values you see here. It's the same problem for months_wise, it gives me the individual values, not the sums.
I tried the {} .format in the write lines, .apply(str), .format on its own, and just about every other Python permutation of the conversion to string from a list I could think of, but I'm stumped.
What am I missing here?
As #Barmar said, the source has $XX so it is not treated as numbers. You could try following this approach to parse the values as integers/floats instead of strings with $ in them.
I would like to subtract two numbers that are written in my text file.I have to calculate the total sales in each line by doing sales - cost price.
the text file contains:
200 123
300 189
111 77
I would like to subtract these values to get the output. Each line gives a different sales profit.
Here's a way to do it:
input_file = open('input.txt', 'r')
sales_profits = []
for line in input_file:
sales, cost = map(int, line.split(" "))
sales_profits.append(sales - cost)
print(sales_profits) # Your result is there - do what you want with :)
I have data in my .txt file
productname1
7,64
productname2
6,56
4.73
productname3
productname4
12.58
10.33
So the data is explained here. We have product name in the first name and in the 2nd line is the price. But for 2nd product name we have original product price and discounted price. Also, the prices sometimes contain '.' and ',' to represent cents. I want to format the data in the following way
Product o_price d_price
productname1 7.64 -
productname2 6.56 4.73
productname3 - -
productname4 12.58 10.33
My current approach is a bit naive but it works for 98% of the cases
import pandas as pd
data = {}
tempKey = []
with open("myfile.txt", encoding="utf-8") as file:
arr_content = file.readlines()
for val in arr_content:
if not val[0].isdigit():# check whether Starting letter is a digit or text
val = ' '.join(val.split()) # Remove extra spaces
data.update({val: []}) # Adding key to the dict and initializing it with a list in which I'll populate values
tempKey.append(val) # keeping track of the last key added because dicts are not sequential
else:
data[str(tempKey[-1])].append(val) # Using last added key and updating it with prices
df = pd.DataFrame(list(data.items()), columns = ['Product', 'Pricelist'])
df[['o_price', 'd_price']] = pd.DataFrame([x for x in df.Pricelist])
df = df.drop('Prices', axis=1)
So this technique does not work when product name starts with a digit. Any suggestions for a better approach ?
Use a regular expression to check if the line contains only numbers and/or periods.
if (re.match("^[0-9\.]*$", val)):
# This is a product price
else:
# This is a product name
I am trying to save my contents in the textfile in this format
|Name|Description|
|QY|Hello|
But for now mine is returning me as this output:
|2014 1Q 2014 2Q 2014 3Q 2014 4Q|
|96,368.3 96,808.3 97,382 99,530.5|
Code:
def write_to_txt(header,data):
with open('test.txt','a+') as f:
f.write("|{}|\n".format('\t'.join(str(field)
for field in header))) # write header
f.write("|{}|\n".format('\t'.join(str(field)
for field in data))) # write items
Any idea how to change my code so that I would have the ideal output?
This is happenning because you are joining the fields inside data iterable using \t , use | in the join as well for your requirement. Example -
def write_to_txt(header,data):
with open('test.txt','a+') as f:
f.write("|{}|\n".format('|'.join(str(field)
for field in header))) # write header
f.write("|{}|\n".format('|'.join(str(field)
for field in data)))