This question already has answers here:
Best way to retrieve variable values from a text file?
(11 answers)
Closed 7 months ago.
Ok don't yell at me. I'm still learning.
Text file (file.txt, these are the entire contents):
pn = FZP16WHTEPE444
cp = custpart
pd = partdesc
bq = 11,000
co = color 02
ma = material 01
mo = 1234
ln = 2227011234
mb = 38
Python code:
print(str(pn))
print(str(cp))
print(str(pd))
print(str(bq))
print(str(co))
print(str(ma))
print(str(mo))
print(str(ln))
print(str(mb))
What do I do to make the python script call the strings from the text file so it'll display the following?:
FZP16WHTEPE444
custpart
partdesc
11,000
color 02
material 01
1234
2227011234
38
You can read content from file and split base = and store in dict and use dict for printing values.
dct = {}
with open('file.txt', 'r') as file:
for line in file:
k, v = line.split('=')
dct[k.strip()] = v.strip()
keys = ['pn' , 'cp' , 'pd', 'bq', 'co', 'ma', 'mo', 'ln', 'mb']
for k in keys:
print(dct[k])
FZP16WHTEPE444
custpart
partdesc
11,000
color 02
material 01
1234
2227011234
38
You can read the text file then create a dictionary with the values:
with open("file.txt", "r") as f:
f = f.read()
f = f.split("\n")
vals = dict(map(lambda val : val.split(" = "), f))
print(vals["pn"])
print(vals["cp"])
Here is one way:
test.txt
pn = FZP16WHTEPE444
cp = custpart
pd = partdesc
bq = 11,000
co = color 02
ma = material 01
mo = 1234
ln = 2227011234
mb = 38
test.py
with open("test.txt") as file:
dictionary = {}
for line in file:
entry = [x.strip() for x in line.split("=")]
dictionary[entry[0]] = entry[1]
print("{}".format(dictionary), flush=True)
$ python test.py
{'pn': 'FZP16WHTEPE444', 'cp': 'custpart', 'pd': 'partdesc', 'bq': '11,000', 'co': 'color 02', 'ma': 'material 01', 'mo': '1234', 'ln': '2227011234', 'mb': '38'}
Related
I am having this text
/** Goodmorning
Alex
Dog
House
Red
*/
/** Goodnight
Maria
Cat
Office
Green
*/
I would like to have Alex , Dog , House and red in one list and Maria,Cat,office,green in an other list.
I am having this code
with open(filename) as f :
for i in f:
if i.startswith("/** Goodmorning"):
#add files to list
elif i.startswith("/** Goodnight"):
#add files to other list
So, is there any way to write the script so it can understands that Alex belongs in the part of the text that has Goodmorning?
I'd recommend you to use dict, where "section name" will be a key:
with open(filename) as f:
result = {}
current_list = None
for line in f:
if line.startswith("/**"):
current_list = []
result[line[3:].strip()] = current_list
elif line != "*/":
current_list.append(line.strip())
Result:
{'Goodmorning': ['Alex', 'Dog', 'House', 'Red'], 'Goodnight': ['Maria', 'Cat', 'Office', 'Green']}
To search which key one of values belongs you can use next code:
search_value = "Alex"
for key, values in result.items():
if search_value in values:
print(search_value, "belongs to", key)
break
I would recommend to use Regular expressions. In python there is a module for this called re
import re
s = """/** Goodmorning
Alex
Dog
House
Red
*/
/** Goodnight
Maria
Cat
Office
Green
*/"""
pattern = r'/\*\*([\w \n]+)\*/'
word_groups = re.findall(pattern, s, re.MULTILINE)
d = {}
for word_group in word_groups:
words = word_group.strip().split('\n\n')
d[words[0]] = words[1:]
print(d)
Output:
{'Goodmorning': ['Alex', 'Dog', 'House', 'Red'], 'Goodnight':
['Maria', 'Cat', 'Office', 'Green']}
expanding on Olvin Roght (sorry can't comment - not enough reputation) I would keep a second dictionary for the reverse lookup
with open(filename) as f:
key_to_list = {}
name_to_key = {}
current_list = None
current_key = None
for line in f:
if line.startswith("/**"):
current_list = []
current_key = line[3:].strip()
key_to_list[current_key] = current_list
elif line != "*/":
current_name=line.strip()
name_to_key[current_name]=current_key
current_list.append(current_name)
print key_to_list
print name_to_key['Alex']
alternative is to convert the dictionary afterwards:
name_to_key = {n : k for k in key_to_list for n in key_to_list[k]}
(i.e if you want to go with the regex version from ashwani)
Limitation is that this only permits one membership per name.
What is the best way for parsing below file? The blocks repeat multiple times.
The expected result is output to CSV file as:
{Place: REGION-1, Host: ABCD, Area: 44...}
I tried the code below, but it only iterates first blocks and than finishes.
with open('/tmp/t2.txt', 'r') as input_data:
for line in input_data:
if re.findall('(.*_RV)\n',line):
myDict={}
myDict['HOST'] = line[6:]
continue
elif re.findall('Interface(.*)\n',line):
myDict['INTF'] = line[6:]
elif len(line.strip()) == 0:
print(myDict)
Text file is below.
Instance REGION-1:
ABCD_RV
Interface: fastethernet01/01
Last state change: 0h54m44s ago
Sysid: 01441
Speaks: IPv4
Topologies:
ipv4-unicast
SAPA: point-to-point
Area Address(es):
441
IPv4 Address(es):
1.1.1.1
EFGH_RV
Interface: fastethernet01/01
Last state change: 0h54m44s ago
Sysid: 01442
Speaks: IPv4
Topologies:
ipv4-unicast
SAPA: point-to-point
Area Address(es):
442
IPv4 Address(es):
1.1.1.2
Instance REGION-2:
IJKL_RV
Interface: fastethernet01/01
Last state change: 0h54m44s ago
Sysid: 01443
Speaks: IPv4
Topologies:
ipv4-unicast
SAPA: point-to-point
Area Address(es):
443
IPv4 Address(es):
1.1.1.3
Or if you prefer an ugly regex route:
import re
region_re = re.compile("^Instance\s+([^:]+):.*")
host_re = re.compile("^\s+(.*?)_RV.*")
interface_re = re.compile("^\s+Interface:\s+(.*?)\s+")
other_re = re.compile("^\s+([^\s]+).*?:\s+([^\s]*){0,1}")
myDict = {}
extra = None
with open('/tmp/t2.txt', 'r') as input_data:
for line in input_data:
if extra: # value on next line from key
myDict[extra] = line.strip()
extra = None
continue
region = region_re.match(line)
if region:
if len(myDict) > 1:
print(myDict)
myDict = {'Place': region.group(1)}
continue
host = host_re.match(line)
if host:
if len(myDict) > 1:
print(myDict)
myDict = {'Place': myDict['Place'], 'Host': host.group(1)}
continue
interface = interface_re.match(line)
if interface:
myDict['INTF'] = interface.group(1)
continue
other = other_re.match(line)
if other:
groups = other.groups()
if groups[1]:
myDict[groups[0]] = groups[1]
else:
extra = groups[0]
# dump out final one
if len(myDict) > 1:
print(myDict)
output:
{'Place': 'REGION-1', 'Host': 'ABCD', 'INTF': 'fastethernet01/01', 'Last': '0h54m44s', 'Sysid': '01441', 'Speaks': 'IPv4', 'Topologies': 'ipv4-unicast', 'SAPA': 'point-to-point', 'Area': '441', 'IPv4': '1.1.1.1'}
{'Place': 'REGION-1', 'Host': 'EFGH', 'INTF': 'fastethernet01/01', 'Last': '0h54m44s', 'Sysid': '01442', 'Speaks': 'IPv4', 'Topologies': 'ipv4-unicast', 'SAPA': 'point-to-point', 'Area': '442', 'IPv4': '1.1.1.2'}
{'Place': 'REGION-2', 'Host': 'IJKL', 'INTF': 'fastethernet01/01', 'Last': '0h54m44s', 'Sysid': '01443', 'Speaks': 'IPv4', 'Topologies': 'ipv4-unicast', 'SAPA': 'point-to-point', 'Area': '443', 'IPv4': '1.1.1.3'}
This doesn't use much regex and could be more optimized. Hope it helps!
import re
import pandas as pd
from collections import defaultdict
_level_1 = re.compile(r'instance region.*', re.IGNORECASE)
with open('stack_formatting.txt') as f:
data = f.readlines()
"""
Format data so that it could be split easily
"""
data_blocks = defaultdict(lambda: defaultdict(str))
header = None
instance = None
for line in data:
line = line.strip()
if _level_1.match(line):
header = line
else:
if "_RV" in line:
instance = line
elif not line.endswith(":"):
data_blocks[header][instance] += line + ";"
else:
data_blocks[header][instance] += line
def parse_text(data_blocks):
"""
Generate a dict which could be converted easily to a pandas dataframe
:param data_blocks: splittable data
:return: dict with row values for every column
"""
final_data = defaultdict(list)
for key1 in data_blocks.keys():
for key2 in data_blocks.get(key1):
final_data['instance'].append(key1)
final_data['sub_instance'].append(key2)
for items in data_blocks[key1][key2].split(";"):
print(items)
if items.isspace() or len(items) == 0:
continue
a,b = re.split(r':\s*', items)
final_data[a].append(b)
return final_data
print(pd.DataFrame(parse_text(data_blocks)))
This worked for me but it's not pretty:
text=input_data
text=text.rstrip(' ').rstrip('\n').strip('\n')
#first I get ready to create a csv by replacing the headers for the data
text=text.replace('Instance REGION-1:',',')
text=text.replace('Instance REGION-2:',',')
text=text.replace('Interface:',',')
text=text.replace('Last state change:',',')
text=text.replace('Sysid:',',')
text=text.replace('Speaks:',',')
text=text.replace('Topologies:',',')
text=text.replace('SAPA:',',')
text=text.replace('Area Address(es):',',')
text=text.replace('IPv4 Address(es):',',')
#now I strip out the leading whitespace, cuz it messes up the split on '\n\n'
lines=[x.lstrip(' ') for x in text.split('\n')]
clean_text=''
#now that the leading whitespace is gone I recreate the text file
for line in lines:
clean_text+=line+'\n'
#Now split the data into groups based on single entries
entries=clean_text.split('\n\n')
#create one liners out of the entries so they can be split like csv
entry_lines=[x.replace('\n',' ') for x in entries]
#create a dataframe to hold the data for each line
df=pd.DataFrame(columns=['Instance REGION','Interface',
'Last state change','Sysid','Speaks',
'Topologies','SAPA','Area Address(es)',
'IPv4 Address(es)']).T
#now the meat and potatoes
count=0
for line in entry_lines:
data=line[1:].split(',') #split like a csv on commas
data=[x.lstrip(' ').rstrip(' ') for x in data] #get rid of extra leading/trailing whitespace
df[count]=data #create an entry for each split
count+=1 #incriment the count
df=df.T #transpose back to normal so it doesn't look weird
Output looks like this for me
Edit: Also, since you have various answers here, I test the performance of mine. It is mildly exponential as described by the equation y = 100.97e^(0.0003x)
Here are my timeit results.
Entries Milliseconds
18 49
270 106
1620 394
178420 28400
I have to write a script in python that will do following actions
I have a xlsx/csv file in which there are 300 cities listed in one column
I have to make all pairs between them and also with help of google api I have to add their distance and travel time in the second column
my CSV file is looks like this:
=======
SOURCE
=======
Agra
Delhi
Jaipur
and expected output in csv/xlsx file be like this
=============================================
SOURCE | DESTINATION | DISTANCE | TIME_TRAVEL
=============================================
Agra | Delhi | 247 | 4
Agra | Jaipur | 238 | 4
Delhi | Agra | 247 | 4
Delhi | jaipur | 281 | 5
Jaipur | Agra | 238 | 4
Jaipur | Delhi | 281 | 5
and so on.. how to do this.?
NOTE: Distance and Travel Time are from google.
To make the pairs you can use itertools.permutations to get all possible pairs.
Code for the same would be as :
import csv # imports the csv module
import sys # imports the sys module
import ast
import itertools
source_list = []
destination_list = []
type_list = []list
f = open(sys.argv[1], 'rb')
g = open(sys.argv[2], 'wb')
# opens the csv file
try:
reader = csv.reader(f)
my_list = list(reader) # creates the reader object
for i in my_list:
source_list.append(i[0])
a = list(itertools.permutations(source_list, 2))
for i in a:
source_list.append(i[0])
destination_list.append(i[1])
mywriter=csv.writer(g)
rows = zip(source_list,destination_list)
mywriter.writerows(rows)
g.close()
finally:
f.close()
Apart from that to get distance and time from the google this sample code may work for full debugging.
import csv # imports the csv module
import sys # imports the sys module
import urllib2,json
import ast
api_google_key = ''
api_google_url = 'https://maps.googleapis.com/maps/api/distancematrix/json?origins='
source_list = []
destination_list = []
distance_list = []
duration_list = []
f = open(sys.argv[1], 'rb')
g = open(sys.argv[2], 'wb')
# opens the csv file
try:
reader = csv.reader(f)
my_list = list(reader) # creates the reader object
for i in my_list:
if i:
s = (i[0])
src = s.replace(" ","")
d = (i[1])
dest = d.replace(" ","")
source = ''.join(e for e in src if e.isalnum())
destination = ''.join(e for e in dest if e.isalnum())
print 'source status = '+str(source.isalnum())
print 'dest status = '+str(destination.isalnum())
source_list.append(source)
destination_list.append(destination)
request = api_google_url+source+'&destinations='+destination+'&key='+api_google_key
print request
dist = json.load(urllib2.urlopen(request))
if dist['rows']:
if 'duration' in dist['rows'][0]['elements'][0].keys():
duration_dict = dist['rows'][0]['elements'][0]['duration']['text']
distance_dict = dist['rows'][0]['elements'][0]['distance']['text']
else:
duration_dict = 0
distance_dict = 0
else:
duration_dict = 0
distance_dict = 0
distance_list.append(distance_dict)
duration_list.append(duration_dict)
mywriter=csv.writer(g)
rows = zip(source_list,destination_list,distance_list,duration_list)
mywriter.writerows(rows)
g.close()
finally:
f.close()
You can do this by using itertools.product but that'll mean that you'll also get repetitions like (Agra, Agra) the distance for which will be 0 really.
import itertools
cities = ["Agra","Delhi","Jaipur"]
cities2 = cities
p = itertools.product(cities, cities2)
print(list(p))
In this case you'd get
[('Agra', 'Agra'), ('Agra', 'Delhi'), ('Agra', 'Jaipur'), ('Delhi', 'Agra'), ('Delhi', 'Delhi'), ('Delhi', 'Jaipur'), ('Jaipur', 'Agra'), ('Jaipur', 'Delhi'), ('Jaipur', 'Jaipur')]
You can take loop in this forlist and make a request to google to get the travel time and distance.
>>> for pair in list(p):
... print (pair)
...
('Agra', 'Agra')
('Agra', 'Delhi')
('Agra', 'Jaipur')
('Delhi', 'Agra')
('Delhi', 'Delhi')
('Delhi', 'Jaipur')
('Jaipur', 'Agra')
('Jaipur', 'Delhi')
('Jaipur', 'Jaipur')
You can get all the combinations with itertools.permutations() like so:
from itertools import permutations
with open(cities_file, 'r') as f, open(newfile, 'w') as f2:
for pair in (permutations([a.strip() for a in f.read().splitlines()], 2)):
print pair
response = googleapi.get(pair)
f2.write(response+'\n')
Output of print pair
('Agra', 'Delhi')
('Agra', 'Jaipur')
('Delhi', 'Agra')
('Delhi', 'Jaipur')
('Jaipur', 'Agra')
('Jaipur', 'Delhi')
You can then hit the api from the list elements 1 by 1 and keep storing the result in the file.
I have 2 CSV files. One with city name, population and humidity. In second cities are mapped to states. I want to get state-wise total population and average humidity. Can someone help? Here is the example:
CSV 1:
CityName,population,humidity
Austin,1000,20
Sanjose,2200,10
Sacramento,500,5
CSV 2:
State,city name
Ca,Sanjose
Ca,Sacramento
Texas,Austin
Would like to get output(sum population and average humidity for state):
Ca,2700,7.5
Texas,1000,20
The above solution doesn't work because dictionary will contain one one key value. i gave up and finally used a loop. below code is working, mentioned input too
csv1
state_name,city_name
CA,sacramento
utah,saltlake
CA,san jose
Utah,provo
CA,sanfrancisco
TX,austin
TX,dallas
OR,portland
CSV2
city_name population humidity
sacramento 1000 1
saltlake 300 5
san jose 500 2
provo 100 7
sanfrancisco 700 3
austin 2000 4
dallas 2500 5
portland 300 6
def mapping_within_dataframe(self, file1,file2,file3):
self.csv1 = file1
self.csv2 = file2
self.outcsv = file3
one_state_data = 0
outfile = csv.writer(open('self.outcsv', 'w'), delimiter=',')
state_city = read_csv(self.csv1)
city_data = read_csv(self.csv2)
all_state = list(set(state_city.state_name))
for one_state in all_state:
one_state_cities = list(state_city.loc[state_city.state_name == one_state, "city_name"])
one_state_data = 0
for one_city in one_state_cities:
one_city_data = city_data.loc[city_data.city_name == one_city, "population"].sum()
one_state_data = one_state_data + one_city_data
print one_state, one_state_data
outfile.writerows(whatever)
def output(file1, file2):
f = lambda x: x.strip() #strips newline and white space characters
with open(file1) as cities:
with open(file2) as states:
states_dict = {}
cities_dict = {}
for line in states:
line = line.split(',')
states_dict[f(line[0])] = f(line[1])
for line in cities:
line = line.split(',')
cities_dict[f(line[0])] = (int(f(line[1])) , int(f(line[2])))
for state , city in states_dict.iteritems():
try:
print state, cities_dict[city]
except KeyError:
pass
output(CSV1,CSV2) #these are the names of the files
This gives the output you wanted. Just make sure the names of cities in both files are the same in terms of capitalization.
I am new to python and need help. I am trying to make a list of comma separated values.
I have this data.
EasternMountain 84,844 39,754 24,509 286 16,571 3,409 315
EasternHill 346,373 166,917 86,493 1,573 66,123 23,924 1,343
EasternTerai 799,526 576,181 206,807 2,715 6,636 1,973 5,214
CentralMountain 122,034 103,137 13,047 8 2,819 2,462 561
Now how do I get something like this;
"EasternMountain": 84844,
"EasternHill":346373,
and so on??
So far I have been able to do this:
fileHandle = open("testData", "r")
data = fileHandle.readlines()
fileHandle.close()
dataDict = {}
for i in data:
temp = i.split(" ")
dataDict[temp[0]]=temp[1]
with_comma='"'+temp[0]+'"'+':'+temp[1]+','
print with_comma
Use the csv module
import csv
with open('k.csv', 'r') as csvfile:
reader = csv.reader(csvfile, delimiter=' ')
my_dict = {}
for row in reader:
my_dict[row[0]] = [''.join(e.split(',')) for e in row[1:]]
print my_dict
k.csv is a text file containing:
EasternMountain 84,844 39,754 24,509 286 16,571 3,409 315
EasternHill 346,373 166,917 86,493 1,573 66,123 23,924 1,343
EasternTerai 799,526 576,181 206,807 2,715 6,636 1,973 5,214
CentralMountain 122,034 103,137 13,047 8 2,819 2,462 561
Output:
{'EasternHill': ['346373', '166917', '86493', '1573', '66123', '23924', '1343', ''], 'EasternTerai': ['799526', '576181', '206807', '2715', '6636', '1973', '5214', ''], 'CentralMountain': ['122034', '103137', '13047', '8', '2819', '2462', '561', ''], 'EasternMountain': ['84844', '39754', '24509', '286', '16571', '3409', '315', '']}
Try this:
def parser(file_path):
d = {}
with open(file_path) as f:
for line in f:
if not line:
continue
parts = line.split()
d[parts[0]] = [part.replace(',', '') for part in parts[1:]]
return d
Running it:
result = parser("testData")
for key, value in result.items():
print key, ':', value
Result:
EasternHill : ['346373', '166917', '86493', '1573', '66123', '23924', '1343']
EasternTerai : ['799526', '576181', '206807', '2715', '6636', '1973', '5214']
CentralMountain : ['122034', '103137', '13047', '8', '2819', '2462', '561']
EasternMountain : ['84844', '39754', '24509', '286', '16571', '3409', '315']