Write within-patient data from single observation to multiple text files - python

I am trying to write individual column data within a dataframe in which each row represents one patient's data. I have a loop function that takes one patient's 'id' to generate 25 'id'.txt files - one for each patient. I now want to loop through the df, pick up individual data points (e.g. the 'fio2' value for patient with id=6) and append it to that patient's .txt file.
Here is the problem I need some guidance with: when I run the for loops (I've tried multiple variations) all I get is ALL 25 values for all patients are appended to every individual patient's text file.
The df/data look like this
My basic code that create/write to the text files is:
for i in data['id']:
filename = str(i) + '.txt'
f = open(filename, 'a+')
f.write('{}\n'.format('-----------------------------------------------'))
f.write(datetime.datetime.now().strftime("%d.%m.%y"))
f.write('{}\n'.format(''))
f.write('{}\n'.format('Updated summary of patient data'))
f.close()
I believe (probably incorrectly) that I need a nested loop. How would I modify this code to do what I need done?

You could try something like this:
import pandas as pd
d = {
'id':range(10),
'name': list('abcdefghij')
}
df = pd.DataFrame(d)
print(df.head(2))
def search_id_and_return_field(id,return_field_name):
return df.loc[df.id==id][return_field_name].values[0]
required_ids = [1,5]
for id in required_ids:
print(search_id_and_return_field(id=id,return_field_name='name'))
break
In your code, it would fit in somewhere like so:
for i in required_ids:
filename = str(i) + '.txt'
f = open(filename, 'a+')
f.write('{}\n'.format('-----------------------------------------------'))
f.write(datetime.datetime.now().strftime("%d.%m.%y"))
f.write('{}\n'.format(search_id_and_return_field(id=i,return_field_name="fio2"))) # Change your fieldname to be returned here
f.write('{}\n'.format('Updated summary of patient data'))
f.close()

Related

Python Export Dictionary to CSV

I'm new in Python and I've been trying to create csv file and save each result in a new row. The results consist of several rows and each line should be captured in csv. However, my csv file separate each letter into new row. I also need to add new key values for the filename, but I dont know how to get the image filename (input is images). I used the search bar searching for similar case/recommended solution but still stumped. Thanks in advance.
with open('glaresss_result.csv','wt') as f:
f.write(",".join(res1.keys()) + "\n")
for imgpath in glob.glob(os.path.join(TARGET_DIR, "*.png")):
res1,res = send_request_qcglare(imgpath)
for row in zip(*res1.values()):
f.write(",".join(str(n) for n in row) + "\n")
f.close()
dictionary res1 printed during the iteration returns:
{'glare': 'Passed', 'brightness': 'Passed'}
The results should be like this (got 3 rows):
glare brightness
Passed Passed
Passed Passed
Passed. Passed
But the current output looks like this:
Few things I changed.
w is enough, since t for text mode is default
no need to close the csv when using a context manager
no need for zip and str(n) for n in row. Just join the 2 values of the dictionary
UPDATED
with open('glaresss_result.csv','w') as f:
f.write(",".join([*res1] + ['filename']) + "\n") # replace filename with whatever columnname you want
for imgpath in glob.glob(os.path.join(TARGET_DIR, "*.png")):
res1,res = send_request_qcglare(imgpath)
f.write(",".join([*res1.values()] + [imgpath]) + "\n") # imgpath (which needs to be a string) will be the value of each row, replace with whatever you suits
If you plan to do more with data, you might want to check out the pandas library.
In your use case p.ex DataFrame.from_records
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.from_records.html
It provides a lot of out of the box functionalities to read, transform and write data.
import pandas as pd
results = []
for imgpath in glob.glob(os.path.join(TARGET_DIR, "*.png")):
res1,res = send_request_qcglare(imgpath)
result.append(res1)
df = pd.DataFrame.from_records(results)
df.to_csv("glaresss_result.csv", index=False)

Python entire XML file to list and then into dataframe, missing most of the file

My final goal is to take each xml file and enter the raw format of the XML into Snowflake, and this is the result I have so far. For some reason though when i convert the list to a Dataframe, the dataframe is only take a couple items from the list for each file...and not the entire 5000 rows in the xml.
My list Data is grabbing all contents from multiple files, in the list you can see the following:
Each list item is genertating a numpy array and its splitting up the elements from the looks of it.
dated = datetime.today().strftime('%Y-%m-%d')
source_dir = r'C:\Users\jSmith\.spyder-py3\SampleXML'
table_name = 'LV_XML'
file_list = glob.glob(source_dir + '/*.XML')
data = []
for file_path in file_list:
data.append(
np.genfromtxt(file_path,dtype='str',delimiter='|',encoding='utf-8')) #delimiter used to make sure it is not splitting based on spaces, might be the issue?
df = pd.DataFrame(list(zip(data)),
columns =['SRC_XML'])
df['SRC_XML']=df['SRC_XML'].astype(str)
df = df.replace(',','', regex=True)
df["TPR_AS_OF_DT"] = dated
The data frame has the following in each column:
Solution via Dave, with a small tweak:
for file_path in file_list:
with open(file_path,'r') as afile:
content = ''
for aline in afile:
content += aline.replace('\n',' ') # changed to replace for my needs
data.append(content)
This puts the data into a single string, and allows it to be ready to be inserted into the Snowflake table as 1 string...for future queries
Perhaps replace the file reading with this:
for file_path in file_list:
with open(file_path,'r') as afile:
content = ''
for aline in afile:
content += aline.strip('\n')
data.append(content)

How can I print a specific set of data in Python 3.9, from a column of a CSV file based on two sets of descriptors from two other columns?

I have an assignment to print specific data within a CSV file. The data to be printed are the registration numbers of vehicles caught by the camera at a location whose descriptor is stored in the variable search_Descriptor, during an hour specified in the variable search_HH.
The CSV file is called: Carscaught.csv
All the registration numbers of the vehicles are under the column titled: Plates.
The descriptors are locations where the vehicles were caught, under the column titled: Descriptor.
And the hours of when each vehicles were caught are under the column titled: HH.
This is the file, it's quite big so I have shared it from google drive:
https://drive.google.com/file/d/1zhIxg5s_nVGzk_5JUXkujbetSIuUcNRU/view?usp=sharing
This is an image of a few lines of the CSV file from the top, the actual data on the file fills 3170 rows and goes all the way from 0-23 hours on the "HH" column:
Carscaught.csv
In my code I have defined two variables as I want to print only the registration plates of vehicles that were caught at the Location of "Kilburn Bldg" specifically at "17" hours:
search_Descriptor = "Kilburn Bldg"
search_HH = "17"
This is the code I have used, but have no clue how to go further by using the variables defined to print the specific data I need. And I HAVE to use those specific variables as they are shown, by the way.
search_Descriptor = "Kilburn Bldg"
search_HH = "17"
fo = open ('Carscaught.csv', 'r')
counter = 0;
line = fo.readline()
while line:
print(line, end = "")
line = fo.readline();
counter = counter + 1;
fo.close()
All that code does is read the entire entire file and closes it. I have no idea on how to get the desired output which should be these three specific registration numbers:
JOHNZS
KEENAS
KR8IVE
Hopefully you can help me with this. Thank you.
You possible want to look at DictReader
with open('Carscaught.csv') as f:
reader = csv.DictReader(f)
for row in reader:
plate = row['Plates']
hour = row['HH']
minute = row['MM']
descriptor = row['Descriptor']
if descriptor in search_Descriptor:
print("Found a row, now to check time")
Then you can use simple logic to search for the data you need.
Try:
import pandas as pd
search_Descriptor = "Kilburn Bldg"
search_HH = 17 # based on the csv that you have posted above, HH is int and not str so I removed the quotation marks
df = pd.read_csv("Carscaught.csv")
df2 = df[df["Descriptor"].eq(search_Descriptor) & df["HH"].eq(search_HH)]
df3 = df2["Plates"]
print(df3)
Output (the numbers 1636, 1648, and 1660 are their row numbers):
1636 JOHNZS
1648 KEENAS
1660 KR8IVE
If you don't have pandas yet, there are different tutorials on how to download/use it depending on where are you writing your code.

Read data from excel after a string matches

I want to read the entire row data and store it in variables, later use them in selenium to write it to webelements. Programming language is Python.
Example: I have an excel sheet of Incidents and their details regarding priority, date, assignee etc
If I give the string as INC00000 it should match the excel data, fetch all the above details and store it in separate variables like
INC #= INC0000 Priority= Moderate Date = 11/2/2020
Is this feasible? I tried and failed writing a code. Please suggest other possible ways to do this.
I would,
load the sheet into a pandas DataFrame
filter the corresponding column in the DataFrame by the INC # of interest
convert the row to dictionary (assuming the INC filter produces only 1 row)
get the corresponding value in the dictionary to assign to the corresponding webelement
Example:
import pandas as pd
df = pd.read_excel("full_file_path", sheet_name="name_of_sheet")
dict_data = df[df['INC #']].to_dict("record") # <-- assuming the "INC #" are in column named "INC #" in the spreadsheet
webelement1.send_keys(dict_data[columnname1])
webelement2.send_keys(dict_data[columnname2])
webelement3.send_keys(dict_data[columnname3])
.
.
.
Please find the below code and do the changes as per your variables after saving your excel file as csv:
Please find the dummy data image
import csv
# Set up input and output variables for the script
gTrack = open("file1.csv", "r")
# Set up CSV reader and process the header
csvReader = csv.reader(gTrack)
header = csvReader.next()
print header
id_index = header.index("id")
date_index = header.index("date ")
var1_index = header.index("var1")
var2_index = header.index("var2")
# # Make an empty list
cList = []
# Loop through the lines in the file and get required id
for row in csvReader:
id = row[id_index]
if(id == 'INC001') :
date = row[date_index]
var1 = row[var1_index]
var2 = row[var2_index]
cList.append([id,date,var1,var2])
# # Print the coordinate list
print cList

How to name dataframes with a for loop?

I want to read several files json files and write them to a dataframe with a for-loop.
review_categories = ["beauty", "pet"]
for i in review_categories:
filename = "D:\\Library\\reviews_{}.json".format(i)
output = pd.read_json(path_or_buf=filename, lines=True)
return output
The problem is I want each review category to have its own variable, like a dataframe called "beauty_reviews", and another called "pet_reviews", containing the data read from reviews_beauty.json and reviews_pet.json respectively.
I think it is easy to handle the dataframes in a dictionary. Try the codes below:
review_categories = ["beauty", "pet"]
reviews = {}
for review in review_categories:
df_name = review + '_reviews' # the name for the dataframe
filename = "D:\\Library\\reviews_{}.json".format(review)
reviews[df_name] = pd.read_json(path_or_buf=filename, lines=True)
In reviews, you will have a key with the respective dataframe to store the data. If you want to retrieve the data, just call:
reviews["beauty_reviews"]
Hope it helps.
You can first pack the files into a list
reviews = []
review_categories = ["beauty", "pet"]
for i in review_categories:
filename = "D:\\Library\\reviews_{}.json".format(i)
reviews.append(pd.read_json(path_or_buf=filename, lines=True))
and then unpack your results into the variable names you wanted:
beauty_reviews, pet_reviews = reviews

Categories

Resources