I have been trying to build vlookup function in python. I have two files. name = data - created in python which has more than 2000 rows. comp_data = csv file loaded in system which has 35 rows. I have to match date of data file with comp_data and have to load Exp_date corresponding to it. Current code gives error 35. I am not able to understand the problem.
Following are the codes:
data['Exp_date'] = datetime.date(2020,3,30)
z=0
for i in range(len(data)):
if data['Date'][i] == comp_data['Date'][z]:
data['Exp_date'][i] = comp_data['Exp_date'][z]
else:
z=z+1
One option would be to put your comp_data in a dictionary with your data/exp_date as key/value pairs and let python do the lookup for you.
data = {"date":["a","b","c","d","e","f"],"exp_date":[0,0,0,0,0,0]}
comp = {"a":10,"d":13}
for i in range(len(data['date'])):
if data['date'][i] in comp:
data['exp_date'][i] = comp[data['date'][i]]
print(data)
There's probably a one-liner way of doing this with iterators!
Related
I have an assignment to print specific data within a CSV file. The data to be printed are the registration numbers of vehicles caught by the camera at a location whose descriptor is stored in the variable search_Descriptor, during an hour specified in the variable search_HH.
The CSV file is called: Carscaught.csv
All the registration numbers of the vehicles are under the column titled: Plates.
The descriptors are locations where the vehicles were caught, under the column titled: Descriptor.
And the hours of when each vehicles were caught are under the column titled: HH.
This is the file, it's quite big so I have shared it from google drive:
https://drive.google.com/file/d/1zhIxg5s_nVGzk_5JUXkujbetSIuUcNRU/view?usp=sharing
This is an image of a few lines of the CSV file from the top, the actual data on the file fills 3170 rows and goes all the way from 0-23 hours on the "HH" column:
Carscaught.csv
In my code I have defined two variables as I want to print only the registration plates of vehicles that were caught at the Location of "Kilburn Bldg" specifically at "17" hours:
search_Descriptor = "Kilburn Bldg"
search_HH = "17"
This is the code I have used, but have no clue how to go further by using the variables defined to print the specific data I need. And I HAVE to use those specific variables as they are shown, by the way.
search_Descriptor = "Kilburn Bldg"
search_HH = "17"
fo = open ('Carscaught.csv', 'r')
counter = 0;
line = fo.readline()
while line:
print(line, end = "")
line = fo.readline();
counter = counter + 1;
fo.close()
All that code does is read the entire entire file and closes it. I have no idea on how to get the desired output which should be these three specific registration numbers:
JOHNZS
KEENAS
KR8IVE
Hopefully you can help me with this. Thank you.
You possible want to look at DictReader
with open('Carscaught.csv') as f:
reader = csv.DictReader(f)
for row in reader:
plate = row['Plates']
hour = row['HH']
minute = row['MM']
descriptor = row['Descriptor']
if descriptor in search_Descriptor:
print("Found a row, now to check time")
Then you can use simple logic to search for the data you need.
Try:
import pandas as pd
search_Descriptor = "Kilburn Bldg"
search_HH = 17 # based on the csv that you have posted above, HH is int and not str so I removed the quotation marks
df = pd.read_csv("Carscaught.csv")
df2 = df[df["Descriptor"].eq(search_Descriptor) & df["HH"].eq(search_HH)]
df3 = df2["Plates"]
print(df3)
Output (the numbers 1636, 1648, and 1660 are their row numbers):
1636 JOHNZS
1648 KEENAS
1660 KR8IVE
If you don't have pandas yet, there are different tutorials on how to download/use it depending on where are you writing your code.
I've complex flat file with huge data of mixed type. Trying to parse it using Python (best known to me), Succeeded to segregate data categorically using manual parsing.
Now stuck at a point where I have extracted data and need to make it tabular so that I could write it into xls, using pandas or any other lib.
I have pasted data at pastebin , url is https://pastebin.com/qn9J5nUL
data comes in non-tabualr and tabular format, out of which I need to discard non-tabular data and only need to write tabular data into xls.
To be precise I want to delete below data -
ABC Command-----UIP BLOCK:;
SE : ABC_UIOP_89TP
Report : +ve ABC_UIOP_89TP 2016-09-23 15:16:14
O&M #998459350
%%/*Web=1571835373:;%%
ID = 0 Result Ok.
and only utilize below format data into xls (example, not exact. Please refer to pastebin url to see complete data format) -
Local Info ID ID Name ID Frequency ID Data My ID
0 XXX_1 0 12 13
Since your datafile has certain pattern i think you can do it this way.
import pandas
s = []
e = []
with open('data_to_be_parsed.txt') as f:
datafile = f.readlines()
for idx,line in enumerate(datafile):
if 'Local' in line:
s.append(idx)
if '(Number of results' in line:
e.append(idx)
maindf = pd.DataFrame()
for i in range(len(s)):
head = list(datafile[s[i]].split(" "))
head = [x for x in head if x.strip()]
tmpdf = pd.DataFrame(columns=head)
for l_ in range(s[i]+1,e[i]):
da = datafile[l_]
if len(da)>1:
data = list(da.split(" "))
data = [x for x in data if x.strip()]
tmpdf = tmpdf.append(dict(zip(head,data)),ignore_index=True)
maindf = pd.concat([maindf,tempdf])
maindf.to_excel("output.xlsx")
I want to read the entire row data and store it in variables, later use them in selenium to write it to webelements. Programming language is Python.
Example: I have an excel sheet of Incidents and their details regarding priority, date, assignee etc
If I give the string as INC00000 it should match the excel data, fetch all the above details and store it in separate variables like
INC #= INC0000 Priority= Moderate Date = 11/2/2020
Is this feasible? I tried and failed writing a code. Please suggest other possible ways to do this.
I would,
load the sheet into a pandas DataFrame
filter the corresponding column in the DataFrame by the INC # of interest
convert the row to dictionary (assuming the INC filter produces only 1 row)
get the corresponding value in the dictionary to assign to the corresponding webelement
Example:
import pandas as pd
df = pd.read_excel("full_file_path", sheet_name="name_of_sheet")
dict_data = df[df['INC #']].to_dict("record") # <-- assuming the "INC #" are in column named "INC #" in the spreadsheet
webelement1.send_keys(dict_data[columnname1])
webelement2.send_keys(dict_data[columnname2])
webelement3.send_keys(dict_data[columnname3])
.
.
.
Please find the below code and do the changes as per your variables after saving your excel file as csv:
Please find the dummy data image
import csv
# Set up input and output variables for the script
gTrack = open("file1.csv", "r")
# Set up CSV reader and process the header
csvReader = csv.reader(gTrack)
header = csvReader.next()
print header
id_index = header.index("id")
date_index = header.index("date ")
var1_index = header.index("var1")
var2_index = header.index("var2")
# # Make an empty list
cList = []
# Loop through the lines in the file and get required id
for row in csvReader:
id = row[id_index]
if(id == 'INC001') :
date = row[date_index]
var1 = row[var1_index]
var2 = row[var2_index]
cList.append([id,date,var1,var2])
# # Print the coordinate list
print cList
I'm a python newbie. I have a lot of data (25000 csv files) which I need to work with. I am importing the files and doing some statistical stuff with (means, standard deviations, plots). One file contains of 480x640 (temperature-)values. I don't need the correct order the values, so I don't care about the column or row order. Some files have 9 rows header and some 17. I run the following code:
data = glob.glob('*.csv')
variance_data = []
for one_file in all_files:
try:
with open(one_file) as data:
all_data = np.genfromtxt((line.replace(',','.') for line in data),skip_header=9,delimiter=";")
except ValueError:
with open(one_file) as data:
all_data = np.genfromtxt((line.replace(',','.') for line in data),skip_header=17,delimiter=";")
variance_data.append(np.nanvar(all_data))
This reads most of the data well. But I need to exclude all values from all files which are under -10 (so I only need temperatures above -10 degree Celsius). How can I deal with this? I guess there's an easy way which I still don't know. Thanks for any help!
Are you looking for something like this? It converts any entry in your array which is below -10 to nan.
# Create some sample data
all_data = 20 * np.random.randn(480,640)
all_data[all_data <= -10] = np.nan
variance_data.append(np.nanvar(all_data))
Note: Your sample code below the for loop all needs indenting.
I have a really large excel file and i need to delete about 20,000 rows, contingent on meeting a simple condition and excel won't let me delete such a complex range when using a filter. The condition is:
If the first column contains the value, X, then I need to be able to delete the entire row.
I'm trying to automate this using python and xlwt, but am not quite sure where to start. Seeking some code snippits to get me started...
Grateful for any help that's out there!
Don't delete. Just copy what you need.
read the original file
open a new file
iterate over rows of the original file (if the first column of the row does not contain the value X, add this row to the new file)
close both files
rename the new file into the original file
I like using COM objects for this kind of fun:
import win32com.client
from win32com.client import constants
f = r"h:\Python\Examples\test.xls"
DELETE_THIS = "X"
exc = win32com.client.gencache.EnsureDispatch("Excel.Application")
exc.Visible = 1
exc.Workbooks.Open(Filename=f)
row = 1
while True:
exc.Range("B%d" % row).Select()
data = exc.ActiveCell.FormulaR1C1
exc.Range("A%d" % row).Select()
condition = exc.ActiveCell.FormulaR1C1
if data == '':
break
elif condition == DELETE_THIS:
exc.Rows("%d:%d" % (row, row)).Select()
exc.Selection.Delete(Shift=constants.xlUp)
else:
row += 1
# Before
#
# a
# b
# X c
# d
# e
# X d
# g
#
# After
#
# a
# b
# d
# e
# g
I usually record snippets of Excel macros and glue them together with Python as I dislike Visual Basic :-D.
You can try using the csv reader:
http://docs.python.org/library/csv.html
You can use,
sh.Range(sh.Cells(1,1),sh.Cells(20000,1)).EntireRow.Delete()
will delete rows 1 to 20,000 in an open Excel spreadsheet so,
if sh.Cells(1,1).Value == 'X':
sh.Cells(1,1).EntireRow.Delete()
If you just need to delete the data (rather than 'getting rid of' the row, i.e. it shifts rows) you can try using my module, PyWorkbooks. You can get the most recent version here:
https://sourceforge.net/projects/pyworkbooks/
There is a pdf tutorial to guide you through how to use it. Happy coding!
I have achieved this using Pandas package....
import pandas as pd
#Read from Excel
xl= pd.ExcelFile("test.xls")
#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])
#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)
dfs = dfs[dfs['Name'] != '']
#Updating the excel sheet with the updated DataFrame
dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)