I am creating a script and part of it requires a list of names from a cell range to be stored as a list. I need the list to store as many names as are added to the cellrange however it must not store the values of empty cells.
If I simply use a longer range than is necessary like so:
names = CellRange("C10:C99999").value
then my final script will iterate through all the empty values which is extremely inefficient.
After quite some searching through the DataNitro documentation I found .vertical property which "returns the values of the cells starting with the cell it’s called from, and ending in the last non-empty cell in the same column."
So in my example this would mean:
names = Cell("C10").vertical
Related
I'm having trouble naming the subsets I create inside a loop. I want to give each one the five first letters of the condition (or even just the iteration number) as a name but I haven't figured out how to.
Here's my code
list_mun=list(ensud21.NOM_MUN.unique())
for mun in list_mun:
name=ensud21[ensud21['NOM_MUN']== mun]
list_mun is a list with the unique values that a column of my dataframe can take. Inside the for loop I wrote name where I want what I explained before. I am unable to give each dataframe a different name. Thankyou!
You shouldn't try to set variable names dynamically. Use a container, a dictionary is perfect here:
list_mun=list(ensud21.NOM_MUN.unique())
out_dic = {}
for mun in list_mun:
# here we set "mun" as key
out_dict[mun] = ensud21[ensud21['NOM_MUN']== mun]
Then subsets with:
out_dic[the_mun_you_want]
I am looking to assess if there is a better method to append to a list within a list within a dictionary.
I have many different packets and associated strings to search for in a huge text file. Associated to each string is a value I want to store in a list so that I can perform calculations like average/max/min.
Due to the packet variations and associated strings for each packet I was looking to keep a dictionary entry to a single line. So I would have a Key as the packet ID and the value as a list of elements, see below
mycompactdict={
"packetID_001":[12,15,'ID MATCH',[['search_string',[] ],['search_string2',[] ]]]
"packetID_002":[...etc]
}
The 12,15 ints are references I use later in Excel plotting. The 'ID_MATCH' entry is my first check to see if the packet ID matches the file object. The 'search_string' references are the strings I am looking for and the blank lists next to them is where I hope to drop the values associated to each search string after splitting the line in the text file.
Now I may be biting off more than Python can chew... I realize there is a list within a list within a list within a list within a dict!
Here's a start of my code...
def process_data(file_object):
split_data = file_object.split('\n')
for key in mycompactdict:
if mycompactdict[key][2] in file_object:
for line in split_data:
if item[0] for item in mycompactdict[key][3] in line:
value = line.split('=', 1)
value.strip()
print value
and then append the stripped value to mycompactdict[key][6]item[1]
Am I on the wrong approach which will cause performance problems later on, and is there a cleaner alternative?
Below is an example of the file_object in the for of a unicode block of text, there are both matching and differing packet IDs I need to account for.
14:27:42.0 ID_21 <<(ID_MATCH)
Type = 4
Fr = 2
search_string1 = -12
search_string2 = 1242
I would not try to re-invent the wheel were I in your position. Thus, I would use Pandas. It has something called DataFrames that would be a good fit for what you are trying to do. In addition, you can export those into exel spread sheets. Have a look at the 10min introduction.
I need to create a BASH script, ideally using SED to find and replace value lists in href URL link constructs with HTML sit files, looking-up in a map (old to new values), that have a given URL construct. There are around 25K site files to look through, and the map has around 6,000 entries that I have to search through.
All old and new values have 6 digits.
The URL construct is:
One value:
HREF=".*jsp\?.*N=[0-9]{1,}.*"
List of values:
HREF=".*\.jsp\?.*N=[0-9]{1,}+N=[0-9]{1,}+N=[0-9]{1,}...*"
The list of values are delimited by + PLUS symbol, and the list can be 1 to n values in length.
I want to ignore a construct such as this:
HREF=".*\.jsp\?.*N=0.*"
IE the list is only N=0
Effectively I'm only interested in URL's that include one or more values that are in the file map, that are not prepended with CHANGED -- IE the list requires updating.
PLEASE NOTE: in the above construct examples: .* means any character that isn't a digit; I'm just interested in any 6 digit values in the list of values after N=; so I've trying to isolate the N= list from the rest of the URL construct, and it should be noted that this N= list can appear anywhere within this URL construct.
Initially, I want to create a script that will create a report of all links that fulfills the above criteria and that have a 6 digital OLD value that's in the map file, with its file path, to get an understanding of links impacted. EG:
Filename link
filea.jsp /jsp/search/results.jsp?N=204200+731&Ntx=mode+matchallpartial&Ntk=gensearch&Ntt=
filea.jsp /jsp/search/BROWSE.jsp?Ntx=mode+matchallpartial&N=213890+217867+731&
fileb.jsp /jsp/search/results.jsp?N=0+450+207827+213767&Ntx=mode+matchallpartial&Ntk=gensearch&Ntt=
Lastly, I'd like to find and replace all 6 digit numbers, within the URL construct lists, as outlined above, as efficiently as possible (I'd like it to be reasonably fast as there could be around 25K files, with 6K values to look up, with potentially multiple values in the list).
**PLEASE NOTE:** There is an additional issue I have, when finding and replacing, is that an old value could have been assigned a new value, that's already been used, that may also have to be replaced.
E.G. If the map file is as below:
MAP-FILE.txt
OLD NEW
214865 218494
214866 217854
214867 214868
214868 218633
... ...
and there is a HREF link such as:
/jsp/search/results.jsp?Ntx=mode+matchallpartial&Ntk=gensearch&N=0+450+214867+214868
214867 changes to 214868 - this would need to be prepended to flag that this value has been changed, and should not be replaced, otherwise what was 214867 would become 218633 as all 214868 would be changed to 218633. Hope this makes sense - I would then need to run through file and remove all 6 digit numbers that had been marked with the prepended flag, such that link would become:
/jsp/search/results.jsp?Ntx=mode+matchallpartial&Ntk=gensearch&N=0+450+214868CHANGED+218633CHANGED
Unless there's a better way to manage these infile changes.
Could someone please help me on this, I'm note an expert with these kind of changes - so help would be massively appreciated.
Many thanks in advance,
Alex
I will write the outline for the code in some kind of pseudocode. And I don't remember Python well to quickly write the code in Python.
First find what type it is (if contains N=0 then type 3, if contains "+" then type 2, else type 1) and get a list of strings containing "N=..." by exploding (name of PHP function) by "+" sign.
The first loop is on links. The second loop is for each N= number. The third loop looks in map file and finds the replacing value. Load the data of the map file to a variable before all the loops. File reading is the slowest operation you have in programming.
You replace the value in the third loop, then implode (PHP function) the list of new strings to a new link when returning to a first loop.
Probably you have several files with the links then you need another loop for the files.
When dealing with repeated codes you nees a while loop until spare number found. And you need to save the numbers that are already used in a list.
Importing a JSON document into a pandas dataframe using records = pandas.read_json(path), where path was a pre-defined path to the JSON document, I discovered that the content of certain columns of the resulting dataframe "records" are not simply strings as expected. Instead, each "cell" in such a column is an array, containing one single element -- the string of interest. This makes selecting columns using boolean indexing difficult. For example, records[records['category']=='Python Books'] in Ipython outputs an empty dataframe; had the "cells" contained strings instead of arrays of strings, the output would have been nonempty, containing rows that correspond to python books.
I could modify the JSON document, so that "records" reads the strings in properly. But is there a way to modify "records" directly, to somehow strip the single-element arrays into the elements themselves?
Update: After clarification, I believe this might accomplish what you want while limiting it to a single iteration over the data:
nested_column_1 = records["column_name_1"]
nested_column_2 = records["column_name_2"]
clean_column_1 = []
clean_column_2 = []
for i in range(0, len(records.index):
clean_column_1.append(nested_column_1[i][0])
clean_column_2.append(nested_column_2[i][0])
Then you convert the clean_column lists to Series like you mentioned in your comment. Obviously, you make as many nested_column and clean_column lists as you need, and update them all in the loop.
You could generalize this pretty easily by keeping a record of "problem" columns and using that to create a data structure to manage the nested/clean lists, rather than declaring them explicitly as I did in my example. But I thought this might illustrate the approach more clearly.
Obviously, this assumes that all columns have the same number of elements, which maybe isn't a a valid assertion in your case.
Original Answer:
Sorry if I'm oversimplifying or misunderstanding the problem, but could you just do something like this?
simplified_list = [element[0] for element in my_array_of_arrays]
Or if you don't need the whole thing at once, just a generator instead:
simplifying_generator = (element[0] for element in my_array_of_arrays)
I am using XLRD to attempt to read from and manipulate string text encapsulated within the cells of my excel document. I am posting my code, as well as the text that is returned when I choose to print a certain column.
import xlrd
data = xlrd.open_workbook('data.xls')
sheetname = data.sheet_names()
employees = data.sheet_by_index(0)
print employees.col(2)
>>>[text:u'employee_first', text:u'\u201cRichard\u201d', text:u'\u201cCatesby\u201d', text:u'\u201cBrian\u201d']
My intention is to create a dict or either reference the excel documents using strings in python. I would like to have a number of my functions in my program manipulate the data locally and then output at a later point (not within the scope of this question) to a second excel file.
How do I get rid of this extra information?
If you are only interested in the values of the cells, then you should do:
values = sheet.col_values(colx=2)
instead of:
cells = sheet.col(colx=2)
values = [c.value for c in cells]
because it's more concise and more efficient (Cell objects are constructed on the fly as/when requested).
employees.col(2) is a list of xlrd.sheet.Cell instances. To get all the values from the column (instead of the Cell objects), you can use the col_values method:
values = employees.col_values(2)
You could also do this (my original suggestion):
values = [c.value for c in employees.col(2)]
but that is much less efficient than using col_values.
\u201c and \u201d are unicode left and right double quotes, respectively. If you want to get rid of those, you can use, say, the lstrip and rstrip string methods. E.g. something like this:
values = [c.value.lstrip(u'\u201c').rstrip(u'\u201d') for c in employees.col(2)]