Make the data from the second column stay at the second column - python

I'm making a form using reportlab and its in two columns. The second columns is just a copy of the first column.
I used Frame() function to create two columns and I used a Spacer() function to separate the original form from the copied form into two columns.
My expected result is to make the data from the second column stay in place. But the result that I'm getting is when the data from the first columns gets shorter the second columns starts shifting up and moves to the first column.

If I get your question correct, the problem is that you use a spacer to control the contents' visual placement in two columns/frames. By this, you see it as a single long column split in two, meanwhile you need to see it as two separate columns (two separate frames).
Therefore you will get greater control if you end the first frame (with FrameBreak() before start filling the other and only use the spacer to control any visual design within the same frame.
Tools you need to be aware of are:
FrameBreak(), if you search for it you will find many code examples.
e.g. you fill frame 1 with with 10 lines of text, then you insert a FramBreak() and instruct the script to start filling the second column.
Another tool you should be aware of is the settings used e.g for BaseDocTemplate:
allowSplitting: If set to 1, flowables (eg, paragraphs) may be split across frames or pages. If 0, you force content into the same frame. (default: 1, disabled with 0).

Related

Looping by certain columns and classification by keyword?

So I've been working on data classification as part of a research project but since there are thousands of different values, I thought it best to use python to simplify the process rather than going through each record and classifying it manually.
So basically, I have a dataframe wherein one column is entitled "description" and another is entitled "codes". Each row in the "description" column contains a survey response about activities. The descriptions are all different but might contain some keywords. I have a list of some 40 codes to classify each row based on the text. I was thinking of manually creating some columns in the csv file and in each column, typing a keyword corresponding to each of the codes. Then, a loop (or function with a loop) is applied to the dataframe that goes through each row and if a specific substring is found that corresponds to any of the keywords, and then updated the "codes" column with the code corresponding to that keyword.
My Dilemma
For example:
Suppose the list of codes is "Dance", "Nap", "Run", and "Fight" that are in a separate dataframe column. This dataframe also with the manually entered keyword columns is shown below (can be more than two but I just used two for illustration purposes).
This dataframe is named "classes".
category
Keyword1
Keyword2
Dance
dance
danc
Nap
sleep
slept
Run
run
quick
Fight
kick
unch
The other dataframe is as follows with the "codes" column initially blank.
This dataframe is named "data".
description
codes
Iwasdancingthen
She Slept
He was stealing
The function or loop will search through the "description" column above and check if the keywords are in a given row. If they are, the corresponding codes are applied (as shown in the resulting dataframe below in bold). If not, the row in the "codes" column is left blank. The loop should run as many times as there are Keyword columns; the loop will run twice in this case since there are two keyword columns.
description
codes
Iwasdancingthen
Dance
She Slept
Sleep
He landed a kick
Fight
We are family
FYI: The keywords don't actually have to be complete words. I'd like to use partial words too as you see above.
Also, it should be noted that the loop or function I want to make should account for case sensitivity and strings that are combined.
I hope you understand what I'm trying to do.
What I tried:
At first, I tried using a dictionary and manipulate it somehow. I used the advice here:
search keywords in dataframe cell
However, this didn't work too well as I had many "Nan" values pop up and it became too complicated, so I tried a different route using lists. The code I used was based off another user's advice:
How to conditionally update DataFrame column in Pandas
Here's what I did:
# Create lists from the classes dataframe
Keyword1list = classes["Keyword1"].values.tolist()
Category = classes["category"].values.tolist()
I then used the following loop for classification
for i in range(len(Keyword1list)):
data.loc[data["description"] == Keyword1list[i] , "codes"] = Category[i]
However, the resulting output still gives me "Nan" for all columns. Also, I don't know how to loop over every single keyword column (in this case, loop over the two columns "Keyword1" and "Keyword2").
I'd really appreciate it if anyone could help me with a function or loop that works. Thanks in advance!
Edit: It was pointed out to me that some descriptions might contain multiple keywords. I forgot to mention that the codes in the "classes" dataframe are ordered by rank so that the ones that appear first on the dataframe should take priority; for example, if both "dance" and "nap" are in a description, the code listed higher in the "classes" dataframe (i.e. dance) should be selected and inputted into the "codes" column. I hope there's a way to do that.

Find a way to use Python iloc method to see the bigger values in each row

I'm using iloc method to see each row on a dataframe, but it's beeing exausting.
The problem is that I can't create many variables to colect the bigger value for each line, as I did before (the other DF I'm working now has more than 20 lines, and I'd like to see a method to find the greater value without using many variables):
alex=df2.iloc[0,5:16]
mv=df2.iloc[1,5:16]
mv2=df2.iloc[2,5:16]

Cannot sort dataframe – the column label is not unique

I merge three dataframes with the first line and try to sort them with the second. This used to work fine, but now I get this error (our company may have updated the python version during this time):
ValueError: The column label 'Areanr' is not unique.
For a multi-index, the label must be a tuple with elements corresponding to each level.
The code looks like this
pref_info4 = pref_info1.append(pref_info2).append(pref_info3)
pref_info4 = pref_info4.sort_values(['Areanr','nr'])
The second line gives the error. When inspecting 'pref_info4' after the first line is done there is only one column with the label 'Areanr'. Is there some hidden labels that I need to remove? Otherwise it should be unique right? Each of the original dataframes has columns Areanr and nr, but this worked fine (and I cannot see any bad merging issue when inspecting pref_info4...)
You can try to assign a running id to each columns so that they are unique and then sort them this running total

Swapping dataframe column data without changing the index for the table

While compiling a pandas table to plot certain activity on a tool I have encountered a rare error in the data that creates an extra 2 columns for certain entries. This means that one of my computed column data goes into the table 2 cells further on that the other and kills the plot.
I was hoping to find a way to pull the contents of a single cell in a row and swap it into the other cell beside it, which contains irrelevant information in the error case, but which is used for the plot of all the other pd data.
I've tried a couple of different ways to swap the data around but keep hitting errors.
My attempts to fix it include:
for rows in df['server']:
if '%USERID' in line:
df['server'] = df[7] # both versions of this and below
df['server'].replace(df['server'],df[7])
else:
pass
if '%USERID' in df['server']: # Attempt to fix missing server name
df['server'] = df[7];
else:
pass
if '%USERID' in df['server']:
return row['7'], row['server']
else:
pass
I'd like the data from column '7' to be replicated in 'server', only in the case of the error - where the data in the cell contains a string starting with '%USERID'
Turns out I was over-thinking this one. I took a step back, worked the code a bit and solved it.
Rather than trying to smash a one-size fits all bit of code for the all data I built separate lists for the general data and 2 exception I found, by writing a nested loop and created 3 data frames. These were easy enough to then manipulate individually, and finally concatenate together. All working fine now.

Python - read excel data when format changes every time

I get an excel from someone and I need to read the data every month. The format is not stable each time, and by saying "not stable" I mean:
Where the data starts changes: e.g. Section A may start on row 4, column D this time, but next time it may start at row 2, column E.
Under each section there are tags. The number of tags may change as well. But every time I only need the data in tag_2 and tag_3 (these two will always show up)
The only data that I need is from tag_2, tag_3, for each month (month1 - month8). And I want to find a way using Python first locate the section name, then find tag_2, tag_3 under that section, then get the data for month1 to month8 (number of months may change as well).
Please note that I do NOT want to locate the data that I need by specifying locations in excel since the locations change every time. How to I do this?
The end product should be a pandas dataframe that has monthly data for tag_2, tag_3, with a column that says which section the data come from.
Thanks.
I think you can directly read it as a comma separated text file. Based on what you need you can look at the tag2 ant tag3 for each line.
with open(filename, "r") as fs:
for line in fs:
cell_list = line.split(",")
# This point you will have all elements on the line as a list
# you can check for the size and implement your logic
Assuming that the (presumably manually pasted) block of information is unlikely to end up in the very bottom-right corner of the excel sheet, you could simply iterate over rows and columns (set maximum values for each to prevent long searching times) until you find a familiar value (such as "Section A") and go from there.
Unless I misunderstood you, the rest of the format should consistent between the months so you can simply assume that "month_1" is always one cell up and two to the right of that initial spot.
I have not personally worked with excel sheets in python, so I cannot state whether the following is possible in python, but it definitely works in ExcelVBA:
You could just as well use the Range.find() method to find the value "Section A" and continue with the same process as above, perhaps writing any results to a txt file and calling your python script from there if neccessary.
I hope this helps a little.

Categories

Resources