In xlswriter, once a format is defined, how can you apply it to a range and not to the whole column or the whole row?
for example:
perc_fmt = workbook.add_format({'num_format': '0.00%','align': 'center'})
worksheet.set_column('B:B', 10.00, perc_fmt)
this gets applied it to the whole "B" column, but how can this "perc_fmt" applied to a range, for example, if I do:
range2 = "B2:C15"
worksheet2.write(range2, perc_fmt)
it says:
TypeError: Unsupported type <class 'xlsxwriter.format.Format'> in write()
Actually I found a workaround that avoids doing the loop.
You just need to use the conditional formatting (that takes a range as an input) and just format all cases. For example:
worksheet2.conditional_format(color_range2, {'type': 'cell',
'criteria': '>=',
'value': 0, 'format': perc_fmt})
worksheet2.conditional_format(color_range2, {'type': 'cell',
'criteria': '<',
'value': 0, 'format': perc_fmt})
In xlswriter, once a format is defined, how can you apply it to a range and not to the whole column or the whole row?
There isn't a helper function to do this. You will need to loop over the range and apply the data and formatting to each cell.
It took me quite a time to find this answer, so thank you. I would offer up the additional solution with just one block when your range includes blanks and text fields. This turns the range A2:N5 to my format of peach coloring I defined earlier in my code. Of course you have to make the number more negative if you actually have large negative numbers in your dataset.
worksheet.conditional_format('A2:N5', {'type': 'cell',
'criteria' : '>',
'value' : -99999999999,
'format' : peach_format})
I originally tried 'criteria' : '<>', 'value' : 0 but that did not catch the blank cells. If you use the < 99999999999 it would code the text fields as False and would not code them.
you can use:
{
'type': 'cell',
'criteria': 'between',
'minimum': -10000,
'maximum': 10000,
'format': perc_fmt
}
This will save you one line of code
Related
I'd like to set values on a slice of a DataFrame using .loc using pandas str extract method .str.extract() however, it's not working due to indexing errors. This code works perfectly if I swap extract with contains.
Here is a sample frame:
import pandas as pd
df = pd.DataFrame(
{
'name': [
'JUNK-0003426', 'TEST-0003435', 'JUNK-0003432', 'TEST-0003433', 'TEST-0003436',
],
'value': [
'Junk', 'None', 'Junk', 'None', 'None',
]
}
)
Here is my code:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d+)")
How can I set the None values to the extracted regex string
Hmm the problem seems to be that .str.extract returns a pd.DataFrame, you can .squeeze it to turn it into a series and it seems to work fine:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d+)").squeeze()
indexing alignment takes care of the rest.
Instead of trying to get the group, you can replace the rest with the empty string:
df.loc[df['value']=='None', 'value'] = df.loc[df['value']=='None', 'name'].str.replace('TEST-\d{3}', '')
Was this answer helpful to your problem?
Here is a way to do it:
df.loc[df["name"].str.startswith("TEST"), "value"] = df["name"].str.extract(r"TEST-\d{3}(\d+)").loc[:,0]
Output:
name value
0 JUNK-0003426 Junk
1 TEST-0003435 3435
2 JUNK-0003432 Junk
3 TEST-0003433 3433
4 TEST-0003436 3436
Background: Apparently Google doesn't have a straight answer to a very basic question, so here goes...
I have a pandas df with a Open Date column [Dtype = object] which (when previewing df) is formatted yyyy-mm-dd, which is the format I want, great! Not so great however, when I write df to a .csv which then defaults the formatting to m/dd/yyyy.
Issue: I have tried just about everything for the .csv to output yyyy-dd-mm to no avail.
What I've tried:
I have tried specifying a date format when writing the .csv
df.to_csv(filename, date_format="%Y%d%d")
I have tried changing the format of the column in question, prior to writing to a .csv
df['Open Date'] = pd.to_datetime(df['Open Date'])
I have also tried converting the column to a string, to try and force the correct output
df['Open Date'] = df['timestamp'].apply(lambda v: str(v))
Despite these attempts, I still get a m/dd/yyyy output.
Help: where am I embarrasingly going wrong here?
Your question contained various breaking typos which seems to suggest what may be causing the problem in general.
There's a few issues with what you are saying. Consider:
from pandas import DataFrame
from datetime import datetime
# just some example data, including some datetime and string data
data = [
{'Open date': datetime(2022, 3, 22, 0, 0), 'value': '1'},
{'Open date': datetime(2022, 3, 22, 0, 1), 'value': '2'},
{'Open date': datetime(2022, 3, 22, 0, 2), 'value': '3'}
]
df = DataFrame(data)
# note how the 'Open date' columns is actually a `datetime64[ns]`
# the 'value' string however is what you're saying you're getting, `object`
print(df['Open date'].dtype, df['value'].dtype)
# saving with a silly format, to show it works:
df.to_csv('test.csv', date_format='%Y.%m.%d')
The resulting file:
,Open date,value
0,2022.03.22,1
1,2022.03.22,2
2,2022.03.22,3
I picked a silly format because the default format for me is actually %Y-%m-%d .
The most likely issue is that your 'date' column is actually a string column, but the tools you are using to 'preview' your data are interpreting these strings as dates and actually showing them in some other format.
However, with the limited information you provided, it's guesswork. If you provide some example data that demonstrates the issue, it would be easier to say for sure.
I have a dictionary like this:
film = {
'ID': [],
'Name': [],
'Run Time': [],
'Genre': [],
'link': [],
'name 2': []
}
Then I populate it in a for loop, like this:
film['ID'].append(film_id)
film['Name'].append(film_name)
film['Run Time'].append(film_runtime)
film['Genre'].append(film_genre)
film['link'].append(film_link)
film['name 2'].append(film_name2)
Then I convert the dictionary to a Pandas DataFrame, so that I can write it to an .xlsx file. Now before I actually write it, I print it to check the values of Run Time column. And everything is OK:
output_df = pd.DataFrame(film).set_index('ID')
print(output_df['Run Time'])
output:
ID
102 131
103 60
104
105
Name: Run Time, dtype: object
But then, when I write it, like this:
writer = ExcelWriter('output.xlsx')
output_df.to_excel(writer, 'فیلم')
writer.save()
The file looks like this:
As you can see, there's an extra ' (single quote) character in the file. This character is not visible. But I can highlight it:
And if I remove it, the number goes RTL:
So I thought the invisible character was LTR MARK (\u200E). I removed it like this:
film['Run Time'].append(film_runtime.replace('\u200E', ''))
But nothing happened, and the character is still there.
How can I fix this?
You need to make sure that cells that need to be numbers are converted to numbers (typically ints) before converting to an .xlsx file.
In your case just:
film['Run Time'].append(int(film_runtime))
The ' before a value in Excel forces the value to string. Looks like the Excel Writer is interpreting such list as an string array.
Changing the type in the DataFrame should solve it.
I'm trying to create charts with xlsxwriter python module.
It works fine, but I would like to not have to hard code the row amount
This example will chart 30 rows.
chart.add_series({
'name': 'SNR of old AP',
'values': '=Depart!$D$2:$D$30',
'marker': {'type': 'circle'},
'data_labels': {'value': True,'num_format':'#,##0'},
})
For values': I would like the row count to be dynamic. How do I do this?
Thanks.
It works fine, but I would like to not have to hard code the row amount
XlsxWriter supports a list syntax in add_series() for this exact case. So your example could be written as:
chart.add_series({
'name': 'SNR of old AP',
'values': ['Depart', 1, 3, 29, 3],
'marker': {'type': 'circle'},
'data_labels': {'value': True, 'num_format':'#,##0'},
})
And then you can set any of the first_row, first_col, last_row, last_col parameters programmatically.
See the docs for add_series().
Hi I have a working script to generate a line chart using Xlsxwriter, however I am looking for a way to concatenate an earlier hit count with the cell range for my generated chart as the script is used to iterate over several similar files in the directory so the overall 'hit count' varies for each file.
The script first looks through a text file for a string and collects some stats using line spitting drops the collected figures into Excel and and generates a hit count each time the particular string is found (total)
Then charts are generated using thee collected stats..
Here's my chart generating section...
chart1 = workbook.add_chart({'type': 'line'})
chart1.add_series({
'name': 'My Chart',
'categories': '=Sheet1!$A$2:$A$2200',
'values': '=Sheet1!$B$2:$B$2200',
'line': {'color': 'purple'},
})
I am hoping to generate the chart by referencing the 'total' count in the row count. So I am looking for something along the lines of
'categories': '=Sheet1!$A$2:$A$'+total,
'values': '=Sheet1!$B$2:$B$'+total,
I hope this makes sense? Basically I am looking to have a varying cell row range dependent on the count of hits, is this possible? Or alternatively is there a 'last row' reference in xlsxwriter for this type of circumstance?
Thanks,
MikG
The chart.add_series() method also accepts a list of values so you can do something like this:
chart1.add_series({
'name': 'My Chart',
'categories': ['Sheet1', 1, 0, total -1, 0],
'values': ['Sheet1', 1, 1, total -1, 1],
'line': {'color': 'purple'},
})