Update: openpyxl 1.0 is now out !
At a customer, we read a lot of Excel files. We’ve tried the conventional approaches, that are xlrd and xlwt, pyinex, and COM automation.
That’s COM that we mainly use, because it’s able to deal with every Excel file format, from the ancient Excel 5 to most recent Excel 2007 Office Open XML format.
However, we experience from time to time stability issues (Excel is a complex beast, sometimes you don’t fully understand why it is angry).
We then looked for a native reader for .xlsx format, to get rid of the Excel part of the equation, but unfortunately, there are only two small read-only libraries for now:
- Python-xlsx: http://github.com/staale/python-xlsx
- PyXLSX: http://github.com/leegao/pyXLSX
Finally, I thought that I was the only guy who needed a native .xslx writer, and decided to stick with COM for now.
I wouldn’t be doing this project now without a tweet from Tarek Ziadé, who was also looking for such a library. That meant that we were at least two in need for the same thing, so I simply decided to write it.
Trust me, the Office Open XML format is open, but it’s also a bit twisted, so I spent a few days gathering documentation, and I finally landed on the PHPExcel library, that was already doing what I needed, but in PHP.
So now, I’m busy porting the PHPExcel library under Python, which is really easy, because of the similarities between both languages, but I can also benefit from all the nice things that come with Python, so the code is much simpler.
You can follow my progress on bitbucket: http://bitbucket.org/ericgazoni/openpyxl/
14 responses to “openpyxl: my python xlsx library”
I hope that you continue to work on this project–you’re definitely not the only one who needs to read and write xslx files. Myself, I’m doing ad-hoc parsing of the XML, which is not a solution that generalizes well. Good luck with openpyxl.
Thanks a lot for your support ! Actually I am still working on the project, but I had two really busy weeks at work and have not been able to progress much … I hope I will have a bit more time in the upcoming weeks so I can at least finish the reader module. I’ll post updates on the project when there will be visible improvements. If you have some spare time, I’m accepting any kind of help on the project too 😉
thanks so much for your work!
I’ll certainly be interested in seeing how this progresses. I’ll warn you in advance about the biggest issues with PHPExcel – memory and speed – so you can perhaps make allowance for this when building from the ground up, because it’s a lot harder to resolve when you’re as established as PHPExcel without breaking backward compatibility.
Drop me an e-mail if you’re interested in some cross-pollenation of ideas (you can contact me via the PHPExcel web site).
I could sure use this functionality. I only need to read a spreadsheet, not to write one, so the reader would do me just fine.
Just to make things fun though, I definitely need to be able to read an encrypted xlsx file.
Is the library at this point yet?
@Mark: thanks for your support 🙂 For now speed is not the biggest issue, but memory is, although I’ve tweaked it. I’ll definitely re-contact you soon 🙂
@Ken: No it’s not yet able to read encrypted XLSX files, though password support primitives are already there. I think it could be possible to implement in no time if needed.
I have a problem with a simple xls file with an image in it.
I’ve made it with oocalc but saved like a 2007Excel file
this is my error output
Traceback (most recent call last):
File “”, line 1, in
File “openpyxl/reader/excel.py”, line 51, in load_workbook
archive = ZipFile(filename, ‘r’, ZIP_DEFLATED)
File “/usr/lib/python2.6/zipfile.py”, line 696, in __init__
File “/usr/lib/python2.6/zipfile.py”, line 716, in _GetContents
File “/usr/lib/python2.6/zipfile.py”, line 728, in _RealGetContents
raise BadZipfile, “File is not a zip file”
zipfile.BadZipfile: File is not a zip file
thanks for any help!
I juste find your library and i just want to thank you for your job !
I would have find it before to save me a lot of time working with the win32com library…
thanks again !
[…] journal information, abstracts etc), exports it into an excel spreadsheet (using the awesome library openpyxl) that is emailed to my desired email address. A cron job on one of my servers ensures the report […]
Currently, there is an issue where this great package cannot open a slightly damaged excel file, although excel itself can. Is it possible to implement the repair mechanism?
Actually, Excel’s “repair” method can be summarized as “let’s see what happens if I forget about this sheet ?”. It does not really repair anything, but blindly ignores stuff and then pretends “tadaaa, here I fixed it”. There are two big parts where a workbook can be damaged: meta-data and worksheets data. It’s difficult to implement a repair system for the data, because the behavior could make a buggy workbook be read inconsistently, without the user noticing.
That’s more harm than good, the real option is just to fix the workbook, because if the library allows inconsistent readings, then people will stop trusting it. And for the meta-data, it’s almost impossible to repair it as it could make the library to skip sheets completely for example.
However, I’m aware that some implementations are not really robust, those were design decisions I made at the very beginning of the project, and I’m in the process of fixing them all as often as I can. I hope in a few weeks openpyxl can read at least the standard OOXML format and the MS Excel dialect, but I don’t think it will allow any day malformed workbooks to be read.
How does this project relate to xlrd/xlwt? I assume its completely unrelated since you have worked off the PHPExcel library. Do you think that these two projects (openpyxl and xlrd) might merge at some point to give a single, comprehensive Python solution?
well, as you may already have read it, xlrd/xlwt are intended to work with Excel BIFF format, used until Excel 2007. Openpyxl on its side, works on OOXML format.
I don’t think that merging would be a good solution, because though there are many common concepts, the implementation details would lead to a huge spaghetti style mess 🙂
As well, xlrd/xlwt are fully working, proven libraries, while openpyxl has still a few features missing, so people might get upset not to be able to use some features of the same library, based on the format they want to use.
If you’re interested in having a single python library, I can think of one which bundles all of them, Kenneth Reitz’s tablib. Might be of some help 😉
My input xlsx file contains excel formulas. Openpyxl fails to save them in the output file xlsx file. Instead it saves the calculated output value. How could I fix this problem?