Eric Gazoni's Blog

Daily thoughts for computer scientists

Category: Python

openpyxl: my python xlsx library

Update: openpyxl 1.0 is now out !

At a customer, we read a lot of Excel files. We’ve tried the conventional approaches, that are xlrd and xlwt, pyinex, and COM automation.

That’s COM that we mainly use, because it’s able to deal with every Excel file format, from the ancient Excel 5 to most recent Excel 2007 Office Open XML format.
However, we experience from time to time stability issues (Excel is a complex beast, sometimes you don’t fully understand why it is angry).

We then looked for a native reader for .xlsx format, to get rid of the Excel part of the equation, but unfortunately, there are only two small read-only libraries for now:

Finally, I thought that I was the only guy who needed a native .xslx writer, and decided to stick with COM for now.
I wouldn’t be doing this project now without a tweet from Tarek Ziadé, who was also looking for such a library. That meant that we were at least two in need for the same thing, so I simply decided to write it.

Trust me, the Office Open XML format is open, but it’s also a bit twisted, so I spent a few days gathering documentation, and I finally landed on the PHPExcel library, that was already doing what I needed, but in PHP.

So now, I’m busy porting the PHPExcel library under Python, which is really easy, because of the similarities between both languages, but I can also benefit from all the nice things that come with Python, so the code is much simpler.

You can follow my progress on bitbucket:

My Python environment

The early days

When I first heard about Python, it was just after the 2.5 release. I heard that one of my customer was using it but I had never seen a line of Python yet. At some point in a project, I needed a Bash script equivalent on Windows, and decided to give Python a try, instead of using Windows BAT files.

I installed it on my workstation, and started reading the (excellent) documentation about the tasks I needed to do.

I’ve used IDLE at first, because, well, it’s shipped with the Windows Python distribution. It has been a very unpleasant experience I must say (although I have learned to appreciate some features IDLE has that are missing in other editors). TKinter is a summary toolkit, the look and feel makes it look like it was written in the 80’s (in fact, it probably was). The main concept of editor/runner mix-in felt also a bit weird at first. I finally returned to my all-time-favorite editor, Notepad++, and ran my scripts from the command line.

Eclipse and Pydev

Later, I (luckily) landed on a new project, and it included a lot of Python. The team in place was using an editor they were not yet familiar with, but hopefully, I already knew quite well: Eclipse. I must say the Pydev extension for Eclipse is one of the greatest blessing you can get when working with Python. It features a lot of interesting features from IDLE:

  • syntax highlighting
  • code completion
  • real-time code inspection (very useful when using a dynamic language)
  • PyLint integration (once you’ve tasted it, you can never work without anymore)
  • smart indentations (you almost don’t have to worry about your indents)
  • unittest integration (although I’ve stopped using it)

Eclipse is an excellent product on its own, I already used it for PHP development for several years and I was happy it was also my customer choice for Python.

However, there was still some occasions Eclipse was not the right tool to use:

  • the integrated console implements the basic shell only
  • executing scripts outside the project path is hard, as well as changing the current directory
  • for 10 line scripts, creating a project is a bit overkill

The revelation: IPython

After a couple of months struggling with Eclipse and Notepad++ & python.exe, I discovered IPython and at last found a way to work on small scripts without the overhead of Eclipse, but with all its interesting features:

  • serves as well as a shell replacement as Python interpreter
  • excellent autocompletion (both for paths and Python code)
  • “magic” functions such as bookmarks, list of currently defined variables (“whos” command)
  • PDB (Python Debugger) integration with IPDB, providing code completion and history to PDB
  • post-mortem debugger (“debug” command after a traceback)
  • quick access to docstrings and source code of almost every library

IPython is packaged inside the Python(x,y) distribution with the Console application, which is a kind of command line emulator for Windows. Once configured with a readable setup, it’s probably the best development environment you can find of on Windows.

Read the rest of this entry »