Skip to content

Category: Open source

openpyxl: my python xlsx library

Update: openpyxl 1.0 is now out !

At a customer, we read a lot of Excel files. We’ve tried the conventional approaches, that are xlrd and xlwt, pyinex, and COM automation.

That’s COM that we mainly use, because it’s able to deal with every Excel file format, from the ancient Excel 5 to most recent Excel 2007 Office Open XML format.
However, we experience from time to time stability issues (Excel is a complex beast, sometimes you don’t fully understand why it is angry).

We then looked for a native reader for .xlsx format, to get rid of the Excel part of the equation, but unfortunately, there are only two small read-only libraries for now:

Finally, I thought that I was the only guy who needed a native .xslx writer, and decided to stick with COM for now.
I wouldn’t be doing this project now without a tweet from Tarek Ziadé, who was also looking for such a library. That meant that we were at least two in need for the same thing, so I simply decided to write it.

Trust me, the Office Open XML format is open, but it’s also a bit twisted, so I spent a few days gathering documentation, and I finally landed on the PHPExcel library, that was already doing what I needed, but in PHP.

So now, I’m busy porting the PHPExcel library under Python, which is really easy, because of the similarities between both languages, but I can also benefit from all the nice things that come with Python, so the code is much simpler.

You can follow my progress on bitbucket:



Note: this post is part of a planned series on what values I wish to promote through my daily work

What do I mean by transparency

Based on my quite short but wide experience in the IT business, I must say that sometimes (not to say often) people do business like they play poker. The purpose of poker, like most games, is winning against your opponents. The provider tries to win against the customer, the customer tries to win against his own employees, etc…

To make successful business, we should open our eyes and realize that we don’t play against each other, we play in the same team.

Obviously, it only works when everyone keeps playing fair with others, and follow the game’s rules. Let me states some of them:

  • accept that a bad work shouldn’t be awarded
  • accept that getting more implies giving more
  • accept that others can also be right
  • accept that money cannot buy everything, nor that you are allowed to sell anything for money

That’s mainly humanism applied to business: remain fair with the others and no one will try to fool you. If you send the signal that you don’t respect people, don’t be shocked when people lack respect for you.

Some solutions for a better IT world

Here are a few ideas I try to spread around me on what we could to at any level to improve the current IT ecosystem:

Open formats

Most of my customers don’t see the issue when they build their whole business around proprietary formats.

Actually, they don’t see the issue while they do it, but after a few years, when the format has become extremely deeply rooted in the company’s flows, then they start to see that they have shot themselves in the foot.

Proprietary formats don’t play well with others, so it’s sometimes difficult, not to say mind-boggling to read them with an application that was not patented to do so. Business processes cannot be then automated (or only partly), what leads to more manual operations, more points of failure, more sources of errors, insufficient testing, and finally chaos (to stay polite).

And that’s not yet-another-open-format-geek’s rant, you don’t have to blindly believe in what I say, but just ask around you how many times a closed format was one source of major development delay, or was preventing/hindering automation, you might find the results interesting.

No vendor lock

Although it’s tempting to secure your customer base by preventing them (sometimes contractually) to evaluate other offers, made by potential concurrents. We’ve all seen all the great “benefits” that came with monopolistic situations:

  • low customer support: you don’t need to sweat hard for you customers, it’s not like they could fly away
  • loss of competitive advantage: we don’t need new features, the old ones are still good enough for them
  • overpriced updates: “We know we have a bug on version 1. It won’t be fixed in this version, but we have version 2 that doesn’t have the bug. Of course it will cost you . Shall I send you an upgrade form now ?”

I know I am doing my job really well, my customers know it too, so they are willing to pay for my services.

If one day they find a guy who is better than me, then I think it’s fair he gets the opportunity to show his skills, but I also have the opportunity to improve myself to remain competitive.

The choice is on the customer side, not in mine.

Open source

This one should be obvious nowadays. We are all using open source code at some point in every project. Some people admit it, others are afraid to.

Come on it’s not something a smart developer should be ashamed of. Not working extra hard to reinvent the wheel will not get you fired. You don’t expect your surgeon or physician reinventing medicine for each patient, you just expect he understands enough of the key principles, has enough experience, and knows how to insert what he learned in your specific case.

The same goes for us: we are not the smartest guys on Earth, we cannot invent a new way to build an e-commerce website at each customer that asks one. So many other people already wrote one, failed, learned from that, failed again, etc. I prefer relying on those guys who devoted substantial amounts of time building the most secure e-commerce website known to man, and give them proper credit, while earning my money on what I am the best at: advising the customer, writing the tiny part of the application that is completely customer specific, help him link his website with his existing applications, etc.

And if I can help those guys a little bit by releasing, for example, a bug fix in their application, I see no good reason not to do so, they deserve it. Everyone is enjoying it because no one gets screwed.

“Open schedule”

This is an especially sensitive topic. It’s not specific to IT though, but to any subcontractor activity. I often wondered why some customers required me to work at their premises while the job could have been done elsewhere, like at my office. That’s because some customers fears that you don’t play well and bill them more than necessary. By keeping you under their physical, visual, constant control, they have the illusion that you are not stealing them.

I think this idea that you could be over billing comes from some bad players in the field, either disguised amateurs with no ethics, either crooks with no other intent than making money on customer’s back. Either way, they left a bitter taste to the customer that is now punishing all new contractors, and indirectly themselves, for those guys. The same goes for plumbers, locksmith, painters, … because of some bad guys, one can completely lose confidence in the profession, while 90% of them are honest and hard workers.

Here comes the RERO principle: “release early, release often”, that is fundamental in the Scrum method. If you are able to deliver working software on a regular basis, then you are not stealing the customer’s money. The opposite is not true however, keeping the contractors “in-house” does not ensure that the product will be faultless, only that you will be able to watch the developer’s back during the whole duration of the project (you wish he has a sexy back then).


Perhaps the most overlooked idea while the easiest to practice. There are situations where you can choose playing it open, or prefer hiding the issue and cross your fingers for the best outcome. This includes:

  • as a customer, not having enough cash to pay for all the requirements that were made
  • as a developer, not being comfortable with a new technology, or not having heard of it at all
  • as an employee, you find that working six days per week, ten hours per day is not a sustainable pace
  • as a sales rep, knowing your coders won’t deliver the product in time

That’s why people talk.

If you cannot pay for the full product, maybe we can remove some features and fall back in your budget.

If you don’t know a technology, maybe you can get a training with someone who does, and either share the training fees between you and the customer, either take it for you and assume lifting less profit this month.

If you are working in at death march pace, at one point, something will fail, either in your body or in your life. Reducing your workload will allow you to be more focused, less tired, thus more productive.

If you know you won’t deliver on time, again maybe we can remove some features, but you let the customer select which ones are important to him.

Failing to communicate is digging the project’s grave.

Don’t let your pride, ego, fear or whatever talk for you. Just make one step in the other’s direction and you might be surprised by the outcome. Even if it does not turn the way you expected, at least you remain professional, because ignoring possible risks is not something you should allow yourself to do.

Thank you for reading so far, I hope you found some points of interest, or already shared my views on the subject. As for everything, I don’t pretend holding the truth, that’s just my own observations, mixed with many things I read during the past months.

To get a bit further, I recommend:

Again, if you have comments about this post, ideas you would like to defend, opposite experience, feel free to express yourself in the comments below.

Comments closed

Open source packaging goofs

I am a true Open Source enthusiast, and always advise my customers to use open software whenever alternative to proprietary software exists. It’s link to the way many (Microsoft for instance) software company try to lock customers down by using closed formats, “forgetting” to put an “export all data” option and providing unstable (thus unusable) interoperability.

But I must admit it’s not possible to advise people to use software that I (computer geek type) can’t even install or try. If I’m not able to get the software working, my customer (usually not computer savvy) probably won’t be either. Recently, I gave some Python packages a try: virtualenv, pylons, and trac.

Test bench

I installed virtualenv on a FreeBSD 8.0 box, it installed cleanly, I created my environment, but was unable to use it because the command in the manual (“$ source”) is broken and does not give a clean error message, so I was unable to debug the issue by myself. I assume the package is tried and tested on Linux boxes and the source command there works differently than mine.

I also installed Pylons three times, on my Windows dev box, because it looks like an excellent package. Out of three attempts, I managed to make the tutorial work only once. And as soon I modified the tutorial “Hello, World” into something else, I got strange and non-related errors every time I reloaded the page. Too bad, I will use Bottle for now.

Finally, yesterday I tried to install the dev branch of the Trac bug tracker. The instructions say you have to checkout and install the latest version of Genshi before checking out the latest of Trac itself from their SVN server. That’s what I did, and after ten minutes, I got an error message saying my version of Genshi was not high enough to use Trac. Probably someone set version requirements on his computer, tested it could deploy, but forgot to push the modification in the Genshi tree and I was … stuck. Finally I was able to download and install the last stable version, but not able to test the new 0.12 features.


Of course, I know those goofs are just exceptions in the long life of those three excellent packages, I probably came at the wrong time, and those bugs have probably been solved by now, but it shows the real need of automated integration testing inherent to every software development. And that could be a strategic advantage of the Open Source projects compared to the closed source ones, as their reactivity can make them implement those tools faster than it could be in a big corporation, with procedures, forms to fill and so on. Developers should pay extra attention making sure their software installs smoothly on every supported platform (virtualization is wide spread nowadays), so users are not scared away and then completely miss all the shiny features of their package.

Remember that 10 years ago, linux was labeled as “computer guru stuff”, I welcomed you by asking you the size of your hard drive and the number of cylinders. Now look at what Canonical did with Ubuntu: clean, mouse-driven, colorful installer. And then people were suddenly able to install linux, and realized that “damn, it’s even better than my old Windows box, and I don’t have to pay for it anymore !”. So please Open Source developers, fix your installers 😀

Update: I agree that instead of complaining, I could have filed a bug report in each project to report the issue. Although the statement is valid, it misses the point of “ready to use” software.

Comments closed