Archive

Posts Tagged ‘OpenOffice.org’

An Experiment in Transitioning to Open Document Formats

June 15th, 2013 2 comments

Recently I read an interesting article by Vint Cerf, mostly known as the man behind the TCP/IP protocol that underpins modern Internet communication, where he brought up a very scary problem with everything going digital. I’ll quote from the article (Cerf sees a problem: Today’s digital data could be gone tomorrow – posted June 4, 2013) to explain:

One of the computer scientists who turned on the Internet in 1983, Vinton Cerf, is concerned that much of the data created since then, and for years still to come, will be lost to time.

Cerf warned that digital things created today — spreadsheets, documents, presentations as well as mountains of scientific data — won’t be readable in the years and centuries ahead.

Cerf illustrated the problem in a simple way. He runs Microsoft Office 2011 on Macintosh, but it cannot read a 1997 PowerPoint file. “It doesn’t know what it is,” he said.

“I’m not blaming Microsoft,” said Cerf, who is Google’s vice president and chief Internet evangelist. “What I’m saying is that backward compatibility is very hard to preserve over very long periods of time.”

The data objects are only meaningful if the application software is available to interpret them, Cerf said. “We won’t lose the disk, but we may lose the ability to understand the disk.”

This is a well known problem for anyone who has used a computer for quite some time. Occasionally you’ll get sent a file that you simply can’t open because the modern application you now run has ‘lost’ the ability to read the format created by the (now) ‘ancient’ application. But beyond this minor inconvenience it also brings up the question of how future generations, specifically historians, will be able to look back on our time and make any sense of it. We’ve benefited greatly in the past by having mediums that allow us a more or less easy interpretation of written text and art. Newspaper clippings, personal diaries, heck even cave drawings are all relatively easy to translate and interpret when compared to unknown, seemingly random, digital content. That isn’t to say it is an impossible task, it is however one that has (perceivably) little market value (relatively speaking at least) and thus would likely be de-emphasized or underfunded.

A Solution?

So what can we do to avoid these long-term problems? Realistically probably nothing. I hate to sound so down about it but at some point all technology will yet again make its next leap forward and likely render our current formats completely obsolete (again) in the process. The only thing we can do today that will likely have a meaningful impact that far into the future is to make use of very well documented and open standards. That means transitioning away from so-called binary formats, like .doc and .xls, and embracing the newer open standards meant to replace them. By doing so we can ensure large scale compliance (today) and work toward a sort of saturation effect wherein the likelihood of a complete ‘loss’ of ability to interpret our current formats decreases. This solution isn’t just a nice pie in the sky pipe dream for hippies either. Many large multinational organizations, governments, scientific and statistical groups and individuals are also all beginning to recognize this same issue and many have begun to take action to counteract it.

Enter OpenDocument/Office Open XML

Back in 2005 the Organization for the Advancement of Structured Information Standards (OASIS) created a technical committee to help develop a completely transparent and open standardized document format the end result of which would be the OpenDocument standard. This standard has gone on to be the default file format in most open source applications (such as LibreOffice, OpenOffice.org, Calligra Suite, etc.) and has seen wide spread adoption by many groups and applications (like Microsoft Office). According to Wikipedia the OpenDocument is supported and promoted by over 600 companies and organizations (including Apple, Adobe, Google, IBM, Intel, Microsoft, Novell, Red Hat, Oracle, Wikimedia Foundation, etc.) and is currently the mandatory standard for all NATO members. It is also the default format (or at least a supported format) by more than 25 different countries and many more regions and cities.

Not to be outdone, and potentially lose their position as the dominant office document format creator, Microsoft introduced a somewhat competing format called Office Open XML in 2006. There is much in common between these two formats, both being based on XML and structured as a collection of files within a ZIP container. However they do differ enough that they are 1) not interoperable and 2) that software written to import/export one format cannot be easily made to support the other. While OOXML too is an open standard there have been some concerns about just how open it actually is. For instance take these (completely biased) comparisons done by the OpenDocument Fellowship: Part I / Part II. Wikipedia (Open Office XML – from June 9, 2013) elaborates in saying:

Starting with Microsoft Office 2007, the Office Open XML file formats have become the default file format of Microsoft Office. However, due to the changes introduced in the Office Open XML standard, Office 2007 is not entirely in compliance with ISO/IEC 29500:2008. Microsoft Office 2010 includes support for the ISO/IEC 29500:2008 compliant version of Office Open XML, but it can only save documents conforming to the transitional schemas of the specification, not the strict schemas.

It is important to note that OpenDocument is not without its own set of issues, however its (continuing) standardization process is far more transparent. In practice I will say that (at least as of the time of writing this article) only Microsoft Office 2007 and 2010 can consistently edit and display OOXML documents without issue, whereas most other applications (like LibreOffice and OpenOffice) have a much better time handling OpenDocument. The flip side of which is while Microsoft Office can open and save to OpenDocument format it constantly lags behind the official standard in feature compliance. Without sounding too conspiratorial this is likely due to Microsoft wishing to show how much ‘better’ its standard is in comparison. That said with the forthcoming 2013 version Microsoft is set to drastically improve its compatibility with OpenDocument so the overall situation should get better with time.

Current day however I think, technologically, both standards are now on more or less equal footing. Initially both standards had issues and were lacking some features however both have since evolved to cover 99% of what’s needed in a document format.

What to do?

As discussed above there are two different, some would argue, competing open standards for the replacement of the old closed formats. Ten years ago I would have said that the choice between the two is simple: Office Open XML all the way. However the landscape of computing has changed drastically in the last decade and will likely continue to diversify in the coming one. Cell phone sales have superseded computers and while Microsoft Windows is still the market leader on PCs, alternative operating systems like Apple’s Mac OS X and Linux have been gaining ground. Then you have the new cloud computing contenders like Google’s Google Docs which let you view and edit documents right within a web browser making the operating system irrelevant. All of this heterogeneity has thrown a curve ball into how standards are established and being completely interoperable is now key – you can’t just be the market leader on PCs and expect everyone else to follow your lead anymore. I don’t want to be limited in where I can use my documents, I want them to work on my PC (running Windows 7), my laptop (running Ubuntu 12.04), my cellphone (running iOS 5) and my tablet (running Android 4.2). It is because of these reasons that for me the conclusion, in an ideal world, is OpenDocument. For others the choice may very well be Office Open XML and that’s fine too – both attempt to solve the same problem and a little market competition may end up being beneficial in the short term.

Is it possible to transition to OpenDocument?

This is the tricky part of the conversation. Lets say you want to jump 100% over to OpenDocument… how do you do so? Converting between the different formats, like the old .doc or even the newer Office Open XML .docx, and OpenDocument’s .odt is far from problem free. For most things the conversion process should be as simple as opening the current format document and re-saving it as OpenDocument – there are even wizards that will automate this process for you on a large number of documents. In my experience however things are almost never quite as simple as that. From what I’ve seen any document that has a bulleted list ends up being converted with far from perfect accuracy. I’ve come close to re-creating the original formatting manually, making heavy use of custom styles in the process, but its still not a fun or straightforward task – perhaps in these situations continuing to use Microsoft formatting, via Office Open XML, is the best solution.

If however you are starting fresh or just converting simple documents with little formatting there is no reason why you couldn’t make the jump to OpenDocument. For me personally I’m going to attempt to convert my existing .doc documents to OpenDocument (if possible) or Office Open XML (where there are formatting issues). By the end I should be using exclusively open formats which is a good thing.

I’ll write a follow up post on my successes or any issues encountered if I think it warrants it. In the meantime I’m curious as to the success others have had with a process like this. If you have any comments or insight into how to make a transition like this go more smoothly I’d love to hear it. Leave a comment below.

This post originally appeared on my personal website here.




I am currently running a variety of distributions, primarily Linux Mint Debian Edition.
Previously I was running KDE 4.3.3 on top of Fedora 11 (for the first experiment) and KDE 4.6.5 on top of Gentoo (for the second experiment).
Check out my profile for more information.

Impending Upgrades

October 7th, 2009 5 comments

Here’s another fun little tidbit – today I tried to use OpenOffice.org Writer seriously for the first time, and realized rather quickly that I was running version 2.1 of same. For those who don’t already know, OpenOffice.org was close on unusable prior to version 3.x. While it has since matured into a very capable suite of programs, the first few versions were just awful. In particular, I couldn’t get the formatting correct on a numbered list with bullet-ted sub-points.

A quick apt-get -t lenny-backports install openoffice.org did the trick, and removed my system-wide dictionary as a bonus. Now both Icedove (Thunderbird) and Pidgin claim that everything that I type is spelled incorrectly. A quick check with Synaptic confirmed that the aspell package had mysteriously disappeared from my system; when I tried to mark it for re-installation, Synaptic refused, claiming that it aspell depended on a package called dictionaries-common, which wasn’t going to be installed for some unspecified reason. Christ.

Figuring that it was a version issue (since the only thing that has changed on my system is my version of OpenOffice.org), I tried apt-get -t lenny-backports install aspell. It worked, and also warned me that my OpenOffice.org upgrade had left about 25 packages lying about that ought to be removed:

bluez-gnome, libmtp7, python-notify, obex-data-server, libgda3-common, python-gnome2-extras, evolution-exchange, rhythmbox, system-config-printer,
libgpod3, gnome-themes-extras, bluez-utils, python-eggtrayicon, openoffice.org-style-andromeda, libxalan2-java, python-4suite-xml, libgda3-3,
transmission-common, libgdl-1-0, libxalan2-java-gcj, serpentine, transmission-gtk, libgdl-1-common, gnome-vfs-obexftp

The strange thing is that some of those packages look like they might be required by software other than OpenOffice.org. You know, like Evolution, or maybe Transmission? What the hell is going on here? I’m upgrading to the Testing repositories as soon as I get the chance. Hopefully that will solve some of my old-ass-software issues.

Day 12, my current software setup

September 12th, 2009 No comments

It has been almost half a month since the experiment has begun and I think everyone is just getting to the point where they can begin to be truly productive on their systems. As such I just wanted to share my current software setup, as is, and the replacements I am using for the proprietary software packages that I  would have otherwise normally used under a Window’s environment.

Operating System

As you may have already known, I have chosen Fedora 11 as my distribution for this experiment. While it was quite a rocky start, Fedora is proving to be a competent operating system and should fit my needs for the duration of the experiment.

Office & Word Processing

Fedora ships with OpenOffice.org 3.1.1 as its office suite. I have used OpenOffice.org in the past and have found it to be a adequate alternative to Microsoft’s Office suite if not without it’s own faults. Perhaps it is just my familiarity with Microsoft’s Office suite but I find OpenOffice.org to have many odd quirks. For example its ability to open but not save to Office Open XML (*.docx, *.pptx, *.xlsx, etc.) is rather frustrating. I think for the most part I am going to be using OpenOffice.org’s preferred format, the OpenDocument Format, but I have read numerous issues with this format as well. I guess time will tell if this is a good choice or not.

Moving forward I think I am going to be looking at alternatives to OpenOffice.org, such as AbiWord or KOffice, just to see if those work better for me.

E-mail Client

As on Windows I am using Thunderbird to manage my e-mail. What’s kind of weird is I can only seem to install the Thunderbird 3 beta version from my repositories. Again you can find my contact information on my page here.

Browser

This one was a really a easy choice for me. I have been using Firefox on Windows for a long time. Fedora allows me to run the most recent version which is 3.5.3 as of this writing. My browsing experience has not changed whatsoever from how it was on Windows.

Instant Messaging

On Windows I had been mostly using Windows Live Messenger. Now that I am on Linux I have tried various IM clients including aMSN, Kopete and Pidgin. Of the bunch I think Kopete has a lot of potential but I am sticking with Pidgin. It just seems to do everything and do it mostly right.

Music/Media Management

As an alternative for iTunes I gave Rhythmbox a go and was very impressed. Next I tried Songbird and while there isn’t much difference between the two players, I like the feel of Songbird better. For videos I am still trying to decide whether I prefer VLC or MPlayer. Like Rhythmbox and Songbird there really isn’t much difference between VLC and MPlayer.

Image Manipulation

I have never been a big Photoshop person so my needs in this category were pretty easy to meet. That being said I have settled on using both the GIMP and KolourPaint to fill in any gaps.

Development

In the past I have been primarily a Windows developer using tools such as Visual Studio to get my jobs done. I would be very interested in seeing how Mono development works on Linux but in the meantime I will be using Eclipse’s Java and C/C++ tools as my primary Linux development platform.

Torrents

Because there is no µTorrent support for Linux, except under Wine, I have decided to use the native client KTorrent for all of my torrenting needs! I find it to be very similar to what I’m used to on Windows so again this is a easy solution for me.

That’s It For Now

I’ll let you know if I find any better alternatives moving forward.