Database Conversion Tools
From time to time I need to convert tables from one database management system (DBMS) to another. Mainly I convert to a format usable with PostgreSQL, but these tools should be adaptable to other DBMS, too.
Ideally, the source DBMS has a "dump" or "export" feature that can write the contents of each table as a text file, one line per row, with a special character (typically a TAB) separating the columns. If not, it's an easy thing to generate such a file using SQL. Like most DBMS, PostgreSQL can readily import a delimited file.
Unfortunately, some DBMS can't dump correctly. In other cases, the personnel controlling the data are unable, or unwilling, to deliver a clean, delimited file. I've had to develop some Bourne shell scripts to handle situations I encounter.
Popular DBMS like Oracle are certainly capable of generating
delimited text file dumps, but many legacy systems have been set up
to deliver records with column-aligned fields; perhaps this is a
holdover from punch-card days. If you're already getting the data you
need, there's no need to pester the DBA to deliver the same data in a
new format, just use the script
In the past, database management was expensive to acquire and maintain. In-house "bread-and-butter" applications and services that keep a typical business going but don't directly generate revenue typically couldn't justify the expense of a product like Oracle and a server to run it. Instead, they did without, filling file cabinets with paper and annoying employees with piles of redundant paper forms. Easy-to-use "personal database" tools like Access offered an irresistable temptation: why not implement office-management applications in Access?
A profusion of applications arose, first to assist individual accountants or administrators, then shared among many users, each of whom ran a complete copy of the entire database on his/her own workstation. A new generation of in-house experts came into being, finding ways to bend this tool to meet real needs, despite the inescapable fact that Access, like all "office" applications, was designed for a single user, physically sitting at the workstation running the tool. Multi-user support, concurrent access, remote access and, eventually, web-based access, all had to be retrofitted. This helped create a legacy of applications and services that eventually outgrew Access, which required expert attention to work under the strain.
Today, production-quality DBMS, and the platforms to run them, are available free on any typical GNU/Linux CD. More and more in-house "bread-and-butter" applications and services are now web-based, using the Apache webserver, PHP for the code and user interface, and PostgreSQL or MySQL as the DBMS. But somebody has to migrate the data from the legacy Access apps; typically, this is up to whoever develops the new LAPP or LAMP services, like you and me.
Astonishingly, the Access table export feature is broken; it seems that the highly-capable developers at Microsoft inexplicably overlooked the need to "escape" end-of-lines and other special characters inside text fields so the importing DBMS (even Access itself!) doesn't confuse them with field or record delimiters. Even worse, the export file often simply omits large sections, beginning in the middle of one record and ending in the middle of another, many records later. There is no way even a painstakingly careful import tool can recover data which was never exported in the first place.
After much experimentation, with wailing and gnashing of teeth, I've found a solution. It appears that Access can successfully export tables to the HTML format without losing or corrupting data, in a format that is readily parseable. The Microsoft developers apparently realized that the average user could readily tell if the HTML was broken, and took care to escape all special characters.
To convert, you must open the database "backend" in Access (currently F11, or shift when opening the file), right-click on each desired table, select Export, and then select HTML under "Save as type". Naturally, you'll need to move these HTML files to a platform capable of running shell scripts and tools like sed and grep; presumably, that's the machine that will run your new system.
Once you have your exported HTML files, use the Bourne shell
For large tables, the conversion can take some time, so you probably won't want to pipe it directly into the destination DBMS; instead, save the output for each table as a file and then import the file in a separate step.
Better yet, if you're using PostgreSQL, use another shell script,