Yesterday was the first time in about a year and a half of using MODx CMS/CMF that I had a need for to use the Import HTML function (in the Manager, via Tools -> Import HTML). I was not only impressed by how fast it performed with a large batch of HTML files, but ecstatic with the yield of the function’s end results.
The task at hand was like this: Port an older, hand-coded HTML site over to a dynamic content management system platform. Ideally without a ton of data entry (by an error-prone human anyway) involved in the process, since the project’s scope didn’t account for that. It did however account for all the original primary body and page title content (with its original markup) needed to remain intact for sake of natural SEO positioning the site’s content had accumulated over the years.
220 Pages of HTML Content.
Um, no. I was pretty much gonna have to enter the data myself or find some mysql database import method that I preferred not have to Google around for another “solution” (hehe).
Once I’d given the Import HTML feature a test run on a fresh dev install and gotten good results, I was 75% finished with the task in under an hour.
All that was left was to prep the HTML pages by batch removing unwanted code (like navigation and header/footer/callout/etc).
This went like this (on a Mac, Windows users could easily follow suit with some Windows-ey yucky stuff).
- Grab all the HTML files from the original webserver (since I didn’t have FTP access) using the CocoaWget (a Mac/Linux GUI for WGET)
- Create an OS X Automator workflow to open all HTML files with Coda, Edit -> Find, type the opening selection and closing selection tags to remove, pausing for an insertion of a Wildcard symbol, and replacing with an HTML comment that just said <pre><!– Welcome to your new home, MODx. –></pre> !!bonus -> NO REGEX UGLY FUNK REQUIRED !!
- Voila, all 220 HTML files successfully imported in 0.21 seconds as MODx documents with the page title as the document title and anything else within the html body tag as into the MODx CONTENT system TV.










