Is it possible to crawl through a database?

Help moving broken Movable Type database to WordPress?

  • I have a personal blog running under Movable Type. I want to move to WordPress. I've exported the existing entries to a text file, so that's good. However, a couple years ago my previous MT install tanked (http://ask.metafilter.com/41314/I-cant-access-my-blogs-MT-control-panel). I cannot access it. I have several years worth of posts that are stuck in a database that I can't get to. I've been gradually moving these by hand to my existing MT install, but that's a pain in the ass. Is there some automated way to crawl/scrape the existing HTML files in order to retrieve these "lost" entries? Some way to break back into a corrupt MT database? Some way to repair it? (The install is BerkelyDB-based.) Important note: I have FTP access to all the "lost" entries, too. If I were ambitious, I'd figure out how to write a script that would parse all this information for me. I'm not ambitious, and am hoping that somebody has already done such a thing... Next on the agenda: converting an existing custom MT template to WP! For the record: http://www.foldedspace.org/weblog/2007/06/back_to_the_future_1.html and http://www.foldedspace.org/archives/004980.html

  • Answer:

    If it's really locked down, you might want to give http://www.dapper.net/index.php.

jdroth at Ask.Metafilter.Com Visit the source

Was this solution helpful to you?

Other answers

Assuming that you published static pages of your old blog with the corrupt database, the information is all "out there" in a digestible format. One way of digesting it would be this: 1. Use something like wget or a site-ripper app to crawl your old blog and fetch all the entries to your desktop. (oh wait, you've got FTP access, just DL the whole thing). 2. Figure out the GREP patterns necessary to convert your old blog entries into MT's export format. 3. Run those patterns over your crawled files and cat them into a single file. Make no mistake: this is a tedious process, but it can be done. I've done it on an old Blogger blog when the Blogger backend was unavailable.

adamrice

For minimum effort you might try the Wordpress http://wiki.wordpress.org/?pagename=RssImport, since your RSS links seem fine.

brool

Can you provide more details about the old DB that you "can't get to"? What kind of DB was/is it -- a MySQL DB, a Berkeley DB file, ...? What's preventing you from getting access to the data in it -- is it corrupt? You lost the username/password? More data is key.

delfuego

As I mentioned in the question, the install that's giving me trouble is BerkelyDB-based. I don't know precisely what's preventing me from accessing the file. It's not a lost username/password. In my http://ask.metafilter.com/25356/Why-has-my-MT-weblog-died-Can-it-be-fixed on this problem nearly two years ago, I noted that MT-Medic indicated that "MT-Medic does show existing weblogs and authors, though no weblogs are associated with any authors." HA! In a delicious piece of irony, I seem to have deleted my comment script for that particular MT install last week when I removed a subdomain. "What is this here for?" I wondered. Now I know. It doesn't matter, though, because comments were broken too. I was going to suggest that people try to leave a comment to look at the error that resulted, but now I've created a new error in its place!

jdroth

Here's some more info. I've been playing with MT-Medic, trying to get things to work. As I reported a couple years ago, it shows all the proper weblogs and authors, but it doesn't show any connection between them. Also, when I attempt to login to the MT install, it gives me "no such author" error for any author. When I attempt to re-create connections between blogs and users in MT-Medic, they do not "take". Very frustrating.

jdroth

Hey JD, I work with the MT team, and I'm sorry you're having such a bad experience. I'd love to have us work with you to fix whatever's wrong with your install. If you're game, drop me a line (or IM me at anildash) and I'll get you set up. It sounds like all the issues you're having are definitely fixable.

anildash

Anil, I tried to e-mail you a couple years ago about this problem but got no reply! I'll IM you, okay?

jdroth

Try to avoid anything that uses the Movable Type export format, which is just broken and was the source of http://lemonodor.com/archives/000730.html every time I tried to use it (you may have no problem as long as your posts and comments don't contain troublesome pieces of text like "-----").

jjwiseman

Assuming that the html pages exist, you should be able to do this. Another person and I did this for Wil Wheaton when his db got borked. We went around the db issue and directly to the published archive pages themselves. Basically, I used http://www.tenmax.com/teleport/pro/home.htm to rip the html to local files. Once we had the local finles, we ran a script based on http://forums.sixapart.com//index.php?showtopic=32035&hl=htmls&st=0 to convert it to the MT export format. Yoshi, my coworker did rewrite the code a bit to make it more to his liking, but we were able to create a functional file that was re-imported back into a clean MT and/or Typepad install. I asked Yoshi if he still had the code, but it was back in December 05 or some time long ago. If he finds it, I will post linkage here. That's the concept at least, it worked once, and hopefully is useful to you.

Argyle

Find solution

For every problem there is a solution! Proved by Solucija.

  • Got an issue and looking for advice?

  • Ask Solucija to search every corner of the Web for help.

  • Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.