Nature Protocols | Stepwise

From Word document to clean html

Recently I tweeted that http://word2cleanhtml.com/ was the thing that had been missing from my life. I am not going to go into the reasons why it would be just that important to Me Personally, but I am excited enough to write a blogpost about it.

▲ Aside: Don’t forget to follow @NatureProtocols on http://www.twitter.com!!

♫ Here we go! Are you paying attention!

This Blogpost has been written in Word. The formatting has not been added using any of the weird stars and underscores that you normally find yourself using, and was achieved in six easy steps:

  1. Write the post with the desired formatting.
  2. Go to http://word2cleanhtml.com/index.html and paste in the text (Check the boxes “Remove empty paragraphs”, “Replace smart quotes with ascii equivalents”, “Indent with tabs, not spaces” and “Replace non-breaking spaces with ordinary spaces”).
  3. Copy the resulting clean html into a new Word document.
  4. Find and replace all paragraph marks with “nothing”.
  5. Find and replace all tabs with “nothing”.
  6. Paste the resulting text into the text field in movable type.

Can we even do a table??

Step Number

Problem

Solution

4

You can’t quite work out what to put in the “Find what” field.

Go to “More”, “Special” – the Paragraph Mark should be the top entry, and should be ^p.

6.

When you preview the document, the page layout is completely screwed up.

Every <p> needs to have a corresponding </p>. Similarly with <div> and </div>. Check that this is indeed thecase. A common “thing” is that you will not have copied and pasted the first “<” or the final “>”. If it just looks too complicated, it is worth either repeating the copy-and-paste steps OR just adding another </div> at the end of the text andhoping for the best.

And lastly: You could use this tool when uploading your protocols on the Protocol Exchange and (fingers crossed) your beautiful formatting will be preserved without anyone having to break a sweat.

…………………

Word2Cleanhtml was written by Olly Cope.

…………………

This post has been edited since it was originally published; the whole things works even better if you tick the “Indent with tabs, not spaces” box in http://word2cleanhtml.com/index.html

Comments

  1. Report this comment

    Maxine Clarke said:

     Thanks for the tips. Just to note, in your Tweet about this, the link to this blogpost (the second link) goes to an error page. (I sent you a DM but it does not seem to have gone through so probably you aren;t following me).

    Glad to see that this works. for non-technical content you can paste your Word doc into Notepad or Wordpad to clear out all the spurious code, but this seems a good way,too. 

  2. Report this comment

    Bronwen Dekker said:

    Thank you for the comment, Maxine. I have sent a new tweet out, and will sort out the "following situation".

    I love notepad (I am not sure that I could work without notepad and paint!), and will experiment with using it at various stages of this process to see if we can streamline it more. 

    As an aside, i saw a cool work-around for versions of Word that do not contain a button to remove the reviewer’s identity on Comments and Track Changes that involves converting to "Word html", opening in notepad, and the find-and-replacing the reviewer’s names.