Thursday, October 30, 2008

Yikes backslash zero on wordpress.com

I started a new blog at wordpress.com, One Liner Code. I was reviewing my posts when I realized that my sed example didn't make sense to me. I didn't understand how I could have messed up such a simple example in the language that most easily lends itself to one liners. I thought I had double checked all of my examples and made sure they were running right. Turns out the html code looked fine. There was a preprocessor stepping in between me and my sed Hello World post. It was turning my backslash zero into a null.

I whipped out my handy dandy php evaluator and encountered failure using the htmlentities() string function. php wouldn't give me the htmlentity code for characters that didn't need them. I decided to switch it up and get the charactor code using a ruby one liner:

ruby -e '"0".each_byte {|i| p i}'

I could have also used a php one liner like this:

php -r 'echo ord("0")."\n";'

The quick fix \0 allowed the post preprocessor to accept the backslash zero and turn it into the correct character representation.

4 comments:

Filippo Erik Negroni said...

I too am struggling with back slash followed by a zero.
Unfortunately, encoding the backslash or the zero using the HTML ampersand expressions does not work: the expression gets translated by the post editor every time.
I haven't yet found a way to retain this.

freegnu said...

Here is the raw text of the post I am talking about at:

http://onelinercode.wordpress.com/2008/10/21/sed-one-liners-begin-with-e

<code>sed -e '/World/s/.*/\&#48;!/' &lt;&lt;&lt;"Hello World"</code>

I had to doctor it up for blogger. Hopefully it displays right. You may be missing the semicolon after the character code.

Filippo Erik Negroni said...
This comment has been removed by the author.
Filippo Erik Negroni said...

That encoding works only the first time it is written in the post editor.
Once the post is saved as a draft or even published, any subsequent editing will translate the &#48; encoding into \0.
This is unacceptable.
I use AsciiDoc to render my code in HTML, and AsciiDoc knows very well that '\0' is a perfectly acceptable XML combination of characters and does not translate it into a numerical encoding.
I would need to filter the asciidoc output or worse hack asciidoc to do that for me.