By Peter A. Bromberg, Ph.D.
It turns out the only tool I had found that could do this was a thing written in VB4.0 called "HTML2VB" and it had some clunky limitations, so I decided to write myself a "Converter" HTML page using script to do this. Here is the simple technique I used:
First, we are going to need 2 "windows" on the converter page - one to paste in the HTML to be converted, and the second to display the result so that we can copy it to the clipboard and do what we want with it.
Paste HTML Below and press CONVERT button. Result appears in lower window.
Provide script tags?
In VB we have a neat function, "split" that allows us to take a string and "split" it into an array naming the cutoff delimiter of our choice as a parameter of the function call. Well, this is perfect! Since by definition there is a carriage return / linefeed (vbCrLF) at the end of each line of the HTML that we paste into the top window, we can use this as our array delimiter and we will get exactly the contents of each line in each array element! Then we can do whatever processing we
need to do by iterating through the elements of the array, concatenating the result for each line to a new string variable that will hold our "result" document.
The code is really simple. I'm going to let my inline comments tell the story, so look at the code:
' (C)2000 Peter A. Bromberg all rights reserved
' first dimension our text variables
' set begin text variable to value of first textarea window's text:
stext = divbegin.innertext
' dim a variable for our array
' create the array using the split function, and using the linefeed at the
end of each
' line as the array element delimiter
arStuff = split(stext,vbCrLf)
' if they chose script tags in the select control, write beginning script
if tags.value="yes" Then
entext = "<SCRIPT>" & vbCRLF
' now iterate through the array
for i = 0 to Ubound(arStuff)
' here we are using the "Replace" function to escape double quotes. You can
also put other custom replacements
' here, with a new line for each set of replacements on this element
arStuff(i) = "document.write(""" & Replace(arStuff(i),chr(34), "\" & chr(34))
' here we are escaping any <SCRIPT> tags that are actually in the HTML
by separating them into two elements
' so the script parser won't choke
arStuff(i) = Replace(LCase(arStuff(i)),"script>","scr" & chr(34) & "+" & chr(34)
' note below that we also add an extra vbCrLF at the end of each converted
line. This helps avoid page errors
' parsing the script, and also makes it easier to read when we "view source"
entext = entext & arStuff(i) & vbCrLF
' again, if they wanted script tags, write closing tag
if tags.value="yes" then
entext = entext & "</SCR" & "IPT>" & vbCrLf
' populate the lower window with our result
divend.innertext = entext
dowload the code that accompanies this article
Peter Bromberg is an independent consultant specializing in distributed .NET solutions
Inc. in Orlando and a co-developer of the