C# .NET - how to search a word in txt file with windows applcation code

Asked By Reena Jain on 05-Feb-10 02:16 PM
Hello,

firstly I want to convert html  file to text file or save as html to text file. so how to convert html file to text file through code.

Now i want to search a specific work from text file. and replace it with other word. so how its possible in windows application:

convert or save as html file in text file(i need both mean copy the html file to text file)
find the word in text file by code
replace the word with other word in same file by code

thanks in advance

Rolf Jaeger replied to Reena Jain on 05-Feb-10 06:09 PM

Hi Reena:

substituting text in a string is straightforward:

string t = "{your text}";
t = t.Replace("old string", "new string");

Converting text to HTML depends very much on what you want the HTML document to look like, converting HTML to text simply means that you need to ignore all HTML tags.

Below I am listing very simple code I would suggest for the text to HTML conversion. However I won't touch the HTML-to-text conversion. At first glance it would only require to loose all HTML tags, but then there are all for & characters, e.g. &nbsp for a tab.

Hope this helped,
Rolf

private void ConvertTextFileToHTMLFile(string textFileName)
{
    string htmlFileName = Path.GetFileNameWithoutExtension(textFileName) + ".html";
    htmlFileName = Path.GetDirectoryName(textFileName) + Path.DirectorySeparatorChar + Path.GetFileNameWithoutExtension(textFileName) + ".html";

    try
    {
        FileStream htmlFile = new FileStream(htmlFileName,
            FileMode.Create,
            FileAccess.Write,
            FileShare.None);
        StreamWriter sWriter = new StreamWriter(htmlFile);
        //Write HTML header
        sWriter.WriteLine("<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional//EN\"");
        sWriter.WriteLine("\n{0}\n{1}", "<html>", "<head>");
        sWriter.WriteLine("<title>Untitled</title>");
        sWriter.WriteLine("</head>");
        sWriter.WriteLine("<body>");

        FileInfo txtFile = new FileInfo(textFileName);
        
        StreamReader reader = txtFile.OpenText();
        string text;
        do
        {
            text = reader.ReadLine();
            if (text == null) break;
            //Include tabs
            text = text.Replace("\t", "&nbsp&nbsp&nbsp");
            sWriter.WriteLine(text);
            sWriter.WriteLine("</br>");
        } while (text != null);
        sWriter.WriteLine("</body>");
        sWriter.WriteLine("</head>");
        sWriter.Close();
        reader.Close();

    }
    catch(Exception e)
    {
        MessageBox.Show(e.Message);
    }
}
mv ark replied to Reena Jain on 05-Feb-10 09:30 PM
This one line can remove tags from a HTML content (variable text) to give you just plain text (stripped) by utilizing a regular expression -
string stripped = System.Text.RegularExpressions.Regex.Replace(text,@"<(.|\n)*?>",string.Empty);

You can then scan the text to locate word & use the Replace method - http://dotnetperls.com/replace