C# .NET - Reading HTML data and writing to a text file?

Asked By svt gdwl on 03-Apr-09 03:51 AM
Hi,
I just want to read the table data in a HTML file and Write as it is into a text file as it viewed in HTML view, I mean I don't want retrieve the soucrce code, but I want the text as well as empty spaces between each table cell in the text file also.
How can I do this in c#?

Re - Kalit Sikka replied to svt gdwl on 03-Apr-09 04:05 AM


Put the temp.html file in the same directory as the readHtml.aspx file.
Inside the code-behind file import the IO package:

using System.IO;

Then inside a event like Page_Load try this:

string fileName = Server.MapPath("") + "/temp.html";
FileStream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read);
StreamReader reader = new StreamReader(stream);
while (reader.Peek() > -1) Response.Write(reader.ReadLine());
reader.Close();


Vasanthakumar D replied to svt gdwl on 03-Apr-09 04:12 AM

Hi,

here is the code for this...


StreamReader str = new StreamReader("E:\\test.html");
            string strLings = str.ReadToEnd();
            int startIndex = strLings.IndexOf("<table>");
            int endInedx = strLings.IndexOf("</table>") + "</table>".Length - startIndex;
            string strTab = strLings.Substring(startIndex, endInedx);
            str.Close();
            StreamWriter strWr = new StreamWriter("E:\\test2.txt", true);
            strWr.Write(strTab);
            strWr.Close();

extract text contents of a HTML table - mv ark replied to svt gdwl on 03-Apr-09 04:16 AM

To rephrase your question as per my understanding, you need to extract the text contents of a HTML table. One way is use regular expressions to match & extract just the text.

You can adapt the code sample from this link - http://social.msdn.microsoft.com/Forums/en-US/regexp/thread/389b5bb0-b68f-4e4e-ba9f-cbecf7a86b67


thambi..nee salem TPT college aa? - Stella Pandian replied to Vasanthakumar D on 03-Apr-09 04:24 AM
end of post
smaple code - Sathish S replied to svt gdwl on 03-Apr-09 04:27 AM
http://www.developer.com/net/csharp/article.php/10918_2230091_2
re - Web Star replied to svt gdwl on 03-Apr-09 04:52 AM

u first read the html page which u want to convert into text as

string fileName = Server.MapPath("") + "/temp.html";
FileStream stream = new FileStream(fileName, FileMode.Open, FileAccess.Read);
StreamReader reader = new StreamReader(stream);


after the u can use the regular expression for remove all html tag related to table like <table>,<tr>,<td>....and so on.

Regex regex = new Regex(@"<tr>([^<]|(<[^t])|(<t[^r])|(<tr[^>]))*" +@"title=.*?</tr>",RegexOptions.Singleline | RegexOptions.Multiline);

 

Console.WriteLine(regex.Match(html).Value);


 

hope this help u
no, why? - Vasanthakumar D replied to Stella Pandian on 03-Apr-09 05:11 AM
end of post
this is not a private messages board.. dont ask anything like this... :)
Vasanthakumar D replied to Stella Pandian on 03-Apr-09 05:12 AM
end of post
illa pa enoda junior anda collegle padicha...avanaa nee en pathen....private messageboard engirukku?
Stella Pandian replied to Vasanthakumar D on 03-Apr-09 05:17 AM
end of post
there is no PM here.... - Vasanthakumar D replied to Stella Pandian on 03-Apr-09 05:24 AM
end of post
in the text file also, table data should be rendered in same manner - svt gdwl replied to Web Star on 03-Apr-09 08:20 AM
If I read the table data which is present in the HTML page in the following way:
he following figure shows how the table defined in this example renders. Rendered table.



after reading this table from html page, how to write in a text file such that the table's content should be renderd in same manner in the text file also.
re - Web Star replied to svt gdwl on 03-Apr-09 08:43 AM

as your need u need the read html and then remove all thing from that string exccept the actual data value. and place the blank space

yes, how can I do that? - svt gdwl replied to Web Star on 03-Apr-09 10:57 AM
end of post