How To Parse HTML in JAVA -JSOUP Examples

Posted By: Matpal - March 17, 2011

Parsing HTML in JAVA is very simple with jsoup library. That is a powerful library which can parse HTML documents in a very easy way.
Here are the steps of parsing HTML via jsoup library.

Step 1: Download jsoup jar from http://jsoup.org/download

Step 2: Following example will help you in using jsoup.


import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class TestClass 
{
 public static void main(String args[])
{
String html = "<p>An <a href='http://example.com/'><b>example</b></a> 
link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""
String linkOuterH = link.outerHtml(); 
        // "<a href="http://example.com"><b>example</b></a>"
String linkInnerH = link.html(); // "<b>example</b>"

System.out.println(text);
System.out.println(linkHref);
System.out.println(linkText);
System.out.println(linkOuterH);
System.out.println(linkInnerH);
    }
}

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.