Thursday, March 17, 2011

How To Parse HTML in JAVA -JSOUP Examples

Parsing HTML in JAVA is very simple with jsoup library. That is a powerful library which can parse HTML documents in a very easy way.
Here are the steps of parsing HTML via jsoup library.

Step 1: Download jsoup jar from

Step 2: Following example will help you in using jsoup.

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;

public class TestClass 
 public static void main(String args[])
String html = "<p>An <a href=''><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link ="a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // ""
String linkText = link.text(); // "example""
String linkOuterH = link.outerHtml(); 
        // "<a href=""><b>example</b></a>"
String linkInnerH = link.html(); // "<b>example</b>"


