Parsing HTML in JAVA is very simple with jsoup library. That is a powerful library which can parse HTML documents in a very easy way.
Here are the steps of parsing HTML via jsoup library.
Step 1: Download jsoup jar from http://jsoup.org/download
Step 2: Following example will help you in using jsoup.
import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; public class TestClass { public static void main(String args[]) { String html = "<p>An <a href='http://example.com/'><b>example</b></a>
link.</p>"; Document doc = Jsoup.parse(html); Element link = doc.select("a").first(); String text = doc.body().text(); // "An example link" String linkHref = link.attr("href"); // "http://example.com/" String linkText = link.text(); // "example"" String linkOuterH = link.outerHtml(); // "<a href="http://example.com"><b>example</b></a>" String linkInnerH = link.html(); // "<b>example</b>" System.out.println(text); System.out.println(linkHref); System.out.println(linkText); System.out.println(linkOuterH); System.out.println(linkInnerH); } }
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.