Removing irrelevant tags and stuff in-between.
Constructor Summary |
Html()
|
Method Summary |
static java.lang.String |
getTagAttrib(java.lang.String id,
java.lang.String source)
|
static java.lang.String |
highlightTerms(java.lang.String terms,
java.lang.String data)
|
static void |
main(java.lang.String[] args)
|
static java.lang.String |
modifyLinks(java.lang.String html,
java.net.URL base)
|
static java.lang.String |
relativeLink(java.lang.String base,
java.lang.String link)
|
static java.lang.String |
stripHtmlTags(java.lang.String data)
|
static java.lang.String |
stripTags(java.lang.String data)
Removes all HTML tags from the supplied string. |
static void |
testHtmlClass()
|
static java.lang.String |
verify(java.lang.String html)
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Html
public Html()
stripHtmlTags
public static java.lang.String stripHtmlTags(java.lang.String data)
stripTags
public static java.lang.String stripTags(java.lang.String data)
- Removes all HTML tags from the supplied string.
Example: Before stripping: Amazon sells other items too such as
Movies and more. After
stripping: Amazon sells other items too such as
movies and more.
The method should be able to handle text that has no html
markup embedded in it.
FIXME: Does this method provably handle unbalanced tags correctly?
- Parameters:
String
- containing html and text.- Returns:
- String with just the text, all html tags stripped out.
modifyLinks
public static java.lang.String modifyLinks(java.lang.String html,
java.net.URL base)
relativeLink
public static java.lang.String relativeLink(java.lang.String base,
java.lang.String link)
highlightTerms
public static java.lang.String highlightTerms(java.lang.String terms,
java.lang.String data)
getTagAttrib
public static java.lang.String getTagAttrib(java.lang.String id,
java.lang.String source)
verify
public static java.lang.String verify(java.lang.String html)
testHtmlClass
public static void testHtmlClass()
main
public static void main(java.lang.String[] args)
throws java.io.IOException