Class HTMLParser
- All Implemented Interfaces:
- LinkExtractorParser
- Direct Known Subclasses:
- JsoupBasedHtmlParser,- LagartoBasedHtmlParser
HTMLParser subclasses can parse HTML content to obtain URLs.- 
Field SummaryFieldsModifier and TypeFieldDescriptionprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringstatic final Stringprotected static final Stringprotected static final Stringprotected static final Patternstatic final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final Stringprotected static final String
- 
Constructor SummaryConstructorsModifierConstructorDescriptionprotectedProtected constructor to prevent instantiation except from within subclasses.
- 
Method SummaryModifier and TypeMethodDescriptionprotected FloatextractIEVersion(String userAgent) getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, String encoding) Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, Collection<URLString> coll, String encoding) Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, URLCollection coll, String encoding) Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...protected static booleanisEnableConditionalComments(Float ieVersion) protected static StringNormalizes URL as browsers doMethods inherited from class org.apache.jmeter.protocol.http.parser.BaseParsergetParser, isReusable
- 
Field Details- 
ATT_ARCHIVE- See Also:
 
- 
ATT_BACKGROUND- See Also:
 
- 
ATT_CODE- See Also:
 
- 
ATT_CODEBASE- See Also:
 
- 
ATT_DATA- See Also:
 
- 
ATT_HREF- See Also:
 
- 
ATT_REL- See Also:
 
- 
ATT_SRC- See Also:
 
- 
ATT_STYLE- See Also:
 
- 
ATT_TYPE- See Also:
 
- 
ATT_IS_IMAGE- See Also:
 
- 
TAG_APPLET- See Also:
 
- 
TAG_BASE- See Also:
 
- 
TAG_BGSOUND- See Also:
 
- 
TAG_BODY- See Also:
 
- 
TAG_EMBED- See Also:
 
- 
TAG_FRAME- See Also:
 
- 
TAG_IFRAME- See Also:
 
- 
TAG_IMAGE- See Also:
 
- 
TAG_INPUT- See Also:
 
- 
TAG_LINK- See Also:
 
- 
TAG_OBJECT- See Also:
 
- 
TAG_SCRIPT- See Also:
 
- 
STYLESHEET- See Also:
 
- 
SHORTCUT_ICON- See Also:
 
- 
ICON- See Also:
 
- 
PRELOAD- See Also:
 
- 
IE_UA- See Also:
 
- 
IE_UA_PATTERN
- 
PARSER_CLASSNAME- See Also:
 
- 
DEFAULT_PARSER- See Also:
 
 
- 
- 
Constructor Details- 
HTMLParserprotected HTMLParser()Protected constructor to prevent instantiation except from within subclasses.
 
- 
- 
Method Details- 
getEmbeddedResourceURLspublic Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, String encoding) throws HTMLParseException Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...URLs should not appear twice in the returned iterator. Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException. - Parameters:
- userAgent- User Agent
- html- HTML code
- baseUrl- Base URL from which the HTML code was obtained
- encoding- Charset
- Returns:
- an Iterator for the resource URLs
- Throws:
- HTMLParseException- when parsing the- htmlfails
 
- 
getEmbeddedResourceURLspublic abstract Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, URLCollection coll, String encoding) throws HTMLParseException Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...All URLs should be added to the Collection. Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException. N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString. - Parameters:
- userAgent- User Agent
- html- HTML code
- baseUrl- Base URL from which the HTML code was obtained
- coll- URLCollection
- encoding- Charset
- Returns:
- an Iterator for the resource URLs
- Throws:
- HTMLParseException- when parsing the- htmlfails
 
- 
getEmbeddedResourceURLspublic Iterator<URL> getEmbeddedResourceURLs(String userAgent, byte[] html, URL baseUrl, Collection<URLString> coll, String encoding) throws HTMLParseException Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString. - Parameters:
- userAgent- User Agent
- html- HTML code
- baseUrl- Base URL from which the HTML code was obtained
- coll- Collection - will contain URLString objects, not URLs
- encoding- Charset
- Returns:
- an Iterator for the resource URLs
- Throws:
- HTMLParseException- when parsing the- htmlfails
 
- 
isEnableConditionalComments- Parameters:
- ieVersion- Float IE version
- Returns:
- true if IE version < IE v10
 
- 
extractIEVersion- Parameters:
- userAgent- User Agent
- Returns:
- version null if not IE or the version after MSIE
 
- 
normalizeUrlValueNormalizes URL as browsers do- Parameters:
- url-- CharSequence
- Returns:
- normalized url
 
 
-