Extracting HTML from a WebView
Here’s another Android WebView tutorial for those of you who are looking for a way to get the source code of a page loaded in a WebView instance.
This example is a bit more complicated than previous ones, so let me explain it step by step:
- First, a class called MyJavaScriptInterface is defined. It implements a single public method showHTML() which displays a dialog with the HTML it receives as a parameter.
- Then, an instance of this class is registered as a JavaScript interface called HTMLOUT. The showHTML() method can now be accessed from JavaScript like this: window.HTMLOUT.showHTML(‘…’)
- In order to call showHTML() when the page finishes loading, a WebViewClient instance which overrides onPageFinished() is added to the WebView. When the page finises loading, this method will inject a piece of JavaScript code into the page, usingthe method I described in an earlier post.
- Finally, a web page is loaded.
- final Context myApp = this;
- /* An instance of this class will be registered as a JavaScript interface */
- class MyJavaScriptInterface
- {
- @SuppressWarnings("unused")
- public void showHTML(String html)
- {
- new AlertDialog.Builder(myApp)
- .setTitle("HTML")
- .setMessage(html)
- .setPositiveButton(android.R.string.ok, null)
- .setCancelable(false)
- .create()
- .show();
- }
- }
- final WebView browser = (WebView)findViewById(R.id.browser);
- /* JavaScript must be enabled if you want it to work, obviously */
- browser.getSettings().setJavaScriptEnabled(true);
- /* Register a new JavaScript interface called HTMLOUT */
- browser.addJavascriptInterface(new MyJavaScriptInterface(), "HTMLOUT");
- /* WebViewClient must be set BEFORE calling loadUrl! */
- browser.setWebViewClient(new WebViewClient() {
- @Override
- public void onPageFinished(WebView view, String url)
- {
- /* This call inject JavaScript into the page which just finished loading. */
- browser.loadUrl("javascript:window.HTMLOUT.showHTML('<head>'+document.getElementsByTagName('html')[0].innerHTML+'</head>');");
- }
- });
- /* load a web page */
- browser.loadUrl("https://lexandera.com/files/jsexamples/gethtml.html");
WARNING
Unfortunately, this approach suffers from a major security hole: if your JavaScript can call showHTML(), then so can JavaScript from every other page that might get loaded into the WebView. Use with care.