javascript - Removing html tags and content where tag content matches an array of values using Xml.parse() -
i've extracted html gmailapp using .getbody() , return html filters specific tag , contents contents matches value in array (specifically links text). looking @ this solution figure easiest way use xml.parse()
, filter object can't beyond creating xmldocument.
for example, if:
var html = '<div>some text <div><a href="http://example1.com">foo</a></div> , <span>some <a href="http://example2.com">baa</a>,and <a href="http://example3.com">close</a></span></div>';
and
var linkstoremove = ['baa','foo'];
how return
var newhtml = '<div>some text <div></div> , <span>some ,and <a href="http://example3.com">close</a></span></div>';
using
var obj = xml.parse(html, true);
i can object process falls apart there (i did consider using .replace()
given issues matching regex thought best avoid)
following suggestion opted try using regex
var html = '<div>some text <div><a href="http://example1.com">foo</a></div> , <span>some <a href="http://example2.com">baa</a>,and <a href="http://example3.com">close</a></span></div>'; var linkstoremove = ['baa', 'foo']; var newhtml = cleanbody(html, linkstoremove); /** * removes links html text * @param {string} html html cleaned. * @param {array} exclude array of link text remove. * @returns {string} cleaned html. */ function cleanbody(html, exclude) { html = html.replace(/\r?\n|\r|\t/g, ''); // used remove breaks , tabs var re = '<a\\b[^>]*>(' + exclude.join('|') + ')<\\/a>'; return html.replace(new regexp(re, 'ig'), ""); }
test @ http://jsfiddle.net/hdspu/
Comments
Post a Comment