Rendering Javascript to obtain static HTML in Python -
i have big amount of html files want process using beautifulsoup , generate statistics. although, came across problem html files contain scripts may generate more html code not being processed. therefore, need render javascript static html before proceeding.
i have seen options such using selenium, doesn't seem fit since don't want launch browser (it should done in background).
can please suggest appropriate approach this?
thanks in advance!
since need javascript engine, using headless browser way go. using selenium web driver phantomjs headless browser best option:
driver = webdriver.phantomjs() driver.get("...") bs = beautifulsoup(driver.page_source)
Comments
Post a Comment