xml - Reading text in python weird error [Solved] -
i'm opening file looking this: http://pastebin.com/uch5ayha
and trying read using simple python:
f1 = open("goldstandard-answer-utf-8.txt", "r") print f1.readline(); line in f1: print line f1.close() neither print line prints entire document. both readline , loop separately prints:
</file> this weird. has tags in document both attempts @ parsing either lmxl etree or beautiful soup gave similar results. there way force python print lines , disregarding tags, if makes sense?
edit: (suggested comments include) expected output same pastebin entry: 2028.htm.txt mäkitalo, Östen mäkitalo, Östen mäkitalo, jessica lindbäck, Östen mäkitalo, Östen mäkitalo, robert brännström etc...
if file encoded in utf-8, name suggests, try opening such:
import codecs f = codecs.open('goldstandard-answer-utf-8.txt', 'r', encoding='utf-8')
Comments
Post a Comment