xml - Reading text in python weird error [Solved] -


i'm opening file looking this: http://pastebin.com/uch5ayha

and trying read using simple python:

f1 = open("goldstandard-answer-utf-8.txt", "r")  print f1.readline();  line in f1:     print line  f1.close() 

neither print line prints entire document. both readline , loop separately prints:

</file> 

this weird. has tags in document both attempts @ parsing either lmxl etree or beautiful soup gave similar results. there way force python print lines , disregarding tags, if makes sense?

edit: (suggested comments include) expected output same pastebin entry: 2028.htm.txt mäkitalo, Östen mäkitalo, Östen mäkitalo, jessica lindbäck, Östen mäkitalo, Östen mäkitalo, robert brännström etc...

if file encoded in utf-8, name suggests, try opening such:

import codecs  f = codecs.open('goldstandard-answer-utf-8.txt', 'r', encoding='utf-8') 

Comments

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

python - pip wont install .WHL files -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -