How to delete a particular sentence from a file using line number in Python -
i want delete particular lines contain stopwords or matching string:
import nltk nltk import * nltk.tokenize import word_tokenize import time mywords = 'hello', 'there', 'been' #stopwords matching in sentences. f = open('hello.txt','ru') raw = f.read() sent = word_tokenize(raw) #tokenize words. nltk.tokenize import wordpunct_tokenize punct = wordpunct_tokenize(raw) sent = sent_tokenize(raw) length = len(sent) print(length) = 0 while(i<length): = + 1 time.sleep(2) #print(sent[i]) if <length: #print(sent[i]) thisword = (word_tokenize(sent[i])) word in thisword: if word in mywords: #print(thisword, word) print("yes: ", sent[i]) else: print("no:", sent[i]) else: print("end of line")
you cannot delete lines file, can write lines not contain of stop words file.
the following script first takes list of stop words , converts them set(). reads input file line @ time. each line uses nltk.word_tokenize() create list of words. converts list of words set , gets intersection of stop words. if not empty there must have been stop words present. shows stop words found.
if none found, writes remaining lines output.txt file:
from nltk.tokenize import word_tokenize nltk.corpus import stopwords # full list of stop words nltk #stop_words = set(stopwords.words('english')) stop_words = set(['hello', 'there', 'been']) open('input.txt','ru') f_input, open('output.txt', 'w') f_output: line in f_input: line_words = set(word_tokenize(line)) stop_words_present = line_words & stop_words if stop_words_present: print("yes: '{}' contains {}".format(line.strip(), stop_words_present)) # contains @ least 1 stop word else: print("no:", line.strip()) # contains non stop stops f_output.write(line) note nltk has complete list of english stop words can make use of, change line above. may need first install if cannot find follows. run following mini-script:
import nltk nltk.download() this display download utility allowing stopwords follows:
select corpora, scroll down stopwords , click download button.

Comments
Post a Comment