How to delete a particular sentence from a file using line number in Python -


i want delete particular lines contain stopwords or matching string:

import nltk nltk import * nltk.tokenize import word_tokenize import time  mywords = 'hello', 'there', 'been' #stopwords matching in sentences. f = open('hello.txt','ru') raw = f.read() sent = word_tokenize(raw) #tokenize words. nltk.tokenize import wordpunct_tokenize punct = wordpunct_tokenize(raw) sent = sent_tokenize(raw) length = len(sent)  print(length) = 0 while(i<length):     = + 1     time.sleep(2)     #print(sent[i])     if <length:         #print(sent[i])         thisword = (word_tokenize(sent[i]))         word in thisword:             if word in mywords:                 #print(thisword, word)                 print("yes: ", sent[i])             else:                 print("no:", sent[i])      else:         print("end of line") 

you cannot delete lines file, can write lines not contain of stop words file.

the following script first takes list of stop words , converts them set(). reads input file line @ time. each line uses nltk.word_tokenize() create list of words. converts list of words set , gets intersection of stop words. if not empty there must have been stop words present. shows stop words found.

if none found, writes remaining lines output.txt file:

from nltk.tokenize import word_tokenize nltk.corpus import stopwords  # full list of stop words nltk #stop_words = set(stopwords.words('english'))  stop_words = set(['hello', 'there', 'been'])  open('input.txt','ru') f_input, open('output.txt', 'w') f_output:     line in f_input:         line_words = set(word_tokenize(line))         stop_words_present = line_words & stop_words          if stop_words_present:             print("yes: '{}' contains {}".format(line.strip(), stop_words_present))     # contains @ least 1 stop word         else:             print("no:", line.strip())      # contains non stop stops             f_output.write(line) 

note nltk has complete list of english stop words can make use of, change line above. may need first install if cannot find follows. run following mini-script:

import nltk  nltk.download() 

this display download utility allowing stopwords follows:

nltk download helper

select corpora, scroll down stopwords , click download button.


Comments

Popular posts from this blog

java - Run spring boot application error: Cannot instantiate interface org.springframework.context.ApplicationListener -

python - pip wont install .WHL files -

Excel VBA "Microsoft Windows Common Controls 6.0 (SP6)" Location Changes -