이메일 감정 분석 봇을 구축하는 방법: NLP 자습서

게시 됨: 2022-03-11

자연어 처리 기술은 지난 몇 년 동안 상당히 정교해졌습니다. 기술 대기업부터 취미 생활자에 이르기까지 많은 사람들이 자연어를 분석, 이해 및 대응할 수 있는 풍부한 인터페이스를 구축하기 위해 서두르고 있습니다. Amazon의 Alexa, Microsoft의 Cortana, Google의 Google Home, Apple의 Siri는 모두 우리가 컴퓨터와 상호 작용하는 방식을 바꾸는 것을 목표로 합니다.

자연어 처리의 하위 분야인 감정 분석은 텍스트나 말의 어조를 결정하는 기술로 구성됩니다. 오늘날 기계 학습과 소셜 미디어 및 리뷰 사이트에서 수집한 대량의 데이터를 통해 모델을 훈련시켜 자연어 구절의 감정을 상당히 정확하게 식별할 수 있습니다.

이 자습서에서는 수신한 이메일의 감정을 분석하고 즉시 주의가 필요할 수 있는 이메일에 대해 알릴 수 있는 봇을 구축하는 방법을 배웁니다.

이메일의 감정 분석

봇은 Java와 Python 개발을 혼합하여 구축됩니다. 두 프로세스는 Thrift를 사용하여 서로 통신합니다. 이러한 언어 중 하나 또는 둘 모두에 익숙하지 않은 경우에도 이 문서의 기본 개념이 다른 언어에도 적용되므로 계속 읽을 수 있습니다.

이메일에 주의가 필요한지 확인하기 위해 봇은 이메일을 구문 분석하고 강한 부정적인 톤이 있는지 확인합니다. 그런 다음 필요한 경우 문자 알림을 보냅니다.

Sendgrid를 사용하여 사서함에 연결하고 Twilio를 사용하여 문자 알림을 보냅니다.

감정 분석: 믿을 수 없을 정도로 단순한 문제

우리가 사랑, 기쁨, 즐거움과 같은 긍정적인 감정과 연관시키는 단어가 있습니다. 그리고 증오, 슬픔, 고통과 같은 부정적인 감정과 연관되는 단어가 있습니다. 이 단어를 인식하고 각 긍정 및 부정 단어의 상대적 빈도와 강도를 계산하도록 모델을 훈련하지 않는 이유는 무엇입니까?

글쎄요, 거기에는 몇 가지 문제가 있습니다.

첫째, 부정의 문제가 있다. 예를 들어, "복숭아는 나쁘지 않다"와 같은 문장은 우리가 가장 자주 부정적인 것과 연관시키는 단어를 사용하여 긍정적인 감정을 암시합니다. 단순한 bag-of-words 모델은 이 문장의 부정을 인식할 수 없습니다.

게다가, 혼합된 감정은 순진한 감정 분석의 또 다른 문제로 판명됩니다. 예를 들어, "복숭아는 나쁘지 않지만 사과는 정말 끔찍하다"와 같은 문장은 서로 상호 작용하는 혼합 강도의 혼합 감정을 포함합니다. 단순한 접근 방식으로는 결합된 감정, 다른 강도 또는 감정 간의 상호 작용을 해결할 수 없습니다.

재귀 신경 텐서 네트워크를 사용한 감정 분석

감정 분석을 위한 Stanford Natural Language Processing 라이브러리는 RNTN(Recursive Neural Tensor Network)을 사용하여 이러한 문제를 해결합니다.

RNTN 알고리즘은 먼저 문장을 개별 단어로 나눕니다. 그런 다음 노드가 개별 단어인 신경망을 구성합니다. 마지막으로 텐서 레이어가 추가되어 모델이 단어와 구 간의 상호 작용에 대해 적절하게 조정할 수 있습니다.

공식 웹 사이트에서 알고리즘의 시각적 데모를 찾을 수 있습니다.

Stanford NLP 그룹은 수동으로 태그가 지정된 IMDB 영화 리뷰를 사용하여 Recursive Neural Tensor Network를 훈련했으며 그들의 모델이 매우 정확하게 감정을 예측할 수 있음을 발견했습니다.

이메일을 받는 봇

가장 먼저 해야 할 일은 이메일 통합을 설정하여 데이터를 봇에 전달할 수 있도록 하는 것입니다.

이를 수행하는 방법은 여러 가지가 있지만 간단하게 하기 위해 간단한 웹 서버를 설정하고 Sendgrid의 인바운드 구문 분석 후크를 사용하여 이메일을 서버로 파이프합니다. Sendgrid의 인바운드 구문 분석 주소로 이메일을 전달할 수 있습니다. 그런 다음 Sendgrid는 웹 서버에 POST 요청을 보내고 서버를 통해 데이터를 처리할 수 있습니다.

서버를 구축하기 위해 Python용 간단한 웹 프레임워크인 Flask를 사용합니다.

웹 서버를 구축하는 것 외에도 웹 서비스를 도메인에 연결하려고 합니다. 간결함을 위해 이 기사에서 이에 대한 글을 건너뛰겠습니다. 그러나 여기에서 자세한 내용을 읽을 수 있습니다.

Flask에서 웹 서버를 구축하는 것은 매우 간단합니다.

간단하게 app.py 를 만들고 이것을 파일에 추가하십시오:

 from flask import Flask, request import datetime app = Flask(__name__) @app.route('/analyze', methods=['POST']) def analyze(): with open('logfile.txt', 'a') as fp_log: fp_log.write('endpoint hit %s \n' % datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')) return "Got it" app.run(host='0.0.0.0')

이 앱을 도메인 이름 뒤에 배포하고 끝점 "/analyze" 끝점에 도달하면 다음과 같이 표시되어야 합니다.

 > >> requests.post('http://sentiments.shanglunwang.com:5000/analyze').text 'Got it'

다음으로 이 엔드포인트로 이메일을 보내려고 합니다.

여기에서 더 많은 문서를 찾을 수 있지만 본질적으로 Sendgrid를 이메일 처리기로 설정하고 Sendgrid가 이메일을 웹 서버로 전달하도록 하고 싶습니다.

다음은 Sendgrid에서 설정한 내용입니다. 이렇게 하면 "http://sentiments.shanglunwang.com/analyze"에 대한 POST 요청으로 이메일이 @sentibot.shanglunwang.com 으로 전달됩니다.

웹훅을 통한 인바운드 이메일 전송을 지원하는 다른 서비스를 사용할 수 있습니다.

모든 것을 설정한 후 Sendgrid 주소로 이메일을 보내보십시오. 로그에 다음과 같은 내용이 표시되어야 합니다.

 endpoint hit 2017-05-25 14:35:46

대단해! 이제 이메일을 받을 수 있는 봇이 생겼습니다. 그것이 우리가 하려고 하는 것의 절반입니다.

이제 이 봇에 이메일의 감정을 분석하는 기능을 부여하려고 합니다.

Stanford NLP를 사용한 이메일 감정 분석

Stanford NLP 라이브러리는 Java로 작성되었으므로 Java로 분석 엔진을 구축하고자 합니다.

Maven에서 Stanford NLP 라이브러리와 모델을 다운로드하여 시작하겠습니다. 새 Java 프로젝트를 만들고 Maven 종속성에 다음을 추가하고 가져옵니다.

 <dependency> <groupId>edu.stanford.nlp</groupId> <artifactId>stanford-corenlp</artifactId> <version>3.6.0</version> </dependency>

Stanford NLP의 감정 분석 엔진은 파이프라인 초기화 코드에서 감정 주석자를 지정하여 액세스할 수 있습니다. 그러면 주석을 트리 구조로 검색할 수 있습니다.

이 튜토리얼의 목적을 위해 우리는 문장의 일반적인 감정을 알고 싶을 뿐이므로 트리를 구문 분석할 필요가 없습니다. 기본 노드만 보면 됩니다.

이렇게 하면 메인 코드가 비교적 간단해집니다.

 package seanwang; import edu.stanford.nlp.pipeline.*; import edu.stanford.nlp.util.CoreMap; import edu.stanford.nlp.ling.CoreAnnotations; import edu.stanford.nlp.sentiment.SentimentCoreAnnotations; import java.util.*; public class App { public static void main( String[] args ) { Properties pipelineProps = new Properties(); Properties tokenizerProps = new Properties(); pipelineProps.setProperty("annotators", "parse, sentiment"); pipelineProps.setProperty("parse.binaryTrees", "true"); pipelineProps.setProperty("enforceRequirements", "false"); tokenizerProps.setProperty("annotators", "tokenize ssplit"); StanfordCoreNLP tokenizer = new StanfordCoreNLP(tokenizerProps); StanfordCoreNLP pipeline = new StanfordCoreNLP(pipelineProps); String line = "Amazingly grateful beautiful friends are fulfilling an incredibly joyful accomplishment. What an truly terrible idea."; Annotation annotation = tokenizer.process(line); pipeline.annotate(annotation); // normal output for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) { String output = sentence.get(SentimentCoreAnnotations.SentimentClass.class); System.out.println(output); } } }

몇 가지 문장을 시도하면 적절한 주석이 표시됩니다. 예제 코드를 실행하면 다음이 출력됩니다.

 Very Positive Negative

봇과 분석 엔진 통합

따라서 Java로 작성된 감정 분석기 프로그램과 Python으로 작성된 이메일 봇이 있습니다. 어떻게 하면 그들이 서로 이야기하게 합니까?

이 문제에 대한 가능한 솔루션이 많이 있지만 여기서는 Thrift를 사용합니다. Sentiment Analyzer를 Thrift 서버로, 이메일 봇을 Thrift 클라이언트로 가동할 것입니다.

Thrift는 코드 생성기이자 종종 서로 다른 언어로 작성된 두 응용 프로그램이 정의된 프로토콜을 사용하여 서로 통신할 수 있도록 하는 데 사용되는 프로토콜입니다. Polyglot 팀은 Thrift를 사용하여 마이크로서비스 네트워크를 구축하여 사용하는 각 언어의 장점을 최대한 활용합니다.

Thrift를 사용하려면 서비스 끝점을 정의하기 위한 .thrift 파일과 .proto 파일에 정의된 프로토콜을 사용하기 위해 생성된 코드의 두 가지가 필요합니다. 분석기 서비스의 경우 sentiment.thrift 는 다음과 같습니다.

 namespace java sentiment namespace py sentiment service SentimentAnalysisService { string sentimentAnalyze(1:string sentence), }

이 .thrift 파일을 사용하여 클라이언트 및 서버 코드를 생성할 수 있습니다. 달리다:

 thrift-0.10.0.exe --gen py sentiment.thrift thrift-0.10.0.exe --gen java sentiment.thrift

참고: Windows 컴퓨터에서 코드를 생성했습니다. 사용자 환경에서 Thrift 실행 파일에 대한 적절한 경로를 사용하고 싶을 것입니다.

이제 분석 엔진을 적절하게 변경하여 서버를 생성해 보겠습니다. Java 프로그램은 다음과 같아야 합니다.

SentimentHandler.java

 package seanwang; public class SentimentHandler implements SentimentAnalysisService.Iface { SentimentAnalyzer analyzer; SentimentHandler() { analyzer = new SentimentAnalyzer(); } public String sentimentAnalyze(String sentence) { System.out.println("got: " + sentence); return analyzer.analyze(sentence); } }

이 핸들러는 Thrift 프로토콜을 통해 분석 요청을 수신하는 곳입니다.

SentimentAnalyzer.java

 package seanwang; // ... public class SentimentAnalyzer { StanfordCoreNLP tokenizer; StanfordCoreNLP pipeline; public SentimentAnalyzer() { Properties pipelineProps = new Properties(); Properties tokenizerProps = new Properties(); pipelineProps.setProperty("annotators", "parse, sentiment"); pipelineProps.setProperty("parse.binaryTrees", "true"); pipelineProps.setProperty("enforceRequirements", "false"); tokenizerProps.setProperty("annotators", "tokenize ssplit"); tokenizer = new StanfordCoreNLP(tokenizerProps); pipeline = new StanfordCoreNLP(pipelineProps); } public String analyze(String line) { Annotation annotation = tokenizer.process(line); pipeline.annotate(annotation); String output = ""; for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) { output += sentence.get(SentimentCoreAnnotations.SentimentClass.class); output += "\n"; } return output; } }

분석기는 Stanford NLP 라이브러리를 사용하여 텍스트의 감정을 결정하고 텍스트의 각 문장에 대한 감정 주석이 포함된 문자열을 생성합니다.

감성서버.자바

 package seanwang; // ... public class SentimentServer { public static SentimentHandler handler; public static SentimentAnalysisService.Processor processor; public static void main(String [] args) { try { handler = new SentimentHandler(); processor = new SentimentAnalysisService.Processor(handler); Runnable simple = new Runnable() { public void run() { simple(processor); } }; new Thread(simple).start(); } catch (Exception x) { x.printStackTrace(); } } public static void simple(SentimentAnalysisService.Processor processor) { try { TServerTransport serverTransport = new TServerSocket(9090); TServer server = new TSimpleServer(new Args(serverTransport).processor(processor)); System.out.println("Starting the simple server..."); server.serve(); } catch (Exception e) { e.printStackTrace(); } } }

SentimentAnalysisService.java 파일은 생성된 파일이므로 여기에 포함하지 않았습니다. 나머지 코드가 액세스할 수 있는 위치에 생성된 코드를 배치하고 싶을 것입니다.

이제 서버가 준비되었으므로 서버를 사용할 Python 클라이언트를 작성해 보겠습니다.

client.py

 from sentiment import SentimentAnalysisService from thrift.transport import TSocket from thrift.transport import TTransport from thrift.protocol import TBinaryProtocol class SentimentClient: def __init__(self, server='localhost', socket=9090): transport = TSocket.TSocket(server, socket) transport = TTransport.TBufferedTransport(transport) protocol = TBinaryProtocol.TBinaryProtocol(transport) self.transport = transport self.client = SentimentAnalysisService.Client(protocol) self.transport.open() def __del__(self): self.transport.close() def analyze(self, sentence): return self.client.sentimentAnalyze(sentence) if __name__ == '__main__': client = SentimentClient() print(client.analyze('An amazingly wonderful sentence'))

이것을 실행하면 다음이 표시됩니다.

 Very Positive

엄청난! 이제 서버를 실행하고 클라이언트와 통신하므로 클라이언트를 인스턴스화하고 이메일을 파이프로 연결하여 이메일 봇과 통합해 보겠습니다.

 import client # ... @app.route('/analyze', methods=['POST']) def analyze(): sentiment_client = client.SentimentClient() with open('logfile.txt', 'a') as fp_log: fp_log.write(str(request.form.get('text'))) fp_log.write(request.form.get('text')) fp_log.write(sentiment_client.analyze(request.form.get('text'))) return "Got it"

이제 웹 서버를 실행하는 동일한 시스템에 Java 서비스를 배포하고 서비스를 시작한 다음 앱을 다시 시작합니다. 테스트 문장이 포함된 이메일을 봇에 보내면 로그 파일에 다음과 같은 내용이 표시됩니다.

 Amazingly wonderfully positive and beautiful sentence. Very Positive

이메일 분석

괜찮아! 이제 감정 분석을 수행할 수 있는 이메일 봇이 있습니다! 우리는 이메일을 보내고 우리가 보낸 각 문장에 대한 감정 태그를 받을 수 있습니다. 이제 인텔리전스를 실행 가능하게 만드는 방법을 살펴보겠습니다.

일을 단순하게 유지하기 위해 부정적인 문장과 매우 부정적인 문장이 많이 집중된 이메일에 집중하겠습니다. 간단한 채점 시스템을 사용하고 이메일에 부정적인 감정 문장이 75% 이상 포함되어 있는 경우 즉각적인 응답이 필요할 수 있는 잠재적 경보 이메일로 표시한다고 가정해 보겠습니다. 분석 경로에서 채점 논리를 구현해 보겠습니다.

 @app.route('/analyze', methods=['POST']) def analyze(): text = str(request.form.get('text')) sentiment_client = client.SentimentClient() text.replace('\n', '') # remove all new lines sentences = text.rstrip('.').split('.') # remove the last period before splitting negative_sentences = [ sentence for sentence in sentences if sentiment_client.analyze(sentence).rstrip() in ['Negative', 'Very negative'] # remove newline char ] urgent = len(negative_sentences) / len(sentences) > 0.75 with open('logfile.txt', 'a') as fp_log: fp_log.write("Received: %s" % (request.form.get('text'))) fp_log.write("urgent = %s" % (str(urgent))) return "Got it"

위의 코드는 몇 가지 가정을 하지만 데모 목적으로 작동합니다. 봇에 몇 개의 이메일을 보내면 로그에서 이메일 분석을 볼 수 있습니다.

 Received: Here is a test for the system. This is supposed to be a non-urgent request. It's very good! For the most part this is positive or neutral. Great things are happening! urgent = False Received: This is an urgent request. Everything is truly awful. This is a disaster. People hate this tasteless mail. urgent = True

경고 보내기

거의 완료되었습니다!

우리는 이메일을 수신하고, 감정 분석을 수행하고, 이메일에 즉각적인 주의가 필요한지 결정할 수 있는 이메일 봇을 구축했습니다. 이제 이메일이 특히 부정적인 경우 문자 알림을 보내기만 하면 됩니다.

Twilio를 사용하여 문자 알림을 보냅니다. 여기에 문서화되어 있는 Python API는 매우 간단합니다. 긴급 요청을 받았을 때 요청을 보내도록 분석 경로를 수정해 보겠습니다.

 def send_message(body): twilio_client.messages.create( to=on_call, from_=os.getenv('TWILIO_PHONE_NUMBER'), body=body ) app = Flask(__name__) @app.route('/analyze', methods=['POST']) def analyze(): text = str(request.form.get('text')) sentiment_client = client.SentimentClient() text.replace('\n', '') # remove all new lines sentences = text.rstrip('.').split('.') # remove the last period before splitting negative_sentences = [ sentence for sentence in sentences if sentiment_client.analyze(sentence).rstrip() in ['Negative', 'Very negative'] # remove newline char ] urgent = len(negative_sentences) / len(sentences) > 0.75 if urgent: send_message('Highly negative email received. Please take action') with open('logfile.txt', 'a') as fp_log: fp_log.write("Received: " % request.form.get('text')) fp_log.write("urgent = %s" % (str(urgent))) fp_log.write("\n") return "Got it"

환경 변수를 Twilio 계정 자격 증명으로 설정하고 통화 중 번호를 확인할 수 있는 전화로 설정해야 합니다. 이 작업을 완료한 후 분석 엔드포인트로 이메일을 보내면 해당 전화번호로 문자가 전송되는 것을 볼 수 있습니다.

그리고 우리는 끝났습니다!

Stanford NLP로 자연어 처리가 쉬워졌습니다.

이 기사에서는 Stanford NLP 라이브러리를 사용하여 이메일 감정 분석 봇을 구축하는 방법을 배웠습니다. 라이브러리는 자연어 처리의 모든 핵심 세부 사항을 추상화하는 데 도움이 되며 이를 NLP 애플리케이션의 빌딩 블록으로 사용할 수 있습니다.

이 게시물이 감정 분석의 많은 놀라운 잠재적 응용 프로그램 중 하나를 보여주고 이것이 여러분이 자신만의 NLP 응용 프로그램을 구축하도록 영감을 주기를 바랍니다.

GitHub의 이 NLP 자습서에서 이메일 감정 분석 봇에 대한 코드를 찾을 수 있습니다.

관련 항목: 감정 분석 정확도의 4가지 함정