Elasticsearch สำหรับ Ruby on Rails: บทช่วยสอนสำหรับ Chewy Gem

เผยแพร่แล้ว: 2022-03-11

Elasticsearch จัดเตรียมอินเทอร์เฟซ RESTful HTTP ที่มีประสิทธิภาพสำหรับการจัดทำดัชนีและการสืบค้นข้อมูล สร้างขึ้นบนไลบรารี Apache Lucene ทันทีที่แกะกล่อง ให้การค้นหาที่ปรับขนาดได้ มีประสิทธิภาพ และมีประสิทธิภาพ พร้อมรองรับ UTF-8 เป็นเครื่องมือที่มีประสิทธิภาพสำหรับการจัดทำดัชนีและสืบค้นข้อมูลที่มีโครงสร้างจำนวนมหาศาล และที่ Toptal เครื่องมือนี้จะสนับสนุนการค้นหาแพลตฟอร์มของเรา และจะนำไปใช้สำหรับการเติมข้อความอัตโนมัติในเร็วๆ นี้ด้วย เราเป็นแฟนตัวยง

Chewy ขยายไคลเอนต์ Elasticsearch-Ruby ทำให้มีประสิทธิภาพมากขึ้นและให้การผสานรวมกับ Rails ที่เข้มงวดยิ่งขึ้น

เนื่องจากแพลตฟอร์มของเราสร้างขึ้นโดยใช้ Ruby on Rails การผสานรวม Elasticsearch ของเราจึงใช้ประโยชน์จากโครงการ elasticsearch-ruby (เฟรมเวิร์กการรวม Ruby สำหรับ Elasticsearch ที่ให้บริการไคลเอ็นต์สำหรับการเชื่อมต่อกับคลัสเตอร์ Elasticsearch, Ruby API สำหรับ REST API ของ Elasticsearch และ ส่วนขยายและยูทิลิตี้ต่างๆ) จากรากฐานนี้ เราได้พัฒนาและเผยแพร่การปรับปรุงของเราเอง (และการทำให้เข้าใจง่าย) ของสถาปัตยกรรมการค้นหาแอปพลิเคชัน Elasticsearch ซึ่งบรรจุเป็นอัญมณี Ruby ที่เราตั้งชื่อว่า Chewy (พร้อมแอปตัวอย่างที่มีให้ที่นี่)

Chewy ขยายไคลเอนต์ Elasticsearch-Ruby ทำให้มีประสิทธิภาพมากขึ้นและให้การผสานรวมกับ Rails ที่เข้มงวดยิ่งขึ้น ในคู่มือ Elasticsearch นี้ ฉันพูดคุย (ผ่านตัวอย่างการใช้งาน) ว่าเราบรรลุเป้าหมายนี้อย่างไร ซึ่งรวมถึงอุปสรรคทางเทคนิคที่เกิดขึ้นระหว่างการใช้งาน

ความสัมพันธ์ระหว่าง Elasticsearch และ Ruby on Rails ได้อธิบายไว้ในคู่มือภาพนี้

บันทึกย่อสองสามข้อก่อนดำเนินการตามคำแนะนำ:

แอปพลิเคชันสาธิต Chewy และ Chewy มีอยู่ใน GitHub
สำหรับผู้ที่สนใจข้อมูลเพิ่มเติม "ที่ซ่อนเร้น" เกี่ยวกับ Elasticsearch ฉันได้รวมบทความสั้น ๆ ไว้เป็นภาคผนวกในโพสต์นี้

ทำไมต้องชิว?

แม้จะมีความสามารถในการปรับขนาดและประสิทธิภาพของ Elasticsearch แต่การผสานรวมกับ Rails ก็ไม่ได้เป็นเรื่องง่ายอย่างที่คิด ที่ Toptal เราพบว่าตนเองจำเป็นต้องเพิ่มไคลเอ็นต์ Elasticsearch-Ruby พื้นฐานให้มีประสิทธิภาพมากขึ้น และเพื่อรองรับการทำงานเพิ่มเติม

แม้จะมีความสามารถในการปรับขนาดและประสิทธิภาพของ Elasticsearch แต่การผสานรวมกับ Rails ก็ไม่ได้เป็นเรื่องง่ายอย่างที่คิด

และด้วยเหตุนี้อัญมณี Chewy จึงถือกำเนิดขึ้น

คุณสมบัติเด่นบางประการของ Chewy ได้แก่:

ทุกดัชนีสามารถสังเกตได้จากแบบจำลองที่เกี่ยวข้องทั้งหมด
โมเดลที่จัดทำดัชนีส่วนใหญ่มีความเกี่ยวข้องกัน และบางครั้ง จำเป็นต้องลดมาตรฐานข้อมูลที่เกี่ยวข้องนี้และผูกเข้ากับวัตถุเดียวกัน (เช่น หากคุณต้องการสร้างดัชนีอาร์เรย์ของแท็กร่วมกับบทความที่เกี่ยวข้อง) Chewy ช่วยให้คุณระบุดัชนีที่อัปเดตได้สำหรับทุกรุ่น ดังนั้นบทความที่เกี่ยวข้องจะถูกสร้างดัชนีใหม่ทุกครั้งที่แท็กที่เกี่ยวข้องได้รับการอัปเดต
คลาสดัชนีเป็นอิสระจากโมเดล ORM/ODM
ด้วยการปรับปรุงนี้ ตัวอย่างเช่น การปรับใช้การเติมข้อความอัตโนมัติข้ามโมเดลจะง่ายขึ้นมาก คุณสามารถกำหนดดัชนีและทำงานกับดัชนีได้ในลักษณะเชิงวัตถุ Chewy gem ต่างจากไคลเอนต์อื่นๆ ตรงที่ไม่จำเป็นต้องปรับใช้คลาสดัชนี การเรียกกลับของการนำเข้าข้อมูล และส่วนประกอบอื่นๆ ด้วยตนเอง
การนำเข้าจำนวนมากมีอยู่ ทุก ที่
Chewy ใช้ Elasticsearch API จำนวนมากสำหรับการทำดัชนีใหม่และอัปเดตดัชนีอย่างเต็มรูปแบบ นอกจากนี้ยังใช้แนวคิดของการอัปเดตอะตอมมิก รวบรวมวัตถุที่เปลี่ยนแปลงภายในบล็อกอะตอมและอัปเดตทั้งหมดพร้อมกัน
Chewy ให้ DSL แบบสอบถามสไตล์ AR
การปรับปรุงนี้ช่วยให้สร้างการสืบค้นข้อมูลในลักษณะที่มีประสิทธิภาพยิ่งขึ้นด้วยการเป็นลูกโซ่ ผสานรวมกันได้ และมีความเกียจคร้าน

ตกลง มาดูกันว่าทั้งหมดนี้เล่นอย่างไรในอัญมณี…

คู่มือพื้นฐานสำหรับ Elasticsearch

Elasticsearch มีแนวคิดเกี่ยวกับเอกสารหลายอย่าง อย่างแรกคือ index (อะนาล็อกของ database ใน RDBMS) ซึ่งประกอบด้วยชุดของ documents ซึ่งสามารถมีได้หลาย types (โดยที่ type คือตาราง RDBMS ชนิดหนึ่ง)

เอกสารทุกฉบับมีชุดของ fields แต่ละฟิลด์จะได้รับการวิเคราะห์อย่างอิสระและตัวเลือกการวิเคราะห์จะถูกจัดเก็บไว้ในการ mapping สำหรับประเภทของฟิลด์ Chewy ใช้โครงสร้างนี้ "ตามที่เป็น" ในโมเดลวัตถุ:

 class EntertainmentIndex < Chewy::Index settings analysis: { analyzer: { title: { tokenizer: 'standard', filter: ['lowercase', 'asciifolding'] } } } define_type Book.includes(:author, :tags) do field :title, analyzer: 'title' field :year, type: 'integer' field :author, value: ->{ author.name } field :author_id, type: 'integer' field :description field :tags, index: 'not_analyzed', value: ->{ tags.map(&:name) } end {movie: Video.movies, cartoon: Video.cartoons}.each do |type_name, scope| define_type scope.includes(:director, :tags), name: type_name do field :title, analyzer: 'title' field :year, type: 'integer' field :author, value: ->{ director.name } field :author_id, type: 'integer', value: ->{ director_id } field :description field :tags, index: 'not_analyzed', value: ->{ tags.map(&:name) } end end end

ด้านบน เราได้กำหนดดัชนี Elasticsearch ที่เรียกว่า entertainment โดยมีสามประเภท ได้แก่ book movie และ cartoon สำหรับแต่ละประเภท เราได้กำหนดการจับคู่ฟิลด์และแฮชของการตั้งค่าสำหรับดัชนีทั้งหมด

ดังนั้นเราจึงกำหนด EntertainmentIndex และเราต้องการดำเนินการค้นหาบางอย่าง ในขั้นแรก เราจำเป็นต้องสร้างดัชนีและนำเข้าข้อมูลของเรา:

 EntertainmentIndex.create! EntertainmentIndex.import # EntertainmentIndex.reset! (which includes deletion, # creation, and import) could be used instead

วิธีการ .import รับรู้ถึงข้อมูลที่นำเข้าเนื่องจากเราส่งผ่านในขอบเขตเมื่อเรากำหนดประเภทของเรา ดังนั้น มันจะนำเข้าหนังสือ ภาพยนตร์ และการ์ตูนทั้งหมดที่เก็บไว้ในที่จัดเก็บถาวร

เมื่อเสร็จแล้ว เราสามารถดำเนินการค้นหาบางอย่างได้:

 EntertainmentIndex.query(match: {author: 'Tarantino'}).filter{ year > 1990 } EntertainmentIndex.query(match: {title: 'Shawshank'}).types(:movie) EntertainmentIndex.query(match: {author: 'Tarantino'}).only(:id).limit(10).load # the last one loads ActiveRecord objects for documents found

ตอนนี้ดัชนีของเราเกือบจะพร้อมที่จะใช้ในการค้นหาของเราแล้ว

การรวมราง

สำหรับการผสานรวมกับ Rails สิ่งแรกที่เราต้องการคือการตอบสนองต่อการเปลี่ยนแปลงอ็อบเจ็กต์ RDBMS Chewy รองรับพฤติกรรมนี้ผ่านการเรียกกลับที่กำหนดไว้ภายในเมธอดคลาส update_index update_index รับสองอาร์กิวเมนต์:

ตัวระบุประเภทที่ให้มาในรูปแบบ "index_name#type_name"
ชื่อเมธอดหรือบล็อกที่จะดำเนินการ ซึ่งแสดงถึงการอ้างอิงย้อนกลับไปยังอ็อบเจ็กต์ที่อัพเดตหรือคอลเลกชั่นอ็อบเจ็กต์

เราจำเป็นต้องกำหนด callbacks เหล่านี้สำหรับแต่ละโมเดลที่ขึ้นต่อกัน:

 class Book < ActiveRecord::Base acts_as_taggable belongs_to :author, class_name: 'Dude' # We update the book itself on-change update_index 'entertainment#book', :self end class Video < ActiveRecord::Base acts_as_taggable belongs_to :director, class_name: 'Dude' # Update video types when changed, depending on the category update_index('entertainment#movie') { self if movie? } update_index('entertainment#cartoon') { self if cartoon? } end class Dude < ActiveRecord::Base acts_as_taggable has_many :books has_many :videos # If author or director was changed, all the corresponding # books, movies and cartoons are updated update_index 'entertainment#book', :books update_index('entertainment#movie') { videos.movies } update_index('entertainment#cartoon') { videos.cartoons } end

เนื่องจากแท็กถูกสร้างดัชนีด้วย ต่อไปเราจึงจำเป็นต้องแก้ไขโมเดลภายนอกบางตัวเพื่อให้ตอบสนองต่อการเปลี่ยนแปลง:

 ActsAsTaggableOn::Tag.class_eval do has_many :books, through: :taggings, source: :taggable, source_type: 'Book' has_many :videos, through: :taggings, source: :taggable, source_type: 'Video' # Updating all tag-related objects update_index 'entertainment#book', :books update_index('entertainment#movie') { videos.movies } update_index('entertainment#cartoon') { videos.cartoons } end ActsAsTaggableOn::Tagging.class_eval do # Same goes for the intermediate model update_index('entertainment#book') { taggable if taggable_type == 'Book' } update_index('entertainment#movie') { taggable if taggable_type == 'Video' && taggable.movie? } update_index('entertainment#cartoon') { taggable if taggable_type == 'Video' && taggable.cartoon? } end

ณ จุดนี้ ทุกอ็อบเจ็กต์ที่ บันทึก หรือ ทำลาย จะอัปเดตประเภทดัชนี Elasticsearch ที่สอดคล้องกัน

ปรมาณู

เรายังมีปัญหาค้างอยู่หนึ่งปัญหา หากเราทำบางอย่าง เช่น books.map(&:save) เพื่อบันทึกหนังสือหลายเล่ม เราจะขอให้อัปเดตดัชนี entertainment ทุกครั้งที่มีการบันทึกหนังสือแต่ละเล่ม ดังนั้น หากเราบันทึกหนังสือไว้ห้าเล่ม เราจะอัปเดตดัชนี Chewy ห้าครั้ง ลักษณะการทำงานนี้เป็นที่ยอมรับสำหรับ REPL แต่ไม่สามารถยอมรับได้อย่างแน่นอนสำหรับการดำเนินการของผู้ควบคุมซึ่งประสิทธิภาพเป็นสิ่งสำคัญ

เราแก้ไขปัญหานี้ด้วยบล็อก Chewy.atomic :

 class ApplicationController < ActionController::Base around_action { |&block| Chewy.atomic(&block) } end

กล่าวโดยย่อ Chewy.atomic ทำการอัพเดตเหล่านี้ดังนี้:

ปิดใช้งานการเรียกกลับ after_save
รวบรวม ID ของหนังสือที่บันทึกไว้
เมื่อบล็อก Chewy.atomic เสร็จสมบูรณ์ ให้ใช้ ID ที่รวบรวมมาเพื่อสร้างคำขออัปเดตดัชนี Elasticsearch เดียว

กำลังค้นหา

ตอนนี้เราพร้อมที่จะใช้งานอินเทอร์เฟซการค้นหาแล้ว เนื่องจากอินเทอร์เฟซผู้ใช้ของเราคือรูปแบบ วิธีที่ดีที่สุดในการสร้างคือด้วย FormBuilder และ ActiveModel (ที่ Toptal เราใช้ ActiveData เพื่อใช้อินเทอร์เฟซ ActiveModel แต่อย่าลังเลที่จะใช้อัญมณีที่คุณชื่นชอบ)

 class EntertainmentSearch include ActiveData::Model attribute :query, type: String attribute :author_id, type: Integer attribute :min_year, type: Integer attribute :max_year, type: Integer attribute :tags, mode: :arrayed, type: String, normalize: ->(value) { value.reject(&:blank?) } # This accessor is for the form. It will have a single text field # for comma-separated tag inputs. def tag_list= value self.tags = value.split(',').map(&:strip) end def tag_list self.tags.join(', ') end end

แบบสอบถามและตัวกรองบทช่วยสอน

ตอนนี้เรามีออบเจ็กต์ที่เหมือน ActiveModel ซึ่งสามารถยอมรับและแอตทริบิวต์ typecast ได้ เรามาดำเนินการค้นหากัน:

 class EntertainmentSearch ... def index EntertainmentIndex end def search # We can merge multiple scopes [query_string, author_id_filter, year_filter, tags_filter].compact.reduce(:merge) end # Using query_string advanced query for the main query input def query_string index.query(query_string: {fields: [:title, :author, :description], query: query, default_operator: 'and'}) if query? end # Simple term filter for author id. `:author_id` is already # typecasted to integer and ignored if empty. def author_id_filter index.filter(term: {author_id: author_id}) if author_id? end # For filtering on years, we will use range filter. # Returns nil if both min_year and max_year are not passed to the model. def year_filter body = {}.tap do |body| body.merge!(gte: min_year) if min_year? body.merge!(lte: max_year) if max_year? end index.filter(range: {year: body}) if body.present? end # Same goes for `author_id_filter`, but `terms` filter used. # Returns nil if no tags passed in. def tags_filter index.filter(terms: {tags: tags}) if tags? end end

ผู้ควบคุมและมุมมอง

ณ จุดนี้ โมเดลของเราสามารถทำการร้องขอการค้นหาด้วยแอตทริบิวต์ที่ส่งผ่าน การใช้งานจะมีลักษณะดังนี้:

 EntertainmentSearch.new(query: 'Tarantino', min_year: 1990).search

โปรดทราบว่าในคอนโทรลเลอร์ เราต้องการโหลดออบเจ็กต์ ActiveRecord แทนการห่อเอกสาร Chewy :

 class EntertainmentController < ApplicationController def index @search = EntertainmentSearch.new(params[:search]) # In case we want to load real objects, we don't need any other # fields except for `:id` retrieved from Elasticsearch index. # Chewy query DSL supports Kaminari gem and corresponding API. # Also, we pass scopes for every requested type to the `load` method. @entertainments = @search.search.only(:id).page(params[:page]).load( book: {scope: Book.includes(:author)}, movie: {scope: Video.includes(:director)}, cartoon: {scope: Video.includes(:director)} ) end end

ตอนนี้ได้เวลาเขียน HAML บางส่วนที่ entertainment/index.html.haml :

 = form_for @search, as: :search, url: entertainment_index_path, method: :get do |f| = f.text_field :query = f.select :author_id, Dude.all.map { |d| [d.name, d.id] }, include_blank: true = f.text_field :min_year = f.text_field :max_year = f.text_field :tag_list = f.submit - if @entertainments.any? %dl - @entertainments.each do |entertainment| %dt %h1= entertainment.title %strong= entertainment.class %dd %p= entertainment.year %p= entertainment.description %p= entertainment.tag_list = paginate @entertainments - else Nothing to see here

การเรียงลำดับ

นอกจากนี้ เรายังเพิ่มการจัดเรียงลงในฟังก์ชันการค้นหาของเราอีกด้วย

สมมติว่าเราจำเป็นต้องเรียงลำดับเขตข้อมูลชื่อเรื่องและปีตลอดจนตามความเกี่ยวข้อง น่าเสียดายที่ชื่อ One Flew Over the Cuckoo's Nest จะแบ่งออกเป็นคำแต่ละคำ ดังนั้นการจัดเรียงตามคำที่แตกต่างกันเหล่านี้จะสุ่มเกินไป เราต้องการจัดเรียงตามชื่อทั้งหมดแทน

วิธีแก้ไขคือใช้ฟิลด์ชื่อพิเศษและใช้ตัววิเคราะห์ของตัวเอง:

 class EntertainmentIndex < Chewy::Index settings analysis: { analyzer: { ... sorted: { # `keyword` tokenizer will not split our titles and # will produce the whole phrase as the term, which # can be sorted easily tokenizer: 'keyword', filter: ['lowercase', 'asciifolding'] } } } define_type Book.includes(:author, :tags) do # We use the `multi_field` type to add `title.sorted` field # to the type mapping. Also, will still use just the `title` # field for search. field :title, type: 'multi_field' do field :title, index: 'analyzed', analyzer: 'title' field :sorted, index: 'analyzed', analyzer: 'sorted' end ... end {movie: Video.movies, cartoon: Video.cartoons}.each do |type_name, scope| define_type scope.includes(:director, :tags), name: type_name do # For videos as well field :title, type: 'multi_field' do field :title, index: 'analyzed', analyzer: 'title' field :sorted, index: 'analyzed', analyzer: 'sorted' end ... end end end

นอกจากนี้ เราจะเพิ่มทั้งแอตทริบิวต์ใหม่และขั้นตอนการประมวลผลการจัดเรียงลงในรูปแบบการค้นหาของเรา:

 class EntertainmentSearch # we are going to use `title.sorted` field for sort SORT = {title: {'title.sorted' => :asc}, year: {year: :desc}, relevance: :_score} ... attribute :sort, type: String, enum: %w(title year relevance), default_blank: 'relevance' ... def search # we have added `sorting` scope to merge list [query_string, author_id_filter, year_filter, tags_filter, sorting].compact.reduce(:merge) end def sorting # We have one of the 3 possible values in `sort` attribute # and `SORT` mapping returns actual sorting expression index.order(SORT[sort.to_sym]) end end

สุดท้าย เราจะแก้ไขช่องการเลือกตัวเลือกการจัดเรียงแบบฟอร์มของเรา:

 = form_for @search, as: :search, url: entertainment_index_path, method: :get do |f| ... / `EntertainmentSearch.sort_values` will just return / enum option content from the sort attribute definition. = f.select :sort, EntertainmentSearch.sort_values ...

การจัดการข้อผิดพลาด

หากผู้ใช้ของคุณทำการสืบค้นที่ไม่ถูกต้อง เช่น ( หรือ AND ไคลเอนต์ Elasticsearch จะทำให้เกิดข้อผิดพลาด เพื่อจัดการกับสิ่งนั้น มาทำการเปลี่ยนแปลงบางอย่างกับคอนโทรลเลอร์ของเรา:

 class EntertainmentController < ApplicationController def index @search = EntertainmentSearch.new(params[:search]) @entertainments = @search.search.only(:id).page(params[:page]).load( book: {scope: Book.includes(:author)}, movie: {scope: Video.includes(:director)}, cartoon: {scope: Video.includes(:director)} ) rescue Elasticsearch::Transport::Transport::Errors::BadRequest => e @entertainments = [] @error = e.message.match(/QueryParsingException\[([^;]+)\]/).try(:[], 1) end end

นอกจากนี้ เราต้องแสดงข้อผิดพลาดในมุมมอง:

 ... - if @entertainments.any? ... - else - if @error = @error - else Nothing to see here

การทดสอบข้อความค้นหา Elasticsearch

การตั้งค่าการทดสอบพื้นฐานมีดังนี้:

เริ่มเซิร์ฟเวอร์ Elasticsearch
ล้างข้อมูลและสร้างดัชนีของเรา
นำเข้าข้อมูลของเรา
ดำเนินการสอบถามของเรา
อ้างโยงผลลัพธ์กับความคาดหวังของเรา

สำหรับขั้นตอนที่ 1 จะสะดวกที่จะใช้คลัสเตอร์ทดสอบที่กำหนดไว้ใน gem ส่วนขยาย elasticsearch เพียงเพิ่มบรรทัดต่อไปนี้ในการติดตั้ง Rakefile post-gem ของโปรเจ็กต์ของคุณ:

 require 'elasticsearch/extensions/test/cluster/tasks'

จากนั้น คุณจะได้รับงาน Rake ต่อไปนี้:

 $ rake -T elasticsearch rake elasticsearch:start # Start Elasticsearch cluster for tests rake elasticsearch:stop # Stop Elasticsearch cluster for tests

Elasticsearch และ Rspec

อันดับแรก เราต้องตรวจสอบให้แน่ใจว่าดัชนีของเราได้รับการอัปเดตเพื่อให้สอดคล้องกับการเปลี่ยนแปลงข้อมูลของเรา โชคดีที่ Chewy gem มาพร้อมกับตัวจับคู่ update_index rspec ที่เป็นประโยชน์:

 describe EntertainmentIndex do # No need to cleanup Elasticsearch as requests are # stubbed in case of `update_index` matcher usage. describe 'Tag' do # We create several books with the same tag let(:books) { create_list :book, 2, tag_list: 'tag1' } specify do # We expect that after modifying the tag name... expect do ActsAsTaggableOn::Tag.where(name: 'tag1').update_attributes(name: 'tag2') # ... the corresponding type will be updated with previously-created books. end.to update_index('entertainment#book').and_reindex(books, with: {tags: ['tag2']}) end end end

ต่อไป เราต้องทดสอบว่าคำค้นหาจริงทำงานอย่างถูกต้องและส่งคืนผลลัพธ์ที่คาดไว้:

 describe EntertainmentSearch do # Just defining helpers for simplifying testing def search attributes = {} EntertainmentSearch.new(attributes).search end # Import helper as well def import *args # We are using `import!` here to be sure all the objects are imported # correctly before examples run. EntertainmentIndex.import! *args end # Deletes and recreates index before every example before { EntertainmentIndex.purge! } describe '#min_year, #max_year' do let(:book) { create(:book, year: 1925) } let(:movie) { create(:movie, year: 1970) } let(:cartoon) { create(:cartoon, year: 1995) } before { import book: book, movie: movie, cartoon: cartoon } # NOTE: The sample code below provides a clear usage example but is not # optimized code. Something along the following lines would perform better: # `specify { search(min_year: 1970).map(&:id).map(&:to_i) # .should =~ [movie, cartoon].map(&:id) }` specify { search(min_year: 1970).load.should =~ [movie, cartoon] } specify { search(max_year: 1980).load.should =~ [book, movie] } specify { search(min_year: 1970, max_year: 1980).load.should == [movie] } specify { search(min_year: 1980, max_year: 1970).should == [] } end end

ทดสอบการแก้ไขปัญหาคลัสเตอร์

สุดท้าย นี่คือคำแนะนำสำหรับการแก้ไขปัญหาคลัสเตอร์ทดสอบของคุณ:

ในการเริ่มต้น ให้ใช้คลัสเตอร์หนึ่งโหนดในหน่วยความจำ มันจะเร็วขึ้นมากสำหรับสเปก ในกรณีของเรา: TEST_CLUSTER_NODES=1 rake elasticsearch:start
มีปัญหาบางอย่างเกี่ยวกับการใช้งานคลัสเตอร์ทดสอบ elasticsearch-extensions ที่เกี่ยวข้องกับการตรวจสอบสถานะคลัสเตอร์โหนดเดียว (เป็นสีเหลืองในบางกรณีและจะไม่เป็นสีเขียว ดังนั้นการตรวจสอบการเริ่มต้นคลัสเตอร์สถานะสีเขียวจะล้มเหลวทุกครั้ง) ปัญหาได้รับการแก้ไขแล้วในส้อม แต่หวังว่าจะได้รับการแก้ไขใน repo หลักในไม่ช้า
สำหรับแต่ละชุดข้อมูล ให้จัดกลุ่มคำขอของคุณในข้อกำหนด (เช่น นำเข้าข้อมูลของคุณหนึ่งครั้งแล้วดำเนินการตามคำขอหลายๆ ครั้ง) Elasticsearch อุ่นเครื่องเป็นเวลานานและใช้หน่วยความจำฮีปจำนวนมากในขณะนำเข้าข้อมูล ดังนั้นอย่าหักโหมจนเกินไป โดยเฉพาะอย่างยิ่งหากคุณมีข้อกำหนดจำนวนมาก
ตรวจสอบให้แน่ใจว่าเครื่องของคุณมีหน่วยความจำเพียงพอ มิฉะนั้น Elasticsearch จะหยุดทำงาน (เราต้องการประมาณ 5GB สำหรับการทดสอบเครื่องเสมือนแต่ละครั้ง และประมาณ 1GB สำหรับ Elasticsearch เอง)

ห่อ

Elasticsearch อธิบายตนเองว่าเป็น "โอเพ่นซอร์สที่ยืดหยุ่นและทรงพลัง เครื่องมือค้นหาแบบกระจายตามเวลาจริง และเครื่องมือวิเคราะห์" เป็นมาตรฐานทองคำในเทคโนโลยีการค้นหา

ด้วย Chewy นักพัฒนา Rails ของเราได้รวมเอาประโยชน์เหล่านี้ไว้ในรูปแบบโอเพ่นซอร์ส Ruby gem ที่มีคุณภาพการผลิตที่เรียบง่าย ใช้งานง่าย ซึ่งให้การผสานรวมกับ Rails อย่างแน่นหนา Elasticsearch และ Rails – ช่างเป็นการผสมผสานที่ยอดเยี่ยมจริงๆ!

Elasticsearch และ Rails - ช่างเป็นการผสมผสานที่ยอดเยี่ยมจริงๆ!

ทวีต

ภาคผนวก: Elasticsearch internals

ต่อไปนี้คือข้อมูลเบื้องต้นสั้น ๆ เกี่ยวกับ Elasticsearch "ภายใต้กระโปรงหน้ารถ"...

Elasticsearch สร้างขึ้นบน Lucene ซึ่งใช้ดัชนีกลับหัวเป็นโครงสร้างข้อมูลหลัก ตัวอย่างเช่น หากเรามีสตริง "สุนัขกระโดดสูง" "กระโดดข้ามรั้ว" และ "รั้วสูงเกินไป" เราจะได้โครงสร้างดังต่อไปนี้:

 "the" [0, 0], [1, 2], [2, 0] "dogs" [0, 1] "jump" [0, 2], [1, 0] "high" [0, 3], [2, 4] "over" [1, 1] "fence" [1, 3], [2, 1] "was" [2, 2] "too" [2, 3]

ดังนั้น ทุกคำจึงมีทั้งการอ้างอิงถึงและตำแหน่งในข้อความ นอกจากนี้ เราเลือกที่จะแก้ไขข้อกำหนดของเรา (เช่น โดยการลบคำหยุด เช่น "the") และใช้การแฮชเสียงกับทุกคำ (คุณเดาอัลกอริทึมได้ไหม):

 "DAG" [0, 1] "JANP" [0, 2], [1, 0] "HAG" [0, 3], [2, 4] "OVAR" [1, 1] "FANC" [1, 3], [2, 1] "W" [2, 2] "T" [2, 3]

หากเราค้นหาคำว่า "the dog jumps" จะมีการวิเคราะห์ในลักษณะเดียวกับข้อความต้นฉบับ กลายเป็น "DAG JANP" หลังจากการแฮช ("dog" มีแฮชเหมือนกับ "dogs" เช่นเดียวกับ "jumps" และ "กระโดด").

นอกจากนี้เรายังเพิ่มตรรกะระหว่างคำแต่ละคำในสตริง (ตามการตั้งค่าการกำหนดค่า) โดยเลือกระหว่าง (“DAG” และ “JANP”) หรือ (“DAG” OR “JANP”) อดีตส่งกลับจุดตัดของ [0] & [0, 1] (เช่นเอกสาร 0) และส่วนหลัง [0] | [0, 1] [0] | [0, 1] (เช่น เอกสาร 0 และ 1) ตำแหน่งในข้อความสามารถใช้สำหรับการให้คะแนนและแบบสอบถามที่ขึ้นกับตำแหน่ง