Elasticearch for Ruby on Rails: A Tutorial to Chewy Gem

نشرت: 2022-03-11

يوفر Elasticsearch واجهة HTTP قوية وفعالة لفهرسة البيانات والاستعلام عنها ، مبنية على أعلى مكتبة Apache Lucene. بمجرد إخراجه من الصندوق ، فإنه يوفر بحثًا قابلاً للتطوير وفعال وقويًا ، مع دعم UTF-8. إنها أداة قوية لفهرسة كميات هائلة من البيانات المنظمة والاستعلام عنها ، وهنا في Toptal ، تعمل على تشغيل بحث النظام الأساسي الخاص بنا وسيتم استخدامها قريبًا للإكمال التلقائي أيضًا. نحن معجبون كثيرون.

يعمل Chewy على توسيع نطاق عميل Elasticsearch-Ruby ، مما يجعله أكثر قوة ويوفر تكاملاً أكثر إحكامًا مع Rails.

نظرًا لأن نظامنا الأساسي مبني باستخدام Ruby on Rails ، فإن تكاملنا مع Elasticsearch يستفيد من مشروع elasticsearch-ruby (إطار تكامل Ruby لـ Elasticsearch الذي يوفر عميلاً للاتصال بمجموعة Elasticsearch ، وواجهة برمجة تطبيقات Ruby لـ Elasticsearch's REST API ، و مختلف الامتدادات والمرافق). بناءً على هذا الأساس ، قمنا بتطوير وإصدار تحسين خاص بنا (وتبسيط) لهندسة البحث في تطبيق Elasticsearch ، والتي تم تجميعها كجوهرة روبي التي أطلقنا عليها اسم Chewy (مع توفر مثال للتطبيق هنا).

يعمل Chewy على توسيع نطاق عميل Elasticsearch-Ruby ، مما يجعله أكثر قوة ويوفر تكاملاً أكثر إحكامًا مع Rails. في دليل Elasticsearch هذا ، أناقش (من خلال أمثلة الاستخدام) كيف أنجزنا ذلك ، بما في ذلك العقبات التقنية التي ظهرت أثناء التنفيذ.

تم توضيح العلاقة بين Elasticsearch و Ruby on Rails في هذا الدليل المرئي.

بضع ملاحظات سريعة قبل المتابعة إلى الدليل:

يتوفر تطبيق Chewy و Chewy التجريبي على GitHub.
لأولئك المهتمين بمزيد من المعلومات "تحت الغطاء" حول Elasticsearch ، قمت بتضمين موجز مكتوب كملحق لهذه المشاركة.

لماذا Chewy؟

على الرغم من قابلية تطوير Elasticsearch وكفاءتها ، إلا أن دمجها مع Rails لم يكن بسيطًا كما كان متوقعًا. في Toptal ، وجدنا أنفسنا بحاجة إلى زيادة عميل Elasticsearch-Ruby الأساسي بشكل كبير لجعله أكثر أداءً ودعم عمليات إضافية.

على الرغم من قابلية تطوير Elasticsearch وكفاءتها ، إلا أن دمجها مع Rails لم يكن بسيطًا كما كان متوقعًا.

وهكذا ، ولدت جوهرة مطاطية.

بعض الميزات الجديرة بالملاحظة بشكل خاص لـ Chewy تشمل:

كل فهرس يمكن ملاحظته من قبل جميع النماذج ذات الصلة.
ترتبط معظم النماذج المفهرسة ببعضها البعض. وفي بعض الأحيان ، من الضروري إلغاء تنسيق هذه البيانات ذات الصلة وربطها بنفس الكائن (على سبيل المثال ، إذا كنت تريد فهرسة مجموعة من العلامات مع المقالة المرتبطة بها). يتيح لك Chewy تحديد فهرس قابل للتحديث لكل نموذج ، لذلك ستتم إعادة فهرسة المقالات المقابلة كلما تم تحديث علامة ذات صلة.
فئات الفهرس مستقلة عن نماذج ORM / ODM.

مع هذا التحسين ، يكون تنفيذ الإكمال التلقائي عبر النماذج ، على سبيل المثال ، أسهل بكثير. يمكنك فقط تحديد فهرس والعمل معه بطريقة كائنية التوجه. على عكس العملاء الآخرين ، فإن Chewy gem يزيل الحاجة إلى تنفيذ فئات الفهرس يدويًا ، وعمليات رد نداء استيراد البيانات ، والمكونات الأخرى.
الاستيراد بالجملة في كل مكان .
يستخدم Chewy واجهة برمجة تطبيقات Elasticsearch المجمعة لإعادة الفهرسة وتحديثات الفهرس الكاملة. كما أنه يستخدم مفهوم التحديثات الذرية ، وجمع الأشياء المتغيرة داخل كتلة ذرية وتحديثها جميعًا مرة واحدة.
يوفر Chewy استعلام AR-style DSL.
من خلال كونه قابلاً للتسلسل ، ودمجًا ، وكسولًا ، يتيح هذا التحسين إنتاج الاستعلامات بطريقة أكثر كفاءة.

حسنًا ، لنرى كيف يتم كل هذا في الأحجار الكريمة ...

الدليل الأساسي ل Elasticsearch

يحتوي Elasticsearch على العديد من المفاهيم المتعلقة بالوثائق. الأول هو index (التناظري database في RDBMS) ، والذي يتكون من مجموعة من documents ، والتي يمكن أن تكون من عدة types (حيث يكون type نوعًا من جدول RDBMS).

يحتوي كل مستند على مجموعة من fields . يتم تحليل كل حقل بشكل مستقل ويتم تخزين خيارات التحليل الخاصة به في mapping . يستخدم Chewy هذا الهيكل "كما هو" في نموذج الكائن الخاص به:

 class EntertainmentIndex < Chewy::Index settings analysis: { analyzer: { title: { tokenizer: 'standard', filter: ['lowercase', 'asciifolding'] } } } define_type Book.includes(:author, :tags) do field :title, analyzer: 'title' field :year, type: 'integer' field :author, value: ->{ author.name } field :author_id, type: 'integer' field :description field :tags, index: 'not_analyzed', value: ->{ tags.map(&:name) } end {movie: Video.movies, cartoon: Video.cartoons}.each do |type_name, scope| define_type scope.includes(:director, :tags), name: type_name do field :title, analyzer: 'title' field :year, type: 'integer' field :author, value: ->{ director.name } field :author_id, type: 'integer', value: ->{ director_id } field :description field :tags, index: 'not_analyzed', value: ->{ tags.map(&:name) } end end end

أعلاه ، حددنا فهرس Elasticsearch يسمى entertainment بثلاثة أنواع: book movie cartoon . لكل نوع ، حددنا بعض تعيينات الحقول وتجزئة الإعدادات للفهرس بأكمله.

لذلك ، حددنا EntertainmentIndex ونريد تنفيذ بعض الاستفسارات. كخطوة أولى ، نحتاج إلى إنشاء الفهرس واستيراد بياناتنا:

 EntertainmentIndex.create! EntertainmentIndex.import # EntertainmentIndex.reset! (which includes deletion, # creation, and import) could be used instead

يدرك أسلوب .import البيانات المستوردة لأننا مررنا في النطاقات عندما حددنا أنواعنا ؛ وبالتالي ، سيتم استيراد جميع الكتب والأفلام والرسوم المتحركة المخزنة في التخزين الدائم.

بعد ذلك ، يمكننا إجراء بعض الاستفسارات:

 EntertainmentIndex.query(match: {author: 'Tarantino'}).filter{ year > 1990 } EntertainmentIndex.query(match: {title: 'Shawshank'}).types(:movie) EntertainmentIndex.query(match: {author: 'Tarantino'}).only(:id).limit(10).load # the last one loads ActiveRecord objects for documents found

أصبح فهرسنا الآن جاهزًا تقريبًا للاستخدام في تنفيذ البحث.

تكامل القضبان

للتكامل مع ريلز ، فإن أول شيء نحتاجه هو أن نكون قادرين على الرد على تغييرات كائن RDBMS. يدعم Chewy هذا السلوك عبر عمليات الاسترجاعات المحددة في طريقة فئة update_index . يأخذ update_index :

معرف النوع المقدم بتنسيق "index_name#type_name"
اسم طريقة أو كتلة يتم تنفيذها ، والتي تمثل مرجعًا خلفيًا للكائن المحدث أو مجموعة الكائنات

نحتاج إلى تحديد عمليات الاسترجاعات هذه لكل نموذج تابع:

 class Book < ActiveRecord::Base acts_as_taggable belongs_to :author, class_name: 'Dude' # We update the book itself on-change update_index 'entertainment#book', :self end class Video < ActiveRecord::Base acts_as_taggable belongs_to :director, class_name: 'Dude' # Update video types when changed, depending on the category update_index('entertainment#movie') { self if movie? } update_index('entertainment#cartoon') { self if cartoon? } end class Dude < ActiveRecord::Base acts_as_taggable has_many :books has_many :videos # If author or director was changed, all the corresponding # books, movies and cartoons are updated update_index 'entertainment#book', :books update_index('entertainment#movie') { videos.movies } update_index('entertainment#cartoon') { videos.cartoons } end

نظرًا لأن العلامات مفهرسة أيضًا ، نحتاج بعد ذلك إلى تصحيح القرد لبعض النماذج الخارجية حتى تتفاعل مع التغييرات:

 ActsAsTaggableOn::Tag.class_eval do has_many :books, through: :taggings, source: :taggable, source_type: 'Book' has_many :videos, through: :taggings, source: :taggable, source_type: 'Video' # Updating all tag-related objects update_index 'entertainment#book', :books update_index('entertainment#movie') { videos.movies } update_index('entertainment#cartoon') { videos.cartoons } end ActsAsTaggableOn::Tagging.class_eval do # Same goes for the intermediate model update_index('entertainment#book') { taggable if taggable_type == 'Book' } update_index('entertainment#movie') { taggable if taggable_type == 'Video' && taggable.movie? } update_index('entertainment#cartoon') { taggable if taggable_type == 'Video' && taggable.cartoon? } end

في هذه المرحلة ، سيقوم كل عنصر يتم حفظه أو إتلافه بتحديث نوع فهرس Elasticsearch المقابل.

الذرية

لا يزال لدينا مشكلة واحدة باقية. إذا فعلنا شيئًا مثل books.map(&:save) لحفظ عدة كتب ، فسنطلب تحديث فهرس entertainment في كل مرة يتم فيها حفظ كتاب فردي . وبالتالي ، إذا حفظنا خمسة كتب ، فسنقوم بتحديث فهرس Chewy خمس مرات. يعتبر هذا السلوك مقبولاً لـ REPL ، ولكنه بالتأكيد غير مقبول لإجراءات وحدة التحكم التي يكون الأداء فيها بالغ الأهمية.

نعالج هذه المشكلة مع Chewy.atomic Block:

 class ApplicationController < ActionController::Base around_action { |&block| Chewy.atomic(&block) } end

باختصار ، Chewy.atomic هذه التحديثات على النحو التالي:

تعطيل رد الاتصال after_save .
يجمع معرّفات الكتب المحفوظة.
عند الانتهاء من Chewy.atomic Block ، يستخدم المعرفات المجمعة لتقديم طلب واحد لتحديث فهرس Elasticsearch.

يبحث

نحن الآن جاهزون لتنفيذ واجهة بحث. نظرًا لأن واجهة المستخدم الخاصة بنا عبارة عن نموذج ، فإن أفضل طريقة لبنائها هي ، بالطبع ، باستخدام FormBuilder و ActiveModel. (في Toptal ، نستخدم ActiveData لتنفيذ واجهات ActiveModel ، لكن لا تتردد في استخدام جوهرة المفضلة لديك.)

 class EntertainmentSearch include ActiveData::Model attribute :query, type: String attribute :author_id, type: Integer attribute :min_year, type: Integer attribute :max_year, type: Integer attribute :tags, mode: :arrayed, type: String, normalize: ->(value) { value.reject(&:blank?) } # This accessor is for the form. It will have a single text field # for comma-separated tag inputs. def tag_list= value self.tags = value.split(',').map(&:strip) end def tag_list self.tags.join(', ') end end

الاستعلام عن البرنامج التعليمي وعوامل التصفية

الآن بعد أن أصبح لدينا كائن يشبه ActiveModel يمكنه قبول السمات وطباعتها ، دعنا ننفذ البحث:

 class EntertainmentSearch ... def index EntertainmentIndex end def search # We can merge multiple scopes [query_string, author_id_filter, year_filter, tags_filter].compact.reduce(:merge) end # Using query_string advanced query for the main query input def query_string index.query(query_string: {fields: [:title, :author, :description], query: query, default_operator: 'and'}) if query? end # Simple term filter for author id. `:author_id` is already # typecasted to integer and ignored if empty. def author_id_filter index.filter(term: {author_id: author_id}) if author_id? end # For filtering on years, we will use range filter. # Returns nil if both min_year and max_year are not passed to the model. def year_filter body = {}.tap do |body| body.merge!(gte: min_year) if min_year? body.merge!(lte: max_year) if max_year? end index.filter(range: {year: body}) if body.present? end # Same goes for `author_id_filter`, but `terms` filter used. # Returns nil if no tags passed in. def tags_filter index.filter(terms: {tags: tags}) if tags? end end

وحدات تحكم ووجهات نظر

في هذه المرحلة ، يمكن لنموذجنا تنفيذ طلبات البحث بسمات تم تمريرها. سيبدو الاستخدام كما يلي:

 EntertainmentSearch.new(query: 'Tarantino', min_year: 1990).search

لاحظ أنه في وحدة التحكم ، نريد تحميل كائنات ActiveRecord الدقيقة بدلاً من أغلفة المستندات Chewy :

 class EntertainmentController < ApplicationController def index @search = EntertainmentSearch.new(params[:search]) # In case we want to load real objects, we don't need any other # fields except for `:id` retrieved from Elasticsearch index. # Chewy query DSL supports Kaminari gem and corresponding API. # Also, we pass scopes for every requested type to the `load` method. @entertainments = @search.search.only(:id).page(params[:page]).load( book: {scope: Book.includes(:author)}, movie: {scope: Video.includes(:director)}, cartoon: {scope: Video.includes(:director)} ) end end

حان الوقت الآن لكتابة بعض HAML على Entertainment entertainment/index.html.haml :

 = form_for @search, as: :search, url: entertainment_index_path, method: :get do |f| = f.text_field :query = f.select :author_id, Dude.all.map { |d| [d.name, d.id] }, include_blank: true = f.text_field :min_year = f.text_field :max_year = f.text_field :tag_list = f.submit - if @entertainments.any? %dl - @entertainments.each do |entertainment| %dt %h1= entertainment.title %strong= entertainment.class %dd %p= entertainment.year %p= entertainment.description %p= entertainment.tag_list = paginate @entertainments - else Nothing to see here

فرز

كمكافأة ، سنضيف أيضًا الفرز إلى وظائف البحث لدينا.

افترض أننا بحاجة إلى الفرز حسب العنوان وحقول السنة ، وكذلك حسب الصلة. لسوء الحظ ، سيتم تقسيم العنوان One Flew Over the Cuckoo's Nest إلى مصطلحات فردية ، لذا فإن الفرز حسب هذه المصطلحات المتباينة سيكون عشوائيًا جدًا ؛ بدلاً من ذلك ، نود الفرز حسب العنوان بالكامل.

الحل هو استخدام حقل عنوان خاص وتطبيق المحلل الخاص به:

 class EntertainmentIndex < Chewy::Index settings analysis: { analyzer: { ... sorted: { # `keyword` tokenizer will not split our titles and # will produce the whole phrase as the term, which # can be sorted easily tokenizer: 'keyword', filter: ['lowercase', 'asciifolding'] } } } define_type Book.includes(:author, :tags) do # We use the `multi_field` type to add `title.sorted` field # to the type mapping. Also, will still use just the `title` # field for search. field :title, type: 'multi_field' do field :title, index: 'analyzed', analyzer: 'title' field :sorted, index: 'analyzed', analyzer: 'sorted' end ... end {movie: Video.movies, cartoon: Video.cartoons}.each do |type_name, scope| define_type scope.includes(:director, :tags), name: type_name do # For videos as well field :title, type: 'multi_field' do field :title, index: 'analyzed', analyzer: 'title' field :sorted, index: 'analyzed', analyzer: 'sorted' end ... end end end

بالإضافة إلى ذلك ، سنقوم بإضافة هذه السمات الجديدة وخطوة معالجة الفرز إلى نموذج البحث الخاص بنا:

 class EntertainmentSearch # we are going to use `title.sorted` field for sort SORT = {title: {'title.sorted' => :asc}, year: {year: :desc}, relevance: :_score} ... attribute :sort, type: String, enum: %w(title year relevance), default_blank: 'relevance' ... def search # we have added `sorting` scope to merge list [query_string, author_id_filter, year_filter, tags_filter, sorting].compact.reduce(:merge) end def sorting # We have one of the 3 possible values in `sort` attribute # and `SORT` mapping returns actual sorting expression index.order(SORT[sort.to_sym]) end end

أخيرًا ، سنقوم بتعديل النموذج الخاص بنا بإضافة مربع اختيار خيارات الفرز:

 = form_for @search, as: :search, url: entertainment_index_path, method: :get do |f| ... / `EntertainmentSearch.sort_values` will just return / enum option content from the sort attribute definition. = f.select :sort, EntertainmentSearch.sort_values ...

معالجة الأخطاء

إذا أجرى المستخدمون استعلامات غير صحيحة مثل ( أو AND ، فسيقوم عميل Elasticsearch بإصدار خطأ. لمعالجة ذلك ، دعنا نجري بعض التغييرات على وحدة التحكم الخاصة بنا:

 class EntertainmentController < ApplicationController def index @search = EntertainmentSearch.new(params[:search]) @entertainments = @search.search.only(:id).page(params[:page]).load( book: {scope: Book.includes(:author)}, movie: {scope: Video.includes(:director)}, cartoon: {scope: Video.includes(:director)} ) rescue Elasticsearch::Transport::Transport::Errors::BadRequest => e @entertainments = [] @error = e.message.match(/QueryParsingException\[([^;]+)\]/).try(:[], 1) end end

علاوة على ذلك ، نحتاج إلى عرض الخطأ في العرض:

 ... - if @entertainments.any? ... - else - if @error = @error - else Nothing to see here

استعلامات اختبار Elasticsearch

إعداد الاختبار الأساسي كما يلي:

ابدأ تشغيل خادم Elasticsearch.
تنظيف وإنشاء مؤشراتنا.
استيراد بياناتنا.
قم بإجراء الاستعلام الخاص بنا.
قم بمراجعة النتيجة مع توقعاتنا.

بالنسبة للخطوة 1 ، من الملائم استخدام مجموعة الاختبار المحددة في جوهرة ملحقات البحث المطاطي. ما عليك سوى إضافة السطر التالي إلى Rakefile بعد الأحجار الكريمة الخاص بمشروعك:

 require 'elasticsearch/extensions/test/cluster/tasks'

بعد ذلك ، ستحصل على مهام Rake التالية:

 $ rake -T elasticsearch rake elasticsearch:start # Start Elasticsearch cluster for tests rake elasticsearch:stop # Stop Elasticsearch cluster for tests

Elasticsearch و Rspec

أولاً ، نحتاج إلى التأكد من تحديث فهرسنا ليكون متزامنًا مع تغييرات البيانات الخاصة بنا. لحسن الحظ ، تأتي الأحجار الكريمة Chewy مع أداة تطابق update_index update_index المفيدة:

 describe EntertainmentIndex do # No need to cleanup Elasticsearch as requests are # stubbed in case of `update_index` matcher usage. describe 'Tag' do # We create several books with the same tag let(:books) { create_list :book, 2, tag_list: 'tag1' } specify do # We expect that after modifying the tag name... expect do ActsAsTaggableOn::Tag.where(name: 'tag1').update_attributes(name: 'tag2') # ... the corresponding type will be updated with previously-created books. end.to update_index('entertainment#book').and_reindex(books, with: {tags: ['tag2']}) end end end

بعد ذلك ، نحتاج إلى اختبار ما إذا كانت استعلامات البحث الفعلية يتم تنفيذها بشكل صحيح وأنها تعرض النتائج المتوقعة:

 describe EntertainmentSearch do # Just defining helpers for simplifying testing def search attributes = {} EntertainmentSearch.new(attributes).search end # Import helper as well def import *args # We are using `import!` here to be sure all the objects are imported # correctly before examples run. EntertainmentIndex.import! *args end # Deletes and recreates index before every example before { EntertainmentIndex.purge! } describe '#min_year, #max_year' do let(:book) { create(:book, year: 1925) } let(:movie) { create(:movie, year: 1970) } let(:cartoon) { create(:cartoon, year: 1995) } before { import book: book, movie: movie, cartoon: cartoon } # NOTE: The sample code below provides a clear usage example but is not # optimized code. Something along the following lines would perform better: # `specify { search(min_year: 1970).map(&:id).map(&:to_i) # .should =~ [movie, cartoon].map(&:id) }` specify { search(min_year: 1970).load.should =~ [movie, cartoon] } specify { search(max_year: 1980).load.should =~ [book, movie] } specify { search(min_year: 1970, max_year: 1980).load.should == [movie] } specify { search(min_year: 1980, max_year: 1970).should == [] } end end

اختبار استكشاف أخطاء الكتلة وإصلاحها

أخيرًا ، إليك دليل لاستكشاف مجموعة الاختبار الخاصة بك وإصلاحها:

للبدء ، استخدم نظام مجموعة مكون من عقدة واحدة في الذاكرة. سيكون أسرع بالنسبة للمواصفات. في حالتنا: TEST_CLUSTER_NODES=1 rake elasticsearch:start
هناك بعض المشكلات الحالية في تنفيذ مجموعة اختبار elasticsearch-extensions elasticsearch نفسها المتعلقة بفحص حالة مجموعة العقدة الواحدة (يكون لونه أصفر في بعض الحالات ولن يتحول إلى اللون الأخضر أبدًا ، لذلك سيفشل فحص بدء مجموعة الحالة الخضراء في كل مرة). تم إصلاح المشكلة في مفترق طرق ، ولكن نأمل أن يتم إصلاحها في الريبو الرئيسي قريبًا.
لكل مجموعة بيانات ، قم بتجميع طلبك حسب المواصفات (على سبيل المثال ، قم باستيراد البيانات مرة واحدة ثم قم بتنفيذ عدة طلبات). يتم تسخين Elasticsearch لفترة طويلة ويستخدم الكثير من ذاكرة الكومة أثناء استيراد البيانات ، لذلك لا تفرط في ذلك ، خاصة إذا كان لديك مجموعة من المواصفات.
تأكد من أن جهازك يحتوي على ذاكرة كافية وإلا سيتم تجميد Elasticsearch (طلبنا حوالي 5 جيجابايت لكل جهاز افتراضي للاختبار وحوالي 1 جيجابايت لـ Elasticsearch نفسها).

تغليف

يوصف Elasticsearch ذاتيًا بأنه "محرك بحث وتحليلات مرن وقوي ومفتوح المصدر وموزع وفي الوقت الفعلي." إنها المعيار الذهبي في تقنيات البحث.

مع Chewy ، قام مطورو السكك الحديدية لدينا بتجميع هذه المزايا باعتبارها جوهرة روبي بسيطة وسهلة الاستخدام وجودة الإنتاج ومفتوحة المصدر توفر تكاملاً محكمًا مع Rails. Elasticsearch و Rails - يا له من مزيج رائع!

Elasticsearch و Rails - يا له من مزيج رائع!

سقسقة

الملحق: Elasticsearch الداخلية

إليك مقدمة موجزة جدًا عن Elasticsearch "تحت الغطاء" ...

تم بناء Elasticsearch على أساس Lucene ، والذي يستخدم في حد ذاته مؤشرات مقلوبة باعتباره هيكل البيانات الأساسي الخاص به. على سبيل المثال ، إذا كانت لدينا الأوتار "الكلاب تقفز عاليًا" و "القفز فوق السياج" و "السياج كان مرتفعًا جدًا" ، نحصل على الهيكل التالي:

 "the" [0, 0], [1, 2], [2, 0] "dogs" [0, 1] "jump" [0, 2], [1, 0] "high" [0, 3], [2, 4] "over" [1, 1] "fence" [1, 3], [2, 1] "was" [2, 2] "too" [2, 3]

وبالتالي ، يحتوي كل مصطلح على إشارات إلى النص ومواضع فيه. علاوة على ذلك ، نختار تعديل شروطنا (على سبيل المثال ، عن طريق إزالة كلمات التوقف مثل "the") وتطبيق التجزئة الصوتية على كل مصطلح (هل يمكنك تخمين الخوارزمية؟):

 "DAG" [0, 1] "JANP" [0, 2], [1, 0] "HAG" [0, 3], [2, 4] "OVAR" [1, 1] "FANC" [1, 3], [2, 1] "W" [2, 2] "T" [2, 3]

إذا استفسرنا بعد ذلك عن "الكلب يقفز" ، فسيتم تحليله بنفس طريقة تحليل النص المصدر ، ليصبح "DAG JANP" بعد التجزئة (يحتوي "الكلب" على نفس علامة التجزئة مثل "الكلاب" ، كما هو الحال مع "القفزات" و "قفزة").

نضيف أيضًا بعض المنطق بين الكلمات الفردية في السلسلة (بناءً على إعدادات التكوين) ، والاختيار بين ("DAG" و "JANP") أو ("DAG" أو "JANP"). الأول يعرض تقاطع [0] & [0, 1] (أي المستند 0) والأخير ، [0] | [0, 1] [0] | [0, 1] (أي المستندات 0 و 1). يمكن استخدام مواضع النص لتسجيل النتائج والاستعلامات المعتمدة على الموضع.