Continuous Crawl Sharepoint Doesn t Remove Deleted Site Content

SharePoint 2013: Continuous Crawl and the Difference Between Incremental and Continuous Crawl

SharePoint 2013: Continuous Crawl and the Difference Between Incremental and Continuous Crawl

With the new version of SharePoint a new type of crawl appeared in 2013 named « Continuous Crawl ».

For old schools like me, in SharePoint 2010 we had 2 crawls available and it was configurable on our Search Service Application.

  • Full: Crawl all content,
  • Incremental: As the name says, it crawls content that has been modified since the last crawl.

The disadvantage of these crawls, is that once launched, you are not able to launch a second crawl in parallel (on the same content source), and therefore for the content changed in the meantime we will need to wait until the current crawl is finished (crawl and another) to be integrated into the index, and therefore to be found via search.

An example :

  • A incremental crawl named ALFA is started and will last 50 take minutes,
  • After 10 minutes of crawling a new document has been added, so we need a second incremental crawl named BETA to get the document in the index.
  • This item will have to wait at least 40 minutes to be integrated into the index.

So, we can't keep an updated index with the latest changes, because latency is invited in each crawling process.

It is possible that in most of cases this operation is suitable and favorable for your clients, but for those who want to search their content immediately or after their integration into SharePoint there is now a new solution in SharePoint: "Continuous Crawl".

The Continuous Crawl

So resuming: The "Continuous Crawl" is a type of crawl that aims to maintain the index as current as possible.

It's operation is simple: once activated, it will launch the crawl at regular intervals. The major difference with incremental crawl is that the crawl can run in parallel, and does not expect the previous crawl to complete prior the launch.

Important Points:

  • "Continuous Crawl" is only available for sources of content type "SharePoint Sites"
  •  By default, a new crawl is run every once in 15 minutes, but the SharePoint administrator can change this interval using the PowerShell cmdlet Set-SPEnterpriseSearchCrawlContentSource,
  • Once started, a "Continuous Crawl" can't be paused or stopped, you can just disable it.

If we take our example above with "Continuous Crawl":

  •  Our ALFA crawl starts and will take at least 50 minutes,
  •  After 10 minutes of crawling an item already crawl is hereby amended, and requires a new crawl.
  •  Crawl "BETA" is launched,
  •  The crawl "BETA" starts in (15-10) minutes,
  •  Therefore this item will not need to wait 5 minutes (instead of 50 minutes) to be integrated into the index.

1- How to Enable it?

In Central Administration, click on "Search Service Application", and then in the menu, click on the "Content Sources".

Click on "New Content Source" at the menu

Chose " SharePoint Sites"

Select " Enable Continuous Crawls"

  • The content source has been created so we can see the status on as " Crawling Continuous"

2 - How to disable it?

  • From the content source page, chose the option "Enable Incremental Crawls" option. This will disable the continuous crawl.
  • Save changes.

3 - How to see if it works ?

  • Click on your service application search then " Crawl Log " in the section "Diagnostics".
  • Select your Content Source and click on "View crawl history"
  • Or via PowerShell Execute the following cmdlets
  • $SearchSA = «Search Service»
    • Get-SPEnterpriseSearchCrawlContentSource -SearchApplication $SearchSA | select *

Impact on our Servers

The impact of a "Continuous Crawl" is the same as an "Incremental Crawl".

At the parallel execution of crawls, the "Continuous Crawl" crawls within the parameters defined in the "Crawler Impact Rule" which controls the maximum number of requests that can be executed by the server (default 8).

4 - SharePoint Online

 This feature is available in SharePoint Online 2013 (Office 365). You can read it here: http://technet.microsoft.com/en-us/library/jj819291.aspx

Comments

  • Nice artcile, I think it answered a important question which i was looking for

  • In reading this article, which is 100% correct, for an incremental to regularly take that long it seems you have a bottleneck? We have over 20 million items indexed, but we split up everything with multiple content sources (30 of them to be more exact). The continuous crawl for us takes about 15 minutes to complete for the largest content source we have. Might that be a better way to split out the crawler resources?

harndennottionged.blogspot.com

Source: https://social.technet.microsoft.com/wiki/contents/articles/15571.sharepoint-2013-continuous-crawl-and-the-difference-between-incremental-and-continuous-crawl.aspx

0 Response to "Continuous Crawl Sharepoint Doesn t Remove Deleted Site Content"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel