Search index

 
itzmie
Benutzer
Avatar
Geschlecht: keine Angabe
Alter: 65
Beiträge: 40
Dabei seit: 05 / 2011
Betreff:

Re: Search index

 · 
Gepostet: 23.12.2016 - 15:17 Uhr  ·  #9
Hi Chris,

It's just finished. Took a long time but the search function is working again. Thanks.

Have a nice christmas to!
cback
Admin
Avatar
Geschlecht:
Herkunft: Saarland
Alter: 38
Homepage: cback.net
Beiträge: 17613
Dabei seit: 12 / 2003
Betreff:

Re: Search index

 · 
Gepostet: 23.12.2016 - 15:20 Uhr  ·  #10
itzmie
Benutzer
Avatar
Geschlecht: keine Angabe
Alter: 65
Beiträge: 40
Dabei seit: 05 / 2011
Betreff:

Re: Search index

 · 
Gepostet: 23.12.2016 - 16:02 Uhr  ·  #11
Hi Chris,

I've got one problem now. I hear a lot of complaints that when I click on Submit... It takes more then 10 seconds to post.
cback
Admin
Avatar
Geschlecht:
Herkunft: Saarland
Alter: 38
Homepage: cback.net
Beiträge: 17613
Dabei seit: 12 / 2003
Betreff:

Re: Search index

 · 
Gepostet: 23.12.2016 - 16:39 Uhr  ·  #12
Hi itzmie,

yeah that's what I meant with that you are on the limits whats possible with a MySQL/PHP based Search Index: Every new post has to check wich words have to be added to the index etc. So based on Server Load it could take some processing time to go through an index that big.

But you can't go around that without using a Specialized Serverside Search System as alternative (or if the posts are readable by guests with Google Custom Search or something like that). PHP/MySQL is just not made to do that in such a big way.

Sincerely,
Chris
itzmie
Benutzer
Avatar
Geschlecht: keine Angabe
Alter: 65
Beiträge: 40
Dabei seit: 05 / 2011
Betreff:

Re: Search index

 · 
Gepostet: 23.12.2016 - 16:49 Uhr  ·  #13
cback
Admin
Avatar
Geschlecht:
Herkunft: Saarland
Alter: 38
Homepage: cback.net
Beiträge: 17613
Dabei seit: 12 / 2003
Betreff:

Re: Search index

 · 
Gepostet: 23.12.2016 - 16:57 Uhr  ·  #14
Hi itzmie,

no it was absolutely clear what you meant. ;)

When you post a new post the system has to check wich new words from that specific posts have to be added to the index and wich words are already indexed. Then based on that the post will be added to the index also.

If your search Database is too big the system needs more and more time to query all the words from the index to check wich ones to add and wich ones are already there.

That's why it takes longer to post something new if your search index is that much filled. It's a lot of data that has to be moved here and it's a general problem with that big PHP or MySQL based systems. Even MySQL Fulltext Search is not able to handle that (the CF3 Indexing is faster than that btw., we tried to just use MySQL Fulltext first but added the own indexing system because it can handle more, but not endlessly more).


So it's what I said: You have to create an own Search System on your Server to handle the forum search and if you have an alternative drop the CF3 own Search Index and just use your external searching System.


Unfortunately I haven't worked with these mostly Java Based Solutions for your own server so therefore we also don't have any plugins or handbook for that. Basically you have to built an own search system and then just use a Plugin in the CF3 to send search queries to that instead of the own search function.

Also I see that maybe you don't have a stopwords for your language, so there are much more "common" words in the Index than there should be. But changing that and rebuilt the index again would also be nothing that would solve your problem in a forum of that size for a long time. You have to use an external searching system sooner or later again then.

Sincerely,
Chris
itzmie
Benutzer
Avatar
Geschlecht: keine Angabe
Alter: 65
Beiträge: 40
Dabei seit: 05 / 2011
Betreff:

Re: Search index

 · 
Gepostet: 23.12.2016 - 17:08 Uhr  ·  #15
Hi Chris,
Aww, oke, that's clear.

We've got a recyclebin for old topics. It's just a hidden forum only accesible for moderators. Is is possible to clear out all of the topics within that recyclebin forum?

Kind Regards
cback
Admin
Avatar
Geschlecht:
Herkunft: Saarland
Alter: 38
Homepage: cback.net
Beiträge: 17613
Dabei seit: 12 / 2003
Betreff:

Re: Search index

 · 
Gepostet: 23.12.2016 - 17:20 Uhr  ·  #16
Hi itzmie,


I just got another Idea additionally to the stopwords:

In ACP General Configuration you can set the "Minimum Word Length for Search Index" wich is 2 by default in the CF3. Maybe raising that to 4 or 5 letters would also optimize the index by just adding lesser words wich makes it smaller (and therefore faster).

Also go to your language folder (lang/<your basic forum lang>/search/stopwords.php) and add all the words that are just common in the language used in your forum. The search index could also cause problems if too common words like "and" "or" "is" "this" "that" etc. are added to the index too. Nobody searches for words like that but it blows up your Index a lot. If there are posts in different languages in your forum I would also recommend to merge the stopword files also with the english one, so all common words from ALL used languages in your forum will be ignored. (For example here in the CBACK Community it would be the stopwords from german and english language combined). Please also copy that stopword file into ALL used language of your forum, so it is adapted to all users in your forum no matter wich language they individually use. (< sorry see my next correction post, the CF3 already does this automatically :D)


But now to the recyclebin forum:
You could ignore that forum too if you recreate the search index now:

Open the file acp/classes/class_control.php

FIND
this code (in the public function indexing() section of that file):
Code

    $DB->set_sql('SELECT COUNT(`post_id`) AS `count` FROM ' . POSTS);
    $DB->execute();
    $temp = $DB->fetch_assoc();
    $posts = $temp['count'];
    $DB->free();
    
    require_once(PATH . 'classes/class_cback_search_indexer.' . EXT);
    $SearchIndexer = new SearchIndexer();
    
    if ( $start == 0 )
    {
      $Core->set_config('board_on', 0, true, false);
      $SearchIndexer->drop_search_index();
    }
    
    $DB->set_sql('SELECT `post_id`, `post_subject`, `post_text` FROM ' . POSTS . ' ORDER BY `post_id` ASC LIMIT :1,:2');
    $DB->execute((int)$start, (int)$step_count);
    while ( $temp = $DB->fetch_assoc() )
    {
      $SearchIndexer->text_indexing($temp['post_text'], $temp['post_subject'], $temp['post_id']);
    }
    $DB->free();



REPLACE the code with the following:
Code

    $DB->set_sql('SELECT COUNT(`post_id`) AS `count` FROM ' . POSTS . ' WHERE `post_depend_forum` NOT IN (1,2,3)');
    $DB->execute();
    $temp = $DB->fetch_assoc();
    $posts = $temp['count'];
    $DB->free();
    
    require_once(PATH . 'classes/class_cback_search_indexer.' . EXT);
    $SearchIndexer = new SearchIndexer();
    
    if ( $start == 0 )
    {
      $Core->set_config('board_on', 0, true, false);
      $SearchIndexer->drop_search_index();
    }
    
    $DB->set_sql('SELECT `post_id`, `post_subject`, `post_text` FROM ' . POSTS . ' WHERE `post_depend_forum` NOT IN (1,2,3) ORDER BY `post_id` ASC LIMIT :1,:2');
    $DB->execute((int)$start, (int)$step_count);
    while ( $temp = $DB->fetch_assoc() )
    {
      $SearchIndexer->text_indexing($temp['post_text'], $temp['post_subject'], $temp['post_id']);
    }
    $DB->free();




As you can see I added two times(!!) this in the two queries:
WHERE `post_depend_forum` NOT IN (1,2,3)


the 1, 2, 3 would be all the forum_ids you want to ignore.

If your Recycle-Forum has the ID 20 it would look like that in both cases:

WHERE `post_depend_forum` NOT IN (20)

if you have two forums to ignore, lets say 20 and 73 it looks like this:

WHERE `post_depend_forum` NOT IN (20,73)


I hope this helps you a bit with reducing the index size! :)
Unfortunately you have to rebuilt again after that codechange. The index itself doesn't contain the forum_ids.


Sincerely,
Chris
Gewählte Zitate für Mehrfachzitierung:   0

Registrierte in diesem Topic

Aktuell kein registrierter in diesem Bereich

Die Statistik zeigt, wer in den letzten 5 Minuten online war. Erneuerung alle 90 Sekunden.