Advertisement
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Fastest strpos() check?..

Comments in 'Plugin Development' started by dxm_hippie, Jul 24, 2016.

  1. dxm_hippie
    Offline

    dxm_hippie Active Member Plugin Developer

    Joined:
    Feb 1, 2015
    Posts:
    413
    Plugins:
    1
    Minecraft User:
    XxDXM_hippiexX
    I am searching for a fast way to check a message for phrases, the problem is if i were to use a
    PHP:
    foreach($bad_words_and_phrases as $key){
      
    }
    that would cause much lag i feel..i currently store single words to a object so i can use

    PHP:
    public function isWordBlocked($word): bool
    {
          return isset(
    $this->blocked_words[$word]);
    }
    i have almost 600 bad/inappropriate words block..running a loop so i can also block phrases with..

    PHP:
    foreach($bad_words_and_phrases as $inappropriate){
        if(
    strpos($msg$inappropriate) === true)
        {
            return 
    true;
        }
    }
    return 
    false;
    i feel isnt a smart idea..is there a faster way to do this? maybe store all bad words and phrases as a single string so i dont have to run the loop?
    SOFe likes this.
  2. Kvetinac97
    Offline

    Kvetinac97 Active Member Plugin Developer

    Joined:
    Nov 17, 2014
    Posts:
    276
    Plugins:
    1
    Minecraft User:
    Kvetinac97
    I think that this is currently the fastest variant.
    Or, if you want, you can use something like:

    PHP:
      $msg str_replace("rudeword""*****"str_replace....
  3. SOFe
    Offline

    SOFe Banned

    Joined:
    May 28, 2016
    Posts:
    386
    Minecraft User:
    Herobrine
    As a side note, strpos() only returns an integer (0 <= strpos($string, $needle) < strlen($string)) or boolean false. It will never return true, so you shouldn't use === true to check it. !== false doesn't necessarily mean === true.
    I expect it to be even slower. If it is faster, there must be something terribly wrong with PHP.

    Though, I agree that it is the fastest variant already. Splitting down to the lowest layer of logic, excluding the part about it being PHP and it being in the same thread.
    What you are doing:
    1. Iterating over an array of needles.
    2. Calling strpos() once for each needle.
    And what you are doing in 2.:
    1. Iterating over the haystack, from position 0 to position strlen($haystack) - strlen($needle).
    2. Retrieve the substring of the haystack from the current iterated position to the length of the needle.
    3. Compare the substring against the needle.
    So, assuming that haystack is much longer than the needle (such that needle length is negligible in this calculation), you basically need to retrieve substrings of needle length from the haystack strlen($haystack) * count($needles) times. This is probably avoidable.

    Therefore, if I write code in Java (I will explain why later, and it is not simply that I love Java), I would optimize it like this:
    View on GitHub Gist


    Of course, to make it even more efficient, simply do it on a separate thread (AsyncTask). However, if you are doing it on PlayerChatEvent, you probably need to delay each chat by half a tick up to a few ticks. You will probably also break event handling of other plugins.

    Why did I use Java:
    • I am more familiar with Java than other memory-strict languages.
    • I cannot use PHP, because it is not a memory-strict language. I need to make sure that the machine only does what I ask it to do, unlike PHP, where simply getting the code point of a certain character in a string may do much more than a String.codePointAt(int) call in Java.
    • I love the useful Collections and String util methods in Java.
    This is probably more efficient than your method, but it somehow seems a bit complicated. Also, this will only be efficient if matches()ing the haystack with needles is much more frequent than modifying the needles (which I assume only happens after server restart).

    The efficiency mainly comes from the tree structure that quickly skips checking the needles that don't start with the same alphabets.

    Updated some typos in my code.

    And proof that it works:
    http://ideone.com/e9xEiB
    Last edited: Jul 25, 2016

Share This Page

Advertisement