NYCPHP Meetup

NYPHP.org

[nycphp-talk] iterating through a multibyte string

John Campbell jcampbell1 at gmail.com
Wed Jan 13 10:28:18 EST 2010


mb_substr is always going to be slow because you always have to
iterate from the beginning get the count, thus the loop will run in
O(N^2).

In theory, it should be much faster if you just pull the first character.
e.g.:

while($rest)
  $char = mb_substr($rest,0,1);
  $rest = mb_substr($rest,1);

This will at least be O(N) on the length of the string.

I also like Dan's idea of using preg_split.

Regards,
John Campbell

On Wed, Jan 13, 2010 at 10:02 AM, Rob Marscher
<rmarscher at beaffinitive.com> wrote:
> Hi all,
>
> I have a need to iterate through a multibyte string to process the string character by character.  Hopefully in php6, this will work without any special work, but as we know we need to use special multibyte string functions in php5 to work with utf-8 characters.  Here's an example that iterates my dilemma:
>
> <?php
> mb_internal_encoding("UTF-8");
>
> $str = "string with utf-8 chars åèö";
> $length = mb_strlen($str);
> $brokenStr = "";
> $preservedStr = "";
>
> for ($i = 0; $i < $length; $i++) {
>  $brokenStr .= $str[$i];
>  $preservedStr .= mb_substr($str, $i, 1);
> }
> echo "brokenStr = " . $brokenStr . "\n";
> echo "preservedStr = " . $preservedStr . "\n";
> ?>
>
> The array notation for string is the normal way to do this with regular strings: $str[$i].  I assume this will work for multibyte strings in php6.
>
> -- Is using mb_substr($str, $i, 1) the only way to get this to work in php5?  That's my question.
>
> It seems like it's going to be many times slower according to some of the comments I've seen on the multibyte functions in the php manual.
>
> Thanks!!
> -Rob
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>



More information about the talk mailing list