the complete guide to mozilla/string

by Scott Collins

last modified 8 April 2001

Abstract

This document provides an introduction to the design and use of the string classes in mozilla, detailed information on their implementation and how one may extend them, and answers to frequently asked questions about strings.

contents

A note to potential editors: don't even consider modifying this document with an HTML editor. That would destroy the internal formatting, and make patches unmanagable.


user's guide

Strings in mozilla are a world apart from char*s. If you don't know why they are different, this section is the place for you to start. If you're already familiar with the hierarchy of string classes in mozilla, then you might want to skip ahead to the implementor's guide or the FAQ.

introduction

what and what isn't a string?

A string is an opaque container holding a, possibly zero length, linear sequence of characters. Understanding the implications of this statement is the foundation for understanding all mozilla's string classes.

readable and writable

promises

flat strings

encoding

sharing

using the string classes correctly; using the correct string class

basic string operations

comparison

concatenation

substrings

find and replace

conversions

calling a function that expects a different kind of string

converting between string classes

converting between encodings

selecting the right string class

user string classes

selecting the right string class for a parameter

selecting the right string class for a local variable

selecting the right string class for a member variable

selecting the right string class for a return value

selecting the right string class in IDL

dont's

using string iterators

what is an iterator?

reading iterators and writing iterators

`chunky' iterating for efficiency

copy_string, character sources and sinks

encoding conversion iterators

summary


implementor's guide


frequently asked questions

you have some chars
you want 'x' char c "foo" char* cp nsACString& cs
char . [] [] extract a character
PRUnichar PRUnichar('x') PRUnichar(c) convert encoding, extract a character
char* & & & . get a pointer
PRUnichar* convert encoding, get a pointer
nsACString NS_LITERAL_CSTRING("x") make a string NS_LITERAL_CSTRING("foo") make a string .
nsAString NS_LITERAL_STRING("x") convert encoding NS_LITERAL_STRING("foo") convert encoding
to call printf . call printf
you have some PRUnichars
you want PRUnichar w PRUnichar* wp nsAString& s
char
PRUnichar [] extract a character
char*
PRUnichar* & get a pointer
nsACString
nsAString
to call printf call printf
is there any string doc?
Yes, you're soaking in it!
I have a string, how do I get a pointer to the characters?
You want to avoid this situation. In your own interfaces, prefer string types over raw pointers. Any interface that wants to process a string using a single pointer is making two expensive assumptions. First, that the string is stored in one contiguous hunk; and second, that the string is zero-terminated. If this isn't the case, then to get a pointer, storage must be allocated and the entire string must be copied to it and zero-terminated. You may not be able to avoid needing a pointer when interacting with system calls.
Some string classes guarantee that they are `flat'. That is, that their data is stored in one contiguous zero-terminated hunk. This does not imply that there are no embedded nulls. Caveat emptor. All strings that explicitly promise flatness inherit from the class nsAFlatString or nsAFlatCString and can produce a constant pointer to their data with the get() member function. Even strings that don't explicitly promise to be flat may happen to be flat. The helper function PromiseFlatString will produce a const dependent string that is guaranteed to be flat. If you use this on a string that already happens to be flat, the result is simply a reference through to that string. Otherwise, PromiseFlatString does the work to allocate, copy, terminate, and manage a temporary flat string. Since the result of PromiseFlatString is a temporary, you must be careful not to get and hold a pointer to it's data for longer than the temporary itself lives.
  /* I have a string, how do I get a pointer to the characters? */

extern void EvilNarrowOSFunction( const char* );    // evil OS routines that want a pointers
extern void EvilWideOSFunction( const PRUnichar* );

void func( const nsAString& aString, const nsACString& aCString )
  {
    EvilWideOSFunction( NS_LITERAL_STRING("Hello, World!").get() );
      // literal strings are flat already (as are |nsString|s, et al), just use |.get()|

    EvilWideOSFunction( PromiseFlatString(aString).get() );
      // for strings that don't explicitly guarantee flatness, use |PromiseFlatString|


      // beware holding the pointer for longer than the life of the promise
    const PRUnichar* wp = PromiseFlatString(aString).get(); // BAD! |wp| dangles
    EvilWideOSFunction(wp);

      // if you really need to use the pointer from |PromiseFlatString| in more than one expression...
    const nsAFlatString& flat = PromiseFlatString(aString);
    EvilWideOSFunction(flat.get());
    SomeOtherFunction(flat.get());

      // similarly for |char| strings
    EvilNarrowOSFunction( PromiseFlatCString(aCString).get() );
  }
How do I get a particular character out of a string?
Flat strings provide operator[] and CharAt(). All strings provide First(), Last(), and access with iterators. Don't promise a string flat just to do character indexing. Prefer, instead, to get an iterator and advance it to the position you care about.
  /* How do I get a particular character out of a string? */

PRUnichar Get5thCharacterOf( const nsAString& aString )
  {
    if ( aString.Length() >= 5 )
      {
        nsAString::const_iterator iter;
        aString.BeginReading(iter); // make |iter| point to the beginning of |aString|
        iter.advance(5);
        return *iter;
      }

    return PRUnichar(0);
  }
Using iterators isn't as bad as the example above makes it feel. The typical use is for advancing through a string, examining many characters.
How do I convert from one encoding to another?
How do I create a string?
What is the best way to return a string?

There are several reasonable ways to produce a string result from a function. If you are already holding the answer as a sharable string, you can simply return that string (pass-by-value). Otherwise, the most efficient and flexible way to return a string is to assign your result into a non-const reference parameter. Don't bother to create a sharable string from scratch with your generated result.

Why? The two things you want to minimize in string manipulation are, in order of importance, heap allocation, and moving characters around.

  /* What is the best way to return a string? */

class foo
  {
    public:
      // ...
      void GetShortName( nsAString& aResult ) const;
      nsCommonString GetFullName() const;
      
    private:
      nsCommonString    mFullName;

      const PRUnichar*  mShortName;
      PRUint32          mShortNameLength;
      
  };

nsCommonString
foo::GetFullName() const
  {
    return mFullName;
  }

void
foo::GetShortName( nsAString& aResult ) const
  {
    aResult = DependentString(mShortName, mShortNameLength);
  }
How do I printf a string, e.g., for debugging.
If your string is already narrow, you just have to worry about making it flat, and then getting a pointer.
If your string happens to be wide, you'll need to convert it before you can printf something reasonable. If it's just for debugging, you probably wouldn't care if something odd was printed in the case of a UCS2 character that didn't have an ASCII equivalent. The simplest thing in this case is to make a temporary conversion using NS_ConvertUCS2toUTF8. The result is conveniently flat already, so getting the pointer is simple. Remember not to hold onto the pointer you get out of this beyond the lifetime of temporary.
  /* How do I |printf| a string? */


void PrintSomeStrings( const nsAString& aString, const PRUnichar* aKey, const nsACString& aCString )
  {
      // |printf|ing a narrow string is easy
    printf("%s\n", PromiseFlatCString(aCString).get());     // GOOD

      // the simplest way to get a |printf|-able |const char*| out of a string
    printf("%s\n", NS_ConvertUCS2toUTF8(aKey).get());       // GOOD

      // works just as well with an formal wide string type...
    printf("%s\n", NS_ConvertUCS2toUTF8(aString).get());


      // But don't hold onto the pointer longer than the lifetime of the temporary!
    const char* cstring = NS_ConvertUCS2toUTF8(aKey).get(); // BAD! |cstring| is dangling
    printf("%s\n", cstring);
  }