Java/Data Type/String split

Содержание

1 Break a string into tokens
2 Control the maximum number of substrings generated by splitting a string.
3 Escape special character with a \
4 Keep the empty strings
5 Parse a line whose separator is a comma followed by a space
6 Parse a line with and"s and or"s
7 Parsing Character-Separated Data with a Regular Expression
8 Pattern Splitting for space splittor
9 Special character needs to be escaped with a \
10 Special characters needs to be escaped while providing them as delimeters like "." and "|".
11 Specify a regular expression to match one or more spaces
12 Split a String
13 Split a string using String.split()
14 Split by dot
15 " ".split(" ") generates a NullPointerException
16 Split on various punctuation and zero or more trailing spaces.
17 Split on word boundaries.
18 Split on word boundaries, but allow embedded periods and @.
19 Split same string on commas and zero or more spaces.
20 Splits a string around matches of the given delimiter character.
21 Splits a String by Character type as returned by java.lang.Character.getType(char)
22 Splits a String by char: Groups of contiguous characters of the same type are returned as complete tokens.
23 Splits the provided text into an array, separator specified.
24 Splits the provided text into an array, separator specified, preserving all tokens, including empty tokens created by adjacent separators.
25 Splits the provided text into an array, separators specified, preserving all tokens, including empty tokens created by adjacent separators.
26 Splits the provided text into an array, separators specified. This is an alternative to using StringTokenizer.</p>
27 Splits the provided text into an array, separator string specified. Returns a maximum of max substrings.
28 Splits the provided text into an array, using whitespace as the separator, preserving all tokens, including empty tokens created by adjacent separators.
29 Splits the provided text into an array with a maximum length, separators specified.
30 Splits the provided text into an array with a maximum length, separators specified, preserving all tokens, including empty tokens created by adjacent separators.
31 Split Strings with Patterns: split("[-/%]")
32 Split the source into two strings at the first occurrence of the splitter Subsequent occurrences are not treated specially, and may be part of the second string.
33 Split up a string into multiple strings based on a delimiter
34 Split with regular expression
35 String.split() is based on regular expression
36 String split on multicharacter delimiter
37 String.split(): " ".split(" ") -> {} (Empty array)
38 String.split(): " ".split(" ") ->(Empty array too)
39 String.split(): "".split("") (one empty string value array)
40 String.split(): " s".split(" ") -> {"","","s"}
41 String.split(): " s ".split(" ") -> {"","","s"} (!) (space before and after)
42 The string passed to the split method is a regular expression
43 Use split() to extract substrings from a string.
44 Using second argument in the String.split() method to control the maximum number of substrings generated by splitting a string.
45 Using split() with a space can be a problem

Break a string into tokens

   
/*
 * A replacement for java.util.StringTokenizer
 * Copyright (C) 2001 Stephen Ostermiller
 * http://ostermiller.org/contact.pl?regarding=Java+Utilities
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * See COPYING.TXT for details.
 */
 import java.util.NoSuchElementException;
/**
 * The string tokenizer class allows an application to break a string into
 * tokens.
 * More information about this class is available from .
 * <p>
 * The tokenization method is much simpler than the one used by the
 * <code>StreamTokenizer</code> class. The <code>StringTokenizer</code> methods
 * do not distinguish among identifiers, numbers, and quoted strings, nor do
 * they recognize and skip comments.
 * <p>
 * The set of delimiters (the characters that separate tokens) may be specified
 * either at creation time or on a per-token basis.
 * <p>
 * There are two kinds of delimiters: token delimiters and non-token delimiters.
 * A token is either one token delimiter character, or a maximal sequence of
 * consecutive characters that are not delimiters.
 * <p>
 * A <code>StringTokenizer</code> object internally maintains a current
 * position within the string to be tokenized. Some operations advance this
 * current position past the characters processed.
 * <p>
 * The implementation is not thread safe; if a <code>StringTokenizer</code>
 * object is intended to be used in multiple threads, an appropriate wrapper
 * must be provided.
 * <p>
 * The following is one example of the use of the tokenizer. It also
 * demonstrates the usefulness of having both token and non-token delimiters in
 * one <code>StringTokenizer</code>.
 * <p>
 * The code:
 * <blockquote><code>
 * String s = " &nbsp;( &nbsp; aaa  \t &nbsp;* (b+c1 ))";<br>
 * StringTokenizer tokenizer = new StringTokenizer(s, " \t\n\r\f", "()+*");<br>
 * while (tokenizer.hasMoreTokens()) {<br>
 * &nbsp;&nbsp;&nbsp;&nbsp;System.out.println(tokenizer.nextToken());<br>
 * };
 * </code></blockquote>
 * <p>
 * prints the following output:
 * <blockquote>
 * (<br>
 * aaa<br>
 * *<br>
 * (<br>
 * b<br>
 * +<br>
 * c1<br>
 * )<br>
 * )
 * </blockquote>
 * <p>
 * </b>Compatibility with <code>java.util.StringTokenizer</code></b>
 * <p>
 * In the original version of <code>java.util.StringTokenizer</code>, the method
 * <code>nextToken()</code> left the current position after the returned token,
 * and the method <code>hasMoreTokens()</code> moved (as a side effect) the
 * current position before the beginning of the next token. Thus, the code:
 * <blockquote><code>
 * String s = "x=a,b,c";<br>
 * java.util.StringTokenizer tokenizer = new java.util.StringTokenizer(s,"=");<br>
 * System.out.println(tokenizer.nextToken());<br>
 * while (tokenizer.hasMoreTokens()) {<br>
 * &nbsp;&nbsp;&nbsp;&nbsp;System.out.println(tokenizer.nextToken(","));<br>
 * };
 * </code></blockquote>
 * <p>
 * prints the following output:
 * <blockquote>
 * x<br>
 * a<br>
 * b<br>
 * c
 * </blockquote>
 * <p>
 * The Java SDK 1.3 implementation removed the undesired side effect of
 * <code>hasMoreTokens</code> method: now, it does not advance current position.
 * However, after these changes the output of the above code was:
 * <blockquote>
 * x<br>
 * =a<br>
 * b<br>
 * c
 * </blockquote>
 * <p>
 * and there was no good way to produce a second token without "=".
 * <p>
 * To solve the problem, this implementation introduces a new method
 * <code>skipDelimiters()</code>. To produce the original output, the above code
 * should be modified as:
 * <blockquote><code>
 * String s = "x=a,b,c";<br>
 * StringTokenizer tokenizer = new StringTokenizer(s,"=");<br>
 * System.out.println(tokenizer.nextToken());<br>
 * tokenizer.skipDelimiters();<br>
 * while (tokenizer.hasMoreTokens()) {<br>
 * &nbsp;&nbsp;&nbsp;&nbsp;System.out.println(tokenizer.nextToken(","));<br>
 * };
 * </code></blockquote>
 *
 * @author Stephen Ostermiller http://ostermiller.org/contact.pl?regarding=Java+Utilities
 * @since ostermillerutils 1.00.00
 */
public class StringTokenizer implements java.util.Enumeration<String>, java.util.Iterator<String> {
  /**
   * The string to be tokenized.
   * The code relies on this to never be null.
   *
   * @since ostermillerutils 1.00.00
   */
  protected String text;
  /**
   * The length of the text.
   * Cached for performance.  This should be set whenever the
   * string we are working with is changed.
   *
   * @since ostermillerutils 1.00.00
   */
  protected int strLength;
  /**
   * The set of non-token delimiters.
   *
   * @since ostermillerutils 1.00.00
   */
  protected String nontokenDelims;
  /**
   * The set of token delimiters.
   *
   * @since ostermillerutils 1.00.00
   */
  protected String tokenDelims;
  /**
   * One of two variables used to maintain state through
   * the tokenizing process.
   * <P>
   * Represents the position at which we should start looking for
   * the next token(the position of the character immediately
   * following the end of the last token, or 0 to start), or
   * -1 if the entire string has been examined.
   *
   * @since ostermillerutils 1.00.00
   */
  protected int position;
  /**
   * One of two variables used to maintain state through
   * the tokenizing process.
   * <p>
   * true if and only if is found that an empty token should
   * be returned or if empty token was the last thing returned.
   * <p>
   * If returnEmptyTokens in false, then this variable will
   * always be false.
   *
   * @since ostermillerutils 1.00.00
   */
  protected boolean emptyReturned;
  /**
   * Stores the value of the delimiter character with the
   * highest value. It is used to optimize the detection of delimiter
   * characters.  The common case will be that the int values of delimiters
   * will be less than that of most characters in the string (, or space less
   * than any letter for example).  Given this, we can check easily check
   * to see if a character is not a delimiter by comparing it to the max
   * delimiter.  If it is greater than the max delimiter, then it is no
   * a delimiter otherwise we have to do some more in depth analysis. (for example
   * search the delimiter string.)  This will reduce the running time of
   * the algorithm not to depend on the length of the delimiter string
   * for the common case.
   *
   * @since ostermillerutils 1.00.00
   */
  protected char maxDelimChar;
  /**
   * Whether empty tokens should be returned.
   * for example, if "" should be returned when text starts with
   * a delimiter, has two delimiters next to each other, or
   * ends with a delimiter.
   *
   * @since ostermillerutils 1.00.00
   */
  protected boolean returnEmptyTokens;
  /**
   * Indicates at which position the delimiters last changed.  This
   * will effect how null tokens are returned.  Any
   * time that delimiters are changed, the string will be treated as if
   * it is being parsed from position zero, for example, null strings are possible
   * at the very beginning.
   *
   * @since ostermillerutils 1.00.00
   */
  protected int delimsChangedPosition;
  /**
   * A cache of the token count.  This variable should be -1 if the token
   * have not yet been counted. It should be greater than or equal to zero
   * if the tokens have been counted.
   *
   * @since ostermillerutils 1.00.00
   */
  protected int tokenCount;
  /**
   * Constructs a string tokenizer for the specified string. Both token and
   * non-token delimiters are specified.
   * <p>
   * The current position is set at the beginning of the string.
   *
   * @param text a string to be parsed.
   * @param nontokenDelims the non-token delimiters, i.e. the delimiters that only separate
   *     tokens and are not returned as separate tokens.
   * @param tokenDelims the token delimiters, i.e. delimiters that both separate tokens,
   *     and are themselves returned as tokens.
   * @throws NullPointerException if text is null.
   *
   * @since ostermillerutils 1.00.00
   */
  public StringTokenizer(String text, String nontokenDelims, String tokenDelims){
    this(text, nontokenDelims, tokenDelims, false);
  }
  /**
   * Constructs a string tokenizer for the specified string. Both token and
   * non-token delimiters are specified and whether or not empty tokens are returned
   * is specified.
   * <p>
   * Empty tokens are tokens that are between consecutive delimiters.
   * <p>
   * It is a primary constructor (i.e. all other constructors are defined in terms
   * of it.)
   * <p>
   * The current position is set at the beginning of the string.
   *
   * @param text a string to be parsed.
   * @param nontokenDelims the non-token delimiters, i.e. the delimiters that only separate
   *     tokens and are not returned as separate tokens.
   * @param tokenDelims the token delimiters, i.e. delimiters that both separate tokens,
   *     and are themselves returned as tokens.
   * @param returnEmptyTokens true if empty tokens may be returned; false otherwise.
   * @throws NullPointerException if text is null.
   *
   * @since ostermillerutils 1.00.00
   */
  public StringTokenizer(String text, String nontokenDelims, String tokenDelims, boolean returnEmptyTokens){
    setDelims(nontokenDelims, tokenDelims);
    setText(text);
    setReturnEmptyTokens(returnEmptyTokens);
  }
  /**
   * Constructs a string tokenizer for the specified string. Either token or
   * non-token delimiters are specified.
   * <p>
   * Is equivalent to:
   * <ul>
   * <li> If the third parameter is <code>false</code> --
   *      <code>StringTokenizer(text, delimiters, null)</code>
   * <li> If the third parameter is <code>true</code> --
   *      <code>StringTokenizer(text, null, delimiters)</code>
   * </ul>
   *
   * @param text a string to be parsed.
   * @param delims the delimiters.
   * @param delimsAreTokens
   *     flag indicating whether the second parameter specifies token or
   *     non-token delimiters: <code>false</code> -- the second parameter
   *     specifies non-token delimiters, the set of token delimiters is
   *     empty; <code>true</code> -- the second parameter specifies token
   *     delimiters, the set of non-token delimiters is empty.
   * @throws NullPointerException if text is null.
   *
   * @since ostermillerutils 1.00.00
   */
  public StringTokenizer(String text, String delims, boolean delimsAreTokens){
    this(text, (delimsAreTokens ? null : delims), (delimsAreTokens ? delims : null));
  }
  /**
   * Constructs a string tokenizer for the specified string. The characters in the
   * <code>nontokenDelims</code> argument are the delimiters for separating
   * tokens. Delimiter characters themselves will not be treated as tokens.
   * <p>
   * Is equivalent to <code>StringTokenizer(text,nontokenDelims, null)</code>.
   *
   * @param text a string to be parsed.
   * @param nontokenDelims the non-token delimiters.
   * @throws NullPointerException if text is null.
   *
   * @since ostermillerutils 1.00.00
   */
  public StringTokenizer(String text, String nontokenDelims){
    this(text, nontokenDelims, null);
  }
  /**
   * Constructs a string tokenizer for the specified string. The tokenizer uses
   * " \t\n\r\f" as a delimiter set of non-token delimiters, and an empty token
   * delimiter set.
   * <p>
   * Is equivalent to <code>StringTokenizer(text, " \t\n\r\f", null);
   *
   * @param text a string to be parsed.
   * @throws NullPointerException if text is null.
   *
   * @since ostermillerutils 1.00.00
   */
  public StringTokenizer(String text){
    this(text, " \t\n\r\f", null);
  }
  /**
   * Set the text to be tokenized in this StringTokenizer.
   * <p>
   * This is useful when for StringTokenizer re-use so that new string tokenizers do not
   * have to be created for each string you want to tokenizer.
   * <p>
   * The string will be tokenized from the beginning of the string.
   *
   * @param text a string to be parsed.
   * @throws NullPointerException if text is null.
   *
   * @since ostermillerutils 1.00.00
   */
  public void setText(String text){
    if (text == null){
      throw new NullPointerException();
    }
    this.text = text;
    strLength = text.length();
    emptyReturned = false;
    // set the position to start evaluation to zero
    // unless the string has no length, in which case
    // the entire string has already been examined.
    position = (strLength > 0 ? 0: -1);
    // because the text was changed since the last time the delimiters
    // were changed we need to set the delimiter changed position
    delimsChangedPosition = 0;
    // The token count changes when the text changes
    tokenCount = -1;
  }
  /**
   * Set the delimiters for this StringTokenizer.
   * The position must be initialized before this method is used.
   * (setText does this and it is called from the constructor)
   *
   * @param nontokenDelims delimiters that should not be returned as tokens.
   * @param tokenDelims delimiters that should be returned as tokens.
   *
   * @since ostermillerutils 1.00.00
   */
  private void setDelims(String nontokenDelims, String tokenDelims){
    this.nontokenDelims = nontokenDelims;
    this.tokenDelims = tokenDelims;
    // If we change delimiters, we do not want to start fresh,
    // without returning empty tokens.
    // the delimiter changed position can never be less than
    // zero, unlike position.
    delimsChangedPosition = (position != -1 ? position : strLength);
    // set the max delimiter
    maxDelimChar = 0;
    for (int i=0; nontokenDelims != null && i < nontokenDelims.length(); i++){
      if (maxDelimChar < nontokenDelims.charAt(i)){
        maxDelimChar = nontokenDelims.charAt(i);
      }
    }
    for (int i=0; tokenDelims != null && i < tokenDelims.length(); i++){
      if (maxDelimChar < tokenDelims.charAt(i)){
        maxDelimChar = tokenDelims.charAt(i);
      }
    }
    // Changing the delimiters may change the number of tokens
    tokenCount = -1;
  }

  /**
   * Tests if there are more tokens available from this tokenizer"s string.
   * If this method returns <tt>true</tt>, then a subsequent call to
   * <tt>nextToken</tt> with no argument will successfully return a token.
   * <p>
   * The current position is not changed.
   *
   * @return <code>true</code> if and only if there is at least one token in the
   *          string after the current position; <code>false</code> otherwise.
   *
   * @since ostermillerutils 1.00.00
   */
  public boolean hasMoreTokens(){
    // handle the easy case in which the number
    // of tokens has been counted.
    if (tokenCount == 0){
      return false;
    } else if (tokenCount > 0){
      return true;
    }
    // copy over state variables from the class to local
    // variables so that the state of this object can be
    // restored to the state that it was in before this
    // method was called.
    int savedPosition = position;
    boolean savedEmptyReturned = emptyReturned;
    int workingPosition = position;
    boolean workingEmptyReturned = emptyReturned;
    boolean onToken = advancePosition();
    while(position != workingPosition ||
      emptyReturned != workingEmptyReturned){
      if (onToken){
        // restore object state
        position = savedPosition;
        emptyReturned = savedEmptyReturned;
        return true;
      }
      workingPosition = position;
      workingEmptyReturned = emptyReturned;
      onToken = advancePosition();
    }
    // restore object state
    position = savedPosition;
    emptyReturned = savedEmptyReturned;
    return false;
  }
  /**
   * Returns the next token from this string tokenizer.
   * <p>
   * The current position is set after the token returned.
   *
   * @return the next token from this string tokenizer.
   * @throws NoSuchElementException if there are no more tokens in this tokenizer"s string.
   *
   * @since ostermillerutils 1.00.00
   */
  public String nextToken(){
    int workingPosition = position;
    boolean workingEmptyReturned = emptyReturned;
    boolean onToken = advancePosition();
    while(position != workingPosition ||
      emptyReturned != workingEmptyReturned){
      if (onToken){
        // returning a token decreases the token count
        tokenCount--;
        return (emptyReturned ? "" : text.substring(workingPosition, (position != -1) ? position : strLength));
      }
      workingPosition = position;
      workingEmptyReturned = emptyReturned;
      onToken = advancePosition();
    }
    throw new NoSuchElementException();
  }
  /**
   * Advances the current position so it is before the next token.
   * <p>
   * This method skips non-token delimiters but does not skip
   * token delimiters.
   * <p>
   * This method is useful when switching to the new delimiter sets (see the
   * second example in the class comment.)
   *
   * @return <code>true</code> if there are more tokens, <code>false</code> otherwise.
   *
   * @since ostermillerutils 1.00.00
   */
  public boolean skipDelimiters(){
    int workingPosition = position;
    boolean workingEmptyReturned = emptyReturned;
    boolean onToken = advancePosition();
    // skipping delimiters may cause the number of tokens to change
    tokenCount = -1;
    while(position != workingPosition ||
      emptyReturned != workingEmptyReturned){
      if (onToken){
        // restore the state to just as it was before we found
        // this token and return
        position = workingPosition;
        emptyReturned = workingEmptyReturned;
        return true;
      }
      workingPosition = position;
      workingEmptyReturned = emptyReturned;
      onToken = advancePosition();
    }
    // the end of the string was reached
    // without finding any tokens
    return false;
  }
  /**
   * Calculates the number of times that this tokenizer"s <code>nextToken</code>
   * method can be called before it generates an exception. The current position
   * is not advanced.
   *
   * @return the number of tokens remaining in the string using the current
   *    delimiter set.
   *
   * @see #nextToken()
   * @since ostermillerutils 1.00.00
   */
  public int countTokens(){
    // return the cached token count if a cache
    // is available.
    if (this.tokenCount >=0){
      return this.tokenCount;
    }
    int tokenCount = 0;
    // copy over state variables from the class to local
    // variables so that the state of this object can be
    // restored to the state that it was in before this
    // method was called.
    int savedPosition = position;
    boolean savedEmptyReturned = emptyReturned;
    int workingPosition = position;
    boolean workingEmptyReturned = emptyReturned;
    boolean onToken = advancePosition();
    while(position != workingPosition ||
      emptyReturned != workingEmptyReturned){
      if (onToken){
        tokenCount++;
      }
      workingPosition = position;
      workingEmptyReturned = emptyReturned;
      onToken = advancePosition();
    }
    // restore object state
    position = savedPosition;
    emptyReturned = savedEmptyReturned;
    // Save the token count in case this is called again
    // so we wouldn"t have to do so much work.
    this.tokenCount = tokenCount;
    return tokenCount;
  }
  /**
   * Set the delimiters used to this set of (non-token) delimiters.
   *
   * @param delims the new set of non-token delimiters (the set of token delimiters will be empty).
   *
   * @since ostermillerutils 1.00.00
   */
  public void setDelimiters(String delims){
    setDelims(delims, null);
  }
  /**
   * Set the delimiters used to this set of delimiters.
   *
   * @param delims the new set of delimiters.
   * @param delimsAreTokens flag indicating whether the first parameter specifies
   *    token or non-token delimiters: false -- the first parameter specifies non-token
   *    delimiters, the set of token delimiters is empty; true -- the first parameter
   *    specifies token delimiters, the set of non-token delimiters is empty.
   *
   * @since ostermillerutils 1.00.00
   */
  public void setDelimiters(String delims, boolean delimsAreTokens){
    setDelims((delimsAreTokens ? null : delims), (delimsAreTokens ? delims : null));
  }
  /**
   * Set the delimiters used to this set of delimiters.
   *
   * @param nontokenDelims the new set of non-token delimiters.
   * @param tokenDelims the new set of token delimiters.
   *
   * @since ostermillerutils 1.00.00
   */
  public void setDelimiters(String nontokenDelims, String tokenDelims){
    setDelims(nontokenDelims, tokenDelims);
  }
  /**
   * Set the delimiters used to this set of delimiters.
   *
   * @param nontokenDelims the new set of non-token delimiters.
   * @param tokenDelims the new set of token delimiters.
   * @param returnEmptyTokens true if empty tokens may be returned; false otherwise.
   *
   * @since ostermillerutils 1.00.00
   */
  public void setDelimiters(String nontokenDelims, String tokenDelims, boolean returnEmptyTokens){
    setDelims(nontokenDelims, tokenDelims);
    setReturnEmptyTokens(returnEmptyTokens);
  }
  /**
   * Calculates the number of times that this tokenizer"s <code>nextToken</code>
   * method can be called before it generates an exception using the given set of
   * (non-token) delimiters.  The delimiters given will be used for future calls to
   * nextToken() unless new delimiters are given. The current position
   * is not advanced.
   *
   * @param delims the new set of non-token delimiters (the set of token delimiters will be empty).
   * @return the number of tokens remaining in the string using the new
   *    delimiter set.
   *
   * @see #countTokens()
   * @since ostermillerutils 1.00.00
   */
  public int countTokens(String delims){
    setDelims(delims, null);
    return countTokens();
  }
  /**
   * Calculates the number of times that this tokenizer"s <code>nextToken</code>
   * method can be called before it generates an exception using the given set of
   * delimiters.  The delimiters given will be used for future calls to
   * nextToken() unless new delimiters are given. The current position
   * is not advanced.
   *
   * @param delims the new set of delimiters.
   * @param delimsAreTokens flag indicating whether the first parameter specifies
   *    token or non-token delimiters: false -- the first parameter specifies non-token
   *    delimiters, the set of token delimiters is empty; true -- the first parameter
   *    specifies token delimiters, the set of non-token delimiters is empty.
   * @return the number of tokens remaining in the string using the new
   *    delimiter set.
   *
   * @see #countTokens()
   * @since ostermillerutils 1.00.00
   */
  public int countTokens(String delims, boolean delimsAreTokens){
    setDelims((delimsAreTokens ? null : delims), (delimsAreTokens ? delims : null));
    return countTokens();
  }
  /**
   * Calculates the number of times that this tokenizer"s <code>nextToken</code>
   * method can be called before it generates an exception using the given set of
   * delimiters.  The delimiters given will be used for future calls to
   * nextToken() unless new delimiters are given. The current position
   * is not advanced.
   *
   * @param nontokenDelims the new set of non-token delimiters.
   * @param tokenDelims the new set of token delimiters.
   * @return the number of tokens remaining in the string using the new
   *    delimiter set.
   *
   * @see #countTokens()
   * @since ostermillerutils 1.00.00
   */
  public int countTokens(String nontokenDelims, String tokenDelims){
    setDelims(nontokenDelims, tokenDelims);
    return countTokens();
  }
  /**
   * Calculates the number of times that this tokenizer"s <code>nextToken</code>
   * method can be called before it generates an exception using the given set of
   * delimiters.  The delimiters given will be used for future calls to
   * nextToken() unless new delimiters are given. The current position
   * is not advanced.
   *
   * @param nontokenDelims the new set of non-token delimiters.
   * @param tokenDelims the new set of token delimiters.
   * @param returnEmptyTokens true if empty tokens may be returned; false otherwise.
   * @return the number of tokens remaining in the string using the new
   *    delimiter set.
   *
   * @see #countTokens()
   * @since ostermillerutils 1.00.00
   */
  public int countTokens(String nontokenDelims, String tokenDelims, boolean returnEmptyTokens){
    setDelims(nontokenDelims, tokenDelims);
    setReturnEmptyTokens(returnEmptyTokens);
    return countTokens();
  }
  /**
   * Advances the state of the tokenizer to the next token or delimiter.  This method only
   * modifies the class variables position, and emptyReturned.  The type of token that
   * should be emitted can be deduced by examining the changes to these two variables.
   * If there are no more tokens, the state of these variables does not change at all.
   *
   * @return true if we are at a juncture at which a token may be emitted, false otherwise.
   *
   * @since ostermillerutils 1.00.00
   */
  private boolean advancePosition(){
    // if we are returning empty tokens, we are just starting to tokenizer
    // and there is a delimiter at the beginning of the string or the string
    // is empty we need to indicate that there is an empty token at the beginning.
    // The beginning is defined as where the delimiters were last changed.
    if (returnEmptyTokens && !emptyReturned &&
      (delimsChangedPosition == position ||
      (position == -1 && strLength == delimsChangedPosition))){
      if (strLength == delimsChangedPosition){
        // Case in which the string (since delimiter change)
        // is empty, but because we are returning empty
        // tokens, a single empty token should be returned.
        emptyReturned = true;
        return true;
      }
      char c = text.charAt(position);
      if (c <= maxDelimChar &&
        (nontokenDelims != null && nontokenDelims.indexOf(c) != -1) ||
        (tokenDelims != null && tokenDelims.indexOf(c) != -1)){
        // There is delimiter at the very start of the string
        // so we must return an empty token at the beginning.
        emptyReturned = true;
        return true;
      }
    }
    // The main loop
    // Do this as long as parts of the string have yet to be examined
    while (position != -1){
      char c = text.charAt(position);
      if (returnEmptyTokens && !emptyReturned && position > delimsChangedPosition){
        char c1 = text.charAt(position - 1);
        // Examine the current character and the one before it.
        // If both of them are delimiters, then we need to return
        // an empty delimiter.  Note that characters that were examined
        // before the delimiters changed should not be reexamined.
        if (c <= maxDelimChar && c1 <= maxDelimChar &&
          ((nontokenDelims != null && nontokenDelims.indexOf(c) != -1) ||
          (tokenDelims != null && tokenDelims.indexOf(c) != -1)) &&
          ((nontokenDelims != null && nontokenDelims.indexOf(c1) != -1) ||
          (tokenDelims != null && tokenDelims.indexOf(c1) != -1))){
          emptyReturned = true;
          /*System.out.println("Empty token.");*/
          return true;
        }
      }
      int nextDelimiter = (position < strLength - 1 ? indexOfNextDelimiter(position + 1) : -1);
      if (c > maxDelimChar ||
        ((nontokenDelims == null || nontokenDelims.indexOf(c) == -1) &&
        (tokenDelims == null || tokenDelims.indexOf(c) == -1))){
        // token found
        /*System.out.println("Token: "" +
          text.substring(position, (nextDelimiter == -1 ? strLength : nextDelimiter)) +
          "" at " + position + ".");*/
        position = nextDelimiter;
        emptyReturned = false;
        return true;
      } else if (tokenDelims != null && tokenDelims.indexOf(c) != -1) {
        // delimiter that can be returned as a token found
        emptyReturned = false;
        /*System.out.println("Delimiter: "" + c + "" at " + position + ".");*/
        position = (position < strLength -1 ? position +1 : -1);
        return true;
      } else {
        // delimiter that is not a token found.
        emptyReturned = false;
        position = (position < strLength -1 ? position +1 : -1);
        return false;
      }
    }
    // handle the case that a token is at the end of the string and we should
    // return empty tokens.
    if (returnEmptyTokens && !emptyReturned && strLength > 0){
      char c = text.charAt(strLength - 1);
      if (c <= maxDelimChar &&
        (nontokenDelims != null && nontokenDelims.indexOf(c) != -1) ||
        (tokenDelims != null && tokenDelims.indexOf(c) != -1)){
        // empty token at the end of the string found.
        emptyReturned = true;
        /*System.out.println("Empty token at end.");*/
        return true;
      }
    }
    return false;
  }
  /**
   * Returns the next token in this string tokenizer"s string.
   * <p>
   * First, the sets of token and non-token delimiters are changed to be the
   * <code>tokenDelims</code> and <code>nontokenDelims</code>, respectively.
   * Then the next token (with respect to new delimiters) in the string after the
   * current position is returned.
   * <p>
   * The current position is set after the token returned.
   * <p>
   * The new delimiter sets remains the used ones after this call.
   *
   * @param nontokenDelims the new set of non-token delimiters.
   * @param tokenDelims the new set of token delimiters.
   * @return the next token, after switching to the new delimiter set.
   * @throws NoSuchElementException if there are no more tokens in this tokenizer"s string.
   * @see #nextToken()
   *
   * @since ostermillerutils 1.00.00
   */
  public String nextToken(String nontokenDelims, String tokenDelims){
    setDelims(nontokenDelims, tokenDelims);
    return nextToken();
  }
  /**
   * Returns the next token in this string tokenizer"s string.
   * <p>
   * First, the sets of token and non-token delimiters are changed to be the
   * <code>tokenDelims</code> and <code>nontokenDelims</code>, respectively;
   * and whether or not to return empty tokens is set.
   * Then the next token (with respect to new delimiters) in the string after the
   * current position is returned.
   * <p>
   * The current position is set after the token returned.
   * <p>
   * The new delimiter set remains the one used for this call and empty tokens are
   * returned in the future as they are in this call.
   *
   * @param nontokenDelims the new set of non-token delimiters.
   * @param tokenDelims the new set of token delimiters.
   * @param returnEmptyTokens true if empty tokens may be returned; false otherwise.
   * @return the next token, after switching to the new delimiter set.
   * @throws NoSuchElementException if there are no more tokens in this tokenizer"s string.
   * @see #nextToken()
   *
   * @since ostermillerutils 1.00.00
   */
  public String nextToken(String nontokenDelims, String tokenDelims, boolean returnEmptyTokens){
    setDelims(nontokenDelims, tokenDelims);
    setReturnEmptyTokens(returnEmptyTokens);
    return nextToken();
  }
  /**
   * Returns the next token in this string tokenizer"s string.
   * <p>
   * Is equivalent to:
   * <ul>
   * <li> If the second parameter is <code>false</code> --
   *      <code>nextToken(delimiters, null)</code>
   * <li> If the second parameter is <code>true</code> --
   *      <code>nextToken(null, delimiters)</code>
   * </ul>
   * <p>
   * @param delims the new set of token or non-token delimiters.
   * @param delimsAreTokens
   *     flag indicating whether the first parameter specifies token or
   *     non-token delimiters: <code>false</code> -- the first parameter
   *     specifies non-token delimiters, the set of token delimiters is
   *     empty; <code>true</code> -- the first parameter specifies token
   *     delimiters, the set of non-token delimiters is empty.
   * @return the next token, after switching to the new delimiter set.
   * @throws NoSuchElementException if there are no more tokens in this tokenizer"s string.
   *
   * @see #nextToken(String,String)
   * @since ostermillerutils 1.00.00
   */
  public String nextToken(String delims, boolean delimsAreTokens){
    return (delimsAreTokens ? nextToken(null, delims) : nextToken(delims, null));
  }
  /**
   * Returns the next token in this string tokenizer"s string.
   * <p>
   * Is equivalent to <code>nextToken(delimiters, null)</code>.
   *
   * @param nontokenDelims the new set of non-token delimiters (the set of
   *     token delimiters will be empty).
   * @return the next token, after switching to the new delimiter set.
   * @throws NoSuchElementException if there are no more tokens in this
   *     tokenizer"s string.
   *
   * @see #nextToken(String,String)
   * @since ostermillerutils 1.00.00
   */
  public String nextToken(String nontokenDelims){
    return nextToken(nontokenDelims, null);
  }
  /**
   * Similar to String.indexOf(int, String) but will look for
   * any character from string rather than the entire string.
   *
   * @param start index in text at which to begin the search
   * @return index of the first delimiter from the start index (inclusive), or -1
   *     if there are no more delimiters in the string
   *
   * @since ostermillerutils 1.00.00
   */
  private int indexOfNextDelimiter(int start){
    char c;
    int next;
    for (next = start; (c = text.charAt(next)) > maxDelimChar ||
      ((nontokenDelims == null || nontokenDelims.indexOf(c) == -1) &&
      (tokenDelims == null || tokenDelims.indexOf(c) == -1)); next++){
      if (next == strLength - 1){
        // we have reached the end of the string without
        // finding a delimiter
        return (-1);
      }
    }
    return next;
  }
  /**
   * Returns the same value as the <code>hasMoreTokens()</code> method. It exists
   * so that this class can implement the <code>Enumeration</code> interface.
   *
   * @return <code>true</code> if there are more tokens;
   *    <code>false</code> otherwise.
   *
   * @see java.util.Enumeration
   * @see #hasMoreTokens()
   * @since ostermillerutils 1.00.00
   */
  public boolean hasMoreElements(){
    return hasMoreTokens();
  }
  /**
   * Returns the same value as the <code>nextToken()</code> method, except that
   * its declared return value is <code>Object</code> rather than
   * <code>String</code>. It exists so that this class can implement the
   * <code>Enumeration</code> interface.
   *
   * @return the next token in the string.
   * @throws NoSuchElementException if there are no more tokens in this tokenizer"s string.
   *
   * @see java.util.Enumeration
   * @see #nextToken()
   * @since ostermillerutils 1.00.00
   */
  public String nextElement(){
    return nextToken();
  }
  /**
   * Returns the same value as the <code>hasMoreTokens()</code> method. It exists
   * so that this class can implement the <code>Iterator</code> interface.
   *
   * @return <code>true</code> if there are more tokens;
   *     <code>false</code> otherwise.
   *
   * @see java.util.Iterator
   * @see #hasMoreTokens()
   * @since ostermillerutils 1.00.00
   */
  public boolean hasNext(){
    return hasMoreTokens();
  }
  /**
   * Returns the same value as the <code>nextToken()</code> method, except that
   * its declared return value is <code>Object</code> rather than
   * <code>String</code>. It exists so that this class can implement the
   * <code>Iterator</code> interface.
   *
   * @return the next token in the string.
   * @throws NoSuchElementException if there are no more tokens in this tokenizer"s string.
   *
   * @see java.util.Iterator
   * @see #nextToken()
   * @since ostermillerutils 1.00.00
   */
  public String next(){
    return nextToken();
  }
  /**
   * This implementation always throws <code>UnsupportedOperationException</code>.
   * It exists so that this class can implement the <code>Iterator</code> interface.
   *
   * @throws UnsupportedOperationException always is thrown.
   *
   * @see java.util.Iterator
   * @since ostermillerutils 1.00.00
   */
  public void remove(){
    throw new UnsupportedOperationException();
  }
  /**
   * Set whether empty tokens should be returned from this point in
   * in the tokenizing process onward.
   * <P>
   * Empty tokens occur when two delimiters are next to each other
   * or a delimiter occurs at the beginning or end of a string. If
   * empty tokens are set to be returned, and a comma is the non token
   * delimiter, the following table shows how many tokens are in each
   * string.<br>
   * <table><tr><th>String<th><th>Number of tokens<th></tr>
   * <tr><td align=right>"one,two"<td><td>2 - normal case with no empty tokens.<td></tr>
   * <tr><td align=right>"one,,three"<td><td>3 including the empty token in the middle.<td></tr>
   * <tr><td align=right>"one,"<td><td>2 including the empty token at the end.<td></tr>
   * <tr><td align=right>",two"<td><td>2 including the empty token at the beginning.<td></tr>
   * <tr><td align=right>","<td><td>2 including the empty tokens at the beginning and the ends.<td></tr>
   * <tr><td align=right>""<td><td>1 - all strings will have at least one token if empty tokens are returned.<td></tr></table>
   *
   * @param returnEmptyTokens true iff empty tokens should be returned.
   *
   * @since ostermillerutils 1.00.00
   */
  public void setReturnEmptyTokens(boolean returnEmptyTokens){
    // this could effect the number of tokens
    tokenCount = -1;
    this.returnEmptyTokens = returnEmptyTokens;
  }
  /**
   * Get the the index of the character immediately
   * following the end of the last token.  This is the position at which this tokenizer will begin looking
   * for the next token when a <code>nextToken()</code> method is invoked.
   *
   * @return the current position or -1 if the entire string has been tokenized.
   *
   * @since ostermillerutils 1.00.00
   */
  public int getCurrentPosition(){
    return this.position;
  }
  /**
   * Retrieve all of the remaining tokens in a String array.
   * This method uses the options that are currently set for
   * the tokenizer and will advance the state of the tokenizer
   * such that <code>hasMoreTokens()</code> will return false.
   *
   * @return an array of tokens from this tokenizer.
   *
   * @since ostermillerutils 1.00.00
   */
  public String[] toArray(){
    String[] tokenArray = new String[countTokens()];
    for(int i=0; hasMoreTokens(); i++) {
      tokenArray[i] = nextToken();
    }
    return tokenArray;
  }
  /**
   * Retrieves the rest of the text as a single token.
   * After calling this method hasMoreTokens() will always return false.
   *
   * @return any part of the text that has not yet been tokenized.
   *
   * @since ostermillerutils 1.00.00
   */
  public String restOfText(){
    return nextToken(null, null);
  }
  /**
   * Returns the same value as nextToken() but does not alter
   * the internal state of the Tokenizer.  Subsequent calls
   * to peek() or a call to nextToken() will return the same
   * token again.
   *
   * @return the next token from this string tokenizer.
   * @throws NoSuchElementException if there are no more tokens in this tokenizer"s string.
   *
   * @since ostermillerutils 1.00.00
   */
  public String peek(){
    // copy over state variables from the class to local
    // variables so that the state of this object can be
    // restored to the state that it was in before this
    // method was called.
    int savedPosition = position;
    boolean savedEmptyReturned = emptyReturned;
    int savedtokenCount = tokenCount;
    // get the next token
    String retval = nextToken();
    // restore the state
    position = savedPosition;
    emptyReturned = savedEmptyReturned;
    tokenCount = savedtokenCount;
    // return the nextToken;
    return(retval);
  }
}

Control the maximum number of substrings generated by splitting a string.

     
public class Main {
  public static void main(String args[]) {
    String str = "one.two.three";
    String delimeter = "\\.";
    String[] temp = str.split(delimeter, 2);
    for (int i = 0; i < temp.length; i++) {
      System.out.println(temp[i]);
    }
  }
}

Escape special character with a \

     
\ is a special character, escape it again with another \

public class Main {
  public static void main(String args[]) throws Exception {
    String s = "|A|BB||CCC|||";
    String[] words = s.split("\\|");
    for (String string : words) {
      System.out.println(string);
    }
  }
}
/*
A
BB
CCC
*/

Keep the empty strings

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s = "|A|BB|CCC||||";
    String[] words = s.split("\\|", -1);
    for (String string : words) {
      System.out.println(">"+string+"<");
    }
  }
}
/*
><
>A<
>BB<
>CCC<
><
><
><
><
*/

Parse a line whose separator is a comma followed by a space

    
public class Main {
  public static void main(String[] argv) throws Exception {
    String inputStr = "a, b, c,d";
    String patternStr = ", ";
    String[] fields = inputStr.split(patternStr, -1);
  }
}

Parse a line with and"s and or"s

    
public class Main {
  public static void main(String[] argv) throws Exception {
    String inputStr = "a, b, and c";
    String patternStr = "[, ]+(and|or)*[, ]*";
    String[] fields = inputStr.split(patternStr, -1);
  }
}

Parsing Character-Separated Data with a Regular Expression

    
public class Main {
  public static void main(String[] argv) throws Exception {
    String inputStr = "a,,b";
    String patternStr = ",";
    String[] fields = inputStr.split(patternStr);
  }
}

Pattern Splitting for space splittor

    

import java.util.regex.Pattern;
public class Main {
  public static void main(String args[]) {
    Pattern p = Pattern.rupile(" ");
    String tmp = "this is a test";
    String[] tokens = p.split(tmp);
    for (int i = 0; i < tokens.length; i++) {
      System.out.println(tokens[i]);
    }
  }
}

Special character needs to be escaped with a \

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s = "A|BB|CCC";
    String[] words = s.split("\\|");
    for (String str : words) {
      System.out.println(str);
    }
  }
}

Special characters needs to be escaped while providing them as delimeters like "." and "|".

     
public class Main {
  public static void main(String args[]) {
    String str = "one.two.three";
    String[] temp = str.split("\\.");
    for (String s: temp){
      System.out.println(s);
    }      
  }
}

Specify a regular expression to match one or more spaces

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s3 = "A  BB CCC";
    String[] words = s3.split("\\s+");
    for (String str : words) {
      System.out.println(str);
    }
  }
}
/*
A
BB
CCC
*/

Split a String

     

public class Main {
  public static void main(String[] args) {
    String str = "one,two,three,four,five";
    String[] elements = str.split(",");
    for (int i = 0; i < elements.length; i++)
      System.out.println(elements[i]);
  }
}
/*
one
two
three
four
five
*/

Split a string using String.split()

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s3 = "A-BB-CCC";
    String[] words = s3.split("-");
    for (String str : words) {
      System.out.println(str);
    }
  }
}
/*
A
BB
CCC
*/

Split by dot

     

public class Main {
  public static void main(String args[]) throws Exception {
    String s = "A.BB.CCC";
    String[] words = s.split("\\.");
    for (String str : words) {
      System.out.println(str);
    }
  }
}
/*
A
BB
CCC
*/

" ".split(" ") generates a NullPointerException

     
public class Main {
  public static void main(String args[]) throws Exception {
    String[] words = " ".split(" ");
    String firstWord = words[0];
    System.out.println(firstWord);
  }
}
/*
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
  at Main.main(Main.java:5)
*/

Split on various punctuation and zero or more trailing spaces.

    
import java.util.Arrays;
public class Main{
public static void main(String[] argv) throws Exception{
    String testStr = "This;:, is:;:;:;. a!:; test?";
    System.out.println("Original string: " + testStr);
    String[] result = testStr.split("[.,!?:;]+\\s*");
    System.out.print("Split on various punctuation: ");
    System.out.println(Arrays.toString(result));
  }
}
/*
Original string: This;:, is:;:;:;. a!:; test?
Split on various punctuation: [This, is, a, test]
*/

Split on word boundaries.

    
import java.util.Arrays;
public class Main {
  public static void main(String[] argv) throws Exception {
    String testStr = "One, Two, and Three.";
    System.out.println("Original string: " + testStr);
    String[] result = testStr.split("\\W+");
    System.out.print("Split at word boundaries: ");
    System.out.println(Arrays.toString(result));
  }
}
/*
Original string: One, Two, and Three.
Split at word boundaries: [One, Two, and, Three]
*/

Split on word boundaries, but allow embedded periods and @.

    
import java.util.Arrays;
public class Main {
  public static void main(String[] argv) throws Exception {
    String testStr = "J J@H.ru";
    System.out.println("Original string: " + testStr);
    String[] result = testStr.split("[\\W && [^.@]]+");
    System.out.println(Arrays.toString(result));
  }
}
/*Original string: J J@H.ru
[J, J@H.ru]
*/

Split same string on commas and zero or more spaces.

    
import java.util.Arrays;
public class Main {
  public static void main(String[] argv) throws Exception {
    String testStr = "one,   two, three";
    System.out.println("Original string: " + testStr);
    String[] result = testStr.split(",\\s*");
    System.out.print("Split at commas: ");
    System.out.println(Arrays.toString(result));
  }
}
/*
Original string: one,   two, three
Split at commas: [one, two, three]
*/

Splits a string around matches of the given delimiter character.

    
import java.util.StringTokenizer;
/*
 Derby - Class org.apache.derby.iapi.util.PropertyUtil
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to you under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at
 http://www.apache.org/licenses/LICENSE-2.0
 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 */
public class Main {
  /**
   * Splits a string around matches of the given delimiter character.
   *
   * Where applicable, this method can be used as a substitute for
   * <code>String.split(String regex)</code>, which is not available
   * on a JSR169/Java ME platform.
   *
   * @param str the string to be split
   * @param delim the delimiter
   * @throws NullPointerException if str is null
   */
  static public String[] split(String str, char delim)
  {
      if (str == null) {
          throw new NullPointerException("str can"t be null");
      }
      // Note the javadoc on StringTokenizer:
      //     StringTokenizer is a legacy class that is retained for
      //     compatibility reasons although its use is discouraged in
      //     new code.
      // In other words, if StringTokenizer is ever removed from the JDK,
      // we need to have a look at String.split() (or java.util.regex)
      // if it is supported on a JSR169/Java ME platform by then.
      StringTokenizer st = new StringTokenizer(str, String.valueOf(delim));
      int n = st.countTokens();
      String[] s = new String[n];
      for (int i = 0; i < n; i++) {
          s[i] = st.nextToken();
      }
      return s;
  }

}

Splits a String by Character type as returned by java.lang.Character.getType(char)

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {
  /**
   * <p>Splits a String by Character type as returned by
   * <code>java.lang.Character.getType(char)</code>. Groups of contiguous
   * characters of the same type are returned as complete tokens, with the
   * following exception: the character of type
   * <code>Character.UPPERCASE_LETTER</code>, if any, immediately
   * preceding a token of type <code>Character.LOWERCASE_LETTER</code>
   * will belong to the following token rather than to the preceding, if any,
   * <code>Character.UPPERCASE_LETTER</code> token. 
   * <pre>
   * StringUtils.splitByCharacterTypeCamelCase(null)         = null
   * StringUtils.splitByCharacterTypeCamelCase("")           = []
   * StringUtils.splitByCharacterTypeCamelCase("ab de fg")   = ["ab", " ", "de", " ", "fg"]
   * StringUtils.splitByCharacterTypeCamelCase("ab   de fg") = ["ab", "   ", "de", " ", "fg"]
   * StringUtils.splitByCharacterTypeCamelCase("ab:cd:ef")   = ["ab", ":", "cd", ":", "ef"]
   * StringUtils.splitByCharacterTypeCamelCase("number5")    = ["number", "5"]
   * StringUtils.splitByCharacterTypeCamelCase("fooBar")     = ["foo", "Bar"]
   * StringUtils.splitByCharacterTypeCamelCase("foo200Bar")  = ["foo", "200", "Bar"]
   * StringUtils.splitByCharacterTypeCamelCase("ASFRules")   = ["ASF", "Rules"]
   * </pre>
   * @param str the String to split, may be <code>null</code>
   * @return an array of parsed Strings, <code>null</code> if null String input
   * @since 2.4
   */
  public static String[] splitByCharacterTypeCamelCase(String str) {
      return splitByCharacterType(str, true);
  }
  /**
   * <p>
   * Splits a String by Character type as returned by
   * <code>java.lang.Character.getType(char)</code>. Groups of contiguous
   * characters of the same type are returned as complete tokens, with the
   * following exception: if <code>camelCase</code> is <code>true</code>,
   * the character of type <code>Character.UPPERCASE_LETTER</code>, if any,
   * immediately preceding a token of type
   * <code>Character.LOWERCASE_LETTER</code> will belong to the following
   * token rather than to the preceding, if any,
   * <code>Character.UPPERCASE_LETTER</code> token.
   * 
   * @param str
   *          the String to split, may be <code>null</code>
   * @param camelCase
   *          whether to use so-called "camel-case" for letter types
   * @return an array of parsed Strings, <code>null</code> if null String
   *         input
   * @since 2.4
   */
  private static String[] splitByCharacterType(String str, boolean camelCase) {
    if (str == null) {
      return null;
    }
    if (str.length() == 0) {
      return new String[0];
    }
    char[] c = str.toCharArray();
    List list = new ArrayList();
    int tokenStart = 0;
    int currentType = Character.getType(c[tokenStart]);
    for (int pos = tokenStart + 1; pos < c.length; pos++) {
      int type = Character.getType(c[pos]);
      if (type == currentType) {
        continue;
      }
      if (camelCase && type == Character.LOWERCASE_LETTER
          && currentType == Character.UPPERCASE_LETTER) {
        int newTokenStart = pos - 1;
        if (newTokenStart != tokenStart) {
          list.add(new String(c, tokenStart, newTokenStart - tokenStart));
          tokenStart = newTokenStart;
        }
      } else {
        list.add(new String(c, tokenStart, pos - tokenStart));
        tokenStart = pos;
      }
      currentType = type;
    }
    list.add(new String(c, tokenStart, c.length - tokenStart));
    return (String[]) list.toArray(new String[list.size()]);
  }
}

Splits a String by char: Groups of contiguous characters of the same type are returned as complete tokens.

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {
  /**
   * <p>Splits a String by Character type as returned by
   * <code>java.lang.Character.getType(char)</code>. Groups of contiguous
   * characters of the same type are returned as complete tokens. 
   * <pre>
   * StringUtils.splitByCharacterType(null)         = null
   * StringUtils.splitByCharacterType("")           = []
   * StringUtils.splitByCharacterType("ab de fg")   = ["ab", " ", "de", " ", "fg"]
   * StringUtils.splitByCharacterType("ab   de fg") = ["ab", "   ", "de", " ", "fg"]
   * StringUtils.splitByCharacterType("ab:cd:ef")   = ["ab", ":", "cd", ":", "ef"]
   * StringUtils.splitByCharacterType("number5")    = ["number", "5"]
   * StringUtils.splitByCharacterType("fooBar")     = ["foo", "B", "ar"]
   * StringUtils.splitByCharacterType("foo200Bar")  = ["foo", "200", "B", "ar"]
   * StringUtils.splitByCharacterType("ASFRules")   = ["ASFR", "ules"]
   * </pre>
   * @param str the String to split, may be <code>null</code>
   * @return an array of parsed Strings, <code>null</code> if null String input
   * @since 2.4
   */
  public static String[] splitByCharacterType(String str) {
      return splitByCharacterType(str, false);
  }
  /**
   * <p>
   * Splits a String by Character type as returned by
   * <code>java.lang.Character.getType(char)</code>. Groups of contiguous
   * characters of the same type are returned as complete tokens, with the
   * following exception: if <code>camelCase</code> is <code>true</code>,
   * the character of type <code>Character.UPPERCASE_LETTER</code>, if any,
   * immediately preceding a token of type
   * <code>Character.LOWERCASE_LETTER</code> will belong to the following
   * token rather than to the preceding, if any,
   * <code>Character.UPPERCASE_LETTER</code> token.
   * 
   * @param str
   *          the String to split, may be <code>null</code>
   * @param camelCase
   *          whether to use so-called "camel-case" for letter types
   * @return an array of parsed Strings, <code>null</code> if null String
   *         input
   * @since 2.4
   */
  private static String[] splitByCharacterType(String str, boolean camelCase) {
    if (str == null) {
      return null;
    }
    if (str.length() == 0) {
      return new String[0];
    }
    char[] c = str.toCharArray();
    List list = new ArrayList();
    int tokenStart = 0;
    int currentType = Character.getType(c[tokenStart]);
    for (int pos = tokenStart + 1; pos < c.length; pos++) {
      int type = Character.getType(c[pos]);
      if (type == currentType) {
        continue;
      }
      if (camelCase && type == Character.LOWERCASE_LETTER
          && currentType == Character.UPPERCASE_LETTER) {
        int newTokenStart = pos - 1;
        if (newTokenStart != tokenStart) {
          list.add(new String(c, tokenStart, newTokenStart - tokenStart));
          tokenStart = newTokenStart;
        }
      } else {
        list.add(new String(c, tokenStart, pos - tokenStart));
        tokenStart = pos;
      }
      currentType = type;
    }
    list.add(new String(c, tokenStart, c.length - tokenStart));
    return (String[]) list.toArray(new String[list.size()]);
  }
}

Splits the provided text into an array, separator specified.

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {
  /**
   * <p>Splits the provided text into an array, separator specified.
   * This is an alternative to using StringTokenizer.</p>
   *
   * <p>The separator is not included in the returned String array.
   * Adjacent separators are treated as one separator.
   * For more control over the split use the StrTokenizer class.</p>
   *
   * <p>A <code>null</code> input String returns <code>null</code>.</p>
   *
   * <pre>
   * StringUtils.split(null, *)         = null
   * StringUtils.split("", *)           = []
   * StringUtils.split("a.b.c", ".")    = ["a", "b", "c"]
   * StringUtils.split("a..b.c", ".")   = ["a", "b", "c"]
   * StringUtils.split("a:b:c", ".")    = ["a:b:c"]
   * StringUtils.split("a b c", " ")    = ["a", "b", "c"]
   * </pre>
   *
   * @param str  the String to parse, may be null
   * @param separatorChar  the character used as the delimiter
   * @return an array of parsed Strings, <code>null</code> if null String input
   * @since 2.0
   */
  public static String[] split(String str, char separatorChar) {
      return splitWorker(str, separatorChar, false);
  }
/**
 * Performs the logic for the <code>split</code> and 
 * <code>splitPreserveAllTokens</code> methods that do not return a
 * maximum array length.
 *
 * @param str  the String to parse, may be <code>null</code>
 * @param separatorChar the separate character
 * @param preserveAllTokens if <code>true</code>, adjacent separators are
 * treated as empty token separators; if <code>false</code>, adjacent
 * separators are treated as one separator.
 * @return an array of parsed Strings, <code>null</code> if null String input
 */
private static String[] splitWorker(String str, char separatorChar, boolean preserveAllTokens) {
    // Performance tuned for 2.0 (JDK1.4)
    if (str == null) {
        return null;
    }
    int len = str.length();
    if (len == 0) {
        return new String[0];
    }
    List list = new ArrayList();
    int i = 0, start = 0;
    boolean match = false;
    boolean lastMatch = false;
    while (i < len) {
        if (str.charAt(i) == separatorChar) {
            if (match || preserveAllTokens) {
                list.add(str.substring(start, i));
                match = false;
                lastMatch = true;
            }
            start = ++i;
            continue;
        }
        lastMatch = false;
        match = true;
        i++;
    }
    if (match || (preserveAllTokens && lastMatch)) {
        list.add(str.substring(start, i));
    }
    return (String[]) list.toArray(new String[list.size()]);
}
}

Splits the provided text into an array, separator specified, preserving all tokens, including empty tokens created by adjacent separators.

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {
  /**
   * <p>Splits the provided text into an array, separator specified,
   * preserving all tokens, including empty tokens created by adjacent
   * separators. This is an alternative to using StringTokenizer.</p>
   *
   * <p>The separator is not included in the returned String array.
   * Adjacent separators are treated as separators for empty tokens.
   * For more control over the split use the StrTokenizer class.</p>
   *
   * <p>A <code>null</code> input String returns <code>null</code>.</p>
   *
   * <pre>
   * StringUtils.splitPreserveAllTokens(null, *)         = null
   * StringUtils.splitPreserveAllTokens("", *)           = []
   * StringUtils.splitPreserveAllTokens("a.b.c", ".")    = ["a", "b", "c"]
   * StringUtils.splitPreserveAllTokens("a..b.c", ".")   = ["a", "", "b", "c"]
   * StringUtils.splitPreserveAllTokens("a:b:c", ".")    = ["a:b:c"]
   * StringUtils.splitPreserveAllTokens("a\tb\nc", null) = ["a", "b", "c"]
   * StringUtils.splitPreserveAllTokens("a b c", " ")    = ["a", "b", "c"]
   * StringUtils.splitPreserveAllTokens("a b c ", " ")   = ["a", "b", "c", ""]
   * StringUtils.splitPreserveAllTokens("a b c  ", " ")   = ["a", "b", "c", "", ""]
   * StringUtils.splitPreserveAllTokens(" a b c", " ")   = ["", a", "b", "c"]
   * StringUtils.splitPreserveAllTokens("  a b c", " ")  = ["", "", a", "b", "c"]
   * StringUtils.splitPreserveAllTokens(" a b c ", " ")  = ["", a", "b", "c", ""]
   * </pre>
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChar  the character used as the delimiter,
   *  <code>null</code> splits on whitespace
   * @return an array of parsed Strings, <code>null</code> if null String input
   * @since 2.1
   */
  public static String[] splitPreserveAllTokens(String str, char separatorChar) {
      return splitWorker(str, separatorChar, true);
  }
/**
 * Performs the logic for the <code>split</code> and 
 * <code>splitPreserveAllTokens</code> methods that do not return a
 * maximum array length.
 *
 * @param str  the String to parse, may be <code>null</code>
 * @param separatorChar the separate character
 * @param preserveAllTokens if <code>true</code>, adjacent separators are
 * treated as empty token separators; if <code>false</code>, adjacent
 * separators are treated as one separator.
 * @return an array of parsed Strings, <code>null</code> if null String input
 */
private static String[] splitWorker(String str, char separatorChar, boolean preserveAllTokens) {
    // Performance tuned for 2.0 (JDK1.4)
    if (str == null) {
        return null;
    }
    int len = str.length();
    if (len == 0) {
        return new String[0];
    }
    List list = new ArrayList();
    int i = 0, start = 0;
    boolean match = false;
    boolean lastMatch = false;
    while (i < len) {
        if (str.charAt(i) == separatorChar) {
            if (match || preserveAllTokens) {
                list.add(str.substring(start, i));
                match = false;
                lastMatch = true;
            }
            start = ++i;
            continue;
        }
        lastMatch = false;
        match = true;
        i++;
    }
    if (match || (preserveAllTokens && lastMatch)) {
        list.add(str.substring(start, i));
    }
    return (String[]) list.toArray(new String[list.size()]);
}
}

Splits the provided text into an array, separators specified, preserving all tokens, including empty tokens created by adjacent separators.

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {
  /**
   * <p>Splits the provided text into an array, separators specified, 
   * preserving all tokens, including empty tokens created by adjacent
   * separators. This is an alternative to using StringTokenizer.</p>
   *
   * <p>The separator is not included in the returned String array.
   * Adjacent separators are treated as separators for empty tokens.
   * For more control over the split use the StrTokenizer class.</p>
   *
   * <p>A <code>null</code> input String returns <code>null</code>.
   * A <code>null</code> separatorChars splits on whitespace.</p>
   *
   * <pre>
   * StringUtils.splitPreserveAllTokens(null, *)           = null
   * StringUtils.splitPreserveAllTokens("", *)             = []
   * StringUtils.splitPreserveAllTokens("abc def", null)   = ["abc", "def"]
   * StringUtils.splitPreserveAllTokens("abc def", " ")    = ["abc", "def"]
   * StringUtils.splitPreserveAllTokens("abc  def", " ")   = ["abc", "", def"]
   * StringUtils.splitPreserveAllTokens("ab:cd:ef", ":")   = ["ab", "cd", "ef"]
   * StringUtils.splitPreserveAllTokens("ab:cd:ef:", ":")  = ["ab", "cd", "ef", ""]
   * StringUtils.splitPreserveAllTokens("ab:cd:ef::", ":") = ["ab", "cd", "ef", "", ""]
   * StringUtils.splitPreserveAllTokens("ab::cd:ef", ":")  = ["ab", "", cd", "ef"]
   * StringUtils.splitPreserveAllTokens(":cd:ef", ":")     = ["", cd", "ef"]
   * StringUtils.splitPreserveAllTokens("::cd:ef", ":")    = ["", "", cd", "ef"]
   * StringUtils.splitPreserveAllTokens(":cd:ef:", ":")    = ["", cd", "ef", ""]
   * </pre>
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChars  the characters used as the delimiters,
   *  <code>null</code> splits on whitespace
   * @return an array of parsed Strings, <code>null</code> if null String input
   * @since 2.1
   */
  public static String[] splitPreserveAllTokens(String str, String separatorChars) {
      return splitWorker(str, separatorChars, -1, true);
  }
  /**
   * Performs the logic for the <code>split</code> and 
   * <code>splitPreserveAllTokens</code> methods that return a maximum array 
   * length.
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChars the separate character
   * @param max  the maximum number of elements to include in the
   *  array. A zero or negative value implies no limit.
   * @param preserveAllTokens if <code>true</code>, adjacent separators are
   * treated as empty token separators; if <code>false</code>, adjacent
   * separators are treated as one separator.
   * @return an array of parsed Strings, <code>null</code> if null String input
   */
  private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens) {
      // Performance tuned for 2.0 (JDK1.4)
      // Direct code is quicker than StringTokenizer.
      // Also, StringTokenizer uses isSpace() not isWhitespace()
      if (str == null) {
          return null;
      }
      int len = str.length();
      if (len == 0) {
          return new String[0];
      }
      List list = new ArrayList();
      int sizePlus1 = 1;
      int i = 0, start = 0;
      boolean match = false;
      boolean lastMatch = false;
      if (separatorChars == null) {
          // Null separator means use whitespace
          while (i < len) {
              if (Character.isWhitespace(str.charAt(i))) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else if (separatorChars.length() == 1) {
          // Optimise 1 character case
          char sep = separatorChars.charAt(0);
          while (i < len) {
              if (str.charAt(i) == sep) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else {
          // standard case
          while (i < len) {
              if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      }
      if (match || (preserveAllTokens && lastMatch)) {
          list.add(str.substring(start, i));
      }
      return (String[]) list.toArray(new String[list.size()]);
  }
}

Splits the provided text into an array, separators specified. This is an alternative to using StringTokenizer.</p>

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {
  /**
   * <p>Splits the provided text into an array, separators specified.
   * This is an alternative to using StringTokenizer.</p>
   *
   * <p>The separator is not included in the returned String array.
   * Adjacent separators are treated as one separator.
   * For more control over the split use the StrTokenizer class.</p>
   *
   * <p>A <code>null</code> input String returns <code>null</code>.
   * A <code>null</code> separatorChars splits on whitespace.</p>
   *
   * <pre>
   * StringUtils.split(null, *)         = null
   * StringUtils.split("", *)           = []
   * StringUtils.split("abc def", null) = ["abc", "def"]
   * StringUtils.split("abc def", " ")  = ["abc", "def"]
   * StringUtils.split("abc  def", " ") = ["abc", "def"]
   * StringUtils.split("ab:cd:ef", ":") = ["ab", "cd", "ef"]
   * </pre>
   *
   * @param str  the String to parse, may be null
   * @param separatorChars  the characters used as the delimiters,
   *  <code>null</code> splits on whitespace
   * @return an array of parsed Strings, <code>null</code> if null String input
   */
  public static String[] split(String str, String separatorChars) {
      return splitWorker(str, separatorChars, -1, false);
  }
  /**
   * Performs the logic for the <code>split</code> and 
   * <code>splitPreserveAllTokens</code> methods that return a maximum array 
   * length.
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChars the separate character
   * @param max  the maximum number of elements to include in the
   *  array. A zero or negative value implies no limit.
   * @param preserveAllTokens if <code>true</code>, adjacent separators are
   * treated as empty token separators; if <code>false</code>, adjacent
   * separators are treated as one separator.
   * @return an array of parsed Strings, <code>null</code> if null String input
   */
  private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens) {
      // Performance tuned for 2.0 (JDK1.4)
      // Direct code is quicker than StringTokenizer.
      // Also, StringTokenizer uses isSpace() not isWhitespace()
      if (str == null) {
          return null;
      }
      int len = str.length();
      if (len == 0) {
          return new String[0];
      }
      List list = new ArrayList();
      int sizePlus1 = 1;
      int i = 0, start = 0;
      boolean match = false;
      boolean lastMatch = false;
      if (separatorChars == null) {
          // Null separator means use whitespace
          while (i < len) {
              if (Character.isWhitespace(str.charAt(i))) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else if (separatorChars.length() == 1) {
          // Optimise 1 character case
          char sep = separatorChars.charAt(0);
          while (i < len) {
              if (str.charAt(i) == sep) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else {
          // standard case
          while (i < len) {
              if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      }
      if (match || (preserveAllTokens && lastMatch)) {
          list.add(str.substring(start, i));
      }
      return (String[]) list.toArray(new String[list.size()]);
  }
}

Splits the provided text into an array, separator string specified. Returns a maximum of max substrings.

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {

  /**
   * <p>Splits the provided text into an array, separator string specified.
   * Returns a maximum of <code>max</code> substrings.</p>
   *
   * <p>The separator is not included in the returned String array.
   * Adjacent separators are treated as separators for empty tokens.
   * For more control over the split use the StrTokenizer class.</p>
   *
   * <p>A <code>null</code> input String returns <code>null</code>.
   * A <code>null</code> separator splits on whitespace.</p>
   *
   * <pre>
   * StringUtils.splitByWholeSeparatorPreserveAllTokens(null, *, *)               = null
   * StringUtils.splitByWholeSeparatorPreserveAllTokens("", *, *)                 = []
   * StringUtils.splitByWholeSeparatorPreserveAllTokens("ab de fg", null, 0)      = ["ab", "de", "fg"]
   * StringUtils.splitByWholeSeparatorPreserveAllTokens("ab   de fg", null, 0)    = ["ab", "", "", "de", "fg"]
   * StringUtils.splitByWholeSeparatorPreserveAllTokens("ab:cd:ef", ":", 2)       = ["ab", "cd:ef"]
   * StringUtils.splitByWholeSeparatorPreserveAllTokens("ab-!-cd-!-ef", "-!-", 5) = ["ab", "cd", "ef"]
   * StringUtils.splitByWholeSeparatorPreserveAllTokens("ab-!-cd-!-ef", "-!-", 2) = ["ab", "cd-!-ef"]
   * </pre>
   *
   * @param str  the String to parse, may be null
   * @param separator  String containing the String to be used as a delimiter,
   *  <code>null</code> splits on whitespace
   * @param max  the maximum number of elements to include in the returned
   *  array. A zero or negative value implies no limit.
   * @return an array of parsed Strings, <code>null</code> if null String was input
   * @since 2.4
   */
  public static String[] splitByWholeSeparatorPreserveAllTokens(String str, String separator, int max) {
      return splitByWholeSeparatorWorker(str, separator, max, true);
  }
  /**
   * Performs the logic for the <code>splitByWholeSeparatorPreserveAllTokens</code> methods.
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separator  String containing the String to be used as a delimiter,
   *  <code>null</code> splits on whitespace
   * @param max  the maximum number of elements to include in the returned
   *  array. A zero or negative value implies no limit.
   * @param preserveAllTokens if <code>true</code>, adjacent separators are
   * treated as empty token separators; if <code>false</code>, adjacent
   * separators are treated as one separator.
   * @return an array of parsed Strings, <code>null</code> if null String input
   * @since 2.4
   */
  private static String[] splitByWholeSeparatorWorker(String str, String separator, int max, 
                                                      boolean preserveAllTokens) 
  {
      if (str == null) {
          return null;
      }
      int len = str.length();
      if (len == 0) {
          return new String[0];
      }
      if ((separator == null) || ("".equals(separator))) {
          // Split on whitespace.
          return splitWorker(str, null, max, preserveAllTokens);
      }
      int separatorLength = separator.length();
      ArrayList substrings = new ArrayList();
      int numberOfSubstrings = 0;
      int beg = 0;
      int end = 0;
      while (end < len) {
          end = str.indexOf(separator, beg);
          if (end > -1) {
              if (end > beg) {
                  numberOfSubstrings += 1;
                  if (numberOfSubstrings == max) {
                      end = len;
                      substrings.add(str.substring(beg));
                  } else {
                      // The following is OK, because String.substring( beg, end ) excludes
                      // the character at the position "end".
                      substrings.add(str.substring(beg, end));
                      // Set the starting point for the next search.
                      // The following is equivalent to beg = end + (separatorLength - 1) + 1,
                      // which is the right calculation:
                      beg = end + separatorLength;
                  }
              } else {
                  // We found a consecutive occurrence of the separator, so skip it.
                  if (preserveAllTokens) {
                      numberOfSubstrings += 1;
                      if (numberOfSubstrings == max) {
                          end = len;
                          substrings.add(str.substring(beg));
                      } else {
                          substrings.add("");
                      }
                  }
                  beg = end + separatorLength;
              }
          } else {
              // String.substring( beg ) goes from "beg" to the end of the String.
              substrings.add(str.substring(beg));
              end = len;
          }
      }
      return (String[]) substrings.toArray(new String[substrings.size()]);
  }

  /**
   * Performs the logic for the <code>split</code> and 
   * <code>splitPreserveAllTokens</code> methods that return a maximum array 
   * length.
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChars the separate character
   * @param max  the maximum number of elements to include in the
   *  array. A zero or negative value implies no limit.
   * @param preserveAllTokens if <code>true</code>, adjacent separators are
   * treated as empty token separators; if <code>false</code>, adjacent
   * separators are treated as one separator.
   * @return an array of parsed Strings, <code>null</code> if null String input
   */
  private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens) {
      // Performance tuned for 2.0 (JDK1.4)
      // Direct code is quicker than StringTokenizer.
      // Also, StringTokenizer uses isSpace() not isWhitespace()
      if (str == null) {
          return null;
      }
      int len = str.length();
      if (len == 0) {
          return new String[0];
      }
      List list = new ArrayList();
      int sizePlus1 = 1;
      int i = 0, start = 0;
      boolean match = false;
      boolean lastMatch = false;
      if (separatorChars == null) {
          // Null separator means use whitespace
          while (i < len) {
              if (Character.isWhitespace(str.charAt(i))) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else if (separatorChars.length() == 1) {
          // Optimise 1 character case
          char sep = separatorChars.charAt(0);
          while (i < len) {
              if (str.charAt(i) == sep) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else {
          // standard case
          while (i < len) {
              if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      }
      if (match || (preserveAllTokens && lastMatch)) {
          list.add(str.substring(start, i));
      }
      return (String[]) list.toArray(new String[list.size()]);
  }
}

Splits the provided text into an array, using whitespace as the separator, preserving all tokens, including empty tokens created by adjacent separators.

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {
  // -----------------------------------------------------------------------
  /**
   * <p>Splits the provided text into an array, using whitespace as the
   * separator, preserving all tokens, including empty tokens created by 
   * adjacent separators. This is an alternative to using StringTokenizer.
   * Whitespace is defined by {@link Character#isWhitespace(char)}.</p>
   *
   * <p>The separator is not included in the returned String array.
   * Adjacent separators are treated as separators for empty tokens.
   * For more control over the split use the StrTokenizer class.</p>
   *
   * <p>A <code>null</code> input String returns <code>null</code>.</p>
   *
   * <pre>
   * StringUtils.splitPreserveAllTokens(null)       = null
   * StringUtils.splitPreserveAllTokens("")         = []
   * StringUtils.splitPreserveAllTokens("abc def")  = ["abc", "def"]
   * StringUtils.splitPreserveAllTokens("abc  def") = ["abc", "", "def"]
   * StringUtils.splitPreserveAllTokens(" abc ")    = ["", "abc", ""]
   * </pre>
   *
   * @param str  the String to parse, may be <code>null</code>
   * @return an array of parsed Strings, <code>null</code> if null String input
   * @since 2.1
   */
  public static String[] splitPreserveAllTokens(String str) {
      return splitWorker(str, null, -1, true);
  }
  /**
   * Performs the logic for the <code>split</code> and 
   * <code>splitPreserveAllTokens</code> methods that return a maximum array 
   * length.
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChars the separate character
   * @param max  the maximum number of elements to include in the
   *  array. A zero or negative value implies no limit.
   * @param preserveAllTokens if <code>true</code>, adjacent separators are
   * treated as empty token separators; if <code>false</code>, adjacent
   * separators are treated as one separator.
   * @return an array of parsed Strings, <code>null</code> if null String input
   */
  private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens) {
      // Performance tuned for 2.0 (JDK1.4)
      // Direct code is quicker than StringTokenizer.
      // Also, StringTokenizer uses isSpace() not isWhitespace()
      if (str == null) {
          return null;
      }
      int len = str.length();
      if (len == 0) {
          return new String[0];
      }
      List list = new ArrayList();
      int sizePlus1 = 1;
      int i = 0, start = 0;
      boolean match = false;
      boolean lastMatch = false;
      if (separatorChars == null) {
          // Null separator means use whitespace
          while (i < len) {
              if (Character.isWhitespace(str.charAt(i))) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else if (separatorChars.length() == 1) {
          // Optimise 1 character case
          char sep = separatorChars.charAt(0);
          while (i < len) {
              if (str.charAt(i) == sep) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else {
          // standard case
          while (i < len) {
              if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      }
      if (match || (preserveAllTokens && lastMatch)) {
          list.add(str.substring(start, i));
      }
      return (String[]) list.toArray(new String[list.size()]);
  }
}

Splits the provided text into an array with a maximum length, separators specified.

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {
  /**
   * <p>Splits the provided text into an array with a maximum length,
   * separators specified.</p>
   *
   * <p>The separator is not included in the returned String array.
   * Adjacent separators are treated as one separator.</p>
   *
   * <p>A <code>null</code> input String returns <code>null</code>.
   * A <code>null</code> separatorChars splits on whitespace.</p>
   *
   * <p>If more than <code>max</code> delimited substrings are found, the last
   * returned string includes all characters after the first <code>max - 1</code>
   * returned strings (including separator characters).</p>
   *
   * <pre>
   * StringUtils.split(null, *, *)            = null
   * StringUtils.split("", *, *)              = []
   * StringUtils.split("ab de fg", null, 0)   = ["ab", "cd", "ef"]
   * StringUtils.split("ab   de fg", null, 0) = ["ab", "cd", "ef"]
   * StringUtils.split("ab:cd:ef", ":", 0)    = ["ab", "cd", "ef"]
   * StringUtils.split("ab:cd:ef", ":", 2)    = ["ab", "cd:ef"]
   * </pre>
   *
   * @param str  the String to parse, may be null
   * @param separatorChars  the characters used as the delimiters,
   *  <code>null</code> splits on whitespace
   * @param max  the maximum number of elements to include in the
   *  array. A zero or negative value implies no limit
   * @return an array of parsed Strings, <code>null</code> if null String input
   */
  public static String[] split(String str, String separatorChars, int max) {
      return splitWorker(str, separatorChars, max, false);
  }
  /**
   * Performs the logic for the <code>split</code> and 
   * <code>splitPreserveAllTokens</code> methods that return a maximum array 
   * length.
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChars the separate character
   * @param max  the maximum number of elements to include in the
   *  array. A zero or negative value implies no limit.
   * @param preserveAllTokens if <code>true</code>, adjacent separators are
   * treated as empty token separators; if <code>false</code>, adjacent
   * separators are treated as one separator.
   * @return an array of parsed Strings, <code>null</code> if null String input
   */
  private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens) {
      // Performance tuned for 2.0 (JDK1.4)
      // Direct code is quicker than StringTokenizer.
      // Also, StringTokenizer uses isSpace() not isWhitespace()
      if (str == null) {
          return null;
      }
      int len = str.length();
      if (len == 0) {
          return new String[0];
      }
      List list = new ArrayList();
      int sizePlus1 = 1;
      int i = 0, start = 0;
      boolean match = false;
      boolean lastMatch = false;
      if (separatorChars == null) {
          // Null separator means use whitespace
          while (i < len) {
              if (Character.isWhitespace(str.charAt(i))) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else if (separatorChars.length() == 1) {
          // Optimise 1 character case
          char sep = separatorChars.charAt(0);
          while (i < len) {
              if (str.charAt(i) == sep) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else {
          // standard case
          while (i < len) {
              if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      }
      if (match || (preserveAllTokens && lastMatch)) {
          list.add(str.substring(start, i));
      }
      return (String[]) list.toArray(new String[list.size()]);
  }
}

Splits the provided text into an array with a maximum length, separators specified, preserving all tokens, including empty tokens created by adjacent separators.

    
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 * 
 *      http://www.apache.org/licenses/LICENSE-2.0
 * 
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
import java.util.ArrayList;
import java.util.List;
public class Main {

  /**
   * <p>Splits the provided text into an array with a maximum length,
   * separators specified, preserving all tokens, including empty tokens 
   * created by adjacent separators.</p>
   *
   * <p>The separator is not included in the returned String array.
   * Adjacent separators are treated as separators for empty tokens.
   * Adjacent separators are treated as one separator.</p>
   *
   * <p>A <code>null</code> input String returns <code>null</code>.
   * A <code>null</code> separatorChars splits on whitespace.</p>
   *
   * <p>If more than <code>max</code> delimited substrings are found, the last
   * returned string includes all characters after the first <code>max - 1</code>
   * returned strings (including separator characters).</p>
   *
   * <pre>
   * StringUtils.splitPreserveAllTokens(null, *, *)            = null
   * StringUtils.splitPreserveAllTokens("", *, *)              = []
   * StringUtils.splitPreserveAllTokens("ab de fg", null, 0)   = ["ab", "cd", "ef"]
   * StringUtils.splitPreserveAllTokens("ab   de fg", null, 0) = ["ab", "cd", "ef"]
   * StringUtils.splitPreserveAllTokens("ab:cd:ef", ":", 0)    = ["ab", "cd", "ef"]
   * StringUtils.splitPreserveAllTokens("ab:cd:ef", ":", 2)    = ["ab", "cd:ef"]
   * StringUtils.splitPreserveAllTokens("ab   de fg", null, 2) = ["ab", "  de fg"]
   * StringUtils.splitPreserveAllTokens("ab   de fg", null, 3) = ["ab", "", " de fg"]
   * StringUtils.splitPreserveAllTokens("ab   de fg", null, 4) = ["ab", "", "", "de fg"]
   * </pre>
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChars  the characters used as the delimiters,
   *  <code>null</code> splits on whitespace
   * @param max  the maximum number of elements to include in the
   *  array. A zero or negative value implies no limit
   * @return an array of parsed Strings, <code>null</code> if null String input
   * @since 2.1
   */
  public static String[] splitPreserveAllTokens(String str, String separatorChars, int max) {
      return splitWorker(str, separatorChars, max, true);
  }
  /**
   * Performs the logic for the <code>split</code> and 
   * <code>splitPreserveAllTokens</code> methods that return a maximum array 
   * length.
   *
   * @param str  the String to parse, may be <code>null</code>
   * @param separatorChars the separate character
   * @param max  the maximum number of elements to include in the
   *  array. A zero or negative value implies no limit.
   * @param preserveAllTokens if <code>true</code>, adjacent separators are
   * treated as empty token separators; if <code>false</code>, adjacent
   * separators are treated as one separator.
   * @return an array of parsed Strings, <code>null</code> if null String input
   */
  private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens) {
      // Performance tuned for 2.0 (JDK1.4)
      // Direct code is quicker than StringTokenizer.
      // Also, StringTokenizer uses isSpace() not isWhitespace()
      if (str == null) {
          return null;
      }
      int len = str.length();
      if (len == 0) {
          return new String[0];
      }
      List list = new ArrayList();
      int sizePlus1 = 1;
      int i = 0, start = 0;
      boolean match = false;
      boolean lastMatch = false;
      if (separatorChars == null) {
          // Null separator means use whitespace
          while (i < len) {
              if (Character.isWhitespace(str.charAt(i))) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else if (separatorChars.length() == 1) {
          // Optimise 1 character case
          char sep = separatorChars.charAt(0);
          while (i < len) {
              if (str.charAt(i) == sep) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      } else {
          // standard case
          while (i < len) {
              if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                  if (match || preserveAllTokens) {
                      lastMatch = true;
                      if (sizePlus1++ == max) {
                          i = len;
                          lastMatch = false;
                      }
                      list.add(str.substring(start, i));
                      match = false;
                  }
                  start = ++i;
                  continue;
              }
              lastMatch = false;
              match = true;
              i++;
          }
      }
      if (match || (preserveAllTokens && lastMatch)) {
          list.add(str.substring(start, i));
      }
      return (String[]) list.toArray(new String[list.size()]);
  }
}

Split Strings with Patterns: split("[-/%]")

    
public class Main {
  public static void main(String[] arguments) {
    String input = "12%12%%12";
    String[] piece = input.split("[-/%]");
    for (int j = 0; j < piece.length; j++)
      System.out.println(piece[j] + "\t");
  }
}

Split the source into two strings at the first occurrence of the splitter Subsequent occurrences are not treated specially, and may be part of the second string.

   
import java.util.Collection;
import java.util.Iterator;
import java.util.Vector;
/**********************************************************************************
 * $URL: https://source.sakaiproject.org/svn/util/branches/sakai_2-5-4/util-util/util/src/java/org/sakaiproject/util/StringUtil.java $
 * $Id: StringUtil.java 34934 2007-09-10 22:52:23Z lance@indiana.edu $
 ***********************************************************************************
 *
 * Copyright (c) 2003, 2004, 2005, 2006 The Sakai Foundation.
 * 
 * Licensed under the Educational Community License, Version 1.0 (the "License"); 
 * you may not use this file except in compliance with the License. 
 * You may obtain a copy of the License at
 * 
 *      http://www.opensource.org/licenses/ecl1.php
 * 
 * Unless required by applicable law or agreed to in writing, software 
 * distributed under the License is distributed on an "AS IS" BASIS, 
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
 * See the License for the specific language governing permissions and 
 * limitations under the License.
 *
 **********************************************************************************/

/**
 * <p>
 * StringUtil collects together some string utility classes.
 * </p>
 */
public class StringUtil
{
  /**
   * Split the source into two strings at the first occurrence of the splitter Subsequent occurrences are not treated specially, and may be part of the second string.
   * 
   * @param source
   *        The string to split
   * @param splitter
   *        The string that forms the boundary between the two strings returned.
   * @return An array of two strings split from source by splitter.
   */
  public static String[] splitFirst(String source, String splitter)
  {
    // hold the results as we find them
    Vector rv = new Vector();
    int last = 0;
    int next = 0;
    // find first splitter in source
    next = source.indexOf(splitter, last);
    if (next != -1)
    {
      // isolate from last thru before next
      rv.add(source.substring(last, next));
      last = next + splitter.length();
    }
    if (last < source.length())
    {
      rv.add(source.substring(last, source.length()));
    }
    // convert to array
    return (String[]) rv.toArray(new String[rv.size()]);
  }

}

Split up a string into multiple strings based on a delimiter

    
/*
  * JBoss, Home of Professional Open Source
  * Copyright 2005, JBoss Inc., and individual contributors as indicated
  * by the @authors tag. See the copyright.txt in the distribution for a
  * full listing of individual contributors.
  *
  * This is free software; you can redistribute it and/or modify it
  * under the terms of the GNU Lesser General Public License as
  * published by the Free Software Foundation; either version 2.1 of
  * the License, or (at your option) any later version.
  *
  * This software is distributed in the hope that it will be useful,
  * but WITHOUT ANY WARRANTY; without even the implied warranty of
  * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
  * Lesser General Public License for more details.
  *
  * You should have received a copy of the GNU Lesser General Public
  * License along with this software; if not, write to the Free
  * Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA
  * 02110-1301 USA, or see the FSF site: http://www.fsf.org.
  */

public class Main{
  /** An empty string constant */
  public static final String EMPTY = "";

  /////////////////////////////////////////////////////////////////////////
  //                           Spliting Methods                          //
  /////////////////////////////////////////////////////////////////////////
  /**
   * Split up a string into multiple strings based on a delimiter.
   *
   * @param string  String to split up.
   * @param delim   Delimiter.
   * @param limit   Limit the number of strings to split into
   *                (-1 for no limit).
   * @return        Array of strings.
   */
  public static String[] split(final String string, final String delim,
     final int limit)
  {
     // get the count of delim in string, if count is > limit 
     // then use limit for count.  The number of delimiters is less by one
     // than the number of elements, so add one to count.
     int count = count(string, delim) + 1;
     if (limit > 0 && count > limit)
     {
        count = limit;
     }
     String strings[] = new String[count];
     int begin = 0;
     for (int i = 0; i < count; i++)
     {
        // get the next index of delim
        int end = string.indexOf(delim, begin);
        
        // if the end index is -1 or if this is the last element
        // then use the string"s length for the end index
        if (end == -1 || i + 1 == count)
           end = string.length();
        // if end is 0, then the first element is empty
        if (end == 0)
           strings[i] = EMPTY;
        else
           strings[i] = string.substring(begin, end);
        // update the begining index
        begin = end + 1;
     }
     return strings;
  }
  /**
   * Split up a string into multiple strings based on a delimiter.
   *
   * @param string  String to split up.
   * @param delim   Delimiter.
   * @return        Array of strings.
   */
  public static String[] split(final String string, final String delim)
  {
     return split(string, delim, -1);
  }

  /////////////////////////////////////////////////////////////////////////
  //                          Counting Methods                           //
  /////////////////////////////////////////////////////////////////////////
  /**
   * Count the number of instances of substring within a string.
   *
   * @param string     String to look for substring in.
   * @param substring  Sub-string to look for.
   * @return           Count of substrings in string.
   */
  public static int count(final String string, final String substring)
  {
     int count = 0;
     int idx = 0;
     while ((idx = string.indexOf(substring, idx)) != -1)
     {
        idx++;
        count++;
     }
     return count;
  }
  /**
   * Count the number of instances of character within a string.
   *
   * @param string     String to look for substring in.
   * @param c          Character to look for.
   * @return           Count of substrings in string.
   */
  public static int count(final String string, final char c)
  {
     return count(string, String.valueOf(c));
  }

}

Split with regular expression

    
public class Main {
  public static void main(String args[]) {
    String statement = " a b c abc bca cba";
    String tokens[] = null;
    String splitPattern = "a|abc|bac|" + "b|(c)|(cba)";
    tokens = statement.split(splitPattern);
    for (int i = 0; i < tokens.length; i++) {
      System.out.println(tokens[i]);
    }
  }
}

String.split() is based on regular expression

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s3 = "{A}{this is a test}{1234}";
    String[] words = s3.split("[{}]");
    for (String str : words) {
      System.out.println(str);
    }
  }
}
/*
A
this is a test
1234
*/

String split on multicharacter delimiter

    
/**************************************************************************************
 * Copyright (c) Jonas Bon�r, Alexandre Vasseur. All rights reserved.                 *
 * http://aspectwerkz.codehaus.org                                                    *
 * ---------------------------------------------------------------------------------- *
 * The software in this package is published under the terms of the LGPL license      *
 * a copy of which has been included with this distribution in the license.txt file.  *
 **************************************************************************************/

import java.util.List;
import java.util.ArrayList;
/**
 * Utility methods for strings.
 *
 * @author 
 */
public class Strings {

  /**
   * String split on multicharacter delimiter. <p/>Written by Tim Quinn (tim.quinn@honeywell.ru)
   *
   * @param stringToSplit
   * @param delimiter
   * @return
   */
  public static final String[] splitString(String stringToSplit, String delimiter) {
      String[] aRet;
      int iLast;
      int iFrom;
      int iFound;
      int iRecords;
      // return Blank Array if stringToSplit == "")
      if (stringToSplit.equals("")) {
          return new String[0];
      }
      // count Field Entries
      iFrom = 0;
      iRecords = 0;
      while (true) {
          iFound = stringToSplit.indexOf(delimiter, iFrom);
          if (iFound == -1) {
              break;
          }
          iRecords++;
          iFrom = iFound + delimiter.length();
      }
      iRecords = iRecords + 1;
      // populate aRet[]
      aRet = new String[iRecords];
      if (iRecords == 1) {
          aRet[0] = stringToSplit;
      } else {
          iLast = 0;
          iFrom = 0;
          iFound = 0;
          for (int i = 0; i < iRecords; i++) {
              iFound = stringToSplit.indexOf(delimiter, iFrom);
              if (iFound == -1) { // at End
                  aRet[i] = stringToSplit.substring(iLast + delimiter.length(), stringToSplit.length());
              } else if (iFound == 0) { // at Beginning
                  aRet[i] = "";
              } else { // somewhere in middle
                  aRet[i] = stringToSplit.substring(iFrom, iFound);
              }
              iLast = iFound;
              iFrom = iFound + delimiter.length();
          }
      }
      return aRet;
  }
   
}

String.split(): " ".split(" ") -> {} (Empty array)

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s = " ";
    String[] words = s.split(" ");
    for (String string : words) {
      System.out.println(">" + string + "<");
    }
  }
}

String.split(): " ".split(" ") ->(Empty array too)

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s = "     ";
    String[] words = s.split(" ");
    
    for (String string : words) {
      System.out.println(">" + string + "<");
    }
  }
}

String.split(): "".split("") (one empty string value array)

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s = "";
    String[] words = s.split("");
    for (String string : words) {
      System.out.println(">" + string + "<");
    }
  }
}
// ><

String.split(): " s".split(" ") -> {"","","s"}

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s = "  s";
    String[] words = s.split(" ");
    for (String string : words) {
      System.out.println(">" + string + "<");
    }
  }
}
/*
><
><
>s<
*/

String.split(): " s ".split(" ") -> {"","","s"} (!) (space before and after)

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s = "   a  ";
    String[] words = s.split(" ");
    for (String string : words) {
      System.out.println(">" + string + "<");
    }
  }
}
/*
><
><
><
>a<
*/

The string passed to the split method is a regular expression

      
Regular Expression   Explanation
\\t                  A tab character
\\n                  A newline character
\\|                  A vertical bar
\\s                  Any white space character
\\s+                 One or more occurrences of any white space character
import java.util.Scanner;
public class CountWords
{
    static Scanner sc = new Scanner(System.in);
    public static void main(String[] args)
    {
        System.out.print("Enter a string: ");
        String s = sc.nextLine();
        String[] word = s.split("\\s+");
        for (String w: word)
            System.out.println(w);
    }
}

Use split() to extract substrings from a string.

    
import java.util.Arrays;
public class Main {
  public static void main(String args[]) {
    String testStr = "This is  a test.";
    System.out.println("Original string: " + testStr);
    String result[] = testStr.split("\\s+");
    System.out.print("Split at spaces: ");
    System.out.println(Arrays.toString(result));
  }
}
/*
Original string: This is  a test.
Split at spaces: [This, is, a, test.]
*/

Using second argument in the String.split() method to control the maximum number of substrings generated by splitting a string.

     

public class Main {
  public static void main(String args[]) {
    String[] temp = "A.B.BB".split("\\.", 2);
    for (String s: temp){
      System.out.println(s);
    }
  }
}

Using split() with a space can be a problem

     
public class Main {
  public static void main(String args[]) throws Exception {
    String s3 = "A  B C";
    String[] words = s3.split(" ");
    for (String s : words) {
      System.out.println(s);
    }
  }
}
/*
A
B
C
*/

Java/Data Type/String split

Содержание

Break a string into tokens

Control the maximum number of substrings generated by splitting a string.

Escape special character with a \

Keep the empty strings

Parse a line whose separator is a comma followed by a space

Parse a line with and"s and or"s

Parsing Character-Separated Data with a Regular Expression

Pattern Splitting for space splittor

Special character needs to be escaped with a \

Special characters needs to be escaped while providing them as delimeters like "." and "|".

Specify a regular expression to match one or more spaces

Split a String

Split a string using String.split()

Split by dot

" ".split(" ") generates a NullPointerException

Split on various punctuation and zero or more trailing spaces.

Split on word boundaries.

Split on word boundaries, but allow embedded periods and @.

Split same string on commas and zero or more spaces.

Splits a string around matches of the given delimiter character.

Splits a String by Character type as returned by java.lang.Character.getType(char)

Splits a String by char: Groups of contiguous characters of the same type are returned as complete tokens.

Splits the provided text into an array, separator specified.

Splits the provided text into an array, separator specified, preserving all tokens, including empty tokens created by adjacent separators.

Splits the provided text into an array, separators specified, preserving all tokens, including empty tokens created by adjacent separators.

Splits the provided text into an array, separators specified. This is an alternative to using StringTokenizer.</p>

Splits the provided text into an array, separator string specified. Returns a maximum of max substrings.

Splits the provided text into an array, using whitespace as the separator, preserving all tokens, including empty tokens created by adjacent separators.

Splits the provided text into an array with a maximum length, separators specified.

Splits the provided text into an array with a maximum length, separators specified, preserving all tokens, including empty tokens created by adjacent separators.

Split Strings with Patterns: split("[-/%]")

Split the source into two strings at the first occurrence of the splitter Subsequent occurrences are not treated specially, and may be part of the second string.

Split up a string into multiple strings based on a delimiter

Split with regular expression

String.split() is based on regular expression

String split on multicharacter delimiter

String.split(): " ".split(" ") -> {} (Empty array)

String.split(): " ".split(" ") ->(Empty array too)

String.split(): "".split("") (one empty string value array)

String.split(): " s".split(" ") -> {"","","s"}

String.split(): " s ".split(" ") -> {"","","s"} (!) (space before and after)

The string passed to the split method is a regular expression

Use split() to extract substrings from a string.

Using second argument in the String.split() method to control the maximum number of substrings generated by splitting a string.

Using split() with a space can be a problem

Навигация

Поиск