| |
- exceptions.Exception(exceptions.BaseException)
-
- util.tok.TokenizerError
- util.tok.Token
- util.tok.TokenInfo
- util.tok.TokenStream
- util.tok.Tokenizer
-
- util.tok.CTokenizer
class CTokenizer(Tokenizer) |
|
Tokenizer for C-like languages. |
|
Methods inherited from Tokenizer:
- __init__(self, src)
- Constructor. Creates a new Tokenizer given a source file and
(optionally) a set of flags.
- breakOff(self, toksrc)
- "breaks off" the regular expression match specified by toksrc from the
buffer, and returns it.
- fillBuffer(self)
- Make sure that the buffer has data in it.
- nextPreparsed(self)
- Return the next token in the preparsed list (the list of tokens that
has already been parsed and were put back).
- nextToken(self)
- Returns the next token, either from the preparsed cache, or from the
stream.
- parseNextToken(self)
- Parses the next token directly off of the stream. Clients should
generally avoid using this. Use nextToken() instead, since that will
use the preparsed queue if tokens have been put back.
- putBack(self, tok)
- Puts the given token back on the list. Puts it first on the list,
so immediately after calling, /tok/ will be the next token
return from @nextToken().
Data and other attributes inherited from Tokenizer:
- character = <util.tok.TokenInfo instance at 0x831ba4c>
- chr = 4
- cmt = 5
- comment = <util.tok.TokenInfo instance at 0x831b96c>
- id = 1
- identifier = <util.tok.TokenInfo instance at 0x831becc>
- int = 2
- integer = <util.tok.TokenInfo instance at 0x831bfec>
- longComment = <util.tok.TokenInfo instance at 0x831b94c>
- str = 3
- string = <util.tok.TokenInfo instance at 0x831bf2c>
- sym = 7
- symbol = <util.tok.TokenInfo instance at 0x831ba0c>
- tokTypes = [<util.tok.TokenInfo instance at 0x831becc>, <util.tok.TokenInfo instance at 0x831bfec>, <util.tok.TokenInfo instance at 0x831bf2c>, <util.tok.TokenInfo instance at 0x831b96c>, <util.tok.TokenInfo instance at 0x831b94c>, <util.tok.TokenInfo instance at 0x831beac>, <util.tok.TokenInfo instance at 0x831ba0c>]
- whitespace = <util.tok.TokenInfo instance at 0x831beac>
- ws = 6
|
class Token |
|
Tokens represent pieces of text. Each token has:
/val/::
source text of the token. Its "value".
/type/::
a numeric value indicating the tokens type
/srcName/::
Name of the source stream that it came from
/lineNum/::
Line number from which it came.
#Token.end# is a class variable set to zero. It is used to indicate
that the end of the stream has been read. *Do not use 0 as a token
Id.* |
|
Methods defined here:
- __init__(self, type, val, srcInfo)
- Constructor for Token. Type is one of the types listed above, val is
the value of the token (its text), srcName is the name of the source
file that the token was tokenized from, and /lineNum/ is the line number
in the source file.
/srcInfo/ is a tuple indicating the source file name and the line
number.
- equals(self, type, val)
- Returns true if the token is of the indicated type and has the
indicated value.
- isType(self, type)
- Returns true if the token is of the indicated type.
Data and other attributes defined here:
- end = 0
|
class TokenInfo |
|
TokenInfo holds information about token types. Each has a
/name/, a /regex/ (regular expression describing how the token is
represented) and an /id/. |
|
Methods defined here:
- __init__(self, name, regex, create, continued)
- Make one. Public variables:
/name/::
the name of the token type
/regex/::
the regular expression that describes the tokens source form
/create/::
a function that should expect a token source string
/continued/::
an optional regular expression. If it is present, it
indicates that the token may be continued over multiple lines
(if /regex/ matches to the end of the current line) and it
represents the kind of expression which will terminate the
multi line token beginning with /regex/. All lines between
the line that begins the token and the portion of a line
which ends the token are considered to be part of the token.
|
class TokenStream |
|
This class can be used as a wrapper for any object that provides
the file readline() method - it delegates that and also provides
a "name" variable, required by the tokenizer. |
|
Methods defined here:
- __init__(self, src, name)
- readline(self)
|
class Tokenizer |
|
A Tokenizer is used to extract tokens from a source stream. Tokens
are of the form normally accepted by C-like languages.
XXX This class really needs to become generic, with its C personality
moved to CTokenizer |
|
Methods defined here:
- __init__(self, src)
- Constructor. Creates a new Tokenizer given a source file and
(optionally) a set of flags.
- breakOff(self, toksrc)
- "breaks off" the regular expression match specified by toksrc from the
buffer, and returns it.
- fillBuffer(self)
- Make sure that the buffer has data in it.
- nextPreparsed(self)
- Return the next token in the preparsed list (the list of tokens that
has already been parsed and were put back).
- nextToken(self)
- Returns the next token, either from the preparsed cache, or from the
stream.
- parseNextToken(self)
- Parses the next token directly off of the stream. Clients should
generally avoid using this. Use nextToken() instead, since that will
use the preparsed queue if tokens have been put back.
- putBack(self, tok)
- Puts the given token back on the list. Puts it first on the list,
so immediately after calling, /tok/ will be the next token
return from @nextToken().
Data and other attributes defined here:
- character = <util.tok.TokenInfo instance at 0x831ba4c>
- chr = 4
- cmt = 5
- comment = <util.tok.TokenInfo instance at 0x831b96c>
- id = 1
- identifier = <util.tok.TokenInfo instance at 0x831becc>
- int = 2
- integer = <util.tok.TokenInfo instance at 0x831bfec>
- longComment = <util.tok.TokenInfo instance at 0x831b94c>
- str = 3
- string = <util.tok.TokenInfo instance at 0x831bf2c>
- sym = 7
- symbol = <util.tok.TokenInfo instance at 0x831ba0c>
- tokTypes = [<util.tok.TokenInfo instance at 0x831becc>, <util.tok.TokenInfo instance at 0x831bfec>, <util.tok.TokenInfo instance at 0x831bf2c>, <util.tok.TokenInfo instance at 0x831b96c>, <util.tok.TokenInfo instance at 0x831b94c>, <util.tok.TokenInfo instance at 0x831beac>, <util.tok.TokenInfo instance at 0x831ba0c>]
- whitespace = <util.tok.TokenInfo instance at 0x831beac>
- ws = 6
|
|