Takes text and splits it up into plToken objects. The result can be used for easier parsing. More...

#include <Tokenizer.h>

Public Member Functions
	plTokenizer (plAllocator *pAllocator=nullptr)
	Constructor.

void	Tokenize (plArrayPtr< const plUInt8 > data, plLogInterface *pLog, bool bCopyData=true)
	Clears any previous result and creates a new token stream for the given array.

const plDeque< plToken > &	GetTokens () const
	Gives read access to the token stream.

plDeque< plToken > &	GetTokens ()
	Gives read and write access to the token stream.

void	GetAllTokens (plDynamicArray< const plToken * > &ref_tokens) const
	Returns an array with a copy of all tokens. Use this when using plTokenParseUtils.

void	GetAllLines (plDynamicArray< const plToken * > &ref_tokens) const
	Returns an array of all tokens. New line tokens are ignored.

plResult	GetNextLine (plUInt32 &ref_uiFirstToken, plHybridArray< const plToken *, 32 > &ref_tokens) const
	Returns an array of tokens that represent the next line in the file.

plResult	GetNextLine (plUInt32 &ref_uiFirstToken, plHybridArray< plToken *, 32 > &ref_tokens)

const plArrayPtr< const plUInt8 >	GetTokenizedData () const
	Returns the internal copy of the tokenized data. Will be empty if Tokenize was called with 'bCopyData' equals 'false'.

void	SetTreatHashSignAsLineComment (bool bHashSignIsLineComment)
	Enables treating lines that start with # character as line comments.

Detailed Description

Takes text and splits it up into plToken objects. The result can be used for easier parsing.

The tokenizer is built to work on code that is similar to C. That means it will tokenize comments and strings as they are defined in the C language. Also line breaks that end with a backslash are not really considered as line breaks.
White space is defined as spaces and tabs.
Identifiers are names that consist of alphanumerics and underscores.
Non-Identifiers are everything else. However, they will currently never consist of more than a single character. Ie. '++' will be tokenized as two consecutive non-Identifiers.
Parenthesis etc. will not be tokenized in any special way, they are all considered as non-Identifiers.

The token stream will always end with an end-of-file token.

Constructor & Destructor Documentation

◆ plTokenizer()

plTokenizer::plTokenizer ( plAllocator * pAllocator = nullptr )

Constructor.

Takes an additional optional allocator. If no allocator is given the default allocator will be used.

Member Function Documentation

◆ GetNextLine()

plResult plTokenizer::GetNextLine	(	plUInt32 &	ref_uiFirstToken,
		plHybridArray< const plToken *, 32 > &	ref_tokens ) const

Returns an array of tokens that represent the next line in the file.

Returns PL_SUCCESS when there was more data to return, PL_FAILURE if the end of the file was reached already. uiFirstToken is the index from where to start. It will be updated automatically. Consecutive calls to GetNextLine() with the same uiFirstToken variable will give one line after the other.

Note: This function takes care of handling the 'backslash/newline' combination, as defined in the C language. That means all such sequences will be ignored. Therefore the tokens that are returned as one line might not contain all tokens that are actually in the stream. Also the tokens might have different line numbers, when two or more lines from the file are merged into one logical line.

Todo: Theoretically, if the line ends with an identifier, and the next directly starts with one again,

◆ SetTreatHashSignAsLineComment()

void plTokenizer::SetTreatHashSignAsLineComment ( bool bHashSignIsLineComment )

inline

Enables treating lines that start with # character as line comments.

Needs to be set before tokenization to take effect.

◆ Tokenize()

void plTokenizer::Tokenize	(	plArrayPtr< const plUInt8 >	data,
		plLogInterface *	pLog,
		bool	bCopyData = true )

Clears any previous result and creates a new token stream for the given array.

Parameters

data	The string data to be tokenized.
pLog	A log interface that will receive any tokenization errors.
bCopyData	If set, 'data' will be copied into a member variable and tokenization is run on the copy, allowing for the original data storage to be deallocated after this call. If false, tokenization will reference 'data' directly and thus, 'data' must outlive this instance.

The documentation for this class was generated from the following files:

Code/Engine/Foundation/CodeUtils/Tokenizer.h
Code/Engine/Foundation/CodeUtils/Implementation/Tokenizer.cpp