How to Add a Document.

About adding a document to an existed container in XML Database.

#0. prerequisite

#0: In this document, we will explain how the xquery adds an xml doucment into existed container. All the explanations are based on the query Statement:“addDoc test.xdb test.xml -f path;”
When you type a command into console, the server gets the command line and parses the command line, turns it into a specific request. Taking addDoctest.xdb test.xml -f d:\books.xml as an instance, the server receives the user typed command, then, server will perceive the user intention. After knowing the request type, the server will do actually works.

#1. In details

the BDBExecutor will invoke the member function execute() to do adding the document.

adddoc_step_1

In fact, what executor executes is xxxOperation. In this case, the BDBAddDocByLocalFileOperation does the real work. After checking the privilege of the container, container calls the function putDocument to add a document.

adddoc_step_2

It invokes the container::addDocumentInternal to put a document into a container. It shows by the picture below.

adddoc_step_3

#2. How to do XML processing with SAX?

SAX (Simple API for XML) is an event-based sequential access parser API developed by the XML-DEV mailing list for XML documents.[1] SAX provides a mechanism for reading data from an XML document that is an alternative to that provided by the Document Object Model (DOM). Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially.

A parser that implements SAX (i.e., a SAX Parser) functions as a stream parser, with an event-driven API.[1] The user defines a number of callback methods that will be called when events occur during parsing. The SAX events include (among others):

XML Element Starts and Ends
XML Processing Instructions
XML Comments
XML Text nodes

Some events correspond to XML objects that are easily returned all at once, such as comments. However, XML elements can contain many other XML objects, and so SAX represents them as does XML itself: by one event at the beginning, and another at the end. Properly speaking, the SAX interface does not deal in elements, but in events that largely correspond to tags. SAX parsing is unidirectional; previously parsed data cannot be re-read without starting the parsing operation again.
There are many SAX-like implementations in existence. In practice, details vary, but the overall model is the same. For example, XML attributes are typically provided as name and value arguments passed to element events, but can also be provided as separate events, or via a hash or similar collection of all the attributes. For another, some implementations provide “Init” and “Fin” callbacks for the very start and end of parsing; others don’t. The exact names for given event types also vary slightly between implementations. Given the following XML document:

This XML document, when passed through a SAX parser, will generate a sequence of events like the following:

XML Element start, named DocumentElement, with an attribute param equal to “value”
XML Element start, named FirstElement
XML Text node, with data equal to “Some Text”
XML Element end, named FirstElement

Processing Instruction event, with the target some_pi and data some_attr=”some_value” (the content after the target is just text; however, it is very common to imitate the syntax of XML attributes, as in this example)

XML Element start, named SecondElement, with an attribute param2 equal to “something”
XML Text node, with data equal to “Pre-Text”
XML Element start, named Inline
XML Text node, with data equal to “Inlined text”
XML Element end, named Inline
XML Text node, with data equal to “Post-text.”
XML Element end, named SecondElement
XML Element end, named DocumentElement

Note that the first line of the sample above is the XML Declaration and not a processing instruction; as such it will not be reported as a processing instruction event (although some SAX implementations provide a separate event just for the XML declaration).
The result above may vary: the SAX specification deliberately states that a given section of text may be reported as multiple sequential text events. Many parsers, for example, return separate text events for numeric character references. Thus in the example above, a SAX parser may generate a different series of events, part of which might include:

XML Element start, named FirstElement
XML Text node, with data equal to “¶” (the Unicode character U+00b6)
XML Text node, with data equal to ” Some Text”
XML Element end, named FirstElement.

It invokes the function, prepareAddDocument(). In this function, first of all, make sure the document is a valid document. And then goes into Document class to do putting operation. therefore, Doucment::getContentAsEventSource plays this role.

The system reads the file as a file stream; all operations are based on this file stream. The class member function Document::stream2events function. After setting the parser and translator up, the system will go to indexAddDocument. Perform the actual parsing steps. As we have discussed above, all things are taken as events, Such as startAElement, startAText, etc. In function Container::indexAddDocument(), it start parsing.

After starting, the parser will do parse. In NsSAX2Reader, this class will do internal parsing. The following code illustrated that.

void NsSAX2Reader::parse(XmlInputStream **is)
{
XmlInputStreamWrapper isw(is);
parse(isw);
}

In parsing phrase, at first, the scanner will scan the document. WFXMLScanner::scanDocument does that function. It will invoke the NsXercesTranscoder::startDocument. The call stack shows that.

In function “WFXMLScanner::scanContent()”, the parser do parsing the xml content. According to the type of current token, it goes into different branches, such as scanCDSection(), scanComment(), scanEndTag(), etc. And, in function scanContent(), it starts the function scanStartTagNS of WFXMLSCanner to process the xml string. When it meets a start tag, the system will go into function NsSAX2Reader::startElement(), and the call stack above illustrate that.
From the function NsHandlerBase::startElem(), the code below show that how and when write the node down.

_doc->completeNode(tprev, nodesz, needUpdatePrevious_);
If the previous node is not null, it means before creating a new node, it should write the previous node down. Function completeNode used to store the specific node, then, sets the current node infor.

In NsHandlerBase::startElem(), parser will generate the nodes according the type.

LeeHao

Leehao's Homepage about Database Internals.

How to Add a Document.

#0. prerequisite

#1. In details

#2. How to do XML processing with SAX?