Document Object Model

PHP DOM (Document Object Model) Parser is very good at dealing with XML and HTML. It travels based on tree-structure và loads the data inlớn DOM object. The first thing you need khổng lồ vày is to construct a dom document object and then load the html content in it.

Bạn đang xem: Github

// a new dom object$dom = new domDocument; // load the html into the object$dom->loadHTML($html); // discard trắng space$dom->preserveWhiteSpace = false;

Concept of DOM

Everything in a DOM Document is a node. The DOMDocument is a hierarchical tree structure of nodes. It starts with a root node. The root node can have child nodes & child nodes can have sầu child nodes on their own. For example, there is a root element (HTML) with two children (HEAD and BODY).

The Title

It has two nodes - A DOMElement with a DOMText child.

It has three nodes - the DOMElement with a DOMAttr holding a DOMText.

There are two important functions that can be used to extract contents from the html structure:



1. Get Elements by Tag Name

The function getElementsByTagName returns an array of objects that contains all the elements with a given tag name. This function is useful when you want to lớn read the nội dung, or attribute of multiple HTML elements that have the same tag.

Example: For Getting Tables

$tables = $dom->getElementsByTagName("table");foreach($tables as $table) eđến $dom->saveHTML($table);The saveHTML function gets the exact html inside that particular node. To get the total number of elements, you can use the length attribute.

emang lại "Found: " . $tables->length . " items";Example: For Getting Links

$dom = new domDocument;$dom->loadHTML($html);$link = $dom->getElementsByTagName("a")foreach ($link as $node) emang lại $dom->saveHtml($node);


There are four things - tag name, attribute name, attribute value, and enclosed tag content.

Xem thêm: KhắC PhụC LỗI Có Mạng Nhưng Không Vào Được Một Số Trang Web

1. To get the text values of the node (enclosed tag content):

emang đến $node->nodeValue; 2. To check if the href attribute exists:

echo $node->hasAttribute("href");3. To get the href attribute value:

emang đến $node->getAttribute("href");4. To change the href attribute value:

$node->setAttribute("href", "something else");5.

Xem thêm: Download Share Code Bất Động Sản Đơn Giản Sử Dụng Wordpress, Code Website Bất Động Sản Full Chức Năng

To remove the href attribute and its value:


2. Get Element by Id

It returns an object that contains the element with a given id, or NULL if the element is not found. This function is useful when you want to read the content, or attribute value of a HTML element with a specified id.

$element = $dom->getElementById("myid");emang lại $element->nodeValue;

3. DOMXPath in PHP

The DOMXpath class is part of PHPhường DOM extension. The XPath uses path expressions to lớn select nodes. 

$doc = new domDocument();$doc->loadHTML($html);$xpath = new DOMXPath($doc);Syntax for XPath Query

/ Selects from the root node// Selects nodes in the document from the current node that match the selection no matter where they are. Selects the current node.. Selects the parent of the current node
Selects attributes

Parse h1 tag text

$contents = $xpath->query("https://h1");if (!is_null($contents)) foreach ($contents as $i => $node) $heading1 .= " " . $node->nodeValue; echo("h1: $heading1 ");Parse h3 and h4 tag text

$contents = $xpath->query("https://h3 | //h4");if (!is_null($contents)) foreach ($contents as $i => $node) $heading3and4 .= " " . $node->nodeValue; echo("h3 and h4s: $heading3and4 ");Parse meta description

$metaDescription = "";$contents = $xpath->query("/html/head/meta<
content");if ($contents->length != 0) foreach ($contents as $content) $metaDescription .= $content->value; echo("Meta Description: $metaDescription ");Parse meta keywords

$contents = $xpath->query("/html/head/meta<
content");if ($contents->length != 0) foreach ($contents as $content) $metaKeywords .= " " . $content->value; echo("Meta Keywords: $metaKeywords ");Parse Elements with class Name

$nodeList = $xpath->query("https://div<
class="class_name">");$node = $nodeList->item(0);// To check the result:emang lại "" . $node->nodeValue . "

Chuyên mục: Theme wordpress