1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
|
const ncname = `[a-zA-Z_][\\-\\.0-9_a-zA-Z]*`; const qnameCapture = `((?:${ncname}\\:)?${ncname})`; const startTagOpen = new RegExp(`^<${qnameCapture}`); const startTagClose = /^\s*(\/?)>/; const endTag = new RegExp(`^<\\/${qnameCapture}[^>]*>`); const attribute = /^\s*([^\s"'<>\/=]+)(?:\s*(=)\s*(?:"([^"]*)"+|'([^']*)'+|([^\s"'=<>`]+)))?/;
let root, currentParent;
let stack = [];
const ELEMENT_TYPE = 1; const TEXT_TYPE = 3;
function createASTElement(tagName, attrs) { return { tag: tagName, type: ELEMENT_TYPE, children: [], attrs, parent: null, }; }
function handleStartTag({ tagName, attrs }) { let element = createASTElement(tagName, attrs); if (!root) { root = element; } currentParent = element; stack.push(element); }
function handleEndTag(tagName) { let element = stack.pop(); currentParent = stack[stack.length - 1]; if (currentParent) { element.parent = currentParent; currentParent.children.push(element); } }
function handleChars(text) { text = text.replace(/\s/g, ""); if (text) { currentParent.children.push({ type: TEXT_TYPE, text, }); } }
export function parse(html) { while (html) { let textEnd = html.indexOf("<"); if (textEnd === 0) { const startTagMatch = parseStartTag(); if (startTagMatch) { handleStartTag(startTagMatch); continue; }
const endTagMatch = html.match(endTag); if (endTagMatch) { advance(endTagMatch[0].length); handleEndTag(endTagMatch[1]); continue; } }
let text; if (textEnd >= 0) { text = html.substring(0, textEnd); } if (text) { advance(text.length); handleChars(text); } }
function parseStartTag() { const start = html.match(startTagOpen);
if (start) { const match = { tagName: start[1], attrs: [], }; advance(start[0].length);
let end, attr; while ( !(end = html.match(startTagClose)) && (attr = html.match(attribute)) ) { advance(attr[0].length); attr = { name: attr[1], value: attr[3] || attr[4] || attr[5], }; match.attrs.push(attr); } if (end) { advance(1); return match; } } } function advance(n) { html = html.substring(n); } return root; }
|