All OOXML files are ZIP containers: *.docx, *.xlsx, *.pptx.
[Content_Types].xml: content type declarations_rels/.rels: package root relationshipsdocProps/: document properties/metadataw:)a:)s:) + sharedStringsw:t, a:t, SpreadsheetML string nodes; avoid numeric <v> unless typed as string.xml:space="preserve" or Word’s w:space="preserve".word/document.xmlword/header*.xml, word/footer*.xmlword/footnotes.xml, word/endnotes.xmlword/comments.xml, word/commentsExtended.xmlw:drawing → DrawingML (a:) → a:tw:fldSimple, w:instrText)w:tw:instrTexta:tw = http://schemas.openxmlformats.org/wordprocessingml/2006/main
a = http://schemas.openxmlformats.org/drawingml/2006/main
//w:t//w:drawing//a:t//w:instrTextw:p → w:r → w:t. Join all w:t under each w:p.w:br/w:tab to newline/tab while merging.w:hyperlink → w:t; URL via relationship (r:id).ppt/slides/slide*.xmlppt/notesSlides/notesSlide*.xmlppt/slideMasters/slideMaster*.xml, ppt/slideLayouts/slideLayout*.xmlppt/charts/chart*.xml (rich text within)p:spTree → a:txBody → a:p → a:r → a:ta:ta:tbl → a:tr → a:tc → a:txBody → a:p/a:r/a:tc:tx/c:rich//a:t, c:dLbls//a:tp = http://schemas.openxmlformats.org/presentationml/2006/main
a = http://schemas.openxmlformats.org/drawingml/2006/main
c = http://schemas.openxmlformats.org/drawingml/2006/chart
//a:t//a:tbl//a:tc//a:txBody//a:t//c:chart//a:t (or //c:tx//a:t)a:p level: join all a:r/a:t in a paragraph.notesSlide*.xml) for sensitive info.xl/worksheets/sheet*.xmlxl/sharedStrings.xmlxl/comments*.xmlxl/threadedComments/threadedComment*.xml<headerFooter> of each sheetxl/drawings/drawing*.xml, xl/charts/chart*.xml<c t="s"><v> holds index to sharedStrings.xml.sharedStrings.xml:
<si><t><si><r><t> (join all r/t)<c t="inlineStr"><is><t> or <is><r><t><v> isn’t text unless typed as string; formatting changes display only.//si/t and //si/r/t//is/t and //is/r/t//headerFooter/*[text()]//comment//t//a:ts = http://schemas.openxmlformats.org/spreadsheetml/2006/main
a = http://schemas.openxmlformats.org/drawingml/2006/main
c = http://schemas.openxmlformats.org/drawingml/2006/chart
//s:si/s:t | //s:si/s:r/s:t//s:is/s:t | //s:is/s:r/s:t//s:headerFooter/*[text()]//s:comments//s:comment//s:t//a:tsharedStrings.xml first; then inline strings, comments, headers, drawings/charts.$ns = New-Object System.Xml.XmlNamespaceManager($xml.NameTable)
$ns.AddNamespace('w','http://schemas.openxmlformats.org/wordprocessingml/2006/main')
$ns.AddNamespace('a','http://schemas.openxmlformats.org/drawingml/2006/main')
$ns.AddNamespace('s','http://schemas.openxmlformats.org/spreadsheetml/2006/main')
# Typical text nodes across parts
$nodes = $xml.SelectNodes('//w:t | //a:t | //s:si/s:t | //s:si/s:r/s:t', $ns)
local-name() + namespace-uri() (quick & robust)# Word plain text
$xml.SelectNodes('//*[local-name()="t" and namespace-uri()="http://schemas.openxmlformats.org/wordprocessingml/2006/main"]')
# DrawingML text (Word/PPTX/Excel drawings)
$xml.SelectNodes('//*[local-name()="t" and namespace-uri()="http://schemas.openxmlformats.org/drawingml/2006/main"]')
w:t under each w:p.a:r/a:t under each a:p.<si> <t> and <r><t>.w:br → \na:br → \nxml:space="preserve" and w:space="preserve".word/document.xml → //w:t, //w:instrText, //w:drawing//a:tword/header*.xml, word/footer*.xmlword/footnotes.xml, word/endnotes.xml, word/comments*.xmlppt/slides/slide*.xml → //a:tppt/notesSlides/notesSlide*.xml → //a:tppt/slideMasters/*, ppt/slideLayouts/*, ppt/charts/chart*.xml → //a:txl/sharedStrings.xml → //s:si/s:t | //s:si/s:r/s:txl/worksheets/sheet*.xml → inline strings //s:is//s:t, headers/footers //s:headerFooter/*[text()]xl/comments*.xml, xl/threadedComments/* → //txl/drawings/*.xml, xl/charts/*.xml → //a:ta:r/a:t → merge at a:p.w:instrText.