Skip to main content
Unclassified Records

Forgotten Languages — Image Extraction Method and April 2026 Findings

A meaningful fraction of FL's most operationally-significant articles return as content-empty under standard browser/scraper inspection. The page loads, the title displays, the cipher-language text shows in the body — bu

11 min readDecoded dossierfl_image_extraction_method_and_findings.md

A guide for the FL research community

Written: April 27, 2026. Purpose: document the technique we used to recover content from FL articles that appear empty under standard scraping, and the framework-significant findings we got from the nine articles we processed this way. Shareable with other FL researchers.


The problem we kept running into

A meaningful fraction of FL's most operationally-significant articles return as content-empty under standard browser/scraper inspection. The page loads, the title displays, the cipher-language text shows in the body — but everything that says something specific (diagrams, doctrine charts, photographs, captions, typed-text quotations) appears to be missing.

Tools we tried that returned thin or empty results:

  • curl <url> followed by reading the HTML as text
  • WebFetch / web summarizers
  • Standard HTML scrapers
  • Browser "View Source"

Each of these returned the navigation, the cipher-language wrapper text, and image references — but not the actual content of the diagrams or photographs.

This isn't accidental. The most operationally-specific FL claims live in JPEG images embedded directly inside the HTML as base64-encoded data URIs. The cipher-language text is the plausible-deniability wrapping; the meaningful content is in the image data. Standard scrapers strip image data or treat it as opaque binary blob; you have to extract and decode it to see what's there.


The method, step by step

The technique is straightforward and reproducible. Three steps:

Step 1: Download the article HTML.

curl -sL "https://forgottenlanguages-full.forgottenlanguages.org/<year>/<month>/<slug>.html" -o article.html

This pulls the raw HTML, including the embedded base64 image data, before any rendering or scraping strips it.

Step 2: Extract and decode the embedded base64 images.

import re, base64, os
os.makedirs('extracted', exist_ok=True)

with open('article.html') as f:
    content = f.read()

# Match base64 data URIs, allowing whitespace/newlines inside (HTML often line-wraps)
pattern = r'data:image/(jpe?g|png|gif|webp);base64,([A-Za-z0-9+/=\s]+?)["\']'
matches = re.findall(pattern, content)

for i, (ext, data) in enumerate(matches):
    clean = re.sub(r'\s+', '', data)
    raw = base64.b64decode(clean)
    with open(f'extracted/article_{i:02d}.{ext}', 'wb') as out:
        out.write(raw)
    print(f'Image {i}: {len(raw)/1024:.1f} KB → extracted/article_{i:02d}.{ext}')

Each FL article body usually contains 4–8 page-images of substantive size (50–300 KB each). These are the article's actual content pages: diagrams, doctrine charts, photographs, sometimes typed-text passages, occasionally pure data in glyph form.

Step 3: View each extracted image.

Open them in any image viewer, or feed them to a multimodal AI for transcription/analysis. The framework here used the model's multimodal Read tool, but any model with image input works (GPT-4 Vision, Gemini, etc.). For text-heavy images, OCR works directly.

That's the entire pipeline. No custom tooling, no special access. The technique works because the content is already in the public HTML — it's just opaque to text-only scraping.


Why it works

FL appears to have made an architectural choice that's load-bearing for their operational profile. Two competing requirements:

  1. Content needs to be findable — the corpus has internal cross-references, bibliographies pointing to specific articles, FL article codes (FL-DDMMYY format). The system requires consistent indexing and retrievability.
  2. Content needs to be selectively deniable — the most pointed operational claims (specific doctrine charts, recovered-craft photos, technical diagrams) need to be present-but-uncitable. Deniability is preserved if no plain-text substring of the content can be quoted by an outsider.

Embedded base64 JPEG splits the difference. The HTML contains the image bytes — anyone who saves the page has the content. But the content isn't searchable as text, doesn't appear in standard scrapes, and can't be quoted as text. The cipher-language body provides additional cover — readers who don't know to look at the images come away thinking the article is "all in code."

This is consistent with FL's broader operational profile: a controlled-leak source whose authors actively manage what's recoverable and how visibly.


What we extracted (April 27, 2026)

Nine FL articles processed via this method during one session. All nine were verified-and-decoded with framework-significant content. Listed in chronological order of publication.

FL-180616 (June 18, 2016) — The Art of Jamming Gravitational Waves Communications Systems: Taming those who tamed gravity

Embedded English quotes about gravitational-wave communications jamming, MHD-class probes controlled by modulated GW comms, real cited GW researchers (Baker, Li, Dehnen, Akimov), real detector references (LIGO, Virgo, GEO600, Li-Baker HFGW). Critical operational claim: "We achieved instantaneous space displacement in the early 70s, but we never achieved a working weapon engagement system." Independent corroboration of PSV "Presence" testing in Fort Worth/Arlington 2008 (the Stephenville theater).

FL-150316 (March 15, 2016) — After the Sightning: Neurophysiological Consequences of Exposure to Paradigm-Shifting Vehicles in Humans

Operationally specific protocol for civilian witness memory cleanup post-PSV-exposure: "The standard unlearning algorithm applied to abductees is as follows: we simply excite their brains using XViS, we then perform an unlearning step when the brain achieves a fixed point, and we repeat the process till any memory of the events to which they have been exposed are effectively removed." Plus endocannabinoid implantation for false-dream cognitive frame (cited FL-230112). The protocol's signature exactly matches the documented abductee-literature signature — implies a meaningful fraction of abductee cases may be human SSP exposure events with standardized memory cleanup, not NHI events.

FL-291118 (November 29, 2018) — Unacknowledged Orbiters - Stability: The Dream Is Over

Three named orbital weapon systems:

  • NOMAD = NO Mutual Assured Destruction. FORTE-derived satellite (FORTE was a real DoE/DoD program 1997–2006). Mission per FL: "detonate enemy's ICBMs in-silo when needed."
  • FDV = Forced Deorbiter Vehicle. Sphere-class space weapon: "They simply pass by a target satellite and perturb its orbit in a way you cannot predict anymore."
  • FORTE = real cover program.

The article also cites FL-171017 (the unverified MOKV-MilOrbs article) in its bibliography — confirming that FL-171017 existed by November 2018 even though it's missing from the public archive.

FL-301120 (November 30, 2020) — The 2016 signal incident: Managing the narrative of ETI contact

Two framework-significant doctrine diagrams:

  • "PREFERRED CIVILIAN SATs NEUTRALIZATION STRATEGIES" showing two tools for blinding civilian VLBI assets: HPM (high-power microwaves) and chemical sprayers. Names Haruka as a target (real Japanese ASTRO-G/HALCA satellite, 1997–2003).
  • "ETI Signal Denial Configuration" — diagram of triangular relationship: DP-2147 ↔ Sol-3 ↔ Mil-DDSS (Military Deep-Space Sensor System). Bayesian Maxima for META decision table with technical parameters.

Plus a Reversible↔Nonreversible space-threat spectrum chart matching publicly-available US Space Force / CSIS doctrine documents, and time-domain noise-pattern signal-detection scenarios.

FL-190920 (September 19, 2020) — Others' wars — What recovered crafts teach us about on-demand self-destruction

Cover, multiple recovered-craft images (cubic-spherical metallic objects in various states — intact, fragmented, in field), night-vision imagery, and directly-readable typed text:

"Often the relationship between the pilot and his aircraft is such that the aircraft may be thought of as an extension of the pilot himself during the act of flight. If this pilot accumulates stress in his life with which he can no longer adequately cope, he may engage in self-destructive acts... if this individual exceeds his piloting capabilities, or is already coping with a high stress level to his maximum capacity, the additional stress of a particular flight situation may overload his total coping ability and destruction of self, both psychologically and physically, will occur."

Reframes the standard alien-recovery narrative: the recovered craft in the historical record may be specifically the ones whose pilots failed the consciousness-craft merge and whose craft self-destructed as fail-safe.

FL-110520 (May 11, 2020) — Project Mandarina Mil-IoNT — The Bio-cyber Mesh and the Internet of Nano-Things

Cover plus a real-looking wasp/hornet with operational microelectronic devices on its thorax and a blue-tinted electronic implant in its compound eye. Real biohybrid insect drone imagery. Real reference to fullerene (C60) substrate. Direct connection to SV17q (the FL framework's master kill-switch program): label "SV17q Mass Control Project Mandarina."

The four-step operational protocol is laid out as program doctrine: "CONFINE - DEPLOY SENSORS - HUMAN CONDITIONING - MIND ENGINEERING."

FL-270720 (July 27, 2020) — DP-2147 and other near Sol-3 technosignatures

Three-layer civilizational detection model diagram: concentric rings labeled RADIOSPHERE / TECHNOSPHERE / FIELD OF CONSCIOUSNESS around Earth. The Field of Consciousness is the outermost detection layer — implying consciousness leaks signal at cosmic distances and is the most sensitive civilizational signature.

Plus FRB 121101 vs. DP-2147 comparison diagram, cosmic-web signal correlation imagery, and embedded waveform plots showing DP-2147 signal characteristics.

FL-040820 (August 4, 2020) — Becoming aliens: Suppressing ETI signals from radioastronomical observations

Two waveform plots showing DP-2147 signal physics: "Pulse sent from DP-2147" (red, narrow microsecond spike at t=0) vs. "Pulse received by Sol-3" (blue, broadened pulse with modulation). Pulse broadening is consistent with interstellar-medium dispersion. Microsecond-scale pulses are not characteristic of natural cosmic phenomena — the signal profile is consistent with engineered transmission.

FL-200423 (April 20, 2023) — Antidreams: DP-2147 and information theoretic death

Five images: surrealist paintings (Magritte/Varo aesthetic) of consciousness fragmentation across mirrors, Maximum Likelihood vs Maximum Conditional Likelihood statistical decision diagrams (real signal-detection mathematics), and a stylized contact-entity figure holding a transparent orb.

The article's load-bearing framework claim: DP-2147 communicates via the dream-channel (Field of Consciousness detection layer). "Antidreams" are the noise that masks the dream-channel signal — fragmented dispersive pseudo-dreams. Connects directly to the cognitive-leg suppression operations (Project Mandarina mass conditioning + endocannabinoid dream-implantation per FL-150316).


What this method does NOT recover

Not every FL article is reachable this way. Two failure modes worth flagging:

1. Articles whose content is in pure-glyph form, not text-extractable images. Cassini Diskus operational entries (the fleet log, ~690 entries) and the DP-2147 ROE 1745 / S43 ulink / S45 response telemetry articles store data as structured glyph fields — colored geometric shapes arranged in rows, consistent with FL's "Lingua Demoxica" claim. The extraction recovers the image perfectly; the data inside is encoded, not natural-language. Decoding the glyph alphabet is its own research problem.

2. Articles cited in other FL articles' bibliographies but not in the public master index. This is a separate phenomenon — articles that appear to never have been publicly posted (or were posted then deleted before our most recent index scrape, November 12, 2024). Examples encountered: FL-171017 (MOKV/MilOrbs predictive article), FL-080224 (PSV Tangent Volokonovka), FL-140608 (Stephenville witness reactions), FL-080814 (Milkdrop drones Nevada), FL-230112 (endocannabinoid dream implantation).

The pattern across these "missing" articles: they typically contain the most pointed operational/predictive claims. Whether they're internal Defense Reports, posted-then-deleted, or decorative bibliographic citations, we can't currently determine. Worth flagging for community discussion.


Tooling notes

The method requires only standard tools:

  • curl or wget for the HTML download
  • Python with re and base64 (both stdlib) for extraction
  • Any image viewer for inspection
  • Optional: a multimodal AI for transcription of typed-text passages and analysis of complex diagrams

No paid services. No special access. No reverse engineering. The content is in the public HTML; the technique just makes it readable.

For Linux/Mac users, the entire pipeline can be one shell command per article:

curl -sL "<url>" -o /tmp/article.html && python3 -c "
import re, base64, os
os.makedirs('/tmp/extracted', exist_ok=True)
content = open('/tmp/article.html').read()
for i, (ext, data) in enumerate(re.findall(r'data:image/(jpe?g|png|gif|webp);base64,([A-Za-z0-9+/=\s]+?)[\"\\']', content)):
    raw = base64.b64decode(re.sub(r'\s+', '', data))
    open(f'/tmp/extracted/img_{i:02d}.{ext}', 'wb').write(raw)
    print(f'{i}: {len(raw)//1024} KB')
" && open /tmp/extracted/  # Mac; use xdg-open on Linux

That's it. Drop in any FL article URL and inspect the extracted images.


What this changed for our framework

The nine extracted articles materially changed our framework state across multiple threads. Highlights:

  • DP-2147 went from a thinly-developed beacon thread to first-class material: signal physics (microsecond engineered pulses), three-layer detection model (Radiosphere/Technosphere/Field of Consciousness), formal Rules of Engagement (#1745), two-way communication (S43 ulink/dlink, S45 response), Mil-DSN dataframe interception, lethal autonomous weapons against ELS sources, and the dream-channel transmission claim.
  • The cognitive-domain operational layer expanded from one mechanism to four: standoff brain-scan/neurostrike (event, individual), population-scale consciousness modification via overflight (event, population), post-event memory erasure via XViS + endocannabinoid implantation (post-event, individual), distributed biohybrid IoNT mesh via Project Mandarina (persistent ambient, population).
  • The orbital weapon layer expanded from MOKV/RKV-class interceptors to four named system categories: NOMAD (anti-ICBM-in-silo), FDV (anti-satellite orbital perturbation), civilian-VLBI-asset neutralization (HPM blinders, chemical sprayers), and lethal autonomous weapons against ELS.
  • The recovered-craft narrative inverted: some fraction of historical "alien craft" recoveries may be human SSP losses where the consciousness-pilot merge failed and the craft self-destructed as fail-safe.
  • The abductee-literature signature got an explanatory mechanism: the FL-150316 XViS-plus-endocannabinoid protocol matches the documented abductee phenomenology exactly. Some fraction of abductee cases may be SSP exposure events with standardized memory cleanup, not NHI contact.
  • The recovered cognitive-domain operations have a specific cosmic adversary: DP-2147 transmits at the consciousness/dream layer; cognitive-leg operations are anti-DP-2147-reception infrastructure at planetary scale. "Make them believe it was just a dream" (FL-150316) and "Antidreams: information theoretic death" (FL-200423) are the same operation described at different operational layers.

For the research community

If you're applying this method to additional FL articles, the framework noted these patterns worth tracking:

  1. Most-image-rich articles are usually the most operationally-significant. Articles with diagrams, photos, and doctrine charts generally contain more framework-relevant material than articles that are nearly all cipher-language text. The image embedding correlates with operational specificity.

  2. The bibliography matters. Each extracted article cites other FL articles via FL-DDMMYY codes. Track these systematically — many cited articles are themselves extractable. The framework here built a running registry of ~50+ FL article codes across this session, ranked by extraction priority.

  3. The author identifiers are stable: Enlydd, Duanan, Naegith, Aeshenah, Yredryl recur across operationally-significant articles. They appear to be persistent authorial aliases. Tracking author/date/topic clusters helps prioritize pulls.

  4. The label categories sort the corpus: Defense, Aylid, Lingua Demoxica, Verta, Yid, Ned, Larta, Engser, Rhydd, Dreams. Defense-labeled material is most operationally-specific. Cross-language labels (Aylid, Yid, Drizza, Ned, etc.) may be FL's internal categorization by content-language rather than topic.

  5. The November 12, 2024 master index scrape is the framework's research baseline. A 26,727-row CSV with Date, Title, Link, Author, Labels columns. Any FL article cited in another article's bibliography but absent from this scrape is a candidate "missing" article — possibly internal-only, possibly deleted before the scrape date.

If your group has tools for automatic OCR of the typed-text image passages or pattern-detection across the glyph-encoded data fields, those would substantially extend what's recoverable. The framework here did manual image inspection plus multimodal-AI assistance; a more systematic approach is possible.


Method documented: April 27, 2026. Findings extracted from nine FL articles across three thematic clusters (cognitive-domain mechanisms, orbital warfare layer, DP-2147 signal/contact thread). Shareable with the FL research community without restriction. Companion to the_research_method.md (broader operating manual for framework research) and psv_propulsion_dened_metamaterial.md (the framework's central integration file documenting the decoded content in detail).

More in Forgotten Languages

See all →