Skip to main content
Unclassified Records

UAP / Entity Encounter Datasets: Access & Methods

- URL: https://www.kaggle.com/datasets/NUFORC/ufo-sightings

2026-03-1910 min readSource indexdata_sources_and_access.md

Last updated: 2026-03-19


1. NUFORC (National UFO Reporting Center) Database

1A. Kaggle — NUFORC UFO Sightings (Original)

  • URL: https://www.kaggle.com/datasets/NUFORC/ufo-sightings
  • Records: ~80,000+ sightings (scrubbed version removes incomplete entries)
  • Format: CSV in ZIP archive (~10.7 MB)
  • Fields: city, state, date/time, shape, duration, description, latitude, longitude, country
  • Two versions available:
    • Scrubbed — cleaned, missing locations/times removed
    • Complete — includes entries with missing/blank locations (0.8%) and erroneous/blank times (8%)
  • How to download: Free Kaggle account required. Visit URL and click "Download"
  • Limitations: Last updated November 2019. Does not include full narrative text in all records. No entity-specific tagging.

1B. Hugging Face — kcimc/NUFORC

  • URL: https://huggingface.co/datasets/kcimc/NUFORC
  • Records: 147,890 sightings (scraped January 16, 2024 — more current than Kaggle)
  • Format: Multiple formats available:
    • nuforc.json — primary JSON
    • nuforc_str.csv — characteristics as comma-separated strings
    • nuforc_list.csv — characteristics as Python lists
    • nuforc_bool.csv — one boolean column per characteristic
  • Fields (26 columns):
    • Sighting ID, Occurred, Location, Shape, Duration, No. of observers
    • Reported, Posted, Summary, Full Text
    • Boolean flags: Lights on object, Aura/haze, Aircraft nearby, Animals reacted, Left a trail, Emitted other objects, Changed color, Emitted beams, Electrical/magnetic effects, Possible abduction, Missing time, Marks found on body, Landed
    • Location details, Explanation
  • How to download:
    from datasets import load_dataset
    dataset = load_dataset('kcimc/NUFORC')
    
    Or: git clone https://huggingface.co/datasets/kcimc/NUFORC
  • Limitations: nuforc_flat.csv has schema issues. Note the boolean flags for abduction, missing time, marks on body, and landing are highly relevant for entity encounter filtering.
  • BEST FOR ENTITY FILTERING: Use the boolean columns Possible abduction, Missing time, Marks found on body as initial filters, then text-search the narratives.

1C. GitHub — timothyrenner/nuforc_sightings_data

  • URL: https://github.com/timothyrenner/nuforc_sightings_data
  • Records: ~100,000+ raw reports; ~90,000 geocoded in processed version
  • Format:
    • Raw: line-delimited JSON (data/raw/nuforc_reports.json)
    • Processed: CSV (data/processed/nuforc_reports.csv)
  • Fields: Summary, city, state, date/time (ISO 8601), shape, duration, stats, report link, full text, posted date, city lat/long
  • How to download:
    git clone https://github.com/timothyrenner/nuforc_sightings_data.git
    conda env create -f environment.yaml
    conda activate nuforc
    pip install -r requirements.txt
    dvc repro  # Takes 3-4 hours, scrapes fresh data
    
  • Limitations: NUFORC's TOS forbids scraping and redistribution. DVC pipeline re-scrapes data (slow). Email NUFORC CTO directly to request data legitimately.

1D. Hugging Face — latterworks/nuforc-summar-narrative-highlighted.json

  • URL: https://huggingface.co/datasets/latterworks/nuforc-summar-narrative-highlighted.json
  • Records: 161,685 rows
  • Format: JSON (auto-converted to Parquet, 94.2 MB)
  • Fields: Sighting_ID, text (narrative), label (binary 0/1), split
  • How to download:
    from datasets import load_dataset
    dataset = load_dataset('latterworks/nuforc-summar-narrative-highlighted.json')
    
  • Limitations: Only 4 fields. Binary label meaning unclear (likely classification task). Useful for NLP/text analysis of narratives.

1E. CORGIS Dataset Project

  • URL: https://corgis-edu.github.io/corgis/csv/ufo_sightings/
  • Records: 80,000+
  • Format: CSV (direct download)
  • Fields: 15 fields — city, state, country, lat/long, shape, duration, descriptions, date/time breakdowns
  • How to download: Direct CSV download from the page
  • Limitations: Educational dataset, may be simplified

1F. Maven Analytics

  • URL: https://mavenanalytics.io/data-playground/ufo-sightings
  • Records: 80,000+ (1949-2014)
  • Format: CSV, 25 fields
  • Fields: city, state, country, lat/long, shape, duration, date/time, comments
  • How to download: Free download from Maven Analytics data playground
  • Limitations: Same base data as Kaggle version

1G. data.world — timothyrenner/ufo-sightings

1H. NUFORC Direct (Official)

  • URL: https://nuforc.org/databank/
  • Records: 150,000+ and growing
  • Format: HTML pages organized by date, state, and shape
  • How to download: Browse only. No bulk download. Must request data directly from NUFORC.
  • Limitations: Not structured for download. TOS prohibits scraping.

2. Enigmatic Ideas — AI-Extracted Feature Analysis of NUFORC

2A. 35-Feature AI Extraction

Methodology

  • AI model used: the model (Anthropic) read each of the 152,000 narratives
  • 35 features extracted across 8 categories:
    1. Object characteristics (shape, size, color, etc.)
    2. Behavior (movement patterns)
    3. Lights (light characteristics)
    4. Sensory effects (sounds, etc.)
    5. Physical effects (on environment/witnesses)
    6. Encounter details (entity type, interaction)
    7. Witness information (reaction, number)
    8. Subsequent activity (follow-up events)

Entity Types Classified (from 2,569 entity encounters)

Entity TypeCount
Humanoid946
Grey577
Other504
Small200
Tall135
Shadow53
Light being43
Robotic15
Amorphous12

Data Availability

  • Free members: High-level extracted features + index mapping sighting IDs to patterns
  • Insider members (paid): Complete methodology, feature taxonomy specification, extraction prompts, processing code, Jupyter notebooks
  • Raw extracted feature data: NOT publicly available (NUFORC proprietary restrictions)
  • Format: Not specified (likely CSV/JSON behind paywall)
  • No GitHub repo available

How to Access

  1. Visit https://enigmaticideas.com and create a free account for basic feature data
  2. Upgrade to "Insider" membership for full methodology and code
  3. Individual NUFORC sighting IDs are referenced — can be looked up at nuforc.org

Limitations

The processed dataset with all 35 features per sighting is gated behind membership. The author cites NUFORC's proprietary data restrictions as the reason for not releasing the full extracted dataset publicly.


3. Albert Rosales — Humanoid Encounters Database

3A. IRAAP Website

  • URL: http://www.iraap.org/rosales (SSL certificate expired — use HTTP)
  • Records: 18,000+ humanoid sighting cases (growing)
  • Format: HTML pages organized by year
  • How to access: Browse individual year pages (e.g., /rosales/1997.htm, /rosales/2001.htm)
  • Limitations: Not structured data. HTML only. No bulk download. SSL cert expired.

3B. Internet Archive — Book Series PDFs

Multiple volumes of Rosales' 17-volume "Humanoid Encounters: The Others Among Us" series are available:

3C. Other Sources for Rosales Data

3D. 6 Degrees of John Keel — Humanoid Encounters Project

  • URL: https://6degreesofjohnkeel.com/blog/announcing-the-humanoid-encounters-project
  • Status: In development. Rosales gave permission to build a structured database.
  • Records: 18,000+ from Rosales' complete dataset
  • Phase 1: Completed — preliminary descriptive statistics
  • Phase 2: Underway
  • Phase 3: Planning — public database for researchers
  • Format: Will be a structured database (format TBD)
  • Limitations: NOT YET PUBLICLY AVAILABLE. No download link exists yet. Monitor the blog for updates.

4. FREE Foundation — Experiencer Research Study

4A. Survey Results (Published Charts)

Study Details

  • Total respondents: 4,200+ from 100+ countries
  • Survey phases:
    • Phase 1: ~2,900 English responses + 650 Spanish
    • Phase 2: ~1,700 English responses + 200 Spanish (257 questions)
    • Phase 3: ~1,070 responses
  • Total questions: 550+ across all phases
  • Topics covered: Contact experiences with non-human intelligence, consciousness, paranormal events, demographics, family history, physical/psychological effects

Data Availability

  • Published format: PDF charts and summary statistics on experiencer.org and Issuu
  • Raw dataset: The foundation stated all data would be placed in the public domain on experiencer.org after publication. However, as of research, raw tabular data (CSV/spreadsheet) does NOT appear to be publicly downloadable.
  • Book: "Beyond UFOs: The Science of Consciousness and Contact with Non-Human Intelligence" contains analyzed results
  • Medium articles: Analysis published at https://medium.com/the-foundation-for-research-into-extraterrestrial

How to Access What's Available

  1. Visit Issuu stack for PDF chart documents
  2. Download Phase 2 results PDF from SlideShare
  3. Contact the FREE Foundation directly through experiencer.org for raw data requests
  4. Purchase "Beyond UFOs" book for comprehensive analysis

Limitations

Raw survey data in structured format (CSV, SPSS, etc.) does NOT appear to be publicly available despite promises to release it. Only summary charts and statistics in PDF format are accessible. The foundation appears to have become less active in recent years.


5. National Archives — UAP Records (US Government)

Available Collections

CollectionSizeType
Project Blue Book Photos (1954-1966)36.48 GBStill images
Project Blue Book Textual Records379.90 GB totalMicrofilm/text
USAF B&W Photos (1930-1975)648.47 MBPhotos
USAF Color Photos (1954-1980)8.75 GBPhotos
Nuclear Regulatory Commission11.74 MBElectronic records
FAA Records63.10 MBElectronic records
ODNI Records1.29 MBElectronic records
Office of SecDef Records103.37 MBElectronic records
Presidential Library MaterialsVariesMixed
Moving Images/Sound (18 items)Up to 16.25 GB eachVideo/audio
  • How to download: Direct download from archives.gov. No account required.
  • Limitations: Massive file sizes. Primarily historical (pre-1980s). Not structured data — scanned documents, photos, film. Entity encounters would need to be extracted from textual records manually. Blue Book textual records alone are 380 GB.

6. Additional Datasets

6A. richgel999/ufo_data — "Dataset of the Damned"

  • URL: https://github.com/richgel999/ufo_data
  • Records: ~18,000 (primarily from Larry Hatch's *U Database)
  • Format: JSON arrays with GitHub Flavored Markdown
  • Key file: bin/hatch_udb.json
  • Live search: https://ufo-search.com
  • How to download: Clone the GitHub repo
  • Limitations: General UFO chronology, not entity-encounter specific. Draws from Vallee, Dolan, Eberhart, and other historical researchers.

6B. MUFON Case Management System

  • URL: https://mufon.com/
  • Records: 100,000+ cases
  • Format: SQL database (MySQL backend, Perl logic)
  • How to access: Browse recent reports free at mufon.com. Full CMS access requires paid MUFON membership.
  • Limitations: NO bulk download. NO public API. Redistribution explicitly forbidden without written consent. Legal action threatened for violations. Data locked behind membership paywall.

6C. UAPedia

  • URL: https://www.uapedia.ai/
  • Format: Wiki-style knowledge base
  • Limitations: Reference site, not a downloadable dataset

6D. SkyWatch (AI-Enhanced UAP Database)

  • URL: Referenced in Devpost
  • Records: 500,000+ reports claimed
  • Format: Knowledge graphs, GPU-powered analytics
  • Limitations: Appears to be a project/prototype, not a publicly downloadable dataset

Immediate Access (can download today)

  1. Hugging Face kcimc/NUFORC — Best starting point. 147,890 records with boolean flags for abduction, missing time, body marks, landing. Filter these + text search narratives for entity keywords.
  2. Kaggle NUFORC — 80,000+ records, easy CSV download, good for initial exploration.
  3. GitHub ufo_data — Larry Hatch's 18,000 records in JSON, historical breadth.
  4. National Archives — Government records, but massive and unstructured.

Requires Processing/Extraction

  1. Rosales books on Archive.org — Download PDFs, OCR text available. Would need NLP pipeline to extract structured cases from book text. ~18,000 cases across 17 volumes.
  2. IRAAP website — HTML scraping of year-by-year humanoid encounter reports.

Requires Membership or Contact

  1. Enigmatic Ideas — Free membership gets basic features; paid "Insider" gets full 35-feature extraction code and methodology.
  2. MUFON — Paid membership for CMS access; no bulk export.
  3. FREE Foundation — Contact directly for raw survey data; only PDF charts publicly available.

Future/In Development

  1. 6 Degrees of John Keel Humanoid DB — Structured Rosales database in development. Monitor for public release.

8. Quick-Start Commands

from datasets import load_dataset
import pandas as pd

# Load full dataset
ds = load_dataset('kcimc/NUFORC')
df = pd.DataFrame(ds['train'])

# Filter for entity-relevant encounters
entity_mask = (
    df['Possible abduction'] == True |
    df['Missing Time'] == True |
    df['Marks found on body afterwards'] == True |
    df['Landed'] == True
)
entity_encounters = df[entity_mask]
print(f"Entity-relevant encounters: {len(entity_encounters)}")

# Text search for entity keywords in narratives
keywords = ['being', 'entity', 'creature', 'humanoid', 'grey', 'gray',
            'alien', 'occupant', 'figure', 'person', 'abduct']
text_mask = df['Text'].str.lower().str.contains('|'.join(keywords), na=False)
text_entity = df[text_mask]
print(f"Narrative mentions of entities: {len(text_entity)}")

# Save filtered results
entity_encounters.to_csv('nuforc_entity_encounters.csv', index=False)

Download Larry Hatch Database

git clone https://github.com/richgel999/ufo_data.git
# Data is in ufo_data/bin/hatch_udb.json

Download Rosales PDF from Archive.org

curl -L -o rosales_1975_1979.pdf "https://archive.org/download/humanoid-encounters-1975-1979-albert-s-rosales/Humanoid%20Encounters%201975-1979%20Albert%20S%20Rosales.pdf"

More in Source Index

See all →