Last updated: 2026-03-19
1. NUFORC (National UFO Reporting Center) Database
1A. Kaggle — NUFORC UFO Sightings (Original)
- URL: https://www.kaggle.com/datasets/NUFORC/ufo-sightings
- Records: ~80,000+ sightings (scrubbed version removes incomplete entries)
- Format: CSV in ZIP archive (~10.7 MB)
- Fields: city, state, date/time, shape, duration, description, latitude, longitude, country
- Two versions available:
- Scrubbed — cleaned, missing locations/times removed
- Complete — includes entries with missing/blank locations (0.8%) and erroneous/blank times (8%)
- How to download: Free Kaggle account required. Visit URL and click "Download"
- Limitations: Last updated November 2019. Does not include full narrative text in all records. No entity-specific tagging.
1B. Hugging Face — kcimc/NUFORC
- URL: https://huggingface.co/datasets/kcimc/NUFORC
- Records: 147,890 sightings (scraped January 16, 2024 — more current than Kaggle)
- Format: Multiple formats available:
nuforc.json— primary JSONnuforc_str.csv— characteristics as comma-separated stringsnuforc_list.csv— characteristics as Python listsnuforc_bool.csv— one boolean column per characteristic
- Fields (26 columns):
- Sighting ID, Occurred, Location, Shape, Duration, No. of observers
- Reported, Posted, Summary, Full Text
- Boolean flags: Lights on object, Aura/haze, Aircraft nearby, Animals reacted, Left a trail, Emitted other objects, Changed color, Emitted beams, Electrical/magnetic effects, Possible abduction, Missing time, Marks found on body, Landed
- Location details, Explanation
- How to download:
Or:from datasets import load_dataset dataset = load_dataset('kcimc/NUFORC')git clone https://huggingface.co/datasets/kcimc/NUFORC - Limitations:
nuforc_flat.csvhas schema issues. Note the boolean flags for abduction, missing time, marks on body, and landing are highly relevant for entity encounter filtering. - BEST FOR ENTITY FILTERING: Use the boolean columns
Possible abduction,Missing time,Marks found on bodyas initial filters, then text-search the narratives.
1C. GitHub — timothyrenner/nuforc_sightings_data
- URL: https://github.com/timothyrenner/nuforc_sightings_data
- Records: ~100,000+ raw reports; ~90,000 geocoded in processed version
- Format:
- Raw: line-delimited JSON (
data/raw/nuforc_reports.json) - Processed: CSV (
data/processed/nuforc_reports.csv)
- Raw: line-delimited JSON (
- Fields: Summary, city, state, date/time (ISO 8601), shape, duration, stats, report link, full text, posted date, city lat/long
- How to download:
git clone https://github.com/timothyrenner/nuforc_sightings_data.git conda env create -f environment.yaml conda activate nuforc pip install -r requirements.txt dvc repro # Takes 3-4 hours, scrapes fresh data - Limitations: NUFORC's TOS forbids scraping and redistribution. DVC pipeline re-scrapes data (slow). Email NUFORC CTO directly to request data legitimately.
1D. Hugging Face — latterworks/nuforc-summar-narrative-highlighted.json
- URL: https://huggingface.co/datasets/latterworks/nuforc-summar-narrative-highlighted.json
- Records: 161,685 rows
- Format: JSON (auto-converted to Parquet, 94.2 MB)
- Fields: Sighting_ID, text (narrative), label (binary 0/1), split
- How to download:
from datasets import load_dataset dataset = load_dataset('latterworks/nuforc-summar-narrative-highlighted.json') - Limitations: Only 4 fields. Binary label meaning unclear (likely classification task). Useful for NLP/text analysis of narratives.
1E. CORGIS Dataset Project
- URL: https://corgis-edu.github.io/corgis/csv/ufo_sightings/
- Records: 80,000+
- Format: CSV (direct download)
- Fields: 15 fields — city, state, country, lat/long, shape, duration, descriptions, date/time breakdowns
- How to download: Direct CSV download from the page
- Limitations: Educational dataset, may be simplified
1F. Maven Analytics
- URL: https://mavenanalytics.io/data-playground/ufo-sightings
- Records: 80,000+ (1949-2014)
- Format: CSV, 25 fields
- Fields: city, state, country, lat/long, shape, duration, date/time, comments
- How to download: Free download from Maven Analytics data playground
- Limitations: Same base data as Kaggle version
1G. data.world — timothyrenner/ufo-sightings
- URL: https://data.world/timothyrenner/ufo-sightings
- Records: Varies (same source as GitHub repo)
- Format: CSV
- How to download: Free data.world account required
- Limitations: May require account authentication
1H. NUFORC Direct (Official)
- URL: https://nuforc.org/databank/
- Records: 150,000+ and growing
- Format: HTML pages organized by date, state, and shape
- How to download: Browse only. No bulk download. Must request data directly from NUFORC.
- Limitations: Not structured for download. TOS prohibits scraping.
2. Enigmatic Ideas — AI-Extracted Feature Analysis of NUFORC
2A. 35-Feature AI Extraction
- URL (methodology): https://enigmaticideas.com/finding-patterns-in-152-000-ufo-uap-sightings/
- URL (entity encounters): https://enigmaticideas.com/what-152-000-ufo-reports-reveal-about-entity-encounters/
- URL (taxonomy): https://enigmaticideas.com/classifying-the-unknown-a-taxonomy-of-ufo-sightings/
- Records analyzed: 152,691 NUFORC sightings
- Entity encounters found: 2,569 (1.68% of total)
Methodology
- AI model used: the model (Anthropic) read each of the 152,000 narratives
- 35 features extracted across 8 categories:
- Object characteristics (shape, size, color, etc.)
- Behavior (movement patterns)
- Lights (light characteristics)
- Sensory effects (sounds, etc.)
- Physical effects (on environment/witnesses)
- Encounter details (entity type, interaction)
- Witness information (reaction, number)
- Subsequent activity (follow-up events)
Entity Types Classified (from 2,569 entity encounters)
| Entity Type | Count |
|---|---|
| Humanoid | 946 |
| Grey | 577 |
| Other | 504 |
| Small | 200 |
| Tall | 135 |
| Shadow | 53 |
| Light being | 43 |
| Robotic | 15 |
| Amorphous | 12 |
Data Availability
- Free members: High-level extracted features + index mapping sighting IDs to patterns
- Insider members (paid): Complete methodology, feature taxonomy specification, extraction prompts, processing code, Jupyter notebooks
- Raw extracted feature data: NOT publicly available (NUFORC proprietary restrictions)
- Format: Not specified (likely CSV/JSON behind paywall)
- No GitHub repo available
How to Access
- Visit https://enigmaticideas.com and create a free account for basic feature data
- Upgrade to "Insider" membership for full methodology and code
- Individual NUFORC sighting IDs are referenced — can be looked up at nuforc.org
Limitations
The processed dataset with all 35 features per sighting is gated behind membership. The author cites NUFORC's proprietary data restrictions as the reason for not releasing the full extracted dataset publicly.
3. Albert Rosales — Humanoid Encounters Database
3A. IRAAP Website
- URL: http://www.iraap.org/rosales (SSL certificate expired — use HTTP)
- Records: 18,000+ humanoid sighting cases (growing)
- Format: HTML pages organized by year
- How to access: Browse individual year pages (e.g., /rosales/1997.htm, /rosales/2001.htm)
- Limitations: Not structured data. HTML only. No bulk download. SSL cert expired.
3B. Internet Archive — Book Series PDFs
Multiple volumes of Rosales' 17-volume "Humanoid Encounters: The Others Among Us" series are available:
- 1975-1979: https://archive.org/details/humanoid-encounters-1975-1979-albert-s-rosales
- Formats: PDF (4.0 MB), EPUB (872.9K), Full Text (858.7K)
- 1950-1954: https://archive.org/details/humanoid-encounters-1950-1954-albert-s-rosales
- Formats: PDF, EPUB, Full Text
- 1984 Reports: https://archive.org/details/82210915-1984-humanoid-reports
- Collection: https://archive.org/details/HUMANOIDSENCOUNTERSCOLLECTIONFROMINTERNATIONALUFOLOGYBYALBERTROSALES
- How to download: Click download links on each Archive.org page. Free, no account required for most.
- Limitations: Scanned books with OCR (Tesseract 5.3.0). Text quality varies. NOT structured data — would require parsing/extraction to create a database.
3C. Other Sources for Rosales Data
- StudyLib: https://studylib.net/doc/8042639/5046-albert-rosales-humanoid-sighting (1000 BC - 2007, viewable online)
- PDFCoffee: https://pdfcoffee.com/humanoid-sighting-reportsalbert-rosalespdf-pdf-free.html
- Amazon Kindle: 17-volume series available for purchase ($3-10/volume)
3D. 6 Degrees of John Keel — Humanoid Encounters Project
- URL: https://6degreesofjohnkeel.com/blog/announcing-the-humanoid-encounters-project
- Status: In development. Rosales gave permission to build a structured database.
- Records: 18,000+ from Rosales' complete dataset
- Phase 1: Completed — preliminary descriptive statistics
- Phase 2: Underway
- Phase 3: Planning — public database for researchers
- Format: Will be a structured database (format TBD)
- Limitations: NOT YET PUBLICLY AVAILABLE. No download link exists yet. Monitor the blog for updates.
4. FREE Foundation — Experiencer Research Study
4A. Survey Results (Published Charts)
- Primary URL: https://www.experiencer.org/
- Issuu Documents: https://issuu.com/experiencer/stacks/45148db799eb47438a2bb153b7c85104
- SlideShare (Phase 2): https://www.slideshare.net/Experiencer/complete-phase-2-questions-1257-anonymous-free-survey-results
Study Details
- Total respondents: 4,200+ from 100+ countries
- Survey phases:
- Phase 1: ~2,900 English responses + 650 Spanish
- Phase 2: ~1,700 English responses + 200 Spanish (257 questions)
- Phase 3: ~1,070 responses
- Total questions: 550+ across all phases
- Topics covered: Contact experiences with non-human intelligence, consciousness, paranormal events, demographics, family history, physical/psychological effects
Data Availability
- Published format: PDF charts and summary statistics on experiencer.org and Issuu
- Raw dataset: The foundation stated all data would be placed in the public domain on experiencer.org after publication. However, as of research, raw tabular data (CSV/spreadsheet) does NOT appear to be publicly downloadable.
- Book: "Beyond UFOs: The Science of Consciousness and Contact with Non-Human Intelligence" contains analyzed results
- Medium articles: Analysis published at https://medium.com/the-foundation-for-research-into-extraterrestrial
How to Access What's Available
- Visit Issuu stack for PDF chart documents
- Download Phase 2 results PDF from SlideShare
- Contact the FREE Foundation directly through experiencer.org for raw data requests
- Purchase "Beyond UFOs" book for comprehensive analysis
Limitations
Raw survey data in structured format (CSV, SPSS, etc.) does NOT appear to be publicly available despite promises to release it. Only summary charts and statistics in PDF format are accessible. The foundation appears to have become less active in recent years.
5. National Archives — UAP Records (US Government)
- URL: https://www.archives.gov/research/catalog/catalog-bulk-downloads/uap-bulk-download
- Last updated: April 24, 2025
- Format: ZIP files containing images, video, PDFs + JSON metadata
Available Collections
| Collection | Size | Type |
|---|---|---|
| Project Blue Book Photos (1954-1966) | 36.48 GB | Still images |
| Project Blue Book Textual Records | 379.90 GB total | Microfilm/text |
| USAF B&W Photos (1930-1975) | 648.47 MB | Photos |
| USAF Color Photos (1954-1980) | 8.75 GB | Photos |
| Nuclear Regulatory Commission | 11.74 MB | Electronic records |
| FAA Records | 63.10 MB | Electronic records |
| ODNI Records | 1.29 MB | Electronic records |
| Office of SecDef Records | 103.37 MB | Electronic records |
| Presidential Library Materials | Varies | Mixed |
| Moving Images/Sound (18 items) | Up to 16.25 GB each | Video/audio |
- How to download: Direct download from archives.gov. No account required.
- Limitations: Massive file sizes. Primarily historical (pre-1980s). Not structured data — scanned documents, photos, film. Entity encounters would need to be extracted from textual records manually. Blue Book textual records alone are 380 GB.
6. Additional Datasets
6A. richgel999/ufo_data — "Dataset of the Damned"
- URL: https://github.com/richgel999/ufo_data
- Records: ~18,000 (primarily from Larry Hatch's *U Database)
- Format: JSON arrays with GitHub Flavored Markdown
- Key file:
bin/hatch_udb.json - Live search: https://ufo-search.com
- How to download: Clone the GitHub repo
- Limitations: General UFO chronology, not entity-encounter specific. Draws from Vallee, Dolan, Eberhart, and other historical researchers.
6B. MUFON Case Management System
- URL: https://mufon.com/
- Records: 100,000+ cases
- Format: SQL database (MySQL backend, Perl logic)
- How to access: Browse recent reports free at mufon.com. Full CMS access requires paid MUFON membership.
- Limitations: NO bulk download. NO public API. Redistribution explicitly forbidden without written consent. Legal action threatened for violations. Data locked behind membership paywall.
6C. UAPedia
- URL: https://www.uapedia.ai/
- Format: Wiki-style knowledge base
- Limitations: Reference site, not a downloadable dataset
6D. SkyWatch (AI-Enhanced UAP Database)
- URL: Referenced in Devpost
- Records: 500,000+ reports claimed
- Format: Knowledge graphs, GPU-powered analytics
- Limitations: Appears to be a project/prototype, not a publicly downloadable dataset
7. Recommended Strategy for Entity Encounter Research
Immediate Access (can download today)
- Hugging Face kcimc/NUFORC — Best starting point. 147,890 records with boolean flags for abduction, missing time, body marks, landing. Filter these + text search narratives for entity keywords.
- Kaggle NUFORC — 80,000+ records, easy CSV download, good for initial exploration.
- GitHub ufo_data — Larry Hatch's 18,000 records in JSON, historical breadth.
- National Archives — Government records, but massive and unstructured.
Requires Processing/Extraction
- Rosales books on Archive.org — Download PDFs, OCR text available. Would need NLP pipeline to extract structured cases from book text. ~18,000 cases across 17 volumes.
- IRAAP website — HTML scraping of year-by-year humanoid encounter reports.
Requires Membership or Contact
- Enigmatic Ideas — Free membership gets basic features; paid "Insider" gets full 35-feature extraction code and methodology.
- MUFON — Paid membership for CMS access; no bulk export.
- FREE Foundation — Contact directly for raw survey data; only PDF charts publicly available.
Future/In Development
- 6 Degrees of John Keel Humanoid DB — Structured Rosales database in development. Monitor for public release.
8. Quick-Start Commands
Download NUFORC from Hugging Face (recommended first step)
from datasets import load_dataset
import pandas as pd
# Load full dataset
ds = load_dataset('kcimc/NUFORC')
df = pd.DataFrame(ds['train'])
# Filter for entity-relevant encounters
entity_mask = (
df['Possible abduction'] == True |
df['Missing Time'] == True |
df['Marks found on body afterwards'] == True |
df['Landed'] == True
)
entity_encounters = df[entity_mask]
print(f"Entity-relevant encounters: {len(entity_encounters)}")
# Text search for entity keywords in narratives
keywords = ['being', 'entity', 'creature', 'humanoid', 'grey', 'gray',
'alien', 'occupant', 'figure', 'person', 'abduct']
text_mask = df['Text'].str.lower().str.contains('|'.join(keywords), na=False)
text_entity = df[text_mask]
print(f"Narrative mentions of entities: {len(text_entity)}")
# Save filtered results
entity_encounters.to_csv('nuforc_entity_encounters.csv', index=False)
Download Larry Hatch Database
git clone https://github.com/richgel999/ufo_data.git
# Data is in ufo_data/bin/hatch_udb.json
Download Rosales PDF from Archive.org
curl -L -o rosales_1975_1979.pdf "https://archive.org/download/humanoid-encounters-1975-1979-albert-s-rosales/Humanoid%20Encounters%201975-1979%20Albert%20S%20Rosales.pdf"