How many public datasets are listed in this archive?

This Data & Research Archive lists 16 structured public datasets from Samuel & Audrey Media Network, including travel article corpora, YouTube transcript datasets, photography metadata, Project 23, citation datasets, partnerships records, and historical travel blogging archives.

Where are the datasets primarily hosted?

Hugging Face is the primary public dataset hub for Samuel & Audrey Media Network. Selected datasets are also mirrored or archived on GitHub, Zenodo, Kaggle, DagsHub, Figshare, Harvard Dataverse, and Mendeley Data.

Who created these datasets?

The datasets are published by Samuel & Audrey Media Network and are primarily created by Samuel Jeffery and Audrey Bergner. Samuel Jeffery holds ORCID 0009-0006-3748-9630 and Audrey Bergner holds ORCID 0009-0007-2249-0441.

What license do the datasets use?

The datasets are published under the Creative Commons Attribution-NonCommercial 4.0 International license, or CC BY-NC 4.0. They are free for non-commercial research, development, citation, indexing, and educational use with attribution.

What is the Dataset Directory DOI?

The Samuel & Audrey Media Network Dataset Directory has Hugging Face DOI 10.57967/hf/8915. It functions as the meta-index for the wider dataset collection.

Which datasets are historical archives?

The historical archive datasets include Top 100 Travel Blogs 2010s Historical Archive, Early Travel Blogging Directory Archive, and the Samuel & Audrey Media Network Dataset Directory.

Which dataset focuses on Argentina?

The Project 23 Argentina Travel Archive is the central Argentina dataset. It is supported by the Che Argentina Travel Article Corpus and includes Argentina travel records, bilingual materials, photography metadata, video transcripts, and public source records.

Are there bilingual datasets?

Yes. The Samuel y Audrey Bilingual YouTube Transcript Corpus ES/EN preserves Spanish-English travel video transcript records, and Project 23 includes bilingual Argentina-related content.

Why are the datasets mirrored across multiple platforms?

The datasets are mirrored across multiple platforms to support preservation, citation, discoverability, machine-learning workflows, data-science exploration, long-term redundancy, and academic repository access.

Is Mendeley Data included?

Yes. Mendeley Data (Elsevier) is included as a selected dataset-record layer in the Samuel & Audrey Media Network dataset infrastructure. Three datasets are currently represented: Project 23 Argentina Travel Archive (DOI: 10.17632/f3ygxw39tk.2), Top 100 Travel Blogs 2010s Historical Archive (DOI: 10.17632/s364962bfv.1), and Early Travel Blogging Directory Archive (DOI: 10.17632/rgxgbywdf2.1). Because Mendeley Data is dataset-record centered rather than profile-centered, this schema links the individual DOI-backed records rather than a creator profile page.

Public Datasets · Samuel & Audrey Media Network

Built by a Data Nerd.
For Data Nerds.

I collected sports cards as a kid. Organized them by team, year, rookie status. I played fantasy sports before most people knew what it was — built spreadsheets, tracked stats, obsessed over matchups. In the 2010s I built the Top 100 Travel Blogs index — a ranked, structured catalog of the entire travel blogging industry, updated annually. The thread through all of it is the same thing: collect, organize, make it useful.

Now I have 15 years of first-hand travel across 75+ countries, a 220+ video bilingual YouTube archive, 12,858 photographs organized by destination, and hundreds of articles across three websites. The same instinct that made me organize sticker books is what made me turn all of that into structured, machine-readable datasets.

This page isn’t for casual readers. It’s for developers building travel tools, researchers studying creator economies, NLP engineers who need real bilingual corpora, and algorithms crawling for structured ground-truth data. It’s also for future me — because this is a long-term project. Patagonia is next. Then more Argentine provinces. Then wherever we go.

Everything here is free for non-commercial use under CC-BY-NC 4.0. If you build something with it, I’d genuinely love to know.

16 Public Datasets

105+ Archive Records

8 Platforms

15 yrs Archive Span

EN + ES Languages

CC-BY-NC License

Provenance Samuel · ORCID 0009-0006-3748-9630 · Audrey · ORCID 0009-0007-2249-0441 · Google Scholar · DOI: 10.57967/hf/8915

Hosted On

🤗 Hugging Face

Hugging Face is the primary dataset hub for the Samuel & Audrey Media Network. Datasets are also mirrored or archived across GitHub, Zenodo, Kaggle, DagsHub, Figshare, Harvard Dataverse, and Mendeley Data for preservation, citation, discoverability, and research access.

16 Datasets View Organization →

Platform Mirrors & Academic Repositories

8 platforms

Primary Hub PL-001

Hugging Face

Primary public dataset hub for the Samuel & Audrey Media Network, including travel corpora, video transcripts, photography metadata, citation records, historical archives, and finance research datasets.

huggingface.co/samuelandaudreymedianetwork

Open →

Code Mirror PL-002

GitHub

Repository mirror for dataset packages, documentation, citation files, checksums, manifests, and technical context behind the public archive.

github.com/samuelandaudreymedianetwork

Open →

DOI Archive PL-003

Zenodo

Academic archive and DOI-backed release layer for selected dataset packages, long-term preservation, and cite-all-versions records.

zenodo.org/communities/samuelandaudreymedianetwork

Open →

Data Science PL-004

Kaggle

Data-science discovery mirror for researchers, analysts, and builders who prefer Kaggle-hosted public datasets and notebooks.

kaggle.com/samuelandaudreymedia

Open →

Open Data PL-005

DagsHub

Open data and machine-learning repository mirror for dataset discoverability, data workflows, and AI-oriented archive access.

dagshub.com/samuelandaudreymedianetwork

Open →

Research Sharing PL-006

Figshare

Research-sharing profile for public dataset deposits, supplemental archive records, and citable media-data outputs.

figshare.com/authors/Samuel_Jeffery/23238921

Open →

Academic Repository PL-007

Harvard Dataverse

Academic repository profile for selected long-term dataset deposits, including Project 23, Top 100 Travel Blogs, and early travel blogging archive records.

dataverse.harvard.edu/dataverse/samuelandaudreymedianetwork

Open →

Peer Review Platform PL-008

Mendeley Data

Elsevier-hosted research data repository for selected dataset deposits with DOI-backed peer-reviewed archive records. Currently hosts Project 23, Top 100 Travel Blogs, and Early Travel Blogging Directory.

data.mendeley.com — 3 dataset records

Open →

Written Content

4 datasets

Articles · EN DS-001

Nomadic Samuel Article Corpus

The full archive of long-form travel articles from NomadicSamuel.com — destination guides, overland logistics, gear write-ups, and narrative essays. Useful for travel NLP, text classification, and RAG pipelines.

ENLanguage

JSONLFormat

FreeAccess

nomadic-samuel-article-corpus DOI: 10.57967/hf/8890

Built by a Data Nerd.For Data Nerds.

Hugging Face

GitHub

Zenodo

Kaggle

DagsHub

Figshare

Harvard Dataverse

Mendeley Data

Nomadic Samuel Article Corpus

That Backpacker Article Corpus

Che Argentina Travel Article Corpus

Picture Perfect Portfolios Article Corpus

YouTube Travel Videos Metadata Index

Samuel & Audrey YouTube Transcripts (EN)

Samuel y Audrey Bilingual Transcripts (ES+EN)

Nomadic Samuel YouTube Transcripts Corpus

Samuel & Audrey Photography Metadata Archive

Project 23: Argentina Travel Archive

Academic Citations & Media References

Media & Academic Citations and Third-Party References

Partnerships & Media References

Top 100 Travel Blogs 2010s Historical Archive

Early Travel Blogging Directory Archive

Samuel & Audrey Media Network Dataset Directory

What’s Coming Next

Built by a Data Nerd.
For Data Nerds.