Build an AI Agent to Auto-Create and Publish YouTube Videos

YouTube is the second largest search engine in the world. But creating videos takes time—scripting, recording, editing, thumbnail design, and publishing. What if an AI agent could handle most of this for you?

In this guide, I’ll show you exactly how to build an AI agent that generates video scripts, creates AI visuals, produces voiceovers, and publishes directly to your YouTube channel.

What You’ll Build

An AI agent that:

Accepts a topic as input
Generates a complete video script with scene descriptions
Creates AI voiceover narration
Generates video clips for each scene
Assembles everything into a final video
Creates a YouTube thumbnail
Uploads directly to YouTube

How It Works: The Video Pipeline

Topic

Script

Voice

Video

YouTube

You can build this using Make (Integromat) for no-code, or Python for more control. I’ll cover both.

Step 1: Set Up YouTube API Access

Step 1: Create Google Cloud Project

Go to console.cloud.google.com and create a new project.

Step 2: Enable YouTube Data API

In the sidebar, go to “APIs & Services” → “Library”. Search for “YouTube Data API v3” and enable it.

Step 3: Create Credentials

Go to “APIs & Services” → “Credentials”. Click “Create Credentials” → “OAuth client ID”. Choose “Desktop app” or “Web application”.

Step 4: Download JSON

Download your OAuth JSON file and save it securely. You’ll need client_id and client_secret.

Channel Requirements

Your YouTube channel must be verified. For high-volume uploads, you may need to be in the YouTube Partner Program.

Step 2: Get Other API Keys

OpenAI (Script + Images)

Get API key – Used for script generation and thumbnail creation

ElevenLabs (Voiceover)

Get API key – Natural-sounding AI voices

Runway or Kling (Video)

Runway or Kling AI – Generate video clips from descriptions

Method A: Build with Make (No-Code)

Make.com

Step 1: Create New Scenario

Open Make

Click “Create a new scenario”.

Add Trigger

Search for “Schedule” trigger. Set to run daily at your preferred time (e.g., 9 AM).

Step 2: Generate Video Script

Add OpenAI Module

Click + → Search “OpenAI” → Choose “Create a Completion”.

Configure API

Enter your OpenAI API key.

Add Script Prompt

Use the prompt template below for YouTube-ready scripts.

YouTube Script Prompt

Copy and paste this into your OpenAI module:

Write a YouTube video script about {{topic}}.

Requirements:
- Duration: 8-10 minutes
- Include hook (first 30 seconds to grab attention)
- 4-5 main sections, each 1-2 minutes
- End with call-to-action (subscribe, like, comment)
- Include timestamps for each section
- Add scene descriptions in brackets like [B-roll: city timelapse]
- End with 2-3 suggested video tags

Format:
Title: [Video title]
Hook: [Opening 30 seconds]
Section 1 [0:30-2:00]: [Content and scene]
Section 2 [2:00-4:00]: [Content and scene]
...
Tags: tag1, tag2, tag3

Step 3: Generate Voiceover

Add ElevenLabs Module

Click + → Search “ElevenLabs” → Choose “Convert Text to Speech”.

Connect API

Enter your ElevenLabs API key.

Configure Voice

Choose a voice ID. Popular options: “Rachel” (friendly female), “Josh” (professional male). Select MP3 output.

Map Script Text

Map the script content from Step 2 to the text field.

Step 4: Generate Video Clips

Extract Scene Descriptions

Use Make’s “Text Parser” to extract scene descriptions from your script.

Add HTTP Module

For each scene, call Runway or Kling API to generate a video clip.

Wait for Generation

Video AI typically takes 2-5 minutes. Set up polling or wait step.

Alternative: Use Stock Footage

If AI video generation is too slow or expensive, use Pexels API to pull relevant stock footage based on scene keywords.

Step 5: Assemble Video

Use InVideo or Pictory

These tools have APIs or can be integrated with Make.

Upload Clips + Audio

Send your video clips and voiceover to the video editor.

Add Subtitles + Music

Auto-generate captions. Add royalty-free background music.

Export Final Video

Download as MP4, 1080p or 4K.

Step 6: Create Thumbnail

Add OpenAI Image Module

Create Completion with image generation: DALL-E 3.

Thumbnail Prompt

“YouTube thumbnail for: [topic]. High contrast, bold text space on left, professional, attention-grabbing. 16:9 aspect ratio.”

Add Text Overlay

Use Canva or Figma to add your video title text to the thumbnail.

Step 7: Upload to YouTube

Add YouTube Module

Search for “YouTube” → “Upload a Video”.

Connect Account

Authenticate with your Google account that has YouTube access.

Map Video Data

Title: [From script]
Description: [Script summary + links]
Tags: [From script]
Privacy: Public or Schedule

Upload Thumbnail

Attach your generated thumbnail image.

Start with Private/Draft

Set privacy to “Private” or “Unlisted” for test videos. Review quality before going public.

Method B: Build with Python

Prerequisites

Python 3.8+
pip install requests openai elevenlabs google-auth
All API keys ready

Complete Python Script

Copy this entire script into a file called ai_youtube_agent.py

#!/usr/bin/env python3
"""
AI Agent: Auto-Create and Publish YouTube Videos
"""
import requests
import json
import os
import time
from openai import OpenAI
from elevenlabs import client as elevenlabs_client
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow

# ============== CONFIGURATION ==============
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "your-key")
ELEVENLABS_API_KEY = os.environ.get("ELEVENLABS_API_KEY", "your-key")
RUNWAY_API_KEY = os.environ.get("RUNWAY_API_KEY", "your-key")

client = OpenAI(api_key=OPENAI_API_KEY)
elevenlabs = elevenlabs_client(api_key=ELEVENLABS_API_KEY)

# YouTube API scopes
SCOPES = ['https://www.googleapis.com/auth/youtube.upload']

# ============== FUNCTIONS ==============
def generate_script(topic):
    """Generate YouTube video script"""
    prompt = f"""Write a YouTube video script about {topic}.

Requirements:
- 8-10 minute video
- Hook (first 30 seconds)
- 4-5 main sections with timestamps
- Call-to-action at end
- Scene descriptions in brackets [B-roll: description]
- 3-5 video tags

Format your response exactly like this:
TITLE: [Video Title]
TAGS: tag1, tag2, tag3
---
SCRIPT: [Full script text for voiceover]
SCENES: [Scene1]|[Scene2]|[Scene3]

""" response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content def generate_voiceover(script_text, output_file="voiceover.mp3"): """Generate AI voiceover""" audio = elevenlabs.generate( text=script_text, voice="Rachel", model="eleven_v2" ) elevenlabs.save(audio, output_file) return output_file def generate_video_clip(description, duration=5): """Generate video clip using Runway""" response = requests.post( "https://api.dev.runwayml.com/v1/gen3_turbo/text_to_video", headers={"Authorization": f"Bearer {RUNWAY_API_KEY}"}, json={ "prompt": description, "duration": duration, "aspect_ratio": "16:9" } ) task_id = response.json()["id"] # Poll for completion (simplified) time.sleep(120) # Wait for generation return f"https://storage.runwayml.com/videos/{task_id}.mp4" def generate_thumbnail(topic): """Generate YouTube thumbnail""" response = client.images.generate( model="dall-e-3", prompt=f"YouTube thumbnail for: {topic}. High contrast, bold text space on left, professional design.", size="1792x1024" ) return response.data[0].url def download_file(url, filename): """Download file from URL""" response = requests.get(url) with open(filename, 'wb') as f: f.write(response.content) return filename def get_youtube_credentials(): """Get YouTube API credentials""" flow = InstalledAppFlow.from_client_secrets_file( 'client_secrets.json', SCOPES) return flow.run_local_server(port=8080) def upload_to_youtube(video_path, title, description, tags, thumbnail_path): """Upload video to YouTube""" # Initialize YouTube API (simplified) youtube = get_youtube_credentials() request_body = { "snippet": { "title": title, "description": description, "tags": tags, "categoryId": "28", # Science & Technology }, "status": { "privacyStatus": "private", "selfDeclaredMadeForKids": False, }, } # Upload video (requires googleapiclient library) # This is simplified - full implementation needs googleapiclient print(f"Would upload {video_path} to YouTube") print(f"Title: {title}") return "video_id_placeholder" # ============== MAIN FUNCTION ============== def main(topic): print(f"Creating video about: {topic}") # 1. Generate script raw = generate_script(topic) title = raw.split("TITLE:")[1].split("TAGS:")[0].strip() tags = raw.split("TAGS:")[1].split("---")[0].strip().split(", ") script = raw.split("SCRIPT:")[1].split("SCENES:")[0].strip() scenes = raw.split("SCENES:")[1].strip().split("|") # 2. Generate voiceover print("Generating voiceover...") voiceover_path = generate_voiceover(script) # 3. Generate video clips print("Generating video clips...") video_clips = [] for i, scene in enumerate(scenes[:5]): # Limit to 5 clips print(f"Generating clip {i+1}/{len(scenes)}...") clip_url = generate_video_clip(scene) clip_path = f"clip_{i}.mp4" download_file(clip_url, clip_path) video_clips.append(clip_path) # 4. Generate thumbnail print("Generating thumbnail...") thumb_url = generate_thumbnail(topic) thumb_path = "thumbnail.png" download_file(thumb_url, thumb_path) # 5. Upload to YouTube print("Uploading to YouTube...") video_id = upload_to_youtube( video_path="final_video.mp4", # Would be assembled video title=title, description=f"Today we explore {topic}.\n\n#Shorts", tags=tags, thumbnail_path=thumb_path ) print(f"Video uploaded! ID: {video_id}") if __name__ == "__main__": main("how AI is changing video creation")

Cost Breakdown

Script Generation

$0.05

per video (GPT-4o)

Voiceover

$0.50-1.00

per 10-min video (ElevenLabs)

Video Clips

$0.50-1.00

5 clips at $0.10-0.20 each

Thumbnail

$0.12

DALL-E 3

Total

$1.20-2.20

per video
vs $200-500+ for traditional production

Content Types That Work Best

Educational tutorials Explainer videos Listicles (Top 10…) News summaries Fact/Trivia content Product comparisons

Content to Avoid

AI videos work best for informational content. Avoid: interviews, live events, highly personal content, or anything requiring authentic human presence.

Common Problems and Fixes

YouTube upload fails with auth error

Refresh your OAuth token. Make sure your YouTube channel is verified and you’re using the correct Google account.

Video clips take too long

Use longer clip durations (5-10 seconds instead of 3). Or switch to stock footage for transitions and B-roll.

Voiceover sounds robotic

Choose a higher quality voice model in ElevenLabs. Add pauses and emphasis in your script using … and (emphasis).

Script is too long or short

Specify exact duration in your prompt: “Exactly 8 minutes when read at normal pace.” Calculate word count (150 words = 1 minute).

Thumbnail text is unreadable

Generate thumbnail without text, then add text overlay using Canva or Figma. Keep text to 3-5 words max.

Your Action Checklist

Today

Set up Google Cloud project and enable YouTube API.

Today

Get API keys: OpenAI, ElevenLabs, Runway/Kling.

Today

Create Make account OR set up Python environment.

This Week

Build and test with 1-2 videos manually.

This Week

Refine prompts, adjust video quality.

Next Week

Set up automated daily/weekly uploads.

Frequently Asked Questions

Do AI videos get monetized on YouTube?

Yes, if they meet YouTube Partner Program requirements (1,000 subscribers, 4,000 watch hours). AI-generated content that provides value can rank and earn ad revenue.

How long does one video take?

Fully automated: 2-4 hours from trigger to upload. Most time is waiting for AI video generation. With human oversight: 4-6 hours.

Do I need a real voice?

No. Faceless channels work well with AI voiceovers. However, channels with human presenters tend to build stronger audiences. Consider starting with AI voice, then adding your face later.

What’s the best quality for YouTube?

Upload in 1080p minimum (4K if budget allows). Use H.264 codec. Bitrate: 8-12 Mbps for 1080p, 35-45 Mbps for 4K.

Can I automate comment responses?

Yes, using YouTube API you can fetch comments and generate AI responses. However, use caution—AI responses to negative comments can backfire. Automate positive replies only.

You’re Ready!

Follow the steps above and you’ll have a working YouTube AI agent within a few hours. Start with simple scripts and stock footage, then add AI video generation as you refine your workflow.

Advertisement — In-Content (300×250)

Build an AI Agent to Auto-Create and Publish YouTube Videos

How It Works: The Video Pipeline

Step 1: Set Up YouTube API Access

Step 1: Create Google Cloud Project

Step 2: Enable YouTube Data API

Step 3: Create Credentials

Step 4: Download JSON

Step 2: Get Other API Keys

OpenAI (Script + Images)

ElevenLabs (Voiceover)

Runway or Kling (Video)

Method A: Build with Make (No-Code)

Make.com

Step 1: Create New Scenario

Open Make

Add Trigger

Step 2: Generate Video Script

Add OpenAI Module

Configure API

Add Script Prompt

Step 3: Generate Voiceover

Add ElevenLabs Module

Connect API

Configure Voice

Map Script Text

Step 4: Generate Video Clips

Extract Scene Descriptions

Add HTTP Module

Wait for Generation

Step 5: Assemble Video

Use InVideo or Pictory

Upload Clips + Audio

Add Subtitles + Music

Export Final Video

Step 6: Create Thumbnail

Add OpenAI Image Module

Thumbnail Prompt

Add Text Overlay

Step 7: Upload to YouTube

Add YouTube Module

Connect Account

Map Video Data

Upload Thumbnail

Method B: Build with Python

Cost Breakdown

Script Generation

Voiceover

Video Clips

Thumbnail

Total

Content Types That Work Best

Common Problems and Fixes

Your Action Checklist

Today

Today

Today

This Week

This Week

Next Week

Frequently Asked Questions

What is your reaction?

Related Articles

How to Build a Subscription Business Model: The Complete Guide

How to Scale from $100K to $1M in Revenue: The Growth Stage Playbook

AI in Real Estate: Transforming Physical Properties and Digital Assets

Leave a Reply Cancel Reply

Privacy & Cookies

Wait! Before you go...

Saved Articles