YouTube is the second largest search engine in the world. But creating videos takes time—scripting, recording, editing, thumbnail design, and publishing. What if an AI agent could handle most of this for you?
In this guide, I’ll show you exactly how to build an AI agent that generates video scripts, creates AI visuals, produces voiceovers, and publishes directly to your YouTube channel.
- Accepts a topic as input
- Generates a complete video script with scene descriptions
- Creates AI voiceover narration
- Generates video clips for each scene
- Assembles everything into a final video
- Creates a YouTube thumbnail
- Uploads directly to YouTube
How It Works: The Video Pipeline
You can build this using Make (Integromat) for no-code, or Python for more control. I’ll cover both.
Step 1: Set Up YouTube API Access
Step 1: Create Google Cloud Project
Go to console.cloud.google.com and create a new project.
Step 2: Enable YouTube Data API
In the sidebar, go to “APIs & Services” → “Library”. Search for “YouTube Data API v3” and enable it.
Step 3: Create Credentials
Go to “APIs & Services” → “Credentials”. Click “Create Credentials” → “OAuth client ID”. Choose “Desktop app” or “Web application”.
Step 4: Download JSON
Download your OAuth JSON file and save it securely. You’ll need client_id and client_secret.
Step 2: Get Other API Keys
OpenAI (Script + Images)
Get API key – Used for script generation and thumbnail creation
ElevenLabs (Voiceover)
Get API key – Natural-sounding AI voices
Method A: Build with Make (No-Code)
Make.com
Sign up free at make.com. 1000 free operations per month.
Step 1: Create New Scenario
Open Make
Click “Create a new scenario”.
Add Trigger
Search for “Schedule” trigger. Set to run daily at your preferred time (e.g., 9 AM).
Step 2: Generate Video Script
Add OpenAI Module
Click + → Search “OpenAI” → Choose “Create a Completion”.
Configure API
Enter your OpenAI API key.
Add Script Prompt
Use the prompt template below for YouTube-ready scripts.
Copy and paste this into your OpenAI module:
Write a YouTube video script about {{topic}}.
Requirements:
- Duration: 8-10 minutes
- Include hook (first 30 seconds to grab attention)
- 4-5 main sections, each 1-2 minutes
- End with call-to-action (subscribe, like, comment)
- Include timestamps for each section
- Add scene descriptions in brackets like [B-roll: city timelapse]
- End with 2-3 suggested video tags
Format:
Title: [Video title]
Hook: [Opening 30 seconds]
Section 1 [0:30-2:00]: [Content and scene]
Section 2 [2:00-4:00]: [Content and scene]
...
Tags: tag1, tag2, tag3
Step 3: Generate Voiceover
Add ElevenLabs Module
Click + → Search “ElevenLabs” → Choose “Convert Text to Speech”.
Connect API
Enter your ElevenLabs API key.
Configure Voice
Choose a voice ID. Popular options: “Rachel” (friendly female), “Josh” (professional male). Select MP3 output.
Map Script Text
Map the script content from Step 2 to the text field.
Step 4: Generate Video Clips
Extract Scene Descriptions
Use Make’s “Text Parser” to extract scene descriptions from your script.
Add HTTP Module
For each scene, call Runway or Kling API to generate a video clip.
Wait for Generation
Video AI typically takes 2-5 minutes. Set up polling or wait step.
If AI video generation is too slow or expensive, use Pexels API to pull relevant stock footage based on scene keywords.
Step 5: Assemble Video
Use InVideo or Pictory
These tools have APIs or can be integrated with Make.
Upload Clips + Audio
Send your video clips and voiceover to the video editor.
Add Subtitles + Music
Auto-generate captions. Add royalty-free background music.
Export Final Video
Download as MP4, 1080p or 4K.
Step 6: Create Thumbnail
Add OpenAI Image Module
Create Completion with image generation: DALL-E 3.
Thumbnail Prompt
“YouTube thumbnail for: [topic]. High contrast, bold text space on left, professional, attention-grabbing. 16:9 aspect ratio.”
Add Text Overlay
Use Canva or Figma to add your video title text to the thumbnail.
Step 7: Upload to YouTube
Add YouTube Module
Search for “YouTube” → “Upload a Video”.
Connect Account
Authenticate with your Google account that has YouTube access.
Map Video Data
Title: [From script]
Description: [Script summary + links]
Tags: [From script]
Privacy: Public or Schedule
Upload Thumbnail
Attach your generated thumbnail image.
Method B: Build with Python
- Python 3.8+
- pip install requests openai elevenlabs google-auth
- All API keys ready
#!/usr/bin/env python3
"""
AI Agent: Auto-Create and Publish YouTube Videos
"""
import requests
import json
import os
import time
from openai import OpenAI
from elevenlabs import client as elevenlabs_client
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
# ============== CONFIGURATION ==============
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "your-key")
ELEVENLABS_API_KEY = os.environ.get("ELEVENLABS_API_KEY", "your-key")
RUNWAY_API_KEY = os.environ.get("RUNWAY_API_KEY", "your-key")
client = OpenAI(api_key=OPENAI_API_KEY)
elevenlabs = elevenlabs_client(api_key=ELEVENLABS_API_KEY)
# YouTube API scopes
SCOPES = ['https://www.googleapis.com/auth/youtube.upload']
# ============== FUNCTIONS ==============
def generate_script(topic):
"""Generate YouTube video script"""
prompt = f"""Write a YouTube video script about {topic}.
Requirements:
- 8-10 minute video
- Hook (first 30 seconds)
- 4-5 main sections with timestamps
- Call-to-action at end
- Scene descriptions in brackets [B-roll: description]
- 3-5 video tags
Format your response exactly like this:
TITLE: [Video Title]
TAGS: tag1, tag2, tag3
---
SCRIPT: [Full script text for voiceover]
SCENES: [Scene1]|[Scene2]|[Scene3]""" response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content def generate_voiceover(script_text, output_file="voiceover.mp3"): """Generate AI voiceover""" audio = elevenlabs.generate( text=script_text, voice="Rachel", model="eleven_v2" ) elevenlabs.save(audio, output_file) return output_file def generate_video_clip(description, duration=5): """Generate video clip using Runway""" response = requests.post( "https://api.dev.runwayml.com/v1/gen3_turbo/text_to_video", headers={"Authorization": f"Bearer {RUNWAY_API_KEY}"}, json={ "prompt": description, "duration": duration, "aspect_ratio": "16:9" } ) task_id = response.json()["id"] # Poll for completion (simplified) time.sleep(120) # Wait for generation return f"https://storage.runwayml.com/videos/{task_id}.mp4" def generate_thumbnail(topic): """Generate YouTube thumbnail""" response = client.images.generate( model="dall-e-3", prompt=f"YouTube thumbnail for: {topic}. High contrast, bold text space on left, professional design.", size="1792x1024" ) return response.data[0].url def download_file(url, filename): """Download file from URL""" response = requests.get(url) with open(filename, 'wb') as f: f.write(response.content) return filename def get_youtube_credentials(): """Get YouTube API credentials""" flow = InstalledAppFlow.from_client_secrets_file( 'client_secrets.json', SCOPES) return flow.run_local_server(port=8080) def upload_to_youtube(video_path, title, description, tags, thumbnail_path): """Upload video to YouTube""" # Initialize YouTube API (simplified) youtube = get_youtube_credentials() request_body = { "snippet": { "title": title, "description": description, "tags": tags, "categoryId": "28", # Science & Technology }, "status": { "privacyStatus": "private", "selfDeclaredMadeForKids": False, }, } # Upload video (requires googleapiclient library) # This is simplified - full implementation needs googleapiclient print(f"Would upload {video_path} to YouTube") print(f"Title: {title}") return "video_id_placeholder" # ============== MAIN FUNCTION ============== def main(topic): print(f"Creating video about: {topic}") # 1. Generate script raw = generate_script(topic) title = raw.split("TITLE:")[1].split("TAGS:")[0].strip() tags = raw.split("TAGS:")[1].split("---")[0].strip().split(", ") script = raw.split("SCRIPT:")[1].split("SCENES:")[0].strip() scenes = raw.split("SCENES:")[1].strip().split("|") # 2. Generate voiceover print("Generating voiceover...") voiceover_path = generate_voiceover(script) # 3. Generate video clips print("Generating video clips...") video_clips = [] for i, scene in enumerate(scenes[:5]): # Limit to 5 clips print(f"Generating clip {i+1}/{len(scenes)}...") clip_url = generate_video_clip(scene) clip_path = f"clip_{i}.mp4" download_file(clip_url, clip_path) video_clips.append(clip_path) # 4. Generate thumbnail print("Generating thumbnail...") thumb_url = generate_thumbnail(topic) thumb_path = "thumbnail.png" download_file(thumb_url, thumb_path) # 5. Upload to YouTube print("Uploading to YouTube...") video_id = upload_to_youtube( video_path="final_video.mp4", # Would be assembled video title=title, description=f"Today we explore {topic}.\n\n#Shorts", tags=tags, thumbnail_path=thumb_path ) print(f"Video uploaded! ID: {video_id}") if __name__ == "__main__": main("how AI is changing video creation")
Cost Breakdown
Script Generation
per video (GPT-4o)
Voiceover
per 10-min video (ElevenLabs)
Video Clips
5 clips at $0.10-0.20 each
Thumbnail
DALL-E 3
Total
per video
vs $200-500+ for traditional production
Content Types That Work Best
Common Problems and Fixes
YouTube upload fails with auth error
Video clips take too long
Voiceover sounds robotic
Script is too long or short
Thumbnail text is unreadable
Your Action Checklist
Today
Set up Google Cloud project and enable YouTube API.
Today
Get API keys: OpenAI, ElevenLabs, Runway/Kling.
Today
Create Make account OR set up Python environment.
This Week
Build and test with 1-2 videos manually.
This Week
Refine prompts, adjust video quality.
Next Week
Set up automated daily/weekly uploads.

