Category: Black-Tech / AI-Agents
Tags: Python, Browser-Use, AI Agent, Automation, DeepSeek
Slug: browser-use-ai-agent-guide
—
The Pain: We Waste Our Lives Filling Forms
Picture this scenario:
You need to log into 5 competitor websites every morning to copy prices into Excel. Or maybe you need to grab a train ticket and have to stare at the screen refreshing constantly. Or perhaps you want to submit 100 job applications, but every site demands you re-type your “Education History” manually.
This isn’t work. This is “Digital Imprisonment”.
In the past, we tried using Selenium or Puppeteer scripts. But the moment a website changed a single `div` ID, the script would break. You ended up spending more time fixing bugs than actually doing the work.
Times have changed. Today I’m introducing a tool called Browser-Use. It’s not a rigid script that clicks coordinates; it’s an AI Agent with “Vision” and a “Brain”.
The Logic: Seeing the Web Like a Human
Browser-Use connects LangChain and Playwright. When you give it a command like “Go to Amazon and send me the price of iPhone 16”:
- It launches a real Chrome browser.
- It “sees” the page via screenshots, identifying where the search box is and where the “Price” label is (even without specific IDs).
- It automatically plans actions: Click -> Type -> Scroll -> Extract.
It’s like an intern sitting at your computer. You give the order; it runs the errands.
🚀 Quick Start (Installation)
We stick to the Hise principle: No fluff. Just run it.
You need: Python 3.11+ and an LLM API Key (DeepSeek or OpenAI recommended).
1. Install Dependencies
# Install core library
pip install browser-use langchain-openai
# Install browser drivers (Mandatory)
playwright install
2. Code Your Clone (agent.py)
Here are the 50 lines of code worth a million bucks:
import os
import asyncio
from langchain_openai import ChatOpenAI
from browser_use import Agent
# Set your API Key (Example using DeepSeek, compatible with OpenAI format)
os.environ["OPENAI_API_KEY"] = "sk-your-key-here"
os.environ["OPENAI_API_BASE"] = "https://api.deepseek.com" # If using DeepSeek
async def main():
# 1. Define the Brain
llm = ChatOpenAI(model="deepseek-chat")
# 2. Define the Task
agent = Agent(
task="Go to Google, search for 'Hise.lol', find the title of the first article and print it.",
llm=llm,
)
# 3. Execute
result = await agent.run()
print(result)
if __name__ == "__main__":
asyncio.run(main())
3. Witness the Magic
python agent.py
You will see a browser pop up automatically, typing, clicking, and closing like a ghost. Finally, your terminal will receive the report.
Advanced: What Can It Do?
Beyond simple searches, it can handle complex workflows:
- E-commerce Monitoring: Log into backends daily, screenshot sales reports, and send them to your Telegram.
- Social Media Ops: Log into X (Twitter), search for specific keywords, like posts, and draft replies.
- Data Cleaning: Scrape data from a messy government public notice site and organize it into clean JSON.
How to Keep It Running 24/7?
Running this on your laptop is cool, but it stops when you close the lid. If you want a true “Passive Income” employee, you need to deploy it to the cloud.
But there’s a catch: Cloud servers (VPS) usually don’t have screens (Headless), which triggers many anti-bot mechanisms.
The Solution: Combined with our previous black-tech guide “ClawSimple: Solving the Python Persistence Problem”, you can easily deploy this Agent on a High-Performance Vultr VPS ($100 Free Credit) using Xvfb virtual screen technology for 24/7 unattended operation.
Conclusion
AI Agents are reshaping how we interact with the internet. In the future, you won’t “visit” the internet; you will “command” it.
Project Source: GitHub – browser-use