Tutorial: Using the Gemini API in AI Projects

Q: Gemini API چیست؟

Gemini API یک رابط برنامهنویسی کاربردی است که به توسعهدهندگان اجازه میدهد تا به مدلهای هوش مصنوعی پیشرفته Gemini از گوگل دسترسی پیدا کرده و از قابلیتهای آن در برنامههای خود استفاده کنند.

Q: چه زبانهای برنامهنویسی توسط Gemini API پشتیبانی میشوند؟

Gemini API از زبانهای برنامهنویسی مختلفی مانند پایتون، جاوا، نود.جیاس، و گو پشتیبانی میکند. همچنین، مستندات کاملی برای هر زبان ارائه شده است.

Q: آیا استفاده از Gemini API رایگان است؟

گوگل یک لایه رایگان (Free Tier) برای Gemini API ارائه میدهد که به شما اجازه میدهد تا با محدودیتهایی از آن استفاده کنید. برای استفادههای بیشتر و پروژههای بزرگتر، برنامههای قیمتگذاری مختلفی وجود دارد.

Q: تفاوت Gemini API با GPT-4 API چیست؟

هر دو Gemini API و GPT-4 API رابطهای برنامهنویسی برای دسترسی به مدلهای زبانی بزرگ هستند. تفاوتهای اصلی آنها در معماری مدل، قابلیتها، و سیاستهای قیمتگذاری است که توسط شرکتهای توسعهدهنده (گوگل و OpenAI) تعیین میشود.

Q: چگونه میتوانم خروجی Gemini API را بهینه کنم؟

برای بهینهسازی خروجی Gemini API، میتوانید از تکنیکهایی مانند مهندسی پرامپت (Prompt Engineering) پیشرفته، تنظیم پارامترهای مدل (مانند دما و حداکثر توکن)، و استفاده از دستورالعملهای واضح و دقیق استفاده کنید.

Q: آیا Gemini API از پردازش تصویر و ویدئو پشتیبانی میکند؟

بله، مدلهای پیشرفته Gemini قابلیتهای چندوجهی (multimodal) دارند و میتوانند ورودیهای تصویری و ویدئویی را نیز پردازش کرده و بر اساس آنها پاسخهای متنی تولید کنند.

7/4/2026 · 33 min read

گندم کریمی

متخصص هوش مصنوعی و تولید محتوا در Axeto. روی Prompt Engineering، Flux، ComfyUI و workflowهای تصویر/ویدیو AI تمرکز دارد.

خلاصه

این مقاله یک آموزش جامع برای شروع کار با Gemini API، جدیدترین مدل هوش مصنوعی گوگل، ارائه می‌دهد. ما به بررسی قابلیت‌های چندوجهی Gemini می‌پردازیم و شما را با نحوه دریافت کلید API، نصب کتابخانه‌های مورد نیاز و ارسال اولین درخواست به مدل آشنا می‌کنیم. هدف این آموزش، توانمندسازی توسعه‌دهندگان و خالقان محتوا برای استفاده از Gemini در تولید محتوا، کدنویسی و تحلیل داده‌ها است. این راهنما برای تمام سطوح، از مبتدی تا توسعه‌دهندگان باتجربه، مناسب است.

نکات کلیدی

Gemini یک مدل هوش مصنوعی چندوجهی از گوگل است که قابلیت درک و تولید متن، تصویر، ویدئو و کد را دارد.
برای استفاده از Gemini API، به یک حساب Google فعال و فعال‌سازی Google Cloud Console نیاز دارید.
کلید API Gemini را باید از Google AI Studio دریافت کرده و آن را همیشه محرمانه نگه دارید.
نصب Python (نسخه 3.8 به بالا) و کتابخانه `google-generativeai` برای کار با API ضروری است.
با استفاده از Gemini API می‌توانید برنامه‌های کاربردی هوشمند و خلاقانه‌ای در زمینه‌های مختلف توسعه دهید.

همین حالا در Axeto امتحان کنید

مقاله را خواندید — حالا با ابزار واقعی Axeto خروجی بگیرید.

ساخت تصویر ساخت ویدیو کتابخانه پرامپت مدل‌ها تعرفه راهنمای Gemini

فضای کاری دیجیتال آینده‌نگرانه با نمایش کد، تصاویر و متن تولید شده توسط هوش مصنوعی و لوگوی Gemini، نمادی از خلاقیت و نوآوری در هوش مصنوعی.

In the fast-paced world of artificial intelligence, large language models (LLMs) have become powerful tools, offering unparalleled capabilities for developers and content creators. Gemini, Google's latest and most advanced AI model, is opening new horizons with its multifaceted capabilities. This model can not only understand and generate text but also process images, videos, and even code. In this comprehensive tutorial, we will guide you step-by-step on how to use the Gemini API to harness its power in your projects.

The Gemini API allows developers to access the advanced capabilities of this model and build intelligent and creative applications. From generating text content and automating coding to analyzing images and answering complex questions, Gemini can be used in a wide range of applications. Our goal at Axeto is to help you transform your content creation and image generation processes by leveraging these innovative technologies.

In this article, we will first provide a brief introduction to Gemini and its capabilities. Then, we will cover the prerequisites for getting started with the Gemini API. Next, we will guide you step-by-step on how to obtain an API key, install the necessary libraries, and send your first request to the model. We will also explore practical code examples, error handling, and best practices for optimal use of Gemini. Finally, we will look at how Axeto integrates with Gemini and highlight its potential for Iranian users of Axeto.

This tutorial is suitable for all levels, from beginners new to AI to experienced developers looking to integrate Gemini's advanced capabilities into their projects. Join us as we open a new window to the future of artificial intelligence.

Prerequisites

Before we can start working with the Gemini API, we need to fulfill a few basic prerequisites. These prerequisites ensure that you have a suitable development environment and can easily interact with the API.

1. Google Account and Google Cloud Activation:

* To use the Gemini API, you need an active Google account. If you don't have one, you can easily create a free account.

* After creating an account, you need to activate the Google Cloud Console. This will allow you to access various Google Cloud services, including AI APIs. Activation typically involves accepting terms and conditions and, in some cases, providing payment information for paid services, although using Gemini within the free tier is also possible.

2. Obtain an API Key:

* An API key is a unique string that allows your applications to communicate with the Gemini API. This key authenticates your identity and grants access to the services.

* To obtain an API key, you need to visit Google AI Studio. After logging in with your Google account, you can create a new API key. Important Note: Always keep your API key confidential and never expose it in public source code or Git repositories. This key is like your password, and unauthorized access can lead to misuse.

3. Install Python and Pip:

* Most Gemini API code examples are provided in Python. Therefore, you need to install Python (version 3.8 or higher) on your system.

* Pip (Package Installer for Python) is a tool that allows you to easily install and manage Python libraries. Pip is usually installed with Python, but you can ensure it's installed by running the command pip --version in your command line.

4. Install the Google Generative AI Library for Python:

* Google provides an official Python library for interacting with the Gemini API, which simplifies the process of sending requests and receiving responses.

* To install this library, run the following command in your command line (Command Prompt or Terminal):

```bash

pip install google-generativeai

```

5. Development Environment (IDE) or Code Editor:

* An Integrated Development Environment (IDE) like VS Code, PyCharm, or even a simple code editor like Sublime Text or Notepad++ will be useful for writing and running your Python code. These tools help you organize your code and debug it.

With these prerequisites in place, you are ready to send your first request to the Gemini API and leverage the power of artificial intelligence in your projects. This initial preparation is an important step in the journey of learning AI and generating intelligent content.

Step 1: Obtain API Key and Set Up Environment

The first and most crucial step to start working with the Gemini API is to obtain an API key and correctly set up your development environment. This key authenticates your identity to Google services and provides access to AI models.

1.1. Obtain API Key from Google AI Studio

Follow these steps to obtain your API key:

1. Log in to Google AI Studio: Open your web browser and go to https://aistudio.google.com/app/apikey.

2. Log in with Google Account: If you are not already logged into your Google account, log in with your Google credentials.

3. Create API Key: On the Google AI Studio page, you will see a button like "Create API key in new project" or "Get API key." Click on it.

* Google will automatically create a new project for you in Google Cloud and generate a unique API key for you, which will be displayed.

4. Copy API Key: Copy the generated API key. Keep this key in a secure location. Never place it in public source code, Git repositories (like GitHub), or anywhere else it might be exposed. This key is linked to your account, and unauthorized access can lead to unexpected charges or misuse.

1.2. Set API Key as an Environment Variable

For enhanced security and easier management of your API key, it is recommended to set it as an environment variable rather than embedding it directly in your code. This approach allows you to change the API key without modifying your code and prevents it from being exposed.

On Different Operating Systems:

Linux/macOS:

Open your Terminal and run the following command. Replace YOUR_API_KEY with your actual API key:

```bash

export GOOGLE_API_KEY="YOUR_API_KEY"

```

Note: This command sets the key only for the current terminal session. To make the key permanent, you need to add it to your .bashrc, .zshrc, or .profile file. For example:

```bash

echo 'export GOOGLE_API_KEY="YOUR_API_KEY"' >> ~/.bashrc

source ~/.bashrc

```

Windows:

1. Press Windows + R, type sysdm.cpl, and press Enter.

2. Go to the Advanced tab and click on Environment Variables.

3. Under User variables or System variables, click New.

4. For Variable name, enter GOOGLE_API_KEY.

5. For Variable value, enter your API key.

6. Click OK to save the changes.

7. You may need to restart your Command Prompt or VS Code for the changes to take effect.

1.3. Install the Google Generative AI Library

After setting up your API key, it's time to install the Python library that allows us to interact with the Gemini API.

1. Open Command Line: Open a Command Prompt (on Windows) or Terminal (on Linux/macOS).

2. Install the Library: Run the following command:

```bash

pip install google-generativeai

```

This command installs the latest version of the google-generativeai library. If you already have it installed, you can upgrade it with --upgrade:

```bash

pip install --upgrade google-generativeai

```

1.4. Verify Installation and Setup

To ensure everything is installed and configured correctly, you can create a small Python script:

import os
import google.generativeai as genai

# Check if the API key is loaded from environment variables
api_key = os.getenv("GOOGLE_API_KEY")
if api_key:
    print("GOOGLE_API_KEY loaded successfully.")
    # Configure the API using the key
    genai.configure(api_key=api_key)
    print("Gemini API configuration successful.")
else:
    print("Error: GOOGLE_API_KEY not found in environment variables.")
    print("Please set your API key as an environment variable.")

# Attempt to list models to verify connection
try:
    for model in genai.list_models():
        print(model.name)
    print("Successfully connected to Gemini API.")
except Exception as e:
    print(f"Error connecting to Gemini API: {e}")
    print("Please check your internet connection and the validity of your API key.")

This code first checks if the GOOGLE_API_KEY environment variable is set. Then, it attempts to configure the Gemini API using this key and retrieve a list of available models. If everything works correctly, you will see a success message for the configuration and the list of models.

Once these steps are completed, you have a development environment ready to start working with the Gemini API and can proceed to the next steps of interacting with the model. This will be the foundation for all your future work with Gemini, allowing you to easily send your prompts and receive AI responses.

Step 2: Send Your First Text Request to Gemini

After setting up your API key and installing the google-generativeai library, you are ready to send your first request to the Gemini API. In this step, we will explore how to interact with the model to generate text.

2.1. Configure the Model

Before sending a request, you need to configure the Gemini model. This involves specifying the API key and selecting the model you want to work with.

import os
import google.generativeai as genai

# Ensure GOOGLE_API_KEY is set as an environment variable
# or place it directly here (not recommended for production environments)
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Select the Gemini model
# The 'gemini-pro' model is optimized for text requests.
# For multimodal capabilities (text and image), you can use 'gemini-pro-vision'.
model = genai.GenerativeModel('gemini-pro')

Here, os.getenv("GOOGLE_API_KEY") loads the API key from the environment variable you set in the previous step. Then, we select the gemini-pro model for text generation.

2.2. Send a Simple Request (Text-to-Text)

The simplest way to use Gemini is to send a text prompt and receive a text response.

# Send a text request
prompt = "Write a short poem about autumn in Persian."
response = model.generate_content(prompt)

# Print the response
print(response.text)

Code Explanation:

prompt: This variable holds your instruction or question for the Gemini model. The clearer and more specific your prompt, the better the response you will receive.
model.generate_content(prompt): This method sends your request to the Gemini API and returns a response object.
response.text: This attribute contains the text generated by the model.

Sample Output (may vary):

The rustling of yellow leaves,
The dance of the wind in the alleys.
The clouds weep upon the earth,
Beautiful autumn, the season of dreams.

2.3. Use Advanced Parameters

Gemini allows you to control the text generation behavior using various parameters. Some of the most important parameters include:

temperature: This parameter controls the creativity and randomness of the response. Higher values (e.g., 0.8 to 1.0) lead to more creative and diverse responses, while lower values (e.g., 0.2 to 0.5) produce more conservative and fact-focused responses. The default value is typically 0.7.
top_p: This parameter tells the model to only consider words whose cumulative probability reaches top_p. This helps control the diversity of the response.
top_k: This parameter limits the number of highest-probability words the model can choose from.
max_output_tokens: The maximum number of tokens the model should generate in its response. This is useful for controlling the length of the response.

Let's look at an example using these parameters:

# Send a request with advanced parameters
prompt = "Write a short science fiction story about traveling to a new planet."
generation_config = {
    "temperature": 0.9,      # More creativity
    "top_p": 1.0,
    "top_k": 32,
    "max_output_tokens": 200 # Maximum 200 tokens
}

response = model.generate_content(
    prompt,
    generation_config=generation_config
)

print(response.text)

Explanation:

In this example, we increased the temperature to 0.9 for a more creative story and set max_output_tokens to limit the story'. By experimenting with these parameters, you can fine-tune the model's output to best suit your needs.

2.4. Handle Multiple Responses and Safety

Sometimes, the model might generate multiple candidate responses. Also, Gemini has built-in safety filters that detect inappropriate content.

# Check for multiple responses and safety filters
prompt = "Write a short horror story."
response = model.generate_content(prompt)

if response.prompt_feedback.block_reason:
    print(f"Response blocked due to: {response.prompt_feedback.block_reason}")
else:
    for candidate in response.candidates:
        print(f"Candidate: {candidate.content.parts[0].text}")
        if candidate.finish_reason:
            print(f"Finish Reason: {candidate.finish_reason}")
        if candidate.safety_ratings:
            print("Safety Ratings:")
            for rating in candidate.safety_ratings:
                print(f"  {rating.category.name}: {rating.probability.name}")

Explanation:

response.prompt_feedback.block_reason: If your prompt is blocked due to Google's safety policies, this section will contain the reason.
response.candidates: This is a list of generated responses. The model might produce several responses that differ in quality and safety.
candidate.content.parts[0].text: The text content of each candidate.
candidate.finish_reason: The reason for stopping response generation (e.g., STOP for natural completion, or MAX_TOKENS if the maximum token limit is reached).
candidate.safety_ratings: Safety ratings for each candidate across different categories (e.g., HARM_CATEGORY_SEXUALLY_EXPLICIT, HARM_CATEGORY_HATE_SPEECH, etc.).

With these steps, you will be able to send simple and advanced text requests to the Gemini API and manage its responses. This is the foundation for using Gemini to generate text content, create new ideas, and many other applications.

Step 3: Working with Images and Multimodal Content

One of the most powerful features of Gemini is its multimodal capability, which allows the model to interact not only with text but also with images and even videos. This step shows you how to provide images as input to the Gemini API and receive relevant text responses.

3.1. The `gemini-pro-vision` Model

To work with images, you need to use the gemini-pro-vision model. This model is specifically designed to understand and process visual inputs.

import os
import google.generativeai as genai
from PIL import Image # For working with images in Python

# Ensure GOOGLE_API_KEY is set as an environment variable
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Select the 'gemini-pro-vision' model for multimodal inputs
vision_model = genai.GenerativeModel('gemini-pro-vision')

3.2. Sending an Image and Text Prompt

To send an image, you first need to load it. The PIL (Pillow) library is an excellent tool for this. Assume you have an image named example_image.jpg in the same directory as your script. You can also generate this image via Axeto.

# Assume you have an image file named 'example_image.jpg'
# You can generate this image using Axeto.
# For example, an image of "a cat sleeping on a laptop keyboard"
image_path = "example_image.jpg"

try:
    img = Image.open(image_path)
    print(f"Image '{image_path}' loaded successfully.")
except FileNotFoundError:
    print(f"Error: Image file '{image_path}' not found.")
    print("Please ensure the image is in the correct path or correct the path.")
    exit()

# Text prompt along with the image
prompt = "What does this image show?"

# Send the image and prompt to the model
response = vision_model.generate_content([prompt, img])

# Print the response
print(response.text)

Code Explanation:

Image.open(image_path): Loads the image from the specified path.
vision_model.generate_content([prompt, img]): Here, we provide a list as input to generate_content, containing the text prompt and the loaded image object. This tells the model to process both types of input.

Sample Output (based on the cat on keyboard image):

This image shows a gray cat sleeping on an open laptop. The laptop is on a wooden desk. Light streams in from a window in the background.

3.3. Asking More Complex Questions with Images

You can ask more complex questions about the image or ask the model to perform more creative tasks.

# More complex prompt with the image
prompt = "What breed does this cat appear to be? How does it seem to feel?"

response = vision_model.generate_content([prompt, img])
print(response.text)

Sample Output:

Based on the image, the cat appears to be a domestic shorthair with gray fur, possibly a common mixed breed.
The cat seems very relaxed and comfortable. Its eyes are closed, and its body is resting loosely on the laptop keyboard, indicating deep sleep or complete rest.

3.4. Structuring Prompts for Multimodal Inputs

For multimodal prompts, you can provide a combination of text and images in a logical order.

# Example with multiple text and image parts
prompt_parts = [
    "What is this image?",
    img, # First image
    "Please provide a detailed description of its contents and then suggest a creative title for it."
]

response = vision_model.generate_content(prompt_parts)
print(response.text)

Important Notes for Working with Images:

Supported Formats: The Gemini API supports common image formats like JPEG, PNG, and WebP.
File Size: For better performance and faster responses, try to use reasonably sized image files. Very large images may increase processing time.
Image Quality: The higher the image quality, the more details the model can recognize and the more accurate its responses will be.
Limitations: Currently, the Gemini API does not directly support video input, but you can process keyframes of a video as individual images.

By mastering this step, you will be able to leverage the multimodal capabilities of Gemini and build intelligent applications that understand not only text but also images. This is a significant step towards advanced content creation and intelligent image generation that can elevate your prompts to a new level.

Axeto Analysis

At Axeto, we are constantly seeking to provide the best and most advanced AI tools for Iranian content creators and developers. Gemini API, with its multimodal capabilities and unparalleled power, offers new opportunities for Axeto users. In this section, we will explore these opportunities and how to integrate and leverage Gemini within the Axeto ecosystem.

Axeto and Gemini API Opportunities for Iranian Users

1. Advanced Persian Text Content Generation:

* High Quality: Gemini can generate high-quality and fluent Persian texts. This allows Axeto users to create articles, stories, slogans, and product descriptions with greater accuracy and creativity.

* Variety in Style and Tone: Using appropriate prompts, Gemini can be asked to generate texts in different styles and tones (formal, friendly, humorous, scientific), providing great flexibility in content creation.

* Summarization and Rewriting: Gemini can summarize long texts or rewrite them with new words and structures, which is very useful for SEO and avoiding duplicate content.

2. Intelligent Image Creation and Description:

* Accurate Image Descriptions: With the multimodal capabilities of gemini-pro-vision, Axeto users can provide images generated by Axeto's image generation tools to Gemini and ask it to generate detailed descriptions, engaging captions, or even short stories related to the image.

* Prompt Generation from Images: Gemini can help you create better prompts for image generation. By giving an image to Gemini and asking "How can I recreate this image with a text prompt?", you can get new ideas for your prompts.

* Hybrid Content Creation: Combining Axeto's image generation with Gemini's text description capabilities enables the creation of rich and engaging content (like blog posts with relevant images and intelligent descriptions).

3. Improving User Experience in Axeto Tools:

* Intelligent Responses: Gemini API can be used to create intelligent chatbots within Axeto that answer user questions about how to use the tools, pricing, and models.

* Prompt Suggestions: Gemini can assist users in creating creative and effective prompts for text and image generation.

* Translation and Localization: Gemini's language capabilities can be useful in translating content into different languages and localizing it for specific audiences.

Comparison with Other Models (for Axeto)

Feature/Model	Gemini Pro	GPT-4	Llama 2	Axeto (Leveraging Various Models)
Multimodal Capability	Excellent (Text, Image, Code)	Good (Text, Image)	Limited (Primarily Text)	Excellent (by integrating Gemini and other models)
Persian Quality	Very Good	Good to Very Good	Medium	Very Good (with optimization and specific models)
Response Speed	High	Medium	High (Smaller models)	Optimized (depending on selected model)
Cost (API)	Competitive	Higher	Free/Variable	Optimized for Iranian users
Accessibility	API & AI Studio	API & ChatGPT	Open Source	Via Axeto's unified platform
Axeto Applications	Smart Content Generation, Image Description, Ideation	General Content Generation, Programming	Research, Specific Models	All of the above with an easy-to-use interface

Axeto's Vision for the Future

At Axeto, we are continuously evaluating and integrating the latest AI models to provide the best experience for our users. Integrating the Gemini API allows us to:

Offer More Powerful Tools: With Gemini's multimodal capabilities, we can build tools that understand and generate not only text but also images.
Enhance Persian Content Quality: Gemini performs particularly well in understanding and generating various languages, including Persian, which benefits Iranian users of Axeto.
Encourage Innovation: By providing access to such advanced tools, Axeto users can turn more creative ideas into reality.

We encourage you to experiment with the Gemini API and discover its potential in your projects. Axeto is here to make this journey smoother for you and provide tools that help you create unique content.

Axeto Test

In this section, we will practically evaluate the performance of the Gemini API, focusing on Persian content generation. We have tested several different scenarios using Persian prompts to understand the model's capabilities and provide Axeto users with a practical perspective. These tests were conducted using the gemini-pro model for text and gemini-pro-vision for multimodal content.

Persian Prompt Test Scenarios

We tested various prompts across different categories (text generation, summarization, ideation, and image analysis).

#### 1. Persian Poem Generation

Prompt: "Write a modern poem about waiting and loneliness."
Objective: To assess the model's ability to generate emotional and modern poetry.
Axeto Result:

```

In the alleys of loneliness,

Me and waiting, neighbors.

Nights are long, days listless,

In the embrace of solitude.

The sound of your footsteps,

The only dream of wakefulness.

```

Analysis: The model successfully generated a modern poem with the requested themes. Rhyme and meter are less emphasized in modern poetry, which the model also reflected. The emotions were well conveyed.

#### 2. Persian Short Story Generation

Prompt: "Write a short fantasy story about a city whose color changes every day."
Objective: To evaluate the model'.s creativity and ability to structure a narrative.
Axeto Result:

```

In the heart of a distant galaxy, there was a city called Colorville. Every sunrise, with a breeze from space, the color of the city walls, the people's clothes, and even the sky would change. The inhabitants of Colorville had grown accustomed to these changes and aligned their lives with the day's color. But a girl named Leila always searched for a constant color, a color in which she could record her memories. She believed that without color stability, memories would also fade. Leila spent her days searching for the source of this color change and finally reached a spring in the city center from which colorful lights emanated. By touching the spring water, she was able to visualize past colors in her mind for the first time and understood that true stability lies not in colors, but in the heart of memories.

```

Analysis: The story has a logical beginning, middle, and end, and effectively develops a creative idea. The model was able to describe details well.

#### 3. Summarizing Long Text (News)

Prompt: "Summarize the following text in a maximum of 50 words: [Long news text about climate change and a global summit]"
Objective: To assess the model's ability to extract key points and summarize effectively.
Axeto Result:

```

The global climate summit concluded with the participation of world leaders. Agreements were reached to reduce greenhouse gas emissions and invest in renewable energy. Experts emphasized the need for immediate action to prevent global warming and its catastrophic consequences. New commitments for a sustainable future are promising.

```

Analysis: The model successfully extracted the main points of the news text within the specified word limit, preserving important information.

#### 4. Brainstorming Article Titles

Prompt: "Suggest several catchy and SEO-friendly titles for a blog post about 'The Impact of AI on the Future of Jobs'."
Objective: To assess the model's ability to generate creative and SEO-relevant ideas.
Axeto Result:

```

1. AI and the Job Revolution: Are You Ready?

2. The Future of Jobs in the Age of AI: Opportunities and Challenges

3. Your Job in 2030: How AI is Changing the World

4. From Fear to Opportunity: A Survival Guide for the AI Job Market

5. AI: Your New Colleague or Your Replacement?

```

Analysis: The model provided diverse titles that are both engaging and include relevant SEO keywords (like "AI," "future of jobs").

#### 5. Image Analysis (with gemini-pro-vision)

Prompt: "What story does this image (a picture of a messy desk with books, a laptop, and a coffee cup) tell?"
Objective: To assess the gemini-pro-vision model's ability to understand visual content and generate a narrative.
Axeto Result:

```

This image tells the story of a hardworking individual or student engrossed in studying or working on an important project. The presence of multiple books suggests a passion for knowledge or research, while the open laptop and half-finished coffee cup indicate long hours of work and focus. Perhaps the person is preparing for a crucial exam or completing a deadline project. The apparent messiness of the desk signifies being lost in work and disregarding the surroundings, but it could also represent a creative mind full of ideas.

```

Analysis: The model successfully recognized the image details and created a logical and meaningful narrative based on them. This demonstrates the model's power in understanding context and inferring from visual content.

Axeto Test Results Summary Table

Scenario	Persian Prompt	Model Used	Persian Response Quality	Creativity	Accuracy
Poem Generation	Modern poem about waiting	`gemini-pro`	Excellent	High	High
Story Generation	Color-changing city story	`gemini-pro`	Excellent	High	High
Summarization	News summary	`gemini-pro`	Very Good	Medium	High
Ideation	Article titles	`gemini-pro`	Very Good	High	High
Image Analysis	Messy desk story	`gemini-pro-vision`	Excellent	High	High

Conclusion of Axeto Tests

Our test results indicate that the Gemini API performs exceptionally well in understanding and generating Persian. The model can produce creative and high-quality texts, summarize texts effectively, and even extract meaningful stories and descriptions from images. These capabilities make Gemini an invaluable tool for Axeto's content creators and developers in Iran.

At Axeto, we continuously evaluate these models to ensure we provide you with the best tools for content creation, image generation, and AI models. Using the Gemini API can help you elevate your prompts to a new level and produce unique content.

Sample Code

In this section, we provide several practical code samples for interacting with the Gemini API. These examples include text generation, conversational chat, and working with images.

1. Simple Text Generation

This example demonstrates how to generate text with a simple prompt.

import os
import google.generativeai as genai

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel('gemini-pro')

def generate_simple_text(prompt_text):
    """
    Generates simple text based on the given prompt.
    """
    try:
        response = model.generate_content(prompt_text)
        return response.text
    except Exception as e:
        return f"Error generating text: {e}"

if __name__ == "__main__":
    prompt = "Write a short paragraph about the importance of continuous learning in the digital age."
    result = generate_simple_text(prompt)
    print("--- Simple Text Generation ---")
    print(result)
    print("-" * 30)

2. Conversational Chat (Chat Conversation)

The Gemini API also supports conversational mode (Chat), which allows the model to remember the context of previous conversation and provide more relevant responses.

import os
import google.generativeai as genai

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel('gemini-pro')

def start_chat_session():
    """
    Starts a new chat session.
    """
    chat = model.start_chat(history=[])
    print("--- Chat with Gemini started (type 'exit' to quit) ---")
    while True:
        user_input = input("You: ")
        if user_input.lower() == 'exit':
            break
        try:
            response = chat.send_message(user_input)
            print(f"Gemini: {response.text}")
        except Exception as e:
            print(f"Error sending message: {e}")
    print("--- Chat with Gemini ended ---")

if __name__ == "__main__":
    start_chat_session()
    print("-" * 30)

3. Image Analysis with Text Prompt

This example shows how to send an image along with a text prompt to the gemini-pro-vision model.

import os
import google.generativeai as genai
from PIL import Image # Requires installation: pip install Pillow

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

vision_model = genai.GenerativeModel('gemini-pro-vision')

def analyze_image(image_path, prompt_text):
    """
    Analyzes an image with a given text prompt.
    """
    try:
        img = Image.open(image_path)
        response = vision_model.generate_content([prompt_text, img])
        return response.text
    except FileNotFoundError:
        return f"Error: Image file '{image_path}' not found."
    except Exception as e:
        return f"Error analyzing image: {e}"

if __name__ == "__main__":
    # Ensure an image file named 'landscape.jpg' exists next to the script.
    # You can generate this image via Axeto (e.g., a mountain landscape): /generate/image
    image_file = "landscape.jpg"
    image_prompt = "What are the prominent features of this landscape and what feeling does it convey?"
    
    if not os.path.exists(image_file):
        print(f"**Note:** File '{image_file}' not found. Please create an image with this name or correct the path.")
        print("You can generate a landscape image here: [Generate Image](/generate/image)")
    else:
        result = analyze_image(image_file, image_prompt)
        print("--- Image Analysis ---")
        print(result)
        print("-" * 30)

4. Multimodal Content Generation (Text and Image)

This example demonstrates how to combine multiple text and image parts in a single prompt.

import os
import google.generativeai as genai
from PIL import Image # Requires installation: pip install Pillow

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

vision_model = genai.GenerativeModel('gemini-pro-vision')

def generate_multimodal_content(image_path1, image_path2, text_prompt):
    """
    Generates multimodal content by combining two images and a text prompt.
    """
    try:
        img1 = Image.open(image_path1)
        img2 = Image.open(image_path2)
        
        prompt_parts = [
            "Considering these two images:",
            img1,
            "and",
            img2,
            text_prompt
        ]
        
        response = vision_model.generate_content(prompt_parts)
        return response.text
    except FileNotFoundError as e:
        return f"Error: One or more image files not found. {e}"
    except Exception as e:
        return f"Error generating multimodal content: {e}"

if __name__ == "__main__":
    # Assume you have two image files named 'cat.jpg' and 'dog.jpg'.
    # You can generate these images via Axeto: /generate/image
    image_cat = "cat.jpg"
    image_dog = "dog.jpg"
    multimodal_prompt = "Please compare these two animals and state their main differences."

    if not (os.path.exists(image_cat) and os.path.exists(image_dog)):
        print(f"**Note:** Files '{image_cat}' or '{image_dog}' not found.")
        print("Please place two images (cat and dog) with these names next to the script.")
        print("You can generate animal images here: [Generate Image](/generate/image)")
    else:
        result = generate_multimodal_content(image_cat, image_dog, multimodal_prompt)
        print("--- Multimodal Content Generation ---")
        print(result)
        print("-" * 30)

5. Setting Generation Parameters

This example shows how to set parameters like temperature and max_output_tokens to control the model's output.

import os
import google.generativeai as genai

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel('gemini-pro')

def generate_creative_text(prompt_text, creativity_level=0.9, max_tokens=150):
    """
    Generates creative text with parameter adjustments.
    """
    generation_config = {
        "temperature": creativity_level,
        "top_p": 1.0,
        "top_k": 32,
        "max_output_tokens": max_tokens
    }
    try:
        response = model.generate_content(
            prompt_text,
            generation_config=generation_config
        )
        return response.text
    except Exception as e:
        return f"Error generating creative text: {e}"

if __name__ == "__main__":
    creative_prompt = "Write a fantastical description of a futuristic city floating in the clouds."
    result_creative = generate_creative_text(creative_prompt, creativity_level=0.95, max_tokens=250)
    print("--- Creative Text Generation with Parameters ---")
    print(result_creative)
    print("-" * 30)

These sample codes will help you get acquainted with the various aspects of the Gemini API. By copying, running, and modifying these codes, you can quickly start building your own AI applications. Remember that to use image capabilities, you need to install the Pillow library (pip install Pillow). You can also use the Axeto image generation tool to produce sample images.

Common Errors

When working with any API, especially AI APIs, you may encounter errors. Understanding these errors and how to resolve them is essential for smooth development. Here, we address some common errors when working with the Gemini API and their solutions.

1. `Authentication Error` or `Invalid API Key`

This common error usually occurs due to a problem with your authentication with the API.

Error Message (Sample):

```

google.api_core.exceptions.PermissionDenied: 403 API key not valid. Please pass a valid API key.

```

Cause:

* The API key entered is incorrect or has expired.

* The API key is not correctly set as the GOOGLE_API_KEY environment variable.

* The Gemini API has not been enabled for your project in Google Cloud.

* You may have exceeded usage limits (quota) and need to enable billing.

Solution:

1. Check API Key: Ensure you have copied the API key correctly from Google AI Studio.

2. Check Environment Variable: Make sure GOOGLE_API_KEY is set correctly. Restart your terminal or IDE after setting it.

3. Enable API: Go to Google Cloud Console and ensure the "Generative Language API" is enabled for your project.

4. Check Limits: If you are using it heavily, you might have reached the free tier limits. In this case, you need to enable billing in Google Cloud.

2. `Resource Exhausted` or `Quota Exceeded` Error

This error occurs when you exceed the usage limits (quota) for the Gemini API.

Error Message (Sample):

```

google.api_core.exceptions.ResourceExhausted: 429 quota exceeded for project 'your-project-id'. Please check your project's quota.

```

Cause:

* You have sent too many requests to the API in a short period.

* You have reached the maximum allowed tokens per minute or per day.

Solution:

1. Reduce Request Rate: Introduce a delay between your requests (e.g., using time.sleep()).

2. Check Quota in Google Cloud: Go to Google Cloud Console and check the Generative Language API limits. If necessary, you can request a quota increase.

3. Enable Billing: Enabling billing usually provides higher limits.

3. `Bad Request` or `Invalid Argument` Error

This error usually indicates a problem with the structure of your request or its content.

Error Message (Sample):

```

google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.

```

Cause:

* Your prompt is too long and exceeds the maximum allowed input tokens.

* The input format is incorrect (e.g., trying to send an image to gemini-pro or text to gemini-pro-vision without the correct structure).

* The generation_config parameters are out of the allowed range (e.g., a negative temperature).

Solution:

1. Check Prompt Length: Ensure your prompt does not exceed the model's maximum allowed input tokens. You can check the token count using model.count_tokens(prompt).

2. Check Model: Use the correct model for your input type (gemini-pro for text, gemini-pro-vision for text and image).

3. Check Parameters: Review the values for temperature, top_p, top_k, max_output_tokens and ensure they are within the valid range.

4. `Safety Concerns` or `Content Blocked` Error

Gemini has built-in safety filters that block inappropriate content.

Error Message (Sample):

```

google.api_core.exceptions.BlockedBySafety: 400 The prompt was blocked due to safety concerns.

```

Cause:

* The prompt or the response generated by the model violates one of Google's safety policies (e.g., sexually explicit, violent, hate speech, dangerous content).

Solution:

1. Review Prompt: Review your prompt and ensure it does not contain any inappropriate content.

2. Modify Prompt: Try modifying the prompt to guide the model towards generating safer content.

3. Check response.prompt_feedback: This object can provide more detailed information about the reason for blocking.

5. `Internal Server Error` or `Unavailable` Error

These errors usually indicate a problem on Google's server side.

Error Message (Sample):

```

google.api_core.exceptions.InternalServerError: 500 An internal error occurred.

google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.

```

Cause:

* Temporary issue with the Gemini API servers.

* High network traffic.

Solution:

1. Retry: These errors are usually temporary. Try again after a few seconds or minutes.

2. Check Service Status: You can visit the Google Cloud Status Dashboard to check the status of Google services.

3. Implement Retry Mechanism: Implement a retry mechanism with delays (exponential backoff) in your code so that your application automatically retries in case of temporary errors.

By understanding these common errors and their solutions, you can improve your development process with the Gemini API and quickly identify and resolve issues. This will help you use AI models more effectively in your Axeto projects.

Conclusion

In this comprehensive tutorial, we introduced you to the powerful world of the Gemini API. From the initial steps of obtaining an API key and setting up the development environment to sending your first text requests and working with multimodal inputs (images), we covered all the essential aspects for getting started with this advanced AI model.

We saw how Gemini, with its unparalleled capabilities, can open new horizons in content creation and AI application development, from generating creative and informative texts to understanding and interpreting images. The Axeto tests also demonstrated that Gemini performs very strongly in understanding and generating Persian, making it a valuable tool for Iranian content creators and developers.

Key takeaways from this tutorial:

Obtaining and Managing API Keys: The importance of API key security and how to set it as an environment variable.
Configuring Models: Choosing the appropriate model (gemini-pro for text and gemini-pro-vision for multimodal).
Sending Text Requests: How to use generate_content for text generation and control output with parameters like temperature and max_output_tokens.
Working with Images: Loading images and combining them with text prompts for analysis and multimodal content generation.
Conversational Chat: Maintaining conversation context for more natural interactions.
Error Handling: Identifying and resolving common errors such as authentication issues, usage limits, and safety concerns.
Potential for Axeto: Exploring how to use the Gemini API to enhance Axeto's content creation and image generation tools and provide a better experience for Iranian users.

Now that you are familiar with the basics of the Gemini API, it's time to get hands-on. By experimenting with the sample code, changing prompts, and exploring different models, you can quickly enhance your skills in this area. Use this knowledge to build creative applications, automate content creation tasks, and create innovative user experiences on the Axeto platform.

At Axeto, we are committed to continuously providing you with the latest and most powerful AI tools. The Gemini API is just one of these tools, holding immense potential to transform how we create content and interact with technology. Remember that learning AI is a continuous journey, and we will be with you every step of the way.

For more information and new updates, follow the Axeto Blog and our educational resources. Also, if you have any questions or encounter issues, the Axeto support team is ready to assist you.

Resource

For more information and detailed technical specifics about the Gemini API, you can refer to the following official resources:

Official Google AI Studio Documentation:

* Google AI Studio Documentation

* Quickstart for Python with Gemini API

* API Key Management

* Model Overview

Google Generative AI Library on PyPI:

* google-generativeai on PyPI

Official Gemini Website:

* Google Gemini

Google Cloud Console:

* Google Cloud Console

* Generative Language API Quotas

* Google Cloud Status Dashboard

These resources provide comprehensive and up-to-date information on the Gemini API, how to use it, available models, pricing, and troubleshooting potential issues. It is recommended to consult these documents for deeper insights and to discover more advanced capabilities. You can also use the Google AI Developers Community to ask questions and share your experiences.

Close-up of a computer terminal showing the command `pip install google-generativeai`.

تست Axeto

3 پرامپت فارسی استاندارد روی Gemini در Axeto تست شد. نتایج بر اساس کیفیت چهره/متن/سبک و سازگاری با پرامپت فارسی ارزیابی شد.

3 پرامپت تست‌شده

پرامپت	امتیاز	یادداشت
پرتره زن جوان ایرانی، نور طبیعی پنجره، فوکوس نرم، پس‌زمینه مینیمال	A	جزئیات چهره و نور طبیعی قابل قبول؛ مناسب پرامپت‌های پرتره فارسی.
منظره کویر ایران، غروب طلایی، ابرهای دراماتیک، فوتورéalistic	A-	ترکیب‌بندی منظره خوب؛ رنگ‌های غروب طبیعی.
لوگوی مینیمال برای استارتاپ فintech، خطوط هندسی، پس‌زمینه سفید	B+	متن/لوگو خوانا؛ برای برندینگ فارسی نیاز به تکرار پرامپت با وزن بیشتر.

مزایا

قابلیت‌های چندوجهی (متن، تصویر، ویدئو، کد).
دسترسی آسان از طریق API برای توسعه‌دهندگان.
پشتیبانی قوی از سوی گوگل و جامعه کاربری.
ارائه نسخه رایگان (free tier) برای شروع.
افزایش بهره‌وری در تولید محتوا و کدنویسی.
ادغام با اکوسیستم Google Cloud.

معایب

نیاز به حساب Google و فعال‌سازی Google Cloud.
مدیریت کلید API و مسائل امنیتی مرتبط با آن.
پیچیدگی اولیه برای کاربران کاملاً مبتدی.
وابستگی به سرویس‌های ابری گوگل.
محدودیت‌های استفاده در نسخه رایگان.
نیاز به دانش برنامه‌نویسی (Python) برای استفاده کامل.

خط زمانی

2022
Gemini یک مدل هوش مصنوعی چندوجهی از گوگل است که قابلیت درک و تولید متن، تصویر، ویدئو و کد را دارد.
2023
برای استفاده از Gemini API، به یک حساب Google فعال و فعال‌سازی Google Cloud Console نیاز دارید.
2024
کلید API Gemini را باید از Google AI Studio دریافت کرده و آن را همیشه محرمانه نگه دارید.
2025
نصب Python (نسخه 3.8 به بالا) و کتابخانه `google-generativeai` برای کار با API ضروری است.
2026
با استفاده از Gemini API می‌توانید برنامه‌های کاربردی هوشمند و خلاقانه‌ای در زمینه‌های مختلف توسعه دهید.

منابع

سوالات متداول

Gemini API چیست؟▾

Gemini API یک رابط برنامه‌نویسی کاربردی است که به توسعه‌دهندگان اجازه می‌دهد تا به مدل‌های هوش مصنوعی پیشرفته Gemini از گوگل دسترسی پیدا کرده و از قابلیت‌های آن در برنامه‌های خود استفاده کنند.

چگونه می‌توانم به Gemini API دسترسی پیدا کنم؟▾

برای دسترسی به Gemini API، ابتدا باید یک کلید API از کنسول توسعه‌دهندگان گوگل دریافت کنید. پس از آن، می‌توانید با استفاده از کتابخانه‌های رسمی یا درخواست‌های HTTP به API متصل شوید.

چه زبان‌های برنامه‌نویسی توسط Gemini API پشتیبانی می‌شوند؟▾

Gemini API از زبان‌های برنامه‌نویسی مختلفی مانند پایتون، جاوا، نود.جی‌اس، و گو پشتیبانی می‌کند. همچنین، مستندات کاملی برای هر زبان ارائه شده است.

کاربردهای اصلی Gemini API چیست؟▾

Gemini API در طیف وسیعی از کاربردها مانند تولید محتوا، خلاصه‌سازی متن، ترجمه، چت‌بات‌ها، تحلیل احساسات، و پردازش تصویر و ویدئو قابل استفاده است.

آیا استفاده از Gemini API رایگان است؟▾

گوگل یک لایه رایگان (Free Tier) برای Gemini API ارائه می‌دهد که به شما اجازه می‌دهد تا با محدودیت‌هایی از آن استفاده کنید. برای استفاده‌های بیشتر و پروژه‌های بزرگتر، برنامه‌های قیمت‌گذاری مختلفی وجود دارد.

تفاوت Gemini API با GPT-4 API چیست؟▾

هر دو Gemini API و GPT-4 API رابط‌های برنامه‌نویسی برای دسترسی به مدل‌های زبانی بزرگ هستند. تفاوت‌های اصلی آن‌ها در معماری مدل، قابلیت‌ها، و سیاست‌های قیمت‌گذاری است که توسط شرکت‌های توسعه‌دهنده (گوگل و OpenAI) تعیین می‌شود.

چگونه می‌توانم خروجی Gemini API را بهینه کنم؟▾

برای بهینه‌سازی خروجی Gemini API، می‌توانید از تکنیک‌هایی مانند مهندسی پرامپت (Prompt Engineering) پیشرفته، تنظیم پارامترهای مدل (مانند دما و حداکثر توکن)، و استفاده از دستورالعمل‌های واضح و دقیق استفاده کنید.

آیا Gemini API از پردازش تصویر و ویدئو پشتیبانی می‌کند؟▾

بله، مدل‌های پیشرفته Gemini قابلیت‌های چندوجهی (multimodal) دارند و می‌توانند ورودی‌های تصویری و ویدئویی را نیز پردازش کرده و بر اساس آن‌ها پاسخ‌های متنی تولید کنند.

چه منابعی برای یادگیری بیشتر در مورد Gemini API وجود دارد؟▾

بهترین منابع برای یادگیری بیشتر شامل مستندات رسمی گوگل، مثال‌های کد، آموزش‌های ویدئویی، و انجمن‌های توسعه‌دهندگان است.

چگونه می‌توانم خطاهای API را در Gemini API مدیریت کنم؟▾

برای مدیریت خطاها، باید کدهای خطای بازگردانده شده توسط API را بررسی کرده و منطق مناسبی برای رسیدگی به آن‌ها (مانند تلاش مجدد، گزارش خطا، یا نمایش پیام به کاربر) در برنامه خود پیاده‌سازی کنید.

آیا Gemini API برای برنامه‌های موبایل مناسب است؟▾

بله، Gemini API با ارائه SDKها و کتابخانه‌های مختلف، برای توسعه برنامه‌های موبایل (اندروید و iOS) نیز قابل استفاده است.

امنیت داده‌ها در Gemini API چگونه تضمین می‌شود؟▾

گوگل تدابیر امنیتی قوی برای حفاظت از داده‌ها در Gemini API اتخاذ کرده است، از جمله رمزنگاری داده‌ها در حال انتقال و در حالت سکون، و رعایت استانداردهای حریم خصوصی.

آیا می‌توانم مدل‌های Gemini را سفارشی‌سازی کنم؟▾

بله، گوگل امکان سفارشی‌سازی (Fine-tuning) مدل‌های Gemini را برای برخی از کاربران و کاربردهای خاص فراهم می‌کند، که به شما اجازه می‌دهد مدل را با داده‌های خود آموزش دهید.

چگونه می‌توانم عملکرد مدل را در Gemini API ارزیابی کنم؟▾

برای ارزیابی عملکرد مدل، می‌توانید از معیارهایی مانند دقت، روانی، و مرتبط بودن پاسخ‌ها استفاده کنید. همچنین، ابزارهای ارزیابی و تست A/B نیز می‌توانند مفید باشند.

آیا Gemini API از تولید کد پشتیبانی می‌کند؟▾

بله، یکی از قابلیت‌های پیشرفته Gemini API، توانایی تولید قطعه کد، تکمیل کد، و حتی رفع اشکال در زبان‌های برنامه‌نویسی مختلف است.

چه محدودیت‌هایی در استفاده از Gemini API وجود دارد؟▾

محدودیت‌ها شامل تعداد درخواست‌ها در دقیقه (Rate Limits)، حداکثر طول ورودی و خروجی، و محدودیت‌های مربوط به استفاده از منابع محاسباتی است که باید در برنامه‌ریزی پروژه در نظر گرفته شوند.

چگونه می‌توانم بازخورد خود را در مورد Gemini API ارائه دهم؟▾

می‌توانید از طریق انجمن‌های توسعه‌دهندگان گوگل، ابزارهای بازخورد در کنسول توسعه‌دهندگان، یا کانال‌های پشتیبانی رسمی، بازخورد خود را ارائه دهید.

آیا Gemini API برای کاربردهای تجاری مناسب است؟▾

بله، Gemini API برای توسعه و استقرار برنامه‌های کاربردی تجاری در مقیاس‌های مختلف طراحی شده است و ابزارها و پشتیبانی لازم را فراهم می‌کند.

چگونه می‌توانم از Gemini API برای ساخت یک چت‌بات استفاده کنم؟▾

برای ساخت چت‌بات، باید ورودی کاربر را دریافت کرده و آن را به عنوان پرامپت به Gemini API ارسال کنید. سپس پاسخ مدل را دریافت کرده و به کاربر نمایش دهید. مدیریت تاریخچه گفتگو نیز برای حفظ زمینه مکالمه ضروری است.

آینده Gemini API چگونه خواهد بود؟▾

گوگل به طور مداوم در حال توسعه و بهبود مدل‌های Gemini و API آن است. انتظار می‌رود که در آینده قابلیت‌های جدیدتر، عملکرد بهتر، و پشتیبانی از موارد استفاده بیشتری به آن اضافه شود.

مقالات مرتبط

نمونه ویدیوها

همین حالا در Axeto امتحان کنید

مقاله را خواندید — حالا با ابزار واقعی Axeto خروجی بگیرید.

ساخت تصویر ساخت ویدیو کتابخانه پرامپت مدل‌ها تعرفه راهنمای Gemini

نظرات (0)

در حال بارگذاری نظرات...

نکات کلیدی

همین حالا در Axeto امتحان کنید

Prerequisites

Step 1: Obtain API Key and Set Up Environment

1.1. Obtain API Key from Google AI Studio

1.2. Set API Key as an Environment Variable

1.3. Install the Google Generative AI Library

1.4. Verify Installation and Setup

Step 2: Send Your First Text Request to Gemini

2.1. Configure the Model

2.2. Send a Simple Request (Text-to-Text)

2.3. Use Advanced Parameters

2.4. Handle Multiple Responses and Safety

Step 3: Working with Images and Multimodal Content

3.1. The gemini-pro-vision Model

3.2. Sending an Image and Text Prompt

3.3. Asking More Complex Questions with Images

3.4. Structuring Prompts for Multimodal Inputs

Axeto Analysis

Axeto and Gemini API Opportunities for Iranian Users

Comparison with Other Models (for Axeto)

Axeto's Vision for the Future

Axeto Test

Persian Prompt Test Scenarios

Axeto Test Results Summary Table

Conclusion of Axeto Tests

Sample Code

1. Simple Text Generation

2. Conversational Chat (Chat Conversation)

3. Image Analysis with Text Prompt

4. Multimodal Content Generation (Text and Image)

5. Setting Generation Parameters

Common Errors

1. Authentication Error or Invalid API Key

2. Resource Exhausted or Quota Exceeded Error

3. Bad Request or Invalid Argument Error

4. Safety Concerns or Content Blocked Error

5. Internal Server Error or Unavailable Error

Conclusion

Resource

تست Axeto

مزایا

معایب

خط زمانی

منابع

سوالات متداول

مقالات مرتبط

نمونه ویدیوها

همین حالا در Axeto امتحان کنید

نظرات (0)

3.1. The `gemini-pro-vision` Model

1. `Authentication Error` or `Invalid API Key`

2. `Resource Exhausted` or `Quota Exceeded` Error

3. `Bad Request` or `Invalid Argument` Error

4. `Safety Concerns` or `Content Blocked` Error

5. `Internal Server Error` or `Unavailable` Error