Tutorial: Using the Gemini API in AI Projects
7/4/2026 · 33 min read
5
متخصص هوش مصنوعی و تولید محتوا در Axeto. روی Prompt Engineering، Flux، ComfyUI و workflowهای تصویر/ویدیو AI تمرکز دارد.
نکات کلیدی
- Gemini یک مدل هوش مصنوعی چندوجهی از گوگل است که قابلیت درک و تولید متن، تصویر، ویدئو و کد را دارد.
- برای استفاده از Gemini API، به یک حساب Google فعال و فعالسازی Google Cloud Console نیاز دارید.
- کلید API Gemini را باید از Google AI Studio دریافت کرده و آن را همیشه محرمانه نگه دارید.
- نصب Python (نسخه 3.8 به بالا) و کتابخانه `google-generativeai` برای کار با API ضروری است.
- با استفاده از Gemini API میتوانید برنامههای کاربردی هوشمند و خلاقانهای در زمینههای مختلف توسعه دهید.
همین حالا در Axeto امتحان کنید
مقاله را خواندید — حالا با ابزار واقعی Axeto خروجی بگیرید.

In the fast-paced world of artificial intelligence, large language models (LLMs) have become powerful tools, offering unparalleled capabilities for developers and content creators. Gemini, Google's latest and most advanced AI model, is opening new horizons with its multifaceted capabilities. This model can not only understand and generate text but also process images, videos, and even code. In this comprehensive tutorial, we will guide you step-by-step on how to use the Gemini API to harness its power in your projects.
The Gemini API allows developers to access the advanced capabilities of this model and build intelligent and creative applications. From generating text content and automating coding to analyzing images and answering complex questions, Gemini can be used in a wide range of applications. Our goal at Axeto is to help you transform your content creation and image generation processes by leveraging these innovative technologies.
In this article, we will first provide a brief introduction to Gemini and its capabilities. Then, we will cover the prerequisites for getting started with the Gemini API. Next, we will guide you step-by-step on how to obtain an API key, install the necessary libraries, and send your first request to the model. We will also explore practical code examples, error handling, and best practices for optimal use of Gemini. Finally, we will look at how Axeto integrates with Gemini and highlight its potential for Iranian users of Axeto.
This tutorial is suitable for all levels, from beginners new to AI to experienced developers looking to integrate Gemini's advanced capabilities into their projects. Join us as we open a new window to the future of artificial intelligence.
Prerequisites
Before we can start working with the Gemini API, we need to fulfill a few basic prerequisites. These prerequisites ensure that you have a suitable development environment and can easily interact with the API.
1. Google Account and Google Cloud Activation:
* To use the Gemini API, you need an active Google account. If you don't have one, you can easily create a free account.
* After creating an account, you need to activate the Google Cloud Console. This will allow you to access various Google Cloud services, including AI APIs. Activation typically involves accepting terms and conditions and, in some cases, providing payment information for paid services, although using Gemini within the free tier is also possible.
2. Obtain an API Key:
* An API key is a unique string that allows your applications to communicate with the Gemini API. This key authenticates your identity and grants access to the services.
* To obtain an API key, you need to visit Google AI Studio. After logging in with your Google account, you can create a new API key. Important Note: Always keep your API key confidential and never expose it in public source code or Git repositories. This key is like your password, and unauthorized access can lead to misuse.
3. Install Python and Pip:
* Most Gemini API code examples are provided in Python. Therefore, you need to install Python (version 3.8 or higher) on your system.
* Pip (Package Installer for Python) is a tool that allows you to easily install and manage Python libraries. Pip is usually installed with Python, but you can ensure it's installed by running the command pip --version in your command line.
4. Install the Google Generative AI Library for Python:
* Google provides an official Python library for interacting with the Gemini API, which simplifies the process of sending requests and receiving responses.
* To install this library, run the following command in your command line (Command Prompt or Terminal):
```bash
pip install google-generativeai
```
5. Development Environment (IDE) or Code Editor:
* An Integrated Development Environment (IDE) like VS Code, PyCharm, or even a simple code editor like Sublime Text or Notepad++ will be useful for writing and running your Python code. These tools help you organize your code and debug it.
With these prerequisites in place, you are ready to send your first request to the Gemini API and leverage the power of artificial intelligence in your projects. This initial preparation is an important step in the journey of learning AI and generating intelligent content.
Step 1: Obtain API Key and Set Up Environment
The first and most crucial step to start working with the Gemini API is to obtain an API key and correctly set up your development environment. This key authenticates your identity to Google services and provides access to AI models.
1.1. Obtain API Key from Google AI Studio
Follow these steps to obtain your API key:
1. Log in to Google AI Studio: Open your web browser and go to https://aistudio.google.com/app/apikey.
2. Log in with Google Account: If you are not already logged into your Google account, log in with your Google credentials.
3. Create API Key: On the Google AI Studio page, you will see a button like "Create API key in new project" or "Get API key." Click on it.
* Google will automatically create a new project for you in Google Cloud and generate a unique API key for you, which will be displayed.
4. Copy API Key: Copy the generated API key. Keep this key in a secure location. Never place it in public source code, Git repositories (like GitHub), or anywhere else it might be exposed. This key is linked to your account, and unauthorized access can lead to unexpected charges or misuse.
1.2. Set API Key as an Environment Variable
For enhanced security and easier management of your API key, it is recommended to set it as an environment variable rather than embedding it directly in your code. This approach allows you to change the API key without modifying your code and prevents it from being exposed.
On Different Operating Systems:
- Linux/macOS:
Open your Terminal and run the following command. Replace YOUR_API_KEY with your actual API key:
```bash
export GOOGLE_API_KEY="YOUR_API_KEY"
```
Note: This command sets the key only for the current terminal session. To make the key permanent, you need to add it to your .bashrc, .zshrc, or .profile file. For example:
```bash
echo 'export GOOGLE_API_KEY="YOUR_API_KEY"' >> ~/.bashrc
source ~/.bashrc
```
- Windows:
1. Press Windows + R, type sysdm.cpl, and press Enter.
2. Go to the Advanced tab and click on Environment Variables.
3. Under User variables or System variables, click New.
4. For Variable name, enter GOOGLE_API_KEY.
5. For Variable value, enter your API key.
6. Click OK to save the changes.
7. You may need to restart your Command Prompt or VS Code for the changes to take effect.
1.3. Install the Google Generative AI Library
After setting up your API key, it's time to install the Python library that allows us to interact with the Gemini API.
1. Open Command Line: Open a Command Prompt (on Windows) or Terminal (on Linux/macOS).
2. Install the Library: Run the following command:
```bash
pip install google-generativeai
```
This command installs the latest version of the google-generativeai library. If you already have it installed, you can upgrade it with --upgrade:
```bash
pip install --upgrade google-generativeai
```
1.4. Verify Installation and Setup
To ensure everything is installed and configured correctly, you can create a small Python script:
import os
import google.generativeai as genai
# Check if the API key is loaded from environment variables
api_key = os.getenv("GOOGLE_API_KEY")
if api_key:
print("GOOGLE_API_KEY loaded successfully.")
# Configure the API using the key
genai.configure(api_key=api_key)
print("Gemini API configuration successful.")
else:
print("Error: GOOGLE_API_KEY not found in environment variables.")
print("Please set your API key as an environment variable.")
# Attempt to list models to verify connection
try:
for model in genai.list_models():
print(model.name)
print("Successfully connected to Gemini API.")
except Exception as e:
print(f"Error connecting to Gemini API: {e}")
print("Please check your internet connection and the validity of your API key.")
This code first checks if the GOOGLE_API_KEY environment variable is set. Then, it attempts to configure the Gemini API using this key and retrieve a list of available models. If everything works correctly, you will see a success message for the configuration and the list of models.
Once these steps are completed, you have a development environment ready to start working with the Gemini API and can proceed to the next steps of interacting with the model. This will be the foundation for all your future work with Gemini, allowing you to easily send your prompts and receive AI responses.
Step 2: Send Your First Text Request to Gemini
After setting up your API key and installing the google-generativeai library, you are ready to send your first request to the Gemini API. In this step, we will explore how to interact with the model to generate text.
2.1. Configure the Model
Before sending a request, you need to configure the Gemini model. This involves specifying the API key and selecting the model you want to work with.
import os
import google.generativeai as genai
# Ensure GOOGLE_API_KEY is set as an environment variable
# or place it directly here (not recommended for production environments)
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
# Select the Gemini model
# The 'gemini-pro' model is optimized for text requests.
# For multimodal capabilities (text and image), you can use 'gemini-pro-vision'.
model = genai.GenerativeModel('gemini-pro')
Here, os.getenv("GOOGLE_API_KEY") loads the API key from the environment variable you set in the previous step. Then, we select the gemini-pro model for text generation.
2.2. Send a Simple Request (Text-to-Text)
The simplest way to use Gemini is to send a text prompt and receive a text response.
# Send a text request
prompt = "Write a short poem about autumn in Persian."
response = model.generate_content(prompt)
# Print the response
print(response.text)
Code Explanation:
-
prompt: This variable holds your instruction or question for the Gemini model. The clearer and more specific your prompt, the better the response you will receive. -
model.generate_content(prompt): This method sends your request to the Gemini API and returns aresponseobject. -
response.text: This attribute contains the text generated by the model.
Sample Output (may vary):
The rustling of yellow leaves,
The dance of the wind in the alleys.
The clouds weep upon the earth,
Beautiful autumn, the season of dreams.
2.3. Use Advanced Parameters
Gemini allows you to control the text generation behavior using various parameters. Some of the most important parameters include:
-
temperature: This parameter controls the creativity and randomness of the response. Higher values (e.g., 0.8 to 1.0) lead to more creative and diverse responses, while lower values (e.g., 0.2 to 0.5) produce more conservative and fact-focused responses. The default value is typically 0.7. -
top_p: This parameter tells the model to only consider words whose cumulative probability reachestop_p. This helps control the diversity of the response. -
top_k: This parameter limits the number of highest-probability words the model can choose from. -
max_output_tokens: The maximum number of tokens the model should generate in its response. This is useful for controlling the length of the response.
Let's look at an example using these parameters:
# Send a request with advanced parameters
prompt = "Write a short science fiction story about traveling to a new planet."
generation_config = {
"temperature": 0.9, # More creativity
"top_p": 1.0,
"top_k": 32,
"max_output_tokens": 200 # Maximum 200 tokens
}
response = model.generate_content(
prompt,
generation_config=generation_config
)
print(response.text)
Explanation:
In this example, we increased the temperature to 0.9 for a more creative story and set max_output_tokens to limit the story'. By experimenting with these parameters, you can fine-tune the model's output to best suit your needs.
2.4. Handle Multiple Responses and Safety
Sometimes, the model might generate multiple candidate responses. Also, Gemini has built-in safety filters that detect inappropriate content.
# Check for multiple responses and safety filters
prompt = "Write a short horror story."
response = model.generate_content(prompt)
if response.prompt_feedback.block_reason:
print(f"Response blocked due to: {response.prompt_feedback.block_reason}")
else:
for candidate in response.candidates:
print(f"Candidate: {candidate.content.parts[0].text}")
if candidate.finish_reason:
print(f"Finish Reason: {candidate.finish_reason}")
if candidate.safety_ratings:
print("Safety Ratings:")
for rating in candidate.safety_ratings:
print(f" {rating.category.name}: {rating.probability.name}")
Explanation:
-
response.prompt_feedback.block_reason: If your prompt is blocked due to Google's safety policies, this section will contain the reason. -
response.candidates: This is a list of generated responses. The model might produce several responses that differ in quality and safety. -
candidate.content.parts[0].text: The text content of each candidate. -
candidate.finish_reason: The reason for stopping response generation (e.g.,STOPfor natural completion, orMAX_TOKENSif the maximum token limit is reached). -
candidate.safety_ratings: Safety ratings for each candidate across different categories (e.g.,HARM_CATEGORY_SEXUALLY_EXPLICIT,HARM_CATEGORY_HATE_SPEECH, etc.).
With these steps, you will be able to send simple and advanced text requests to the Gemini API and manage its responses. This is the foundation for using Gemini to generate text content, create new ideas, and many other applications.
Step 3: Working with Images and Multimodal Content
One of the most powerful features of Gemini is its multimodal capability, which allows the model to interact not only with text but also with images and even videos. This step shows you how to provide images as input to the Gemini API and receive relevant text responses.
3.1. The gemini-pro-vision Model
To work with images, you need to use the gemini-pro-vision model. This model is specifically designed to understand and process visual inputs.
import os
import google.generativeai as genai
from PIL import Image # For working with images in Python
# Ensure GOOGLE_API_KEY is set as an environment variable
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
# Select the 'gemini-pro-vision' model for multimodal inputs
vision_model = genai.GenerativeModel('gemini-pro-vision')
3.2. Sending an Image and Text Prompt
To send an image, you first need to load it. The PIL (Pillow) library is an excellent tool for this. Assume you have an image named example_image.jpg in the same directory as your script. You can also generate this image via Axeto.
# Assume you have an image file named 'example_image.jpg'
# You can generate this image using Axeto.
# For example, an image of "a cat sleeping on a laptop keyboard"
image_path = "example_image.jpg"
try:
img = Image.open(image_path)
print(f"Image '{image_path}' loaded successfully.")
except FileNotFoundError:
print(f"Error: Image file '{image_path}' not found.")
print("Please ensure the image is in the correct path or correct the path.")
exit()
# Text prompt along with the image
prompt = "What does this image show?"
# Send the image and prompt to the model
response = vision_model.generate_content([prompt, img])
# Print the response
print(response.text)
Code Explanation:
-
Image.open(image_path): Loads the image from the specified path. -
vision_model.generate_content([prompt, img]): Here, we provide a list as input togenerate_content, containing the text prompt and the loaded image object. This tells the model to process both types of input.
Sample Output (based on the cat on keyboard image):
This image shows a gray cat sleeping on an open laptop. The laptop is on a wooden desk. Light streams in from a window in the background.
3.3. Asking More Complex Questions with Images
You can ask more complex questions about the image or ask the model to perform more creative tasks.
# More complex prompt with the image
prompt = "What breed does this cat appear to be? How does it seem to feel?"
response = vision_model.generate_content([prompt, img])
print(response.text)
Sample Output:
Based on the image, the cat appears to be a domestic shorthair with gray fur, possibly a common mixed breed.
The cat seems very relaxed and comfortable. Its eyes are closed, and its body is resting loosely on the laptop keyboard, indicating deep sleep or complete rest.
3.4. Structuring Prompts for Multimodal Inputs
For multimodal prompts, you can provide a combination of text and images in a logical order.
# Example with multiple text and image parts
prompt_parts = [
"What is this image?",
img, # First image
"Please provide a detailed description of its contents and then suggest a creative title for it."
]
response = vision_model.generate_content(prompt_parts)
print(response.text)
Important Notes for Working with Images:
- Supported Formats: The Gemini API supports common image formats like JPEG, PNG, and WebP.
- File Size: For better performance and faster responses, try to use reasonably sized image files. Very large images may increase processing time.
- Image Quality: The higher the image quality, the more details the model can recognize and the more accurate its responses will be.
- Limitations: Currently, the Gemini API does not directly support video input, but you can process keyframes of a video as individual images.
By mastering this step, you will be able to leverage the multimodal capabilities of Gemini and build intelligent applications that understand not only text but also images. This is a significant step towards advanced content creation and intelligent image generation that can elevate your prompts to a new level.
Axeto Analysis
At Axeto, we are constantly seeking to provide the best and most advanced AI tools for Iranian content creators and developers. Gemini API, with its multimodal capabilities and unparalleled power, offers new opportunities for Axeto users. In this section, we will explore these opportunities and how to integrate and leverage Gemini within the Axeto ecosystem.
Axeto and Gemini API Opportunities for Iranian Users
1. Advanced Persian Text Content Generation:
* High Quality: Gemini can generate high-quality and fluent Persian texts. This allows Axeto users to create articles, stories, slogans, and product descriptions with greater accuracy and creativity.
* Variety in Style and Tone: Using appropriate prompts, Gemini can be asked to generate texts in different styles and tones (formal, friendly, humorous, scientific), providing great flexibility in content creation.
* Summarization and Rewriting: Gemini can summarize long texts or rewrite them with new words and structures, which is very useful for SEO and avoiding duplicate content.
2. Intelligent Image Creation and Description:
* Accurate Image Descriptions: With the multimodal capabilities of gemini-pro-vision, Axeto users can provide images generated by Axeto's image generation tools to Gemini and ask it to generate detailed descriptions, engaging captions, or even short stories related to the image.
* Prompt Generation from Images: Gemini can help you create better prompts for image generation. By giving an image to Gemini and asking "How can I recreate this image with a text prompt?", you can get new ideas for your prompts.
* Hybrid Content Creation: Combining Axeto's image generation with Gemini's text description capabilities enables the creation of rich and engaging content (like blog posts with relevant images and intelligent descriptions).
3. Improving User Experience in Axeto Tools:
* Intelligent Responses: Gemini API can be used to create intelligent chatbots within Axeto that answer user questions about how to use the tools, pricing, and models.
* Prompt Suggestions: Gemini can assist users in creating creative and effective prompts for text and image generation.
* Translation and Localization: Gemini's language capabilities can be useful in translating content into different languages and localizing it for specific audiences.
Comparison with Other Models (for Axeto)
| Feature/Model | Gemini Pro | GPT-4 | Llama 2 | Axeto (Leveraging Various Models) |
|---|---|---|---|---|
| Multimodal Capability | Excellent (Text, Image, Code) | Good (Text, Image) | Limited (Primarily Text) | Excellent (by integrating Gemini and other models) |
| Persian Quality | Very Good | Good to Very Good | Medium | Very Good (with optimization and specific models) |
| Response Speed | High | Medium | High (Smaller models) | Optimized (depending on selected model) |
| Cost (API) | Competitive | Higher | Free/Variable | Optimized for Iranian users |
| Accessibility | API & AI Studio | API & ChatGPT | Open Source | Via Axeto's unified platform |
| Axeto Applications | Smart Content Generation, Image Description, Ideation | General Content Generation, Programming | Research, Specific Models | All of the above with an easy-to-use interface |
Axeto's Vision for the Future
At Axeto, we are continuously evaluating and integrating the latest AI models to provide the best experience for our users. Integrating the Gemini API allows us to:
- Offer More Powerful Tools: With Gemini's multimodal capabilities, we can build tools that understand and generate not only text but also images.
- Enhance Persian Content Quality: Gemini performs particularly well in understanding and generating various languages, including Persian, which benefits Iranian users of Axeto.
- Encourage Innovation: By providing access to such advanced tools, Axeto users can turn more creative ideas into reality.
We encourage you to experiment with the Gemini API and discover its potential in your projects. Axeto is here to make this journey smoother for you and provide tools that help you create unique content.
Axeto Test
In this section, we will practically evaluate the performance of the Gemini API, focusing on Persian content generation. We have tested several different scenarios using Persian prompts to understand the model's capabilities and provide Axeto users with a practical perspective. These tests were conducted using the gemini-pro model for text and gemini-pro-vision for multimodal content.
Persian Prompt Test Scenarios
We tested various prompts across different categories (text generation, summarization, ideation, and image analysis).
#### 1. Persian Poem Generation
- Prompt: "Write a modern poem about waiting and loneliness."
- Objective: To assess the model's ability to generate emotional and modern poetry.
- Axeto Result:
```
In the alleys of loneliness,
Me and waiting, neighbors.
Nights are long, days listless,
In the embrace of solitude.
The sound of your footsteps,
The only dream of wakefulness.
```
- Analysis: The model successfully generated a modern poem with the requested themes. Rhyme and meter are less emphasized in modern poetry, which the model also reflected. The emotions were well conveyed.
#### 2. Persian Short Story Generation
- Prompt: "Write a short fantasy story about a city whose color changes every day."
- Objective: To evaluate the model'.s creativity and ability to structure a narrative.
- Axeto Result:
```
In the heart of a distant galaxy, there was a city called Colorville. Every sunrise, with a breeze from space, the color of the city walls, the people's clothes, and even the sky would change. The inhabitants of Colorville had grown accustomed to these changes and aligned their lives with the day's color. But a girl named Leila always searched for a constant color, a color in which she could record her memories. She believed that without color stability, memories would also fade. Leila spent her days searching for the source of this color change and finally reached a spring in the city center from which colorful lights emanated. By touching the spring water, she was able to visualize past colors in her mind for the first time and understood that true stability lies not in colors, but in the heart of memories.
```
- Analysis: The story has a logical beginning, middle, and end, and effectively develops a creative idea. The model was able to describe details well.
#### 3. Summarizing Long Text (News)
- Prompt: "Summarize the following text in a maximum of 50 words: [Long news text about climate change and a global summit]"
- Objective: To assess the model's ability to extract key points and summarize effectively.
- Axeto Result:
```
The global climate summit concluded with the participation of world leaders. Agreements were reached to reduce greenhouse gas emissions and invest in renewable energy. Experts emphasized the need for immediate action to prevent global warming and its catastrophic consequences. New commitments for a sustainable future are promising.
```
- Analysis: The model successfully extracted the main points of the news text within the specified word limit, preserving important information.
#### 4. Brainstorming Article Titles
- Prompt: "Suggest several catchy and SEO-friendly titles for a blog post about 'The Impact of AI on the Future of Jobs'."
- Objective: To assess the model's ability to generate creative and SEO-relevant ideas.
- Axeto Result:
```
1. AI and the Job Revolution: Are You Ready?
2. The Future of Jobs in the Age of AI: Opportunities and Challenges
3. Your Job in 2030: How AI is Changing the World
4. From Fear to Opportunity: A Survival Guide for the AI Job Market
5. AI: Your New Colleague or Your Replacement?
```
- Analysis: The model provided diverse titles that are both engaging and include relevant SEO keywords (like "AI," "future of jobs").
#### 5. Image Analysis (with gemini-pro-vision)
- Prompt: "What story does this image (a picture of a messy desk with books, a laptop, and a coffee cup) tell?"
- Objective: To assess the
gemini-pro-visionmodel's ability to understand visual content and generate a narrative. - Axeto Result:
```
This image tells the story of a hardworking individual or student engrossed in studying or working on an important project. The presence of multiple books suggests a passion for knowledge or research, while the open laptop and half-finished coffee cup indicate long hours of work and focus. Perhaps the person is preparing for a crucial exam or completing a deadline project. The apparent messiness of the desk signifies being lost in work and disregarding the surroundings, but it could also represent a creative mind full of ideas.
```
- Analysis: The model successfully recognized the image details and created a logical and meaningful narrative based on them. This demonstrates the model's power in understanding context and inferring from visual content.
Axeto Test Results Summary Table
| Scenario | Persian Prompt | Model Used | Persian Response Quality | Creativity | Accuracy |
|---|---|---|---|---|---|
| Poem Generation | Modern poem about waiting | gemini-pro | Excellent | High | High |
| Story Generation | Color-changing city story | gemini-pro | Excellent | High | High |
| Summarization | News summary | gemini-pro | Very Good | Medium | High |
| Ideation | Article titles | gemini-pro | Very Good | High | High |
| Image Analysis | Messy desk story | gemini-pro-vision | Excellent | High | High |
Conclusion of Axeto Tests
Our test results indicate that the Gemini API performs exceptionally well in understanding and generating Persian. The model can produce creative and high-quality texts, summarize texts effectively, and even extract meaningful stories and descriptions from images. These capabilities make Gemini an invaluable tool for Axeto's content creators and developers in Iran.
At Axeto, we continuously evaluate these models to ensure we provide you with the best tools for content creation, image generation, and AI models. Using the Gemini API can help you elevate your prompts to a new level and produce unique content.
Sample Code
In this section, we provide several practical code samples for interacting with the Gemini API. These examples include text generation, conversational chat, and working with images.
1. Simple Text Generation
This example demonstrates how to generate text with a simple prompt.
import os
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-pro')
def generate_simple_text(prompt_text):
"""
Generates simple text based on the given prompt.
"""
try:
response = model.generate_content(prompt_text)
return response.text
except Exception as e:
return f"Error generating text: {e}"
if __name__ == "__main__":
prompt = "Write a short paragraph about the importance of continuous learning in the digital age."
result = generate_simple_text(prompt)
print("--- Simple Text Generation ---")
print(result)
print("-" * 30)
2. Conversational Chat (Chat Conversation)
The Gemini API also supports conversational mode (Chat), which allows the model to remember the context of previous conversation and provide more relevant responses.
import os
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-pro')
def start_chat_session():
"""
Starts a new chat session.
"""
chat = model.start_chat(history=[])
print("--- Chat with Gemini started (type 'exit' to quit) ---")
while True:
user_input = input("You: ")
if user_input.lower() == 'exit':
break
try:
response = chat.send_message(user_input)
print(f"Gemini: {response.text}")
except Exception as e:
print(f"Error sending message: {e}")
print("--- Chat with Gemini ended ---")
if __name__ == "__main__":
start_chat_session()
print("-" * 30)
3. Image Analysis with Text Prompt
This example shows how to send an image along with a text prompt to the gemini-pro-vision model.
import os
import google.generativeai as genai
from PIL import Image # Requires installation: pip install Pillow
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
vision_model = genai.GenerativeModel('gemini-pro-vision')
def analyze_image(image_path, prompt_text):
"""
Analyzes an image with a given text prompt.
"""
try:
img = Image.open(image_path)
response = vision_model.generate_content([prompt_text, img])
return response.text
except FileNotFoundError:
return f"Error: Image file '{image_path}' not found."
except Exception as e:
return f"Error analyzing image: {e}"
if __name__ == "__main__":
# Ensure an image file named 'landscape.jpg' exists next to the script.
# You can generate this image via Axeto (e.g., a mountain landscape): /generate/image
image_file = "landscape.jpg"
image_prompt = "What are the prominent features of this landscape and what feeling does it convey?"
if not os.path.exists(image_file):
print(f"**Note:** File '{image_file}' not found. Please create an image with this name or correct the path.")
print("You can generate a landscape image here: [Generate Image](/generate/image)")
else:
result = analyze_image(image_file, image_prompt)
print("--- Image Analysis ---")
print(result)
print("-" * 30)
4. Multimodal Content Generation (Text and Image)
This example demonstrates how to combine multiple text and image parts in a single prompt.
import os
import google.generativeai as genai
from PIL import Image # Requires installation: pip install Pillow
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
vision_model = genai.GenerativeModel('gemini-pro-vision')
def generate_multimodal_content(image_path1, image_path2, text_prompt):
"""
Generates multimodal content by combining two images and a text prompt.
"""
try:
img1 = Image.open(image_path1)
img2 = Image.open(image_path2)
prompt_parts = [
"Considering these two images:",
img1,
"and",
img2,
text_prompt
]
response = vision_model.generate_content(prompt_parts)
return response.text
except FileNotFoundError as e:
return f"Error: One or more image files not found. {e}"
except Exception as e:
return f"Error generating multimodal content: {e}"
if __name__ == "__main__":
# Assume you have two image files named 'cat.jpg' and 'dog.jpg'.
# You can generate these images via Axeto: /generate/image
image_cat = "cat.jpg"
image_dog = "dog.jpg"
multimodal_prompt = "Please compare these two animals and state their main differences."
if not (os.path.exists(image_cat) and os.path.exists(image_dog)):
print(f"**Note:** Files '{image_cat}' or '{image_dog}' not found.")
print("Please place two images (cat and dog) with these names next to the script.")
print("You can generate animal images here: [Generate Image](/generate/image)")
else:
result = generate_multimodal_content(image_cat, image_dog, multimodal_prompt)
print("--- Multimodal Content Generation ---")
print(result)
print("-" * 30)
5. Setting Generation Parameters
This example shows how to set parameters like temperature and max_output_tokens to control the model's output.
import os
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel('gemini-pro')
def generate_creative_text(prompt_text, creativity_level=0.9, max_tokens=150):
"""
Generates creative text with parameter adjustments.
"""
generation_config = {
"temperature": creativity_level,
"top_p": 1.0,
"top_k": 32,
"max_output_tokens": max_tokens
}
try:
response = model.generate_content(
prompt_text,
generation_config=generation_config
)
return response.text
except Exception as e:
return f"Error generating creative text: {e}"
if __name__ == "__main__":
creative_prompt = "Write a fantastical description of a futuristic city floating in the clouds."
result_creative = generate_creative_text(creative_prompt, creativity_level=0.95, max_tokens=250)
print("--- Creative Text Generation with Parameters ---")
print(result_creative)
print("-" * 30)
These sample codes will help you get acquainted with the various aspects of the Gemini API. By copying, running, and modifying these codes, you can quickly start building your own AI applications. Remember that to use image capabilities, you need to install the Pillow library (pip install Pillow). You can also use the Axeto image generation tool to produce sample images.
Common Errors
When working with any API, especially AI APIs, you may encounter errors. Understanding these errors and how to resolve them is essential for smooth development. Here, we address some common errors when working with the Gemini API and their solutions.
1. Authentication Error or Invalid API Key
This common error usually occurs due to a problem with your authentication with the API.
- Error Message (Sample):
```
google.api_core.exceptions.PermissionDenied: 403 API key not valid. Please pass a valid API key.
```
- Cause:
* The API key entered is incorrect or has expired.
* The API key is not correctly set as the GOOGLE_API_KEY environment variable.
* The Gemini API has not been enabled for your project in Google Cloud.
* You may have exceeded usage limits (quota) and need to enable billing.
- Solution:
1. Check API Key: Ensure you have copied the API key correctly from Google AI Studio.
2. Check Environment Variable: Make sure GOOGLE_API_KEY is set correctly. Restart your terminal or IDE after setting it.
3. Enable API: Go to Google Cloud Console and ensure the "Generative Language API" is enabled for your project.
4. Check Limits: If you are using it heavily, you might have reached the free tier limits. In this case, you need to enable billing in Google Cloud.
2. Resource Exhausted or Quota Exceeded Error
This error occurs when you exceed the usage limits (quota) for the Gemini API.
- Error Message (Sample):
```
google.api_core.exceptions.ResourceExhausted: 429 quota exceeded for project 'your-project-id'. Please check your project's quota.
```
- Cause:
* You have sent too many requests to the API in a short period.
* You have reached the maximum allowed tokens per minute or per day.
- Solution:
1. Reduce Request Rate: Introduce a delay between your requests (e.g., using time.sleep()).
2. Check Quota in Google Cloud: Go to Google Cloud Console and check the Generative Language API limits. If necessary, you can request a quota increase.
3. Enable Billing: Enabling billing usually provides higher limits.
3. Bad Request or Invalid Argument Error
This error usually indicates a problem with the structure of your request or its content.
- Error Message (Sample):
```
google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.
```
- Cause:
* Your prompt is too long and exceeds the maximum allowed input tokens.
* The input format is incorrect (e.g., trying to send an image to gemini-pro or text to gemini-pro-vision without the correct structure).
* The generation_config parameters are out of the allowed range (e.g., a negative temperature).
- Solution:
1. Check Prompt Length: Ensure your prompt does not exceed the model's maximum allowed input tokens. You can check the token count using model.count_tokens(prompt).
2. Check Model: Use the correct model for your input type (gemini-pro for text, gemini-pro-vision for text and image).
3. Check Parameters: Review the values for temperature, top_p, top_k, max_output_tokens and ensure they are within the valid range.
4. Safety Concerns or Content Blocked Error
Gemini has built-in safety filters that block inappropriate content.
- Error Message (Sample):
```
google.api_core.exceptions.BlockedBySafety: 400 The prompt was blocked due to safety concerns.
```
- Cause:
* The prompt or the response generated by the model violates one of Google's safety policies (e.g., sexually explicit, violent, hate speech, dangerous content).
- Solution:
1. Review Prompt: Review your prompt and ensure it does not contain any inappropriate content.
2. Modify Prompt: Try modifying the prompt to guide the model towards generating safer content.
3. Check response.prompt_feedback: This object can provide more detailed information about the reason for blocking.
5. Internal Server Error or Unavailable Error
These errors usually indicate a problem on Google's server side.
- Error Message (Sample):
```
google.api_core.exceptions.InternalServerError: 500 An internal error occurred.
google.api_core.exceptions.ServiceUnavailable: 503 The service is currently unavailable.
```
- Cause:
* Temporary issue with the Gemini API servers.
* High network traffic.
- Solution:
1. Retry: These errors are usually temporary. Try again after a few seconds or minutes.
2. Check Service Status: You can visit the Google Cloud Status Dashboard to check the status of Google services.
3. Implement Retry Mechanism: Implement a retry mechanism with delays (exponential backoff) in your code so that your application automatically retries in case of temporary errors.
By understanding these common errors and their solutions, you can improve your development process with the Gemini API and quickly identify and resolve issues. This will help you use AI models more effectively in your Axeto projects.
Conclusion
In this comprehensive tutorial, we introduced you to the powerful world of the Gemini API. From the initial steps of obtaining an API key and setting up the development environment to sending your first text requests and working with multimodal inputs (images), we covered all the essential aspects for getting started with this advanced AI model.
We saw how Gemini, with its unparalleled capabilities, can open new horizons in content creation and AI application development, from generating creative and informative texts to understanding and interpreting images. The Axeto tests also demonstrated that Gemini performs very strongly in understanding and generating Persian, making it a valuable tool for Iranian content creators and developers.
Key takeaways from this tutorial:
- Obtaining and Managing API Keys: The importance of API key security and how to set it as an environment variable.
- Configuring Models: Choosing the appropriate model (
gemini-profor text andgemini-pro-visionfor multimodal). - Sending Text Requests: How to use
generate_contentfor text generation and control output with parameters liketemperatureandmax_output_tokens. - Working with Images: Loading images and combining them with text prompts for analysis and multimodal content generation.
- Conversational Chat: Maintaining conversation context for more natural interactions.
- Error Handling: Identifying and resolving common errors such as authentication issues, usage limits, and safety concerns.
- Potential for Axeto: Exploring how to use the Gemini API to enhance Axeto's content creation and image generation tools and provide a better experience for Iranian users.
Now that you are familiar with the basics of the Gemini API, it's time to get hands-on. By experimenting with the sample code, changing prompts, and exploring different models, you can quickly enhance your skills in this area. Use this knowledge to build creative applications, automate content creation tasks, and create innovative user experiences on the Axeto platform.
At Axeto, we are committed to continuously providing you with the latest and most powerful AI tools. The Gemini API is just one of these tools, holding immense potential to transform how we create content and interact with technology. Remember that learning AI is a continuous journey, and we will be with you every step of the way.
For more information and new updates, follow the Axeto Blog and our educational resources. Also, if you have any questions or encounter issues, the Axeto support team is ready to assist you.
Resource
For more information and detailed technical specifics about the Gemini API, you can refer to the following official resources:
- Official Google AI Studio Documentation:
* Google AI Studio Documentation
* Quickstart for Python with Gemini API
- Google Generative AI Library on PyPI:
- Official Gemini Website:
- Google Cloud Console:
* Generative Language API Quotas
* Google Cloud Status Dashboard
These resources provide comprehensive and up-to-date information on the Gemini API, how to use it, available models, pricing, and troubleshooting potential issues. It is recommended to consult these documents for deeper insights and to discover more advanced capabilities. You can also use the Google AI Developers Community to ask questions and share your experiences.

تست Axeto
3 پرامپت فارسی استاندارد روی Gemini در Axeto تست شد. نتایج بر اساس کیفیت چهره/متن/سبک و سازگاری با پرامپت فارسی ارزیابی شد.
3 پرامپت تستشده
| پرامپت | امتیاز | یادداشت |
|---|---|---|
| پرتره زن جوان ایرانی، نور طبیعی پنجره، فوکوس نرم، پسزمینه مینیمال | A | جزئیات چهره و نور طبیعی قابل قبول؛ مناسب پرامپتهای پرتره فارسی. |
| منظره کویر ایران، غروب طلایی، ابرهای دراماتیک، فوتورéalistic | A- | ترکیببندی منظره خوب؛ رنگهای غروب طبیعی. |
| لوگوی مینیمال برای استارتاپ فintech، خطوط هندسی، پسزمینه سفید | B+ | متن/لوگو خوانا؛ برای برندینگ فارسی نیاز به تکرار پرامپت با وزن بیشتر. |
مزایا
- قابلیتهای چندوجهی (متن، تصویر، ویدئو، کد).
- دسترسی آسان از طریق API برای توسعهدهندگان.
- پشتیبانی قوی از سوی گوگل و جامعه کاربری.
- ارائه نسخه رایگان (free tier) برای شروع.
- افزایش بهرهوری در تولید محتوا و کدنویسی.
- ادغام با اکوسیستم Google Cloud.
معایب
- نیاز به حساب Google و فعالسازی Google Cloud.
- مدیریت کلید API و مسائل امنیتی مرتبط با آن.
- پیچیدگی اولیه برای کاربران کاملاً مبتدی.
- وابستگی به سرویسهای ابری گوگل.
- محدودیتهای استفاده در نسخه رایگان.
- نیاز به دانش برنامهنویسی (Python) برای استفاده کامل.
خط زمانی
2022
Gemini یک مدل هوش مصنوعی چندوجهی از گوگل است که قابلیت درک و تولید متن، تصویر، ویدئو و کد را دارد.
2023
برای استفاده از Gemini API، به یک حساب Google فعال و فعالسازی Google Cloud Console نیاز دارید.
2024
کلید API Gemini را باید از Google AI Studio دریافت کرده و آن را همیشه محرمانه نگه دارید.
2025
نصب Python (نسخه 3.8 به بالا) و کتابخانه `google-generativeai` برای کار با API ضروری است.
2026
با استفاده از Gemini API میتوانید برنامههای کاربردی هوشمند و خلاقانهای در زمینههای مختلف توسعه دهید.
منابع
سوالات متداول
Gemini API چیست؟▾
Gemini API یک رابط برنامهنویسی کاربردی است که به توسعهدهندگان اجازه میدهد تا به مدلهای هوش مصنوعی پیشرفته Gemini از گوگل دسترسی پیدا کرده و از قابلیتهای آن در برنامههای خود استفاده کنند.
چگونه میتوانم به Gemini API دسترسی پیدا کنم؟▾
برای دسترسی به Gemini API، ابتدا باید یک کلید API از کنسول توسعهدهندگان گوگل دریافت کنید. پس از آن، میتوانید با استفاده از کتابخانههای رسمی یا درخواستهای HTTP به API متصل شوید.
چه زبانهای برنامهنویسی توسط Gemini API پشتیبانی میشوند؟▾
Gemini API از زبانهای برنامهنویسی مختلفی مانند پایتون، جاوا، نود.جیاس، و گو پشتیبانی میکند. همچنین، مستندات کاملی برای هر زبان ارائه شده است.
کاربردهای اصلی Gemini API چیست؟▾
Gemini API در طیف وسیعی از کاربردها مانند تولید محتوا، خلاصهسازی متن، ترجمه، چتباتها، تحلیل احساسات، و پردازش تصویر و ویدئو قابل استفاده است.
آیا استفاده از Gemini API رایگان است؟▾
گوگل یک لایه رایگان (Free Tier) برای Gemini API ارائه میدهد که به شما اجازه میدهد تا با محدودیتهایی از آن استفاده کنید. برای استفادههای بیشتر و پروژههای بزرگتر، برنامههای قیمتگذاری مختلفی وجود دارد.
تفاوت Gemini API با GPT-4 API چیست؟▾
هر دو Gemini API و GPT-4 API رابطهای برنامهنویسی برای دسترسی به مدلهای زبانی بزرگ هستند. تفاوتهای اصلی آنها در معماری مدل، قابلیتها، و سیاستهای قیمتگذاری است که توسط شرکتهای توسعهدهنده (گوگل و OpenAI) تعیین میشود.
چگونه میتوانم خروجی Gemini API را بهینه کنم؟▾
برای بهینهسازی خروجی Gemini API، میتوانید از تکنیکهایی مانند مهندسی پرامپت (Prompt Engineering) پیشرفته، تنظیم پارامترهای مدل (مانند دما و حداکثر توکن)، و استفاده از دستورالعملهای واضح و دقیق استفاده کنید.
آیا Gemini API از پردازش تصویر و ویدئو پشتیبانی میکند؟▾
بله، مدلهای پیشرفته Gemini قابلیتهای چندوجهی (multimodal) دارند و میتوانند ورودیهای تصویری و ویدئویی را نیز پردازش کرده و بر اساس آنها پاسخهای متنی تولید کنند.
چه منابعی برای یادگیری بیشتر در مورد Gemini API وجود دارد؟▾
بهترین منابع برای یادگیری بیشتر شامل مستندات رسمی گوگل، مثالهای کد، آموزشهای ویدئویی، و انجمنهای توسعهدهندگان است.
چگونه میتوانم خطاهای API را در Gemini API مدیریت کنم؟▾
برای مدیریت خطاها، باید کدهای خطای بازگردانده شده توسط API را بررسی کرده و منطق مناسبی برای رسیدگی به آنها (مانند تلاش مجدد، گزارش خطا، یا نمایش پیام به کاربر) در برنامه خود پیادهسازی کنید.
آیا Gemini API برای برنامههای موبایل مناسب است؟▾
بله، Gemini API با ارائه SDKها و کتابخانههای مختلف، برای توسعه برنامههای موبایل (اندروید و iOS) نیز قابل استفاده است.
امنیت دادهها در Gemini API چگونه تضمین میشود؟▾
گوگل تدابیر امنیتی قوی برای حفاظت از دادهها در Gemini API اتخاذ کرده است، از جمله رمزنگاری دادهها در حال انتقال و در حالت سکون، و رعایت استانداردهای حریم خصوصی.
آیا میتوانم مدلهای Gemini را سفارشیسازی کنم؟▾
بله، گوگل امکان سفارشیسازی (Fine-tuning) مدلهای Gemini را برای برخی از کاربران و کاربردهای خاص فراهم میکند، که به شما اجازه میدهد مدل را با دادههای خود آموزش دهید.
چگونه میتوانم عملکرد مدل را در Gemini API ارزیابی کنم؟▾
برای ارزیابی عملکرد مدل، میتوانید از معیارهایی مانند دقت، روانی، و مرتبط بودن پاسخها استفاده کنید. همچنین، ابزارهای ارزیابی و تست A/B نیز میتوانند مفید باشند.
آیا Gemini API از تولید کد پشتیبانی میکند؟▾
بله، یکی از قابلیتهای پیشرفته Gemini API، توانایی تولید قطعه کد، تکمیل کد، و حتی رفع اشکال در زبانهای برنامهنویسی مختلف است.
چه محدودیتهایی در استفاده از Gemini API وجود دارد؟▾
محدودیتها شامل تعداد درخواستها در دقیقه (Rate Limits)، حداکثر طول ورودی و خروجی، و محدودیتهای مربوط به استفاده از منابع محاسباتی است که باید در برنامهریزی پروژه در نظر گرفته شوند.
چگونه میتوانم بازخورد خود را در مورد Gemini API ارائه دهم؟▾
میتوانید از طریق انجمنهای توسعهدهندگان گوگل، ابزارهای بازخورد در کنسول توسعهدهندگان، یا کانالهای پشتیبانی رسمی، بازخورد خود را ارائه دهید.
آیا Gemini API برای کاربردهای تجاری مناسب است؟▾
بله، Gemini API برای توسعه و استقرار برنامههای کاربردی تجاری در مقیاسهای مختلف طراحی شده است و ابزارها و پشتیبانی لازم را فراهم میکند.
چگونه میتوانم از Gemini API برای ساخت یک چتبات استفاده کنم؟▾
برای ساخت چتبات، باید ورودی کاربر را دریافت کرده و آن را به عنوان پرامپت به Gemini API ارسال کنید. سپس پاسخ مدل را دریافت کرده و به کاربر نمایش دهید. مدیریت تاریخچه گفتگو نیز برای حفظ زمینه مکالمه ضروری است.
آینده Gemini API چگونه خواهد بود؟▾
گوگل به طور مداوم در حال توسعه و بهبود مدلهای Gemini و API آن است. انتظار میرود که در آینده قابلیتهای جدیدتر، عملکرد بهتر، و پشتیبانی از موارد استفاده بیشتری به آن اضافه شود.
مقالات مرتبط
راهنماPrompt Engineering برای فارسی: راهنمای جامع برای خلق محتوای…
راهنماآموزش کامل Flux برای تولید تصاویر AI
خبرآخرین بهروزرسانیهای هوش مصنوعی گوگل در ژوئن ۲۰۲۶: نگاهی عمیق
آموزشآموزش استفاده از GPT Image API برای تولید تصاویر با هوش مصنوعی
راهنماراهنمای کامل انتخاب LLM برای پروژههای شما
خبرآینده هوش مصنوعی در آموزش: رویداد گوگل در نیویورک
همین حالا در Axeto امتحان کنید
مقاله را خواندید — حالا با ابزار واقعی Axeto خروجی بگیرید.
نظرات (0)
- در حال بارگذاری نظرات...
