Introduction
The speedy growth of the Generative AI mannequin capabilities allows us to develop many companies round GenAI. Immediately’s mannequin not solely generates textual content information but in addition, with the highly effective multi-modal mannequin like GPT-4, Gemini can leverage picture information to generate info. This functionality has large potential within the enterprise world equivalent to you should use any picture to get details about it immediately from the AI with any overhead. On this article, We’ll undergo the processes of utilizing the Gemini Imaginative and prescient Professional multimodal mannequin to get product info from a picture after which making a FastAPI-based REST API to devour the extracted info. So, let’s begin studying by constructing a product discovery API.
Studying Goal
- What’s REST structure?
- Utilizing REST APIs to entry internet information
- Find out how to use FastAPI and Pydantic for creating REST API
- What steps to take to construct APIs utilizing Google Gemini Imaginative and prescient Professional
- Find out how to use the Llamaindex library to entry Google Gemini Fashions
This text was printed as part of the Information Science Blogathon.
What’s a REST API?
A REST API or RESTful API is an software programming interface (API) that makes use of the design ideas of the Representational State Switch structure. It helps builders to combine software parts in a microservices structure.
An API is a option to allow an software or service to entry assets inside one other service or software.
Let’s take a Restaurant analogy to grasp the ideas.
You’re a restaurant proprietor, so you could have two companies operating when the restaurant is operating.
- One is the kitchen, the place the scrumptious meals can be made.
- Two, the sitting or desk space the place individuals will sit and eat meals.
Right here, the kitchen is the SERVER the place scrumptious meals or information can be produced for the individuals or purchasers. Now, individuals (purchasers) will examine the menu(API)and may place for order(request) to the kitchen (server) utilizing particular codes (HTTP strategies) like “GET”, “POST”, “PUT”, or “DELETE”.
Perceive the HTTP methodology utilizing the restaurant analogy
- GET: It means the consumer browses the menu earlier than ordering meals.
- POST: Now, purchasers are inserting an order, which suggests the kitchen will begin making the meals or creating information on the server for the purchasers.
- PUT: Now, to grasp the “PUT” methodology, think about that after inserting your order, you forgot so as to add ice cream, so that you simply replace the present order, which suggests updating the info.
- DELETE: If you wish to cancel the order, delete the info utilizing the “DELETE” methodology.
These are probably the most used strategies for constructing API utilizing the REST framework.
What’s the FastAPI framework?
FastAPI is a high-performance fashionable internet framework for API growth. It’s constructed on prime of Starlette for internet elements and Pydantic for information validation and serialization. Probably the most noticeable options are beneath:
- Excessive Efficiency: It’s primarily based on ASGI(Asynchronous Server Gateway Interface), which suggests FastAPI leverages asynchronous programming, which permits for dealing with high-concurrency situations with out a lot overhead.
- Information Validation: FastAPI makes use of probably the most extensively used Pydantic information validation. We’ll find out about it later within the article
- Computerized API documentation utilizing Swagger UI with full OpenAPI commonplace JSON information.
- Simple Extensibility: FastAPI permits integration with different Python libraries and frameworks simply
What’s Lammaindex?
LLamaindex is a software that acts as a bridge between your information and LLMs. LLMs may be native utilizing Ollama (run LLMs on a Native machine) or an API service equivalent to OpenAI, Gemini, and so on.LLamaindex can construct a Q&A system, Chat course of, clever agent, and different LLM fashions. It lays the muse of Retrieval Augmented Technology (see beneath picture) with ease in three well-defined steps
- Step One: Information Base (Enter)
- Step Two: Set off/Question(Enter)
- Step Three: Job/Motion(Output)
In response to the context of this text, we’ll construct Step Two and Step Three. We’ll give a picture as enter and retrieve the product info from the product within the picture.
Setup mission surroundings
Right here is the not-so-good flowchart of the appliance:
I’ll use conda to arrange the mission surroundings. Comply with the beneath steps
Schematic mission construction
Step 1: Create a conda surroundings
conda create --name api-dev python=3.11
conda activate api-dev
Step 2: Set up the required libraries
# llamaindex libraries
pip set up llama-index llama-index-llms-gemini llama-index-multi-modal-llms-gemini
# google generative ai
pip set up google-generativeai google-ai-generativelanguage
# for API growth
pip set up fastapi
Step 3: Getting the Gemini API KEY
Go to Google AI and Click on on Get an API Key. It would go to the Google AI Studio, the place you may Create API Key
Maintain it secure and save it; we would require this later.
Implementing REST API
Create a separate folder for the mission; let’s title it gemini_productAPI
# create empty mission listing
mkdir gemini_productAPI
# activate the conda surroundings
conda activate api-dev
To make sure FastAPI is put in accurately create a Python file title predominant.py and duplicate the beneath code to it.
# Touchdown web page for the appliance
@app.get("https://www.analyticsvidhya.com/")
def index(request: Request):
return "Hi there World"
As a result of Fastapi is an ASGI framework, we’ll use an asynchronous internet server to run the Fastapi software. There are two kinds of Server Gateway interfaces: WSGI and ASGI. They each sit between an internet server and a Python internet software or framework and deal with incoming consumer requests, however they do it otherwise.
- WSGI or Net Server Gateway interface: It’s synchronous, which suggests it will probably deal with one request at a time and block execution of the opposite till the earlier request is accomplished. Standard Python internet framework Flask is a WSGI framework.
- ASGI or Asynchronous Server Gateway interface: It’s asynchronous, which suggests it will probably deal with a number of requests concurrently with out blocking others. It’s extra fashionable and strong for a number of purchasers, long-live connections, and bidirectional communication equivalent to real-time messaging, video calls, and so on.
Uvicorn is an Asynchronous Server Gateway Interface (ASGI) internet server implementation for Python. It would present a typical interface between an async-capable Python internet server, framework, and software. Fastapi is an ASGI framework that makes use of Uvicorn by default.
Now begin the Uvicorn server and go to http://127.0.0.1:8000 in your browser. You will notice Hi there World written on it.
-- open your vscode terminal and kind
uvicorn predominant:app --reload
Now, we’re set to start out coding the principle mission.
Importing Libraries
import os
from typing import Listing
# fastapi libs
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import HTMLResponse
from fastapi.responses import HTMLResponse
from fastapi.staticfiles import StaticFiles
from fastapi.templating import Jinja2Templates
# pydantic libs
from pydantic import BaseModel, ConfigDict, Subject, HttpUrl
# llamaindex libs
from llama_index.multi_modal_llms.gemini import GeminiMultiModal
from llama_index.core.program import MultiModalLLMCompletionProgram
from llama_index.core.output_parsers import PydanticOutputParser
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls
After importing libraries, create a file .env and put the Google Gemini API Key you bought earlier.
# put it within the .env file
GOOGLE_API_KEY="AB2CD4EF6GHIJKLM-NO6P3QR6ST"
Now instantiate the FastAPI class and cargo the GOOGLE API KEY from env
app = FastAPI()
load_dotenv()
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY")
Create a easy touchdown Web page
Create a GET methodology for our easy touchdown web page for the mission.
# Touchdown web page for the appliance
@app.get("https://www.analyticsvidhya.com/", response_class=HTMLResponse)
def landingpage(request: Request):
return templates.TemplateResponse(
"landingpage.html",
"request": request
)
To render HTML in FastAPI we use the Jinja template. To do that create a template listing on the root of your mission and for static recordsdata equivalent to CSS and Javascript recordsdata create a listing named static. Copy the beneath code in your predominant.py after the app.
# Linking template listing utilizing Jinja Template
templates = Jinja2Templates(listing="templates")
# Mounting Static recordsdata from a static listing
app.mount("/static", StaticFiles(listing="static"), title="static")
The code above hyperlinks your templates and static listing to the FastAPI software.
Now, create a file known as landingpage.html within the template listing. Go to GithubLink and duplicate the code /template/landingpage.html to your mission.
Truncated code snippets
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta title="viewport" content material="width=device-width, initial-scale=1.0">
<title>Product Discovery API</title>
<hyperlink href="https://www.analyticsvidhya.com/weblog/2024/05/building-a-product-discovery-api/url_for("static', path="/landingpage.css")"></hyperlink>
</head>
<physique>
<header class="header">
<div class="container header-container">
<h1 class="content-head is-center">
Uncover Merchandise with Google Gemini Professional Imaginative and prescient
....
</header>
<predominant>
....
</predominant>
<footer class="footer">
<div class="container footer-container">
<p class="footer-text">© 2024 Product Discovery API.
All rights reserved.</p>
</div>
</footer>
</physique>
</html>
After that, create a listing named static and two recordsdata, landingpage.css and landingpage.js, in your static listing. Now, go to GithubLink and duplicate the code from landingpage.js to your mission.
Truncated code snippets
doc.getElementById('upload-btn').addEventListener('click on', operate() {
const fileInput = doc.getElementById('add');
const file = fileInput.recordsdata[0];
......
doc.getElementById('contact-form').addEventListener(
'submit', operate(occasion)
occasion.preventDefault();
alert('Message despatched!');
this.reset();
);
for CSS, go to the Github hyperlink and duplicate the code landingpage.css to your mission.
Truncated code snippets
physique
font-family: 'Arial', sans-serif;
margin: 5px;
padding: 5px;
box-sizing: border-box;
background-color: #f4f4f9;
.container
max-width: 1200px;
margin: 0 auto;
padding: 0 20px;
The ultimate web page will appear to be beneath
It is a very primary touchdown web page created for the article. You can also make it extra enticing utilizing CSS styling or React UI.
We’ll use the Google Gemini Professional Imaginative and prescient mannequin to extract product info from a picture.
def gemini_extractor(model_name, output_class, image_documents, prompt_template_str):
gemini_llm = GeminiMultiModal(api_key=GOOGLE_API_KEY, model_name=model_name)
llm_program = MultiModalLLMCompletionProgram.from_defaults(
output_parser=PydanticOutputParser(output_class),
image_documents=image_documents,
prompt_template_str=prompt_template_str,
multi_modal_llm=gemini_llm,
verbose=True,
)
response = llm_program()
return response
We’ll use Llamaindex’s GeminiMultiModal API to work with Google Gemini API on this operate. The LLmaindex MultiModalLLMCompletion API will take the output parser, picture information, immediate, and GenAI mannequin to get our desired response from the Gemini Professional Imaginative and prescient mannequin.
For extracting info from the picture, we’ve got to engineer a immediate for this objective
prompt_template_str = """
You might be an professional system designed to extract merchandise from photographs for
an e-commerce software. Please present the product title, product coloration,
product class and a descriptive question to seek for the product.
Precisely determine each product in a picture and supply a descriptive
question to seek for the product. You simply return a accurately formatted
JSON object with the product title, product coloration, product class and
description for every product within the picture
"""
With this immediate, we instruct the mannequin that it’s an professional system that may extract info from a picture. It would extract the data beneath from the given picture enter.
- Title of product
- Coloration
- Class
- Description
This immediate can be used as an argument within the above gemini_extractor operate later.
Everyone knows {that a} Generative AI mannequin can usually produce undesired responses. It is a drawback when working with a generative AI mannequin as a result of it won’t at all times comply with the immediate (more often than not). To mitigate this kind of concern, Pydantic comes within the scene. FastAPI was constructed utilizing Pydantic to validate its API mannequin.
Making a Product mannequin utilizing Pydantic
class Product(BaseModel):
id: int
product_name: str
coloration: str
class: str
description: str
class ExtractedProductsResponse(BaseModel):
merchandise: Listing[Product]
class ImageRequest(BaseModel):
url: str
model_config = ConfigDict(
json_schema_extra=
"examples": [
"url": "https://images.pexels.com/photos/356056/pexels-photo-356056.jpeg?
auto=compress&cs=tinysrgb&w=1260&h=750&dpr=1"
]
)
The above Product class defines an information mannequin for a product, and the ExtractedProductResponse class represents a response construction that incorporates a listing of those merchandise, in addition to the ImageRequest class for offering enter photographs for purchasers. We used Pydantic to make sure the structural integrity of the info validation and serialization.
all_products = []
@app.put up("/extracted_products")
def extracted_products(image_request: ImageRequest):
responses = gemini_extractor(
model_name="fashions/gemini-pro-vision",
output_class=ExtractedProductsResponse,
image_documents=load_image_urls([image_request.url]),
prompt_template_str=prompt_template_str,
)
all_products.append(responses)
return responses
Within the above code snippets, we create an endpoint within the FastAPI software utilizing the POST methodology with decorator @app.put up(“/extracted_products”), which can course of the requested picture to extract product info. The extracted_products methodology will deal with the request to those endpoints. It would take the image_request parameter of sort ImageRequest.
We known as the gemini_extractor operate we created beforehand for info extraction, and the response can be saved within the all_products checklist. We’ll use a built-in Python checklist to retailer the responses for simplicity. You’ll be able to add database logic to retailer the response within the database. MongoDB could be a good selection for storing this kind of JSON information.
Requesting a picture from OPENAPI doc
Go to http://127.0.0.1:8000/docs in your browser; you’ll get an OpenAPI docs
Develop the /extracted_product and click on Strive It Out on the proper
Then click on Execute and it’ll extract the product info from the picture utilizing the Gemini Imaginative and prescient Professional mannequin.
Response from the POST methodology will appear to be this.
Within the above picture, you may see the requested URL and response physique, which is the generated response from the Gemini mannequin
Making a product endpoint with a GET methodology for fetching the info
@app.get("/api/merchandise/", response_model=checklist[ExtractedProductsResponse])
async def get_all_products_api():
return all_products
Go to http://127.0.0.1:8000/api/merchandise to see all of the merchandise
Within the above code, we created an endpoint to fetch the extracted information saved within the database. Others can use this JSON information for his or her merchandise, equivalent to making and e-commerce websites.
All of the code used on this article within the GitHub Gemini-Product-Software
Conclusion
It is a easy but systematic option to entry and make the most of the Gemini Multimodal Mannequin to make minimal viable product discovery API. You need to use this method to construct a extra strong product discovery system immediately from a picture. One of these software has very helpful enterprise potential, e.g., an Android software that makes use of digicam API to take photographs and Gemini API for extracting product info from that picture, which can be used for getting merchandise immediately from Amazon, Flipkart, and so on.
Key Takeaways
- The structure of Representational State Switch for constructing high-performance API for enterprise.
- Llamaindex has an API to attach totally different GenAI fashions. Immediately, we’ll discover ways to use Llamaindex with the Gemini API to extract picture information utilizing the Gemini Imaginative and prescient Professional mannequin.
- Many Python frameworks, equivalent to Flask, Django, and FastAPI, are used to construct REST APIs. We discover ways to use FastAPI to construct a sturdy REST API.
- Immediate engineering to get the anticipated response from Gemini Mannequin
The media proven on this article should not owned by Analytics Vidhya and is used on the Creator’s discretion.
Regularly Requested Questions
A: Llamaindex has default OpenAI entry, however if you wish to use different fashions equivalent to Cohere, Gemini, Llama2, Ollama, or MistralAI, you may set up a model-specific library utilizing PIP. See use instances right here.
A: You need to use any UI frameworks you need with FastAPI. You must create a frontend listing and Backend dir for the API software within the FastAPI root and hyperlink your frontend UI with the Backend. See the Full Stack software right here.
A: The responses from the mannequin are JSON, so any doc database equivalent to MongoDB could be good for storing the response and retrieving the info for the appliance. See MongoDB with FastAPI right here.