Introduction
In a world the place displays rely extra on interesting graphics than in depth textual content, utilizing a multimodal method makes summarizing or getting ready presentation notes easy. Think about a software that not solely understands the complexities of pictures, charts, and visible options but additionally can present a succinct and instructive abstract to your viewers. Enter Google’s Gemini Professional, a breakthrough multimodal mannequin that involves your rescue. With Gemini Professional, you not must battle to distill difficult graphics into significant statements. It elegantly incorporates the facility of AI to disclose the story behind every picture and chart, making your displays not solely visually gorgeous but additionally simply comprehensible. This text will concentrate on constructing a easy PPT Summarizer utilizing the Gemini Professional LLM multimodal mannequin and the StreamLit framework.
Studying Targets
- Study to construct PPT slides, a be aware generator, and a summarizer utilizing the Streamlit framework.
- Know the best way to use the Gemini API for constructing a PPT summarizer.
- Learn to construct the Streamlit app in Colab itself and run it with the assistance of supporting libraries.
- Perceive the basics of the Gemini Sequence Mannequin.
This text was printed as part of the Information Science Blogathon.
Gemini Mannequin Sequence
Gemini, a groundbreaking multimodal mannequin collection crafted by Google, has a distinguished position within the realm of AI progress. It has made breakthroughs in pure language understanding, code interpretation, picture evaluation, audio processing, and video evaluation. With precision in design to push the bounds of AI capabilities, Gemini strives for cutting-edge efficiency throughout numerous benchmarks.
Gemini Fashions
Gemini is accessible in 3 distinct mannequin sizes. They’re:
1. Gemini Nano: It’s a compact model of the mannequin that may be run on an edge gadget. At present, this mannequin is being utilized by Google on its Pixel Cellphone. You’ll be able to learn extra about it right here. Competent in varied duties, together with pure language understanding, code interpretation, and picture and audio evaluation.
2. Gemini Professional: That is the mannequin model that has been made obtainable to the general public by Google. It’s a medium-scale mannequin much like the text-based PaLM mannequin however with a number of enhanced capabilities.
The Gemini Professional mannequin comes presently in two variants – one for textual content enter (fashions/Gemini-Professional) and
different for image-based enter together with textual content (fashions/Gemini-pro-vision).
3. Gemini Extremely: It is the biggest mannequin within the Gemini collection with a large-scale structure. It could deal with advanced video and audio processing duties and fee extremely on human knowledgeable efficiency.
Constructing Energy Level Summarizer
We are going to now go intimately about how we will create a easy PPT summarizer. Our Chabot can have following options:
1. Permit customers to add the facility level file which consumer needs to summarize
2. Convert the slides to the photographs from PPT
3. Generate abstract for every slide
4. Show the entire abstract
Step 1: Set up Required Libraries
We are going to set up the required libraries that are google-generativeai,streamlit localtunnel, Spire.Presentation.
We are going to want localtunnel to host the streamlit app from colab pocket book instantly.
Localtunnel – This can assign you a singular publicly accessible url that may proxy all requests to your regionally operating webserver. Principally it’ll enable us to entry our streamlit app operating in our colab native atmosphere
Spire.Presentation – That is used to load PPT and in addition to transform PPT into Photos.
!pip set up -q -U google-generativeai
!pip set up -q streamlit
!npm set up localtunnel # for internet hosting the streamlit app from colab
!pip set up Spire.Presentation
Step 2: Create Utility Python File
We are going to now create a utility python file. If you’re utilizing Colab for this you possibly can create new file and identify it was “utility.py” which is able to create python file as we’ve got named it with .py extension as proven in beneath pic.
The utility file will comprise all of the supporting utility capabilities for the app in order that we will outline the principle app, which is able to comprise the UI components individually. We are going to now outline all capabilities within the utility file one after the other.
Step 3: Import Libraries & Outline Initialize Perform
We are going to import all of the required libraries and outline the initialize operate which principally does the next :
Right here we are going to outline the initializing operate we are going to perceive every half inside it one after the other:
Firstly we have to configure our Google API Key which may be accomplished by logging into our Google account after which navigating to this web site.
Then we create our occasion of Gemini imaginative and prescient mannequin utilizing genai.GenerativeModel(‘Gemini-pro-vision’)
Lastly we return the occasion of our mannequin created
import streamlit as st
##
#from google.colab import userdata
import os
import shutil
import numpy as np
##
import google.generativeai as genai
from PIL import Picture
import base64
from pathlib import Path
import time
from spire.presentation.frequent import *
from spire.presentation import *
def initialize():
# configure api key and initialise mannequin
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = 'YOUR API KEY'
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
# Create the Mannequin
#txt_model = genai.GenerativeModel('gemini-pro')
vision_model = genai.GenerativeModel('gemini-pro-vision')
return vision_model
Step 4: Outline Save File Perform
This operate will save the uploaded streamlit file into our native folder construction. Each time we add any file on streamlit it’s saved in-memory so as to receive and use its path we have to put it aside regionally.
# save uploaded file to native
def save_file(uploaded_file):
# Save uploaded file to folder.
save_folder="/content material"
save_path = Path(save_folder, uploaded_file.identify)
with open(save_path, mode="wb") as w:
w.write(uploaded_file.getvalue())
if save_path.exists():
st.sidebar.success(f'File uploaded_file.identify is efficiently saved!')
return str(save_path)
Step 5: Convert The PPT Slides To Photos
Now we’ve got to transform our PPT slides to photographs in order that we will ship every slide within the type of a picture to the imaginative and prescient mannequin. For this, we used the Spires library we had downloaded. This operate will take the filepath (PPT file location) as enter and return the variety of slides within the PPT as output. We are going to use this to load our abstract sequentially from the beginning to the ending slide and never create a hapazard abstract, as photographs will not be saved sequentially beneath the folder.
def ppt_to_img(filepath):
# Create a Presentation object
presentation = Presentation()
# Load a PPT or PPTX file
presentation.LoadFromFile(filepath)
save_folder="/content material/output"
save_path = Path(save_folder)
#if folder already exists then take away it in order that we will overwrite
if save_path.exists():
shutil.rmtree(save_path,ignore_errors=True)
#save_path.rmdir()
save_path.mkdir() #make listing
# Loop via the slides within the presentation
for i, slide in enumerate(presentation.Slides):
# Specify the output file identify
fileName =save_folder+"/ToImage_" + str(i) + ".png"
# Save every slide as a PNG picture
picture = slide.SaveAsImage()
picture.Save(fileName)
picture.Dispose()
ppt_len = presentation.Slides.Size
presentation.Dispose()
st.success(f'PPT transformed to Picture is efficiently saved!')
return ppt_len
Step 6: Generate Abstract Perform
We are going to now outline the generate abstract operate, which is able to move every picture, which is a slide, to the mannequin and generate the abstract. Right here, an vital factor is to present applicable prompts. That is the place we would must make use of just a little little bit of immediate engineering. After making an attempt varied prompts, the beneath immediate gave higher outcomes. For enter, we take the mannequin, size of the PPT, and path identify of the saved photographs. Additionally be aware we’ve got to incorporate a delay as Gemini API has a fee restrict of 6 requests per minute so we embody a delay of 10 sec after every request.
Immediate Engineering
“You’re a energy level ppt assistant . It is best to generate a cohesive abstract of most 5 traces for the enter slide picture with applicable title. Info needs to be associated to slip picture content material. Additionally if there are any charts graph embody related numbers explaining the charts!”
Right here we use Persona sample of immediate engineering the place we ask mannequin to behave as a PPT assistant in addition to particular data sample the place in we specify that charts and graphs must be defined by way of numbers.
def generate_summary(mannequin, ppt_len, path_name):
from PIL import Picture
for i in vary(0,ppt_len):
picture = Picture.open(path_name+str(i)+".png")
response = mannequin.generate_content(["You are a power point ppt assistant . You should generate a cohesive summary of maximum 5 lines for the input slide image with appropriate title. Information should be related to slide image content. Also if there are any charts graph include relevant numbers explaining the charts!",image]);
st.write(f"-----------------Slide i ------------------")
st.markdown(response.textual content)
time.sleep(10)
Step 7: The Entire Utility File
Beneath is the entire utility.py file the way it will appear to be.
import streamlit as st
##
#from google.colab import userdata
import os
import shutil
import numpy as np
##
import google.generativeai as genai
from PIL import Picture
import base64
from pathlib import Path
import time
from spire.presentation.frequent import *
from spire.presentation import *
def initialize():
# configure api key and initialise mannequin
if "GOOGLE_API_KEY" not in os.environ:
os.environ["GOOGLE_API_KEY"] = 'AIzaSyBM8fTMMEQvUU_FrJdUVTD9TMVhYfNip98'
genai.configure(api_key=os.environ['GOOGLE_API_KEY'])
# Create the Mannequin
#txt_model = genai.GenerativeModel('gemini-pro')
vision_model = genai.GenerativeModel('gemini-pro-vision')
return vision_model
# save uploaded file to native
def save_file(uploaded_file):
# Save uploaded file to folder.
save_folder="/content material"
save_path = Path(save_folder, uploaded_file.identify)
with open(save_path, mode="wb") as w:
w.write(uploaded_file.getvalue())
if save_path.exists():
st.sidebar.success(f'File uploaded_file.identify is efficiently saved!')
return str(save_path)
def ppt_to_img(filepath):
# Create a Presentation object
presentation = Presentation()
# Load a PPT or PPTX file
presentation.LoadFromFile(filepath)
save_folder="/content material/output"
save_path = Path(save_folder)
#if folder already exists then take away it in order that we will overwrite
if save_path.exists():
shutil.rmtree(save_path,ignore_errors=True)
#save_path.rmdir()
save_path.mkdir() #make listing
# Loop via the slides within the presentation
for i, slide in enumerate(presentation.Slides):
# Specify the output file identify
fileName =save_folder+"/ToImage_" + str(i) + ".png"
# Save every slide as a PNG picture
picture = slide.SaveAsImage()
picture.Save(fileName)
picture.Dispose()
ppt_len = presentation.Slides.Size
presentation.Dispose()
st.success(f'PPT transformed to Picture is efficiently saved!')
return ppt_len
def generate_summary(mannequin, ppt_len, path_name):
from PIL import Picture
for i in vary(0,ppt_len):
picture = Picture.open(path_name+str(i)+".png")
response = mannequin.generate_content(["You are a power point ppt assistant . You should generate a cohesive summary of maximum 5 lines for the input slide image with appropriate title. Information should be related to slide image content. Also if there are any charts graph include relevant numbers explaining the charts!",image]);
st.write(f"-----------------Slide i ------------------")
st.markdown(response.textual content)
time.sleep(10)
Step 8: Defining The Essential App File
We now will outline the principle app which is able to comprise the UI components. Now we are going to use our streamlit framework to outline our app :
st.set_page_config – that is used to outline the webpage tab identify and icon
st.header – We use this operate to outline our net web page header which shall be displayed
st.write– We use this operate to present a subtitle to our header describing the app
We name the initialize operate from our utility module and instantiate the mannequin. Inside predominant operate we create a sidebar part which is able to enable consumer to add the PPT file. As soon as the file is uploaded then solely “Generate PPT Abstract” button shall be enabled. As soon as we click on generate abstract PPT is transformed to picture after which generate abstract operate is named which prints the abstract of every slide one after the other on display.
%%writefile app.py
import streamlit as st
##
import utility
st.set_page_config(page_title="PPT", page_icon=":sun shades:")
st.header('PPT SUMMARIZER')
st.write('Summarize Your PPT')
mannequin = utility.initialize()
def predominant():
uploaded_img=""
ppt_len=0
# Sidebar Elements
with st.sidebar:
uploaded_img = st.file_uploader("Add PPT to Summarize it!", accept_multiple_files=False, kind=['ppt','pptx'])
if uploaded_img will not be None:
#encoded_img = image_to_base64(uploaded_img)
save_path = utility.save_file(uploaded_img)
st.write("file identify",uploaded_img.identify)
# Essential Web page
butn_summary = st.button("Generate PPT Abstract!", disabled = not bool(uploaded_img), kind="main")
if butn_summary:
ppt_len = utility.ppt_to_img(save_path)
if ppt_len>0:
utility.generate_summary(mannequin,ppt_len,"/content material/output/ToImage_")
if __name__ == "__main__":
predominant()
Step 9: How To Outline The Entire Code In Colab
In Colab, we have to outline all of the code inside one code cell. Together with the command %%writefile app.py This command principally means that it’ll generate a Python file with the entire code in a cell. We want the py file to run our streamlit app.
Step 10: How To Run The App
As soon as above app.py file is written use the beneath instructions to run the app. We are going to use the streamlit command to run the streamlit app and its logs shall be saved in logs.txt.
!streamlit run /content material/app.py &>/content material/logs.txt &
Then after this command we are going to run the localtunnel command to host our streamlit app on exterior ip deal with with assist of localtunnel.
!npx localtunnel --port 8501
Native tunnel will generate a url we have to click on on it
Open the logs.txt file which is generated and replica the Host deal with of exterior url as present beneath.
Paste this to the url web page which is displayed once you click on on the native tunnel url generated .
App Demo
For demo we use this PPT on Swach Bharat by authorities right here. Beneath is the glimpse of PPT together with its corresponding Abstract.
Abstract
Title: Swachhata Augmentation via Company Serving to Arms
The Swachhata Augmentation via Company Serving to Arms is a program that encourages firms to contribute to the cleanliness of city areas. This system has been profitable in participating firms in varied actions akin to waste administration, sanitation, and consciousness era. As of March 2023, this system has engaged over 100 firms and resulted within the assortment of over 1000 metric tons of waste. This system has additionally helped in creating consciousness concerning the significance of cleanliness and sanitation among the many common public. This system is an effective instance of how the federal government and the non-public sector can work collectively to realize frequent objectives.
Abstract
Our City Sanitation and Waste Administration Actuality
India loses 54 billion USD per yr as a result of insufficient sanitation. Poor sanitation and hygiene trigger 100,000 baby deaths per yr in India. Over 12% of city households in India defecate within the open. Solely 21.5% of the stable waste generated in India is processed.
Abstract
Title: Communities contribute when efforts are most seen.
Whole CSR funds spent in 2014–15 have been INR 14626 crore. Nearly all of the funds have been spent on poverty and healthcare (INR 14217 crore), adopted by ability growth and livelihood (INR 1462.6 crore).
The funds have been spent on varied sectors akin to poverty and healthcare, ability growth and livelihood, atmosphere, and Swachh Bharat Kosh.
The quantity spent on the atmosphere was INR 1188.7 crore. The quantity spent on Swachh Bharat Kosh was INR 42.6 crore.
Remark
As we will observe in slide 3 there was solely graphs and the mannequin was precisely in a position to extract the numbers and generate coherent data.
Conclusion
The Gemini collection stands as a robust software for revolutionizing the realm of AI. Its versatility throughout pure language understanding, code interpretation, picture evaluation, audio processing, and video comprehension units it aside, promising cutting-edge efficiency. The article not solely introduces Gemini Professional but additionally takes us via the sensible software of constructing a PowerPoint summarizer utilizing the Streamlit framework and Gemini API. With this mix, the method turns into streamlined, making displays not simply visually interesting but additionally effortlessly understandable.
Key Takeaways
- Explored the three variants – Nano, Professional, and Extremely – showcasing Google’s versatile multimodal fashions protecting pure language, code, picture, audio, and video processing.
- Demonstrated the sensible software of Gemini Professional and the Streamlit framework to create a PowerPoint summarizer, permitting for visually interesting and simply understandable displays.
- Coated the initialization of Gemini Professional, saving and changing PPT slides to photographs, and producing coherent summaries utilizing immediate engineering.
Regularly Requested Questions
A. At present they’re free to make use of they usually launched it on thirteenth Dec for developer entry and in future it could be charged. View pricing particulars right here.
A. Sure Gemini is accessible of their Vertex AI providing in Google Cloud Platform pattern tutorials and notebooks obtainable right here
A. Sure, API request is presently restricted to 60 requests per minute
A. At present Lang Chain’s separate bundle providing for Google Gemini integration doesn’t assist any LLM Chains.
A. No, presently solely Gemini Professional variant of mannequin is accessible for public entry.
The media proven on this article will not be owned by Analytics Vidhya and is used on the Creator’s discretion.