Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    AI updates from the previous week: Anthropic launches Claude 4 fashions, OpenAI provides new instruments to Responses API, and extra — Might 23, 2025

    May 23, 2025

    Crypto Sniper Bot Improvement: Buying and selling Bot Information

    May 23, 2025

    Upcoming Kotlin language options teased at KotlinConf 2025

    May 22, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    TC Technology NewsTC Technology News
    • Home
    • Big Data
    • Drone
    • Software Development
    • Software Engineering
    • Technology
    TC Technology NewsTC Technology News
    Home»Big Data»Tips on how to Construct Your Private AI Assistant with Huggingface SmolLM
    Big Data

    Tips on how to Construct Your Private AI Assistant with Huggingface SmolLM

    adminBy adminJuly 26, 2024Updated:July 26, 2024No Comments13 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Tips on how to Construct Your Private AI Assistant with Huggingface SmolLM
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Tips on how to Construct Your Private AI Assistant with Huggingface SmolLM


    Introduction

    Within the not-so-distant previous, the concept of getting a private AI assistant felt like one thing out of a sci-fi film. Image a tech-savvy inventor named Alex, who dreamed of getting a sensible companion to reply questions and supply insights, with out counting on the cloud or third-party servers. With developments in small language fashions (SLMs), Alex’s dream grew to become a actuality. This text will take you on Alex’s journey to construct an AI Chat CLI utility utilizing Huggingface’s modern SmolLM mannequin. We’ll mix the facility of SmolLM with LangChain’s flexibility and Typer’s user-friendly interface. By the top, you’ll have a purposeful AI assistant, similar to Alex, able to chatting, answering queries, and saving conversations—all out of your terminal. Let’s dive into this thrilling new world of on-device AI and see what you’ll be able to create.

    Studying Outcomes

    • Perceive Huggingface SmolLM fashions and their functions.
    • Leverage SLM fashions for on-device AI functions.
    • Discover Grouped-Question Consideration and its function in SLM structure.
    • Construct interactive CLI functions utilizing the Typer and Wealthy libraries.
    • Combine Huggingface fashions with LangChain for sturdy AI functions.

    This text was printed as part of the Knowledge Science Blogathon.

    What’s Huggingface SmolLM?

    SmolLM is a collection of state-of-the-art small language fashions obtainable in three sizes 135M, 360M, and 1.7B parameters. These fashions are constructed on a high-quality coaching corpus named Cosmopedia V2 which is the gathering of artificial textbooks and tales generated by Mixtral (28B tokens), Python-Edu instructional Python samples from The Stack (4B tokens), and FineWeb-Edu, an academic internet samples from FineWeb(220B tokens)based on Huggingface these fashions outperform different fashions within the measurement classes throughout a numerous benchmark, testing widespread sense causes, and world information.

    Efficiency Comparability Chart

    Performance Comparison Chart

    It makes use of 5000 matters belonging to 51 classes generated utilizing Mixtral to create subtopics for sure matters, and the ultimate distribution of subtopics is beneath:

    Histogram

    The structure of 135M and 260M parameter fashions, makes use of a design much like MobileLLM, incorporating Grouped-Question Consideration (GQA) and prioritizing depth over width.

    What’s Grouped-Question-Consideration?

    There are three sorts of Consideration structure:

    grouped query attention
    • Multi-Head Consideration (MHA): Every consideration head has its personal impartial question, key, and worth heads. That is computationally costly, particularly for giant fashions.
    • Multi-Question Consideration (MQA): Shares key and worth heads throughout all consideration heads, however every head has its question, That is extra environment friendly than MHA however can nonetheless be computationally intensive.
    • Group-Question Consideration(GQA): Think about you’ve gotten a workforce engaged on an enormous undertaking. As an alternative of each workforce member working independently, you resolve to type smaller teams. Every group will share some instruments and sources. That is much like what Grouped-Question Consideration (GQA) does in a Generative Mannequin Constructing.

    Understanding Grouped-Question Consideration (GQA)

    GQA is a method utilized in fashions to course of data extra effectively. It divides the mannequin’s consideration heads into teams. Every group shares a set of key and worth heads. That is totally different from conventional strategies the place every consideration head has its personal key and worth heads.

    grouped query

    Key Factors:

    • GQA-G: This implies GQA with G teams.
    • GQS-1: This can be a particular case the place there’s just one group. It’s much like one other methodology known as Multi-Question Consideration (MQA)
    • GQA-H: On this case, the variety of teams is the same as the variety of consideration heads. That is much like Multi-Head Consideration (MHA).

    Why Use GQA?

    • Pace: GQA can course of data quicker than conventional strategies in giant fashions.
    • Effectivity: It reduces the quantity of knowledge the mannequin must deal with, saving reminiscence and processing energy.
    • Stability: GQA finds a candy spot between pace and accuracy.

    By grouping consideration heads, GQA helps giant fashions work higher with out sacrificing a lot pace or accuracy.

    Tips on how to use SmolLM?

    Set up the mandatory libraries Pytorch, and Transformers utilizing pip. after which we are going to put that code into the primary.py file.

    Right here , I used SmolLM 360M instruct mannequin you should utilize larger parameter fashions similar to SmolLM-1.7B

    from transformers import AutoModelForCausalLM, AutoTokenizer
    
    checkpoint = "HuggingFaceTB/SmolLM-360M-Instruct"
    
    machine = "CPU" # GPU, if obtainable
    tokenizer = AutoTokenizer.from_pretrained(checkpoint)
    
    mannequin = AutoModelForCausalLM.from_pretrained(checkpoint)
    
    messages = [
        "role": "user", "content": "List the steps to bake a chocolate cake from scratch."
    ]
    
    input_text = tokenizer.apply_chat_template(messages, tokenize=False)
    
    print(input_text)
    
    inputs = tokenizer.encode(input_text, return_tensors="pt").to("cpu")
    outputs = mannequin.generate(
        inputs, max_new_tokens=100, temperature=0.6, top_p=0.92, do_sample=True
    )
    print(tokenizer.decode(outputs[0]))

    Output:

    output

    What’s Typer?

    Typer is a library for constructing Command Line (CLI) functions. It was constructed by Tiangolo who developed the extremely performant Python internet framework FastAPI. Typer for the CLI as FastAPI for the net.

    What are the advantages of utilizing it?

    • Consumer-Pleasant and Intuitive:
      • Simple to Write: Because of wonderful editor help and code completion all over the place, you’ll spend much less time debugging and studying documentation.
      • Easy for Customers: Computerized assist and completion for all shells make it simple for finish customers.
    • Environment friendly:
      • Concise Code: decrease code duplication with a number of options from every parameter declaration. resulting in fewer bugs.
      • Begin Small: You may get began with simply 2 traces of code: one import and one perform name.
    • Scalable:
      • Develop a Wanted: Enhance complexity as a lot as you need, creating advanced command bushes and subcommands with choices and arguments.
    • Versatile:
      • Run Scripts: Typer features a command/program to run scripts, routinely changing them to CLIs, even when they don’t use Typer internally.

    Tips on how to use Typer? 

    A easy Hi there CLI utilizing Typer. First, set up Typer utilizing pip.

    $ pip set up typer

    Now create a principal.py file and kind beneath code

    import typer
    
    app = typer.Typer()
    @app.command()
    def principal(title: str):
        print(f"Hi there title")
    if __name__ == "__main__":
        app()

    Within the above code, we first import Typer after which create an app utilizing “typer.Typer()” methodology.

    The @app.command() is a decorator, decorator in Python does one thing(user-defined) with the perform on which it’s positioned. Right here, in Typer, it makes the primary() right into a command.

    Output:

    FIsrt with –assist argument and the with –title argument.

    output:  Huggingface SmolLM
    output:  Huggingface SmolLM

    Setting Up Undertaking

    To get began with our Private AI Chat CLI utility, we have to arrange our growth surroundings and set up the mandatory dependencies. Right here’s the way to do it

    Create a Conda Atmosphere

    # Create a conda env
    $ conda create --name slmdev python=3.11
    
    # Begin your env
    $ conda activate slmdev

    Create a brand new listing for the undertaking

    $ mkdir personal-ai-chat
    $ cd personal-ai-chat

    Set up the required packages

    pip set up langchain huggingface_hub trabsformers torch wealthy

    Implementing the Chat Software

    First, create a principal.py file in your undertaking listing.

    Let’s import the mandatory modules and initialize our utility.

    import typer
    from langchain_huggingface.llms import HuggingFacePipeline
    from langchain.prompts import PromptTemplate
    from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
    import torch
    from wealthy.console import Console
    from wealthy.panel import Panel
    from wealthy.markdown import Markdown
    import json
    from typing import Record, Dict
    
    app = typer.Typer()
    console = Console()

    Now, we are going to arrange our SmolLM mannequin and a text-generation pipeline

    # Initialize smolLM mannequin
    model_name = "HuggingFaceTB/SmolLM-360M-Instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    mannequin = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)
    
    # Create a text-generation pipeline
    pipe = pipeline(
        "text-generation",
        mannequin=mannequin,
        tokenizer=tokenizer,
        max_length=256,
        truncation=True,
        temperature=0.7,
        do_sample=True,
        repetition_penalty=1.2,
    )
    
    # Create a LangChain LLM
    llm = HuggingFacePipeline(pipeline=pipe)
    

    Within the above code, We set our mannequin title SmolLM 360M Instruct, and use AutoTokenizer for tokenization. After that, we provoke the mannequin utilizing Huggingface AutoModelForCasualLM.

    Then we arrange a HuggingFace Pipeline for working the llm.

    Crafting the Immediate Template and LangChain

    Now we have now to create a immediate template for our help. On this utility, we are going to devise a concise and informative reply immediate.

    # Create a immediate template
    template = """
    You're a useful assistant. Present a concise and informative reply to the next question:
    
    Question: question
    
    Reply:
    """
    
    immediate = PromptTemplate(template=template, input_variables=["query"])
    
    # Create a LangChain
    chain = immediate | llm

    If in case you have adopted me until now, congratulations.

    Now we are going to implement the core performance of our utility.

    Create a Operate known as generate_response

    def generate_response(question: str) -> str:
        attempt:
            with console.standing("Pondering...", spinner="dots"):
                response = chain.invoke(question)
            return response
        besides Exception as e:
            print(f"An error occurred: e")
            return "Sorry, I encountered a problem. Please attempt rephrasing your question." 

    On this perform, we are going to create a console standing that can show a loading message “Pondering..” and a spinner animation whereas a response is being generated. This supplies visible suggestions to the consumer.

    Then we are going to name langchain’s “chain.invoke” methodology to cross the consumer’s question as enter. This can question the smolLM and produce a response.

    Within the exception block deal with any exception which may come up through the response technology course of.

    Producing Responses and Dealing with Conversations

    Subsequent, create a perform for saving the conversations.

    def save_conversation(dialog: Record[Dict[str, str]]):
        """Save the dialog historical past to a JSON file."""
        filename = typer.immediate(
            "Enter a filename to save lots of the dialog (with out extension)"
        )
        attempt:
            with open(f"filename.json", "w") as f:
                json.dump(dialog, f, indent=2)
            console.print(f"Dialog saved to filename.json", fashion="inexperienced")
        besides Exception as e:
            print(f"An error occurred whereas saving: e")

    Within the above code snippets, we are going to create a conversation-saving perform. Right here, the consumer can enter a filename, and the perform will save all of the dialog right into a JSON file.

    Implementing the CLI Software Command

    ## Code Block 1
    
    @app.command()
    def begin():
        console.print(Panel.match("Hello, I am your Private AI!", fashion="daring magenta"))
        dialog = []
        
        ## Code Block 2
        whereas True:
            console.print("How might I enable you to?", fashion="cyan")
            question = typer.immediate("You")
    
            if question.decrease() in ["exit", "quit", "bye"]:
                break
    
            response = generate_response(question)
            dialog.append("function": "consumer", "content material": question)
            dialog.append("function": "assistant", "content material": response)
    
            console.print(Panel(Markdown(response), title="Assistant", develop=False))
            
            ## Code Block 3
            whereas True:
                console.print(
                    "nCHoose an motion:",
                    fashion="daring yellow",
                )
                console.print(
                    "1. follow-upn2. new-queryn3. end-chatn4. save-and-exit",
                    fashion="yellow",
                )
                motion = typer.immediate("Enter the nuber akin to your selection.")
    
                if motion == "1":
                    follow_up = typer.immediate("Observe-up query")
                    question = follow_up.decrease()
                    response = generate_response(question)
    
                    dialog.append("function": "consumer", "content material": question)
                    dialog.append("function": "assistant", "content material": response)
    
                    console.print(
                        Panel(Markdown(response), title="Assistant", develop=False)
                    )
                elif motion == "2":
                    new_query = typer.immediate("New question")
                    question = new_query.decrease()
                    response = generate_response(question)
    
                    dialog.append("function": "consumer", "content material": question)
                    dialog.append("function": "assistant", "content material": response)
    
                    console.print(
                        Panel(Markdown(response), title="Assistant", develop=False)
                    )
                elif motion == "3":
                    return
                elif motion == "4":
                    save_conversation(dialog)
                    return
                else:
                    console.print(
                        "Invalid selection. Please select a sound choice.", fashion="pink"
                    )
                    
        ## Code Block 4  
        if typer.affirm("Would you want to save lots of this dialog?"):
            save_conversation(dialog)
    
        console.print("Good Bye!! Completely satisfied Hacking", fashion="daring inexperienced")

    Code Block 1

    Introduction and welcome message, right here

    • The code begins with a ” begin ” perform triggered if you run the applying. the decorator “@app.command” makes this begin perform right into a command in our CLI utility.
    • It shows colourful welcome messages utilizing a library known as Wealthy.

    Code Block 2

    The primary dialog loop, right here

    • The code enters a loop that continues till you exit the dialog
    • Inside that loop
      • It asks you “How might I enable you to?” utilizing the colour immediate.
      • It captures your question utilizing “typer.immediate” and converts it into lowercase
      • It additionally checks in case your question is an exit command like “exit”, “stop” or “bye”. If that’s the case it exits the loop.
      • In any other case, it calls the “generate_response” perform to course of your question and get a response.
      • It shops your question and response within the dialog historical past.
      • It shows the assistant’s response in a formatted field utilizing libraries like Wealthy’s console and Markdown.

    Output:

    output:  Huggingface SmolLM

    Code Block 3

    Dealing with Consumer Alternative

    • Right here on this whereas loop, a lot of the issues are the identical as earlier than, the one distinction is you could select a distinct choice for additional dialog, similar to follow-up questions, new question, finish chat, saving the dialog, and many others.

    Code Block 4

    Saving the dialog and farewell message

    Right here, the assistant will ask you to save lots of your chat historical past as a JSON file for additional evaluation. Ask you for a file title with out “.json”. and save your historical past in your root listing.

    After which a farewell message for you.

    Output:

    output: Huggingface SmolLM

    Output of Saved File:

    Output:  Huggingface SmolLM

    Conclusion

    Constructing your individual Private AI CLI utility utilizing Huggingface SmoLM is greater than only a enjoyable undertaking. It’s the gateway to understanding and making use of cutting-edge AI applied sciences in a sensible, accessible means. By way of this text, we have now explored the way to use the facility of compact SLM to create an interesting consumer interface proper in your terminal.

    All of the code used on this Private AI CLI

    Key Takeaways

    • Article demonstrates that constructing a private AI assistant is inside attain for builders of assorted talent ranges. and {hardware} ranges.
    • By using SmolLM, a compact but succesful language mannequin, the undertaking reveals the way to create an AI chat utility that doesn’t require heavy computational sources and makes it appropriate for small, low-power {hardware}.
    • The undertaking showcases the facility of integrating totally different applied sciences to create a purposeful feature-rich utility.
    • By way of the usage of Typer and Wealthy libraries, the article emphasizes the significance of making an intuitive and visually interesting command-line interface, enhancing consumer expertise even in a text-based surroundings.

    Ceaselessly Requested Questions

    Q1. Can I customise the AI’s response or practice it on my knowledge?

    A. Sure, You possibly can tweak the immediate template to your domain-specific selection of help. Experiment with immediate, and mannequin parameters like temperature and max_lenght to regulate the fashion responses. For, coaching together with your knowledge you need to attempt PEFT fashion coaching similar to LORA, or you should utilize RAG sort utility to make use of your knowledge instantly with out altering mannequin weights.

    Q2. Is that this Private AI chat safe for dealing with delicate data?

    A. This private AI chat is designed for Native Use, so your knowledge stays private so long as you replace your mannequin weight by fine-tuning it together with your knowledge. As a result of whereas fine-tuning in case your coaching knowledge comprise any private data it can have an imprint on the mannequin weight. So, watch out and sanitize your dataset earlier than fine-tuning.

    Q3. How does the SmolLM mannequin examine to a bigger language mannequin like GPT-3?

    A. SLM fashions are constructed utilizing high-quality coaching knowledge for small units, it has solely 100M to 3B parameters, whereas LLM is educated on and for giant computationally heavy {hardware} and consists of 100B to trillions of parameters. These LLMs are coaching on knowledge obtainable on the web. So, SML won’t compete with the LLM in width and depth. However SLM efficiency is effectively in its measurement classes.

    The media proven on this article just isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.



    Supply hyperlink

    Post Views: 67
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Do not Miss this Anthropic’s Immediate Engineering Course in 2024

    August 23, 2024

    Healthcare Know-how Traits in 2024

    August 23, 2024

    Lure your foes with Valorant’s subsequent defensive agent: Vyse

    August 23, 2024

    Sony Group and Startale unveil Soneium blockchain to speed up Web3 innovation

    August 23, 2024
    Add A Comment

    Leave A Reply Cancel Reply

    Editors Picks

    AI updates from the previous week: Anthropic launches Claude 4 fashions, OpenAI provides new instruments to Responses API, and extra — Might 23, 2025

    May 23, 2025

    Crypto Sniper Bot Improvement: Buying and selling Bot Information

    May 23, 2025

    Upcoming Kotlin language options teased at KotlinConf 2025

    May 22, 2025

    Mojo and Constructing a CUDA Substitute with Chris Lattner

    May 22, 2025
    Load More
    TC Technology News
    Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
    • About Us
    • Contact Us
    • Disclaimer
    • Privacy Policy
    • Terms and Conditions
    © 2025ALL RIGHTS RESERVED Tebcoconsulting.

    Type above and press Enter to search. Press Esc to cancel.