Close Menu
AI News TodayAI News Today

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Casely has reannounced a power bank recall from 2025 following a fatality

    How robots learn: A brief, contemporary history

    Today’s NYT Strands Hints, Answer and Help for April 17 #775

    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook X (Twitter) Instagram Pinterest Vimeo
    AI News TodayAI News Today
    • Home
    • Shop
    • AI News
    • AI Reviews
    • AI Tools
    • AI Tutorials
    • Chatbots
    • Free AI Tools
    AI News TodayAI News Today
    Home»AI Tools»How to Implement Tool Calling with Gemma 4 and Python
    AI Tools

    How to Implement Tool Calling with Gemma 4 and Python

    By No Comments11 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    How to Implement Tool Calling with Gemma 4 and Python
    Share
    Facebook Twitter LinkedIn Pinterest Email

    In this article, you will learn how to build a local, privacy-first tool-calling agent using the Gemma 4 model family and Ollama.

    Topics we will cover include:

    • An overview of the Gemma 4 model family and its capabilities.
    • How tool calling enables language models to interact with external functions.
    • How to implement a local tool calling system using Python and Ollama.

    How to Implement Tool Calling with Gemma 4 and Python
    Image by Editor

    Introducing the Gemma 4 Family

    The open-weights model ecosystem shifted recently with the release of the Gemma 4 model family. Built by Google, the Gemma 4 variants were created with the intention of providing frontier-level capabilities under a permissive Apache 2.0 license, enabling machine learning practitioners complete control over their infrastructure and data privacy.

    The Gemma 4 release features models ranging from the parameter-dense 31B and structurally complex 26B Mixture of Experts (MoE) to lightweight, edge-focused variants. More importantly for AI engineers, the model family features native support for agentic workflows. They have been fine-tuned to reliably generate structured JSON outputs and natively invoke function calls based on system instructions. This transforms them from “fingers crossed” reasoning engines into practical systems capable of executing workflows and conversing with external APIs locally.

    Tool Calling in Language Models

    Language models began life as closed-loop conversationalists. If you asked a language model for real-world sensor reading or live market rates, it could at best apologize, and at worst, hallucinate an answer. Tool calling, aka function calling, is the foundational architecture shift required to fix this gap.

    Tool calling serves as the bridge that can help transform static models into dynamic autonomous agents. When tool calling is enabled, the model evaluates a user prompt against a provided registry of available programmatic tools (supplied via JSON schema). Rather than attempting to guess the answer using only internal weights, the model pauses inference, formats a structured request specifically designed to trigger an external function, and awaits the result. Once the result is processed by the host application and handed back to the model, the model synthesizes the injected live context to formulate a grounded final response.

    The Setup: Ollama and Gemma 4:E2B

    To build a genuinely local, private-first tool calling system, we will use Ollama as our local inference runner, paired with the gemma4:e2b (Edge 2 billion parameter) model.

    The gemma4:e2b model is built specifically for mobile devices and IoT applications. It represents a paradigm shift in what is possible on consumer hardware, activating an effective 2 billion parameter footprint during inference. This optimization preserves system memory while achieving near-zero latency execution. By executing entirely offline, it removes rate limits and API costs while preserving strict data privacy.

    Despite this incredibly small size, Google has engineered gemma4:e2b to inherit the multimodal properties and native function-calling capabilities of the larger 31B model, making it an ideal foundation for a fast, responsive desktop agent. It also allows us to test for the capabilities of the new model family without requiring a GPU.

    The Code: Setting Up the Agent

    To orchestrate the language model and the tool interfaces, we will rely on a zero-dependency philosophy for our implementation, leveraging only standard Python libraries like urllib and json, ensuring maximum portability and transparency while also avoiding bloat.

    The complete code for this tutorial can be found at this GitHub repository.

    The architectural flow of our application operates in the following way:

    1. Define local Python functions that act as our tools
    2. Define a strict JSON schema that explains to the language model exactly what these tools do and what parameters they expect
    3. Pass the user’s query and the tool registry to the local Ollama API
    4. Catch the model’s response, identify if it requested a tool call, execute the corresponding local code, and feed the answer back

    Building the Tools: get_current_weather

    Let’s dive into the code, keeping in mind that our agent’s capability rests on the quality of its underlying functions. Our first function is get_current_weather, which reaches out to the open-source Open-Meteo API to resolve real-time weather data for a specific location.

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    22

    23

    24

    25

    26

    27

    28

    29

    30

    31

    32

    33

    34

    35

    36

    37

    def get_current_weather(city: str, unit: str = “celsius”) -> str:

        “”“Gets the current temperature for a given city using open-meteo API.”“”

        try:

            # Geocode the city to get latitude and longitude

            geo_url = f“https://geocoding-api.open-meteo.com/v1/search?name={urllib.parse.quote(city)}&count=1”

            geo_req = urllib.request.Request(geo_url, headers={‘User-Agent’: ‘Gemma4ToolCalling/1.0’})

            with urllib.request.urlopen(geo_req) as response:

                geo_data = json.loads(response.read().decode(‘utf-8’))

     

            if “results” not in geo_data or not geo_data[“results”]:

                return f“Could not find coordinates for city: {city}.”

     

            location = geo_data[“results”][0]

            lat = location[“latitude”]

            lon = location[“longitude”]

            country = location.get(“country”, “”)

     

            # Fetch the weather

            temp_unit = “fahrenheit” if unit.lower() == “fahrenheit” else “celsius”

            weather_url = f“https://api.open-meteo.com/v1/forecast?latitude={lat}&longitude={lon}&current=temperature_2m,wind_speed_10m&temperature_unit={temp_unit}”

            weather_req = urllib.request.Request(weather_url, headers={‘User-Agent’: ‘Gemma4ToolCalling/1.0’})

            with urllib.request.urlopen(weather_req) as response:

                weather_data = json.loads(response.read().decode(‘utf-8’))

     

            if “current” in weather_data:

                current = weather_data[“current”]

                temp = current[“temperature_2m”]

                wind = current[“wind_speed_10m”]

                temp_unit_str = weather_data[“current_units”][“temperature_2m”]

                wind_unit_str = weather_data[“current_units”][“wind_speed_10m”]

     

                return f“The current weather in {city.title()} ({country}) is {temp}{temp_unit_str} with wind speeds of {wind}{wind_unit_str}.”

            else:

                return f“Weather data for {city} is unavailable from the API.”

     

        except Exception as e:

            return f“Error fetching weather for {city}: {e}”

    This Python function implements a two-stage API resolution pattern. Because standard weather APIs typically require strict geographical coordinates, our function transparently intercepts the city string provided by the model and geocodes it into latitude and longitude coordinates. With the coordinates formatted, it invokes the weather forecast endpoint and constructs a concise natural language string representing the telemetry point.

    However, writing the function in Python is only half the execution. The model needs to be informed visually about this tool. We do this by mapping the Python function into an Ollama-compliant JSON schema dictionary:

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

        {

            “type”: “function”,

            “function”: {

                “name”: “get_current_weather”,

                “description”: “Gets the current temperature for a given city.”,

                “parameters”: {

                    “type”: “object”,

                    “properties”: {

                        “city”: {

                            “type”: “string”,

                            “description”: “The city name, e.g. Tokyo”

                        },

                        “unit”: {

                            “type”: “string”,

                            “enum”: [“celsius”, “fahrenheit”]

                        }

                    },

                    “required”: [“city”]

                }

            }

        }

    This rigid structural blueprint is critical, as it explicitly details variable expectations, strict string enums, and required parameters, all of which guide the gemma4:e2b weights into reliably generating syntax-perfect calls.

    Tool Calling Under the Hood

    The core of the autonomous workflow happens primarily inside the main loop orchestrator. Once a user issues a prompt, we establish the initial JSON payload for the Ollama API, explicitly linking gemma4:e2b and appending the global array containing our parsed toolkit.

        # Initial payload to the model

        messages = [{“role”: “user”, “content”: user_query}]

        payload = {

            “model”: “gemma4:e2b”,

            “messages”: messages,

            “tools”: available_tools,

            “stream”: False

        }

     

        try:

            response_data = call_ollama(payload)

        except Exception as e:

            print(f“Error calling Ollama API: {e}”)

            return

     

        message = response_data.get(“message”, {})

    Once the initial web request resolves, it is critical that we evaluate the architecture of the returned message block. We are not blindly assuming text exists here. The model, aware of the active tools, will signal its desired outcome by attaching a tool_calls dictionary.

    If tool_calls exist, we pause the standard synthesis workflow, parse the requested function name out of the dictionary block, execute the Python tool with the parsed kwargs dynamically, and inject the returned live data back into the conversational array.

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    17

    18

    19

    20

    21

    22

    23

    24

    25

    26

    27

    28

    29

    30

    31

    32

    33

    34

    35

    36

    37

    38

        # Check if the model decided to call tools

        if “tool_calls” in message and message[“tool_calls”]:

     

            # Add the model’s tool calls to the chat history

            messages.append(message)

     

            # Execute each tool call

            num_tools = len(message[“tool_calls”])

            for i, tool_call in enumerate(message[“tool_calls”]):

                function_name = tool_call[“function”][“name”]

                arguments = tool_call[“function”][“arguments”]

     

                if function_name in TOOL_FUNCTIONS:

                    func = TOOL_FUNCTIONS[function_name]

                    try:

                        # Execute the underlying Python function

                        result = func(**arguments)

     

                        # Add the tool response to messages history

                        messages.append({

                            “role”: “tool”,

                            “content”: str(result),

                            “name”: function_name

                        })

                    except TypeError as e:

                        print(f“Error calling function: {e}”)

                else:

                    print(f“Unknown function: {function_name}”)

     

            # Send the tool results back to the model to get the final answer

            payload[“messages”] = messages

            

            try:

                final_response_data = call_ollama(payload)

                print(“[RESPONSE]”)

                print(final_response_data.get(“message”, {}).get(“content”, “”)+“n”)

            except Exception as e:

                print(f“Error calling Ollama API for final response: {e}”)

    Notice the important secondary interaction: once the dynamic result is appended as a “tool” role, we bundle the messages history up a second time and trigger the API again. This second pass is what allows the gemma4:e2b reasoning engine to read the telemetry strings it previously hallucinated around, bridging the final gap to output the data logically in human terms.

    More Tools: Expanding the Tool Calling Capabilities

    With the architectural foundation complete, enriching our capabilities requires nothing more than adding modular Python functions. Using the identical methodology described above, we incorporate three additional live tools:

    1. get_current_news: Utilizing NewsAPI endpoints, this function parses arrays of global headlines based on queried keyword topics that the model identifies as contextually relevant
    2. get_current_time: By referencing TimeAPI.io, this deterministic function bridges complex real-world timezone logic and offsets back into native, readable datetime strings
    3. convert_currency: Relying on the live ExchangeRate-API, this function enables mathematical tracking and fractional conversion computations between fiat currencies

    Each capability is processed through the JSON schema registry, expanding the baseline model’s utility without requiring external orchestration or heavy dependencies.

    Testing the Tools

    And now we test our tool calling.

    Let’s start with the first function we created, get_current_weather, with the following query:

    What is the weather in Ottawa?

    What is the weather in Ottawa?

    What is the weather in Ottawa?

    You can see our CLI UI provides us with:

    • confirmation of the available tools
    • the user prompt
    • details on tool execution, including the function used, the arguments sent, and the response
    • the the language model’s response

    It appears as though we have a successful first run.

    Next, let’s try out another of our tools independently, namely convert_currency:

    Given the current currency exchange rate, how much is 1200 Canadian dollars in euros?

    Given the current currency exchange rate, how much is 1200 Canadian dollars in euros?

    Given the current currency exchange rate, how much is 1200 Canadian dollars in euros?

    More winning.

    Now, let’s stack tool calling requests. Let’s also keep in mind that we are using a 4 billion parameter model that has half of its parameters active at any one time during inference:

    I am going to France next week. What is the current time in Paris? How many euros would 1500 Canadian dollars be? what is the current weather there? what is the latest news about Paris?

    I am going to France next week...

    I am going to France next week…

    Would you look at that. All four questions answered by four different functions from the four separate tool calls. All on a local, private, incredibly small language model served by Ollama.

    I ran queries on this setup over the course of the weekend, and never once did the model’s reasoning fail. Never once. Hundreds of prompts. Admittedly, they were on the same four tools, but regardless of how vague my otherwise reasonable wording become, I couldn’t stump it.

    Gemma 4 certainly appears to be a powerhouse of a small language model reasoning engine with tool calling capabilities. I’ll be turning my attention to building out a fully agentic system next, so stay tuned.

    Conclusion

    The advent of tool calling behavior inside open-weight models is one of the more useful and practical developments in local AI of late. With the release of Gemma 4, we can operate securely offline, building complex systems unfettered by cloud and API restrictions. By architecturally integrating direct access to the web, local file systems, raw data processing logic, and localized APIs, even low-powered consumer devices can operate autonomously in ways that were previously restricted exclusively to cloud-tier hardware.

    Calling Gemma Implement Python tool
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleSony Is Making a Bloodborne Animated Movie With YouTuber JackSepticEye
    Next Article Slate raises $650 million to make its budget electric truck
    • Website

    Related Posts

    AI Tools

    Your Chunks Failed Your RAG in Production

    AI Tools

    Building My Own Personal AI Assistant: A Chronicle, Part 2

    AI Tools

    Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers

    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Casely has reannounced a power bank recall from 2025 following a fatality

    0 Views

    How robots learn: A brief, contemporary history

    0 Views

    Today’s NYT Strands Hints, Answer and Help for April 17 #775

    0 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    AI Tutorials

    Quantization from the ground up

    AI Tools

    David Sacks is done as AI czar — here’s what he’s doing instead

    AI Reviews

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Casely has reannounced a power bank recall from 2025 following a fatality

    0 Views

    How robots learn: A brief, contemporary history

    0 Views

    Today’s NYT Strands Hints, Answer and Help for April 17 #775

    0 Views
    Our Picks

    Quantization from the ground up

    David Sacks is done as AI czar — here’s what he’s doing instead

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Terms & Conditions
    • Privacy Policy
    • Disclaimer

    © 2026 ainewstoday.co. All rights reserved. Designed by DD.

    Type above and press Enter to search. Press Esc to cancel.