Close Menu
AI News TodayAI News Today

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Xbox owners can now disable Quick Resume for specific games

    Elon Musk Seemingly Admits xAI Has Used OpenAI’s Models to Train Its Own

    RFK Jr. appeals ruling that wiped out his vaccine advisory panel

    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    Facebook X (Twitter) Instagram Pinterest Vimeo
    AI News TodayAI News Today
    • Home
    • Shop
    • AI News
    • AI Reviews
    • AI Tools
    • AI Tutorials
    • Chatbots
    • Free AI Tools
    AI News TodayAI News Today
    Home»AI News»This startup’s new mechanistic interpretability tool lets you debug LLMs
    AI News

    This startup’s new mechanistic interpretability tool lets you debug LLMs

    By No Comments3 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    This startup’s new mechanistic interpretability tool lets you debug LLMs
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Mapping models

    Silico lets you zoom in on specific parts of a trained model, such as individual neurons or groups of neurons, and run experiments to see what those neurons do. (Assuming you have access to the model’s inner workings. Most people won’t be able to use Silico to poke around inside ChatGPT or Gemini, but you can use it to look at the parameters inside many open-source models.) You can then check what inputs make different neurons fire, and trace pathways upstream and downstream of a neuron to see how other neurons affect it and how it affects other neurons in turn.

    For example, Goodfire found one neuron inside the open-source model Qwen 3 that was associated with the so-called trolley problem. Activating this neuron changed the model’s responses, making it frame its outputs as explicit moral dilemmas. “When this neuron’s active, all sorts of weird things happen,” says Ho.

    Pinpointing the source of odd behavior like this is now pretty standard practice. But Goodfire wants to make it easier to adjust that behavior. Using Silico, developers can now adjust the parameters connected to individual neurons to boost or suppress certain behaviors.

    In another example, Goodfire researchers asked a model whether a company should disclose that its AI behaves deceptively in 0.3% of cases, affecting 200 million users. The model said no, citing the negative business impact of such a disclosure.

    By looking inside the model, the researchers found that boosting neurons that were found to be associated with transparency and disclosure flipped the answer from no to yes nine out of 10 times. “The model already had the ethical reasoning circuitry, but it was being outweighed by the commercial risk assessment,” says Ho.

    Tweaking the values of a model in this way is just one approach. Silico can also help steer the training process by filtering out certain training data to avoid setting unwanted values for certain parameters in the first place.   

    For example, many models will tell you that 9.11 is greater than 9.9. Looking inside a model to see what’s going on might reveal that it is being influenced by neurons associated with the Bible, in which verse 9.9 comes before 9.11, or by code repositories where consecutive updates are numbered 9.9, 9.10, 9.11 and so on. Using this information, the model can be retrained to make it avoid its “Bible” neurons when doing math.

    By releasing Silico, Goodfire wants to put techniques previously available to a few top labs into the hands of smaller firms and research teams that want to build their own model or adapt an open-source one. The tool will be available for a fee determined on a case-by-case basis according to customers’ requirements (Goodfire declined to give specific pricing details).

    debug interpretability Lets LLMs mechanistic startups tool
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleIn motorsport, there's nowhere to hide as AI becomes new CFD tool
    Next Article Uber taps Hertz to clean, charge, and fix its Lucid Motors robotaxis
    • Website

    Related Posts

    AI News

    RFK Jr. appeals ruling that wiped out his vaccine advisory panel

    Chatbots

    In motorsport, there's nowhere to hide as AI becomes new CFD tool

    AI News

    SpaceX backer 137 Ventures raises $700M for two growth-stage funds

    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Xbox owners can now disable Quick Resume for specific games

    0 Views

    Elon Musk Seemingly Admits xAI Has Used OpenAI’s Models to Train Its Own

    0 Views

    RFK Jr. appeals ruling that wiped out his vaccine advisory panel

    0 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    AI Tutorials

    Quantization from the ground up

    AI Tools

    David Sacks is done as AI czar — here’s what he’s doing instead

    AI Reviews

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Most Popular

    Xbox owners can now disable Quick Resume for specific games

    0 Views

    Elon Musk Seemingly Admits xAI Has Used OpenAI’s Models to Train Its Own

    0 Views

    RFK Jr. appeals ruling that wiped out his vaccine advisory panel

    0 Views
    Our Picks

    Quantization from the ground up

    David Sacks is done as AI czar — here’s what he’s doing instead

    Judge sides with Anthropic to temporarily block the Pentagon’s ban

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Terms & Conditions
    • Privacy Policy
    • Disclaimer

    © 2026 ainewstoday.co. All rights reserved. Designed by DD.

    Type above and press Enter to search. Press Esc to cancel.