Magika 1.0 Goes Stable As Google Rebuilds Its File Detection Tool In Rust (googleblog.com) 10

Posted by BeauHD on Thursday November 06, 2025 @07:23PM from the powered-by-Rust dept.

BrianFagioli writes: Google has released Magika 1.0, a stable version of its AI-based file type detection tool, and rebuilt the entire engine in Rust for speed and memory safety. The system now recognizes more than 200 file types, up from about 100, and is better at distinguishing look-alike formats such as JSON vs JSONL, TSV vs CSV, C vs C++, and JavaScript vs TypeScript. The team used a 3TB training dataset and even relied on Gemini to generate synthetic samples for rare file types, allowing Magika to handle formats that don't have large, publicly available corpora. The tool supports Python and TypeScript integrations and offers a native Rust command-line client.

Under the hood, Magika uses ONNX Runtime for inference and Tokio for parallel processing, allowing it to scan around 1,000 files per second on a modern laptop core and scale further with more CPU cores. Google says this makes Magika suitable for security workflows, automated analysis pipelines, and general developer tooling. Installation is a single curl or PowerShell command, and the project remains fully open source. The project is available on GitHub and documentation can be found here.

Magika 1.0 Goes Stable As Google Rebuilds Its File Detection Tool In Rust

Post Load All Comments

Search 10 Comments Log In/Create an Account

Comments Filter:

So... Garbage in, garbage out, again? (Score:2)

by devslash0 ( 4203435 ) writes:

Using AI to train AI. What could go wrong?
What is that for? (Score:4, Insightful)

by test321 ( 8891681 ) writes: on Thursday November 06, 2025 @08:26PM (#65778940)

I'm really curious of the use scenario for an AI-reimplementation of the Unix "file" command that can process "scan around 1,000 files per second on a modern laptop core". Who needs to speed up their existing bash scripts to analyse the filetype of over 1000 files per second on a laptop?

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by flux ( 5274 ) writes:
  
  An implementation of updatedb could make use of this, allowing to search files by their identified type.
100MB version of mimetype (Score:3)

by Gravis Zero ( 934156 ) writes: on Thursday November 06, 2025 @08:35PM (#65778964)

I understand this is slightly more granular but damn it comes at a high cost in resources. While people may be able to look the other way at 100MB to replace a 10KB program, the computational cost is sooo MUCH higher.

Reply to This Share
Flag as Inappropriate
Why? (Score:2)

by belg4mit ( 152620 ) writes:

What's the point? File types are well defined. Even if you feel the distinguish between "look-alikes" why wouldn't you use something that understands the the basic common types first, and then do your "AI" nonsense to detect dialects for those where it's relevant?
o rly? (Score:3)

by Grady Martin ( 4197307 ) writes: on Thursday November 06, 2025 @10:27PM (#65779142)

$ find kernel.org/linux/include/linux/ -maxdepth 1 -type f | wc -l 1235 $ time file --mime-type -b -- kernel.org/linux/include/linux/* > /dev/null real 0m0.690s user 0m0.581s sys 0m0.033s

Reply to This Share
Flag as Inappropriate
Electricity cost per file idenfied (Score:2)

by will4 ( 7250692 ) writes:

Two Asks:
- Need to know the electricity cost per each file identified comparing the old system versus the new system.
- Need to know the relative frequency of each of the file formats, an important benchmark for others building systems needing to process lots of file types.
For example, the percent of files that can be 100% identified by doing a binary compare of the first 10 bytes of the file for the magic number versus the cost in cpu instructions and electricity for an AI classification of those same files
This is a fine example (Score:4)

by dskoll ( 99328 ) writes: on Thursday November 06, 2025 @10:46PM (#65779174) Homepage

This is a fine example of a completely pointless, stupid, BRAIN-FUCKING-DEAD application of AI. The UNIX file command has existed for decades. It works well and it's fast.
But for "security" purposes, we need to make a program 1000x as big, 100x as slow, and that uses orders of magnitude more electricity so our email security software can distinguish text/json from text/jsonl
We have truly jumped the AI shark.

Reply to This Share
Flag as Inappropriate
How does Magika detect file type (Score:2)

by Mirnotoriety ( 10462951 ) writes:

Magika's deep learning system processes three sequences of 512 bytes, from the start, middle, and end of each file. Allowing it to pick up unique structural, semantic, and content cues associated with different formats.
Not sure I see the point (Score:2)

by DrXym ( 126579 ) writes:

The best way to detect file types is to make them unambiguously what they are by means of content type, file extension and structure. And fuck any format that decides to be something vague or ambiguous - they brought that problem on themselves.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Magika 1.0 Goes Stable As Google Rebuilds Its File Detection Tool In Rust (googleblog.com) 10

Magika 1.0 Goes Stable As Google Rebuilds Its File Detection Tool In Rust More | Reply Login

Magika 1.0 Goes Stable As Google Rebuilds Its File Detection Tool In Rust

So... Garbage in, garbage out, again? (Score:2)

What is that for? (Score:4, Insightful)

Re: (Score:2)

100MB version of mimetype (Score:3)

Why? (Score:2)

o rly? (Score:3)

Electricity cost per file idenfied (Score:2)

This is a fine example (Score:4)

How does Magika detect file type (Score:2)

Not sure I see the point (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot