Excel_PRIME 🌟

Excel_Performant Reader via Interfaces for Memory Efficiency.
Without using any external libraries.
Optimised for Range extraction.

What does that mean?

Yet another Excel reader ?,
- Starting with .Net 8 as the performant Runtime (See Benchmarks)
- .Net9 gives an extra 5% boost,
- .Net10 Another 5% over .Net9 ;-)

Lets take each of the above elements and explain:

Excel 📈

Open Large 2007 (Onwards) XLSX file formats and XLSB (BIFF12) in V3.##
Zip Deflate format Only

Performant 🚀

Try to be as fast as possible, i.e.
- Forward only Lazy loading
- Only "Quick" decipher / convert of the cell(s) types to ease GC pressure
- No attempt at "creating / using" datatables with headers etc.
- Use IEnumerables with initial offset starts (Row / Column)
- Allow CancellationTokens to be used to allow page transitioning cancellation (More on this later)
Now the fastest in Real world usage 2025-11-19 onwards

Q & A's

Q: There are others that are faster
A: Agreed, but then
- They do not have range extraction.
- Or optionally allow the use of the OS's TempFile System to store massive sheets
- Or re-use of already extracted (massive) sheets
- Or allow multiple sheets to be read at the same time
  - because others use global memory to represent the current row
  - Or have a single access into the Zip Excel file

Reader 📋

Read only
- Therefore no calculations or updates to formula calls

Interfaces 🏗️

Will use the DotNet core functionality by default
But, if your target deployment allows for the use of native performant binaries, then via the use of interfaces these will be pluggable
- i.e. Using Zlib.Net for getting the data streams out of the compressed Excel file faster. (Or SharpZipLib / PowerPlayZipper)
- A faster / slimmer implementation for xml stream reading (i.e. TurboXml)
Allow the implementation of different source files (i.e. XLSB)

Q & A's

Q: Why?
A: As mentioned above, this is to allow a developer to replace with external nugets that might perform better XML speed etc.

Memory 🌐

The reason for this project, is to handle very large XSLX files (i.e. > 500K rows with > 180 columns per sheet, with multiple sheets of this size)
For ETL validation scenarios, i.e. make sure that the user modified data that has been transferred has interaction rules applied, before moving onto the T and L stages
Try not to hit / store in the LOH
No internal .Net memory of previously loaded sheets / rows.

Q & A's

Q: It appears that this uses more memory than other implementations
A: Currently yes, but it is being optimised for Range Extraction,
- AND for allowing multiple rows (With cell data) to be stored in memory at the same time, (i.e. via ToList() call);
- AND to allow multiple sheets to be read at the same time (Unlike some to of the others that use a single global memory to represent a row)
- And it appears that the current benchmarks do not extract unless a ToString and a check on the result is used (Otherwise the Jit removes the unassigned dead code)
- And, the memory used will actually be used in the ETL pipeline anyway, so it's just being truthful

Efficiency 📦

As hinted by the above statements, this is to be targetted at memory restricted environments (i.e. ASP Net VM's)
Use the OS's "Temp File" caching, so if the memory is tight then the Owner app will not have to worry about OOM exceptions, or having to use Swap Disk speeds.
Only unzip the sheet(s) when they are asked for
Only load the shared strings upto the current request number

Q & A's

Q: Sometimes the Async await s add too much overhead
A: true, that is why there are also the equivalent base interfaces that perform the same functionality without the need for the async await overheads.

Etc. 🔧

`CancellationToken`s

This is to allow the Large files to be Aborted
Make "Most" of the "Net Cores'" API's Asynchronous Tasks

IDisposable

Got to tidy up those Temp Files, and release the FileStream's

Challenges:

CellValue instances are returned to users
They must be thread-safe (multiple readers possible)
Each "Cell Type" / "Cell Instance" / " Row Instance"(string, numeric, boolean, datetime, error) have different lifecycle requirements

Caveats ⛔:

It will not be: Same sheet thread instance safe 📊

It will Not be same sheet Instance thread safe, because the xml reader will be locked (Forward only) to the sheet in use.
- but you can Open the sheet more than once, and have different threads running over it,
- And you can have Parallel threads access the Excel file
- Just remember to set Options{ AccessExcelFileInForwardOnlyMode = false}

It will not do: Dynamic Ranges ⚠️

i.e. Ones that contain formulas:
- <definedName name="Prices">OFFSET(Sheet1!$A$1,0,0,COUNTA(Sheet1!$A:$A),1)</definedName>

It will not do: Poco 🤖

A POCO / Type populator (Extensions can be written for that later)

It will not be: Writer / Modifier 📚

Totally beyond the scope of this project remit

Badge 🔄	Area
	Release build and tests

Targets 🎯

Targets 🎯

Phase 0

✅ Setup this github
✅ Create the main project
✅ Add Unit Test project
✅ Add simple Test Data

Phase Alpha

✅ Use Net Core Interface(s)
- ✅ Use ZipArchive
- ✅ Use XDocument
✅ Implement Open / Dispose (Async)
- ✅ Sheet Names
- ✅ Shared Strings
✅ Implement Sheet loading (unzip and be ready for use)
- ✅ Use XDocument as POC only
✅ Implement Row extraction
- ✅ Skip
- ✅ Delayed read - until a cell is actually needed
- ✅ Deal with Null / Empty cells (Utilise sparse array?)
- ✅ Keep last used offset (i.e. no need to reload sheet if the next range API startRow call is later)

Phase Beta - Benchmarks ⏱️

✅ Benchmarks
- ✅ Add Other "Excel readers" to the Benchmark project(s)
- 🎉 Now With Sylvan.Data.Excel
- 🎉 Now With XlsxHelper
✅ More UnitTests

Phase 1 - MVP 🔍

✅ Add IEnumerables and benchmark
✅ Implement XmlReader.Create for
✅ More Benchmarks
- Now With FastExcel
- ✅ Some Profiling Enahancements
✅ Better Storage of the SharedStrings
✅ Cell object type 📅
✅ Use internal ZipEntry rented buffer
✅ Investigation into the smallest function 💪
✅ Optimise for CellConversion.None 💪
✅ Parallel Sheet threads Access
✅ Nuget
- ✅ Beta etc.
- 🎊 Released as Nuget V1.yyMM.dd -> 1.2511.14

Phase 2 - RC

✅ Add IEnumerables All the way down ⤵️
✅ Nuget
- ✅ Manual workflow deploy Release
- ✅ Manual workflow deploy Beta
✅ Read "definedName"s (Ranges / Cell / Value / Dynamic) 📇
✅ Deal with blank rows in a sheet 🗋
✅ Deal with Empty cells in a row 🗅
✅ Implement Sheet scoping of "definedName"s
✅ Implement Row extraction 📟
✅ Implement RangeExtraction 📲
✅ Add Benchmarks for "Excel readers" That perform Range Extraction
- ✅ ClosedXML Version="0.105.0"
- ✅ EPPlus_LPGL Version="4.5.3.13"
- ⚠️ FastExcel Version="3.0.13" -> Fails on Range Extraction
- ✅ FreeSpire.XLS Version="14.2.0"
- ✅ Aspose.Cells Version="25.11.0"
- ⚠️ Extend benchmarks to cover the other large file types
  - It appears that most of the others do not like the pivot-tables file.!! 🤯
  - Performance on 2025-11-28
✅ Investigate memory usage(s) 🧑‍💻
- ✅ Sacrificed a little speed ➡️ Performance on 2025-12-07
✅ Release as Nuget V2.2512-10 💨

V2 Changes ➡️ 2025-12-14

Implement GetUserRange(...)
- Range Performance on 2025-12-14

Phase V3 - XLSB 💾 (BIFF12)

⛓️‍💥 Breaking Change(s)
- FileType has been removed, and Open via the Public class type
- IXmlReaderHelpers has become IOpenXmlReaderHelpers, with slightly different methods
- IXmlWorkBookReader has become IOpenXmlWorkBookReader
- IXmlSheetReader has become IOpenXmlSheetReader
- Removal of the Conversion options Number###
- Changed GetAllCells to return IReadOnlyList<ICell?>?
  - Watch out for those null rows !
✅ Branch and beta yml
- ✅ Convert test data in xlsb format
✅ Implement Open / Dispose (Async)
- ✅ Sheet Names
- ✅ Shared Strings
✅ Implement Sheet loading
✅ Implement Row extraction
- ✅ Skip
- ✅ Delayed read - until a cell is actually needed
- ✅ Deal with Null / Empty cells
✅ Cell object type 📅
✅ Benchmarks 🖲️
- ✅ Add "Excel readers" That support XLSB Extraction
- ✅ 🚶‍➡️ 1st Pass Performance on 2025-12-20
- ✅ 👟 2nd Pass Performance on 2025-12-21
✅ Read "definedName"s (Ranges / Cell / Value / Dynamic) 📇
- ✅ Read from global
✅ Strongly-typed accessors (AsInt32, AsDateTime, etc)
- Slightly slower, but less memory pressure for xslb
- ✅ 2026-01-02
✅ Parallel Sheet threads Access
- ✅ Multiple times (with locking)
✅ Release as Nuget V3.yyMM.dd
- 🎊 Released RC1 as Nuget V3.2601.04-RC1
✅ Investigate Performance and edge cases, then Release as Stable
- 🚀 Big Performance improvements 2026-01-11
🎊 Released V3 as Nuget V3.2601.11

V3 Changes ➡️ 2026-01-16

Remove some AggressiveOptimization and allow i-cache to do its job
Implement "Hot-Paths" for cell type access
Reduce some memory allocations for ReadOnly CellCollections
- 2026-01-16

Phase V4 - Specific Cell value type(s) #️⃣

⛓️‍💥 Breaking Change(s)
- Removal of GetSheetFileName(int offsetSheetId);
- Removal of GetDefinedRange via int sheetId
- Removal of Index property from ISheet
- Internal Creation of WorkBooks
- Internal implementation of IOpenXmlWorkBookReader::GetSheetNames now returns the relative path to the "Sheet Name"
- CellValue is now a class, therefore no need to use .Value
- ICell.CellValue is now nullable
✅ Cell object type 📅
- ✅ "Best Effort" Operator based conversion
- ✅ TryGetType will return out type, if stored as that type.
- ✅ Unit Tests
✅ Performance
- ✅ Use ValueTask and reduce memory allocations in some hot paths
- 🚀 Fix fallout from making CellValue is now a class
- ✅ ArrayPool support has been added to ThreadStringBuilderPool using ArrayPool.
- ✅ Release-specific optimizations added
  - ✅ EnableTrimAnalyzer: true
  - ✅ TieredCompilation: true
  - ✅ TieredCompilationQuickJit: true
  - ✅ TieredCompilationQuickJitForLoops: true
✅ Implement System.DBNull return option, for empty cells
- ✅ Implement INullRow return option, for empty rows
- ✅ Update tests to use INullRow detection
✅ Implement GetCell###(string columnLetters, ...) #8
🚀 2026-05-05

Phase 5 - User Cell Value type formatting 💽 & Performance Optimizations 🏃‍➡️

Phase 6 - Third Party Nugets 📦

⛓️‍💥 Breaking Change(s)
- None yet.
Excercise the Implementation of Interfaces for other Libs (Xml / Zip)
- Separate Nuget(s) ?
Benchmarks
- e.g. search isages of Class PoolingArrayBufferWriter<T>
- [ ]

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github		.github
Excel_PRIME.Bench		Excel_PRIME.Bench
Excel_PRIME.RangeBench		Excel_PRIME.RangeBench
Excel_PRIME.Tests		Excel_PRIME.Tests
Excel_PRIME		Excel_PRIME
Excel_PRIMEXlsb.Bench		Excel_PRIMEXlsb.Bench
Excel_PRIMEXlsb.RangeBench		Excel_PRIMEXlsb.RangeBench
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
Excel.ico		Excel.ico
Excel.png		Excel.png
Excel_PRIME.sln.DotSettings		Excel_PRIME.sln.DotSettings
Excel_PRIME.sln.DotSettings.user		Excel_PRIME.sln.DotSettings.user
Excel_PRIME.slnx		Excel_PRIME.slnx
LICENSE		LICENSE
Microsoft_Excel_(2010).svg		Microsoft_Excel_(2010).svg
Performance.md		Performance.md
README.md		README.md
Release_Notes.md		Release_Notes.md

Folders and files

Latest commit

History

Repository files navigation

Excel_PRIME 🌟

What does that mean?

Excel 📈

Performant 🚀

Q & A's

Reader 📋

Interfaces 🏗️

Q & A's

Memory 🌐

Q & A's

Efficiency 📦

Q & A's

Etc. 🔧

CancellationTokens

IDisposable

Challenges:

Caveats ⛔:

It will not be: Same sheet thread instance safe 📊

It will not do: Dynamic Ranges ⚠️

It will not do: Poco 🤖

It will not be: Writer / Modifier 📚

Targets 🎯

Phase 0

Phase Alpha

Phase Beta - Benchmarks ⏱️

Phase 1 - MVP 🔍

Phase 2 - RC

V2 Changes ➡️ 2025-12-14

Phase V3 - XLSB 💾 (BIFF12)

V3 Changes ➡️ 2026-01-16

Phase V4 - Specific Cell value type(s) #️⃣

Phase 5 - User Cell Value type formatting 💽 & Performance Optimizations 🏃‍➡️

Phase 6 - Third Party Nugets 📦

Phase 7 - ideas 💡

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`CancellationToken`s

Packages