- Excel_Performant Reader via Interfaces for Memory Efficiency.
- Without using any external libraries.
- Optimised for Range extraction.

- Yet another Excel reader ?,
- Starting with .Net 8 as the performant Runtime (See Benchmarks)
- .Net9 gives an extra 5% boost,
- .Net10 Another 5% over .Net9 ;-)
Lets take each of the above elements and explain:
- Open Large 2007 (Onwards) XLSX file formats and XLSB (BIFF12) in V3.##
- Zip Deflate format Only
- Try to be as fast as possible, i.e.
- Forward only Lazy loading
- Only "Quick" decipher / convert of the cell(s) types to ease GC pressure
- No attempt at "creating / using" datatables with headers etc.
- Use
IEnumerables with initial offset starts (Row / Column) - Allow
CancellationTokens to be used to allow page transitioning cancellation (More on this later)
- Now the fastest in Real world usage 2025-11-19 onwards
- Q: There are others that are faster
- A: Agreed, but then
- They do not have range extraction.
- Or optionally allow the use of the OS's TempFile System to store massive sheets
- Or re-use of already extracted (massive) sheets
- Or allow multiple sheets to be read at the same time
- because others use global memory to represent the current row
- Or have a single access into the Zip Excel file
- Read only
- Therefore no calculations or updates to formula calls
- Will use the DotNet core functionality by default
- But, if your target deployment allows for the use of native performant binaries, then via the use of interfaces these will be pluggable
- i.e. Using
Zlib.Netfor getting the data streams out of the compressed Excel file faster. (OrSharpZipLib/PowerPlayZipper) - A faster / slimmer implementation for xml stream reading (i.e. TurboXml)
- i.e. Using
- Allow the implementation of different source files (i.e. XLSB)
- Q: Why?
- A: As mentioned above, this is to allow a developer to replace with external nugets that might perform better XML speed etc.
- The reason for this project, is to handle very large XSLX files (i.e. > 500K rows with > 180 columns per sheet, with multiple sheets of this size)
- For
ETLvalidation scenarios, i.e. make sure that the user modified data that has been transferred has interaction rules applied, before moving onto theTandLstages - Try not to hit / store in the LOH
- No internal .Net memory of previously loaded sheets / rows.
- Q: It appears that this uses more memory than other implementations
- A: Currently yes, but it is being optimised for Range Extraction,
- AND for allowing multiple rows (With cell data) to be stored in memory at the same time, (i.e. via
ToList()call); - AND to allow multiple sheets to be read at the same time (Unlike some to of the others that use a single global memory to represent a row)
- And it appears that the current benchmarks do not extract unless a
ToStringand a check on the result is used (Otherwise the Jit removes the unassigned dead code) - And, the memory used will actually be used in the ETL pipeline anyway, so it's just being truthful
- AND for allowing multiple rows (With cell data) to be stored in memory at the same time, (i.e. via
- As hinted by the above statements, this is to be targetted at memory restricted environments (i.e. ASP Net VM's)
- Use the OS's "Temp File" caching, so if the memory is tight then the Owner app will not have to worry about OOM exceptions, or having to use Swap Disk speeds.
- Only unzip the sheet(s) when they are asked for
- Only load the shared strings upto the current request number
- Q: Sometimes the
Asyncawait s add too much overhead - A: true, that is why there are also the equivalent base interfaces that perform the same functionality without the need for the
async awaitoverheads.
- This is to allow the Large files to be Aborted
- Make "Most" of the "Net Cores'" API's Asynchronous
Tasks
- Got to tidy up those
Temp Files, and release theFileStream's
- CellValue instances are returned to users
- They must be thread-safe (multiple readers possible)
- Each "Cell Type" / "Cell Instance" / " Row Instance"(string, numeric, boolean, datetime, error) have different lifecycle requirements
- It will Not be same sheet Instance thread safe, because the xml reader will be locked (Forward only) to the sheet in use.
- but you can Open the sheet more than once, and have different threads running over it,
- And you can have Parallel threads access the Excel file
- Just remember to set
Options{ AccessExcelFileInForwardOnlyMode = false}
- i.e. Ones that contain formulas:
<definedName name="Prices">OFFSET(Sheet1!$A$1,0,0,COUNTA(Sheet1!$A:$A),1)</definedName>
- A POCO / Type populator (Extensions can be written for that later)
- Totally beyond the scope of this project remit
| Badge 🔄 | Area |
|---|---|
| Release build and tests |
- Targets 🎯
- ✅ Setup this github
- ✅ Create the main project
- ✅ Add Unit Test project
- ✅ Add simple Test Data
- ✅ Use Net Core Interface(s)
- ✅ Use
ZipArchive - ✅ Use
XDocument
- ✅ Use
- ✅ Implement Open / Dispose (Async)
- ✅ Sheet Names
- ✅ Shared Strings
- ✅ Implement Sheet loading (unzip and be ready for use)
- ✅ Use
XDocumentas POC only
- ✅ Use
- ✅ Implement Row extraction
- ✅ Skip
- ✅ Delayed read - until a cell is actually needed
- ✅ Deal with Null / Empty cells (Utilise sparse array?)
- ✅ Keep last used offset (i.e. no need to reload sheet if the next range API
startRowcall is later)
- ✅ Benchmarks
- ✅ Add Other "Excel readers" to the Benchmark project(s)
- 🎉 Now With
Sylvan.Data.Excel - 🎉 Now With
XlsxHelper
- ✅ More UnitTests
- ✅ Add
IEnumerables and benchmark - ✅ Implement
XmlReader.Createfor - ✅ More Benchmarks
- Now With
FastExcel - ✅ Some Profiling Enahancements
- Now With
- ✅ Better Storage of the SharedStrings
- ✅ Cell object type 📅
- ✅ Use internal
ZipEntryrented buffer - ✅ Investigation into the smallest function 💪
- ✅ Optimise for
CellConversion.None💪 - ✅ Parallel Sheet threads Access
- ✅ Nuget
- ✅ Beta etc.
- 🎊 Released as Nuget V1.yyMM.dd ->
1.2511.14
- ✅ Add
IEnumerables All the way down⤵️ - ✅ Nuget
- ✅ Manual workflow deploy Release
- ✅ Manual workflow deploy Beta
- ✅ Read "definedName"s (Ranges / Cell / Value / Dynamic) 📇
- ✅ Deal with blank rows in a sheet 🗋
- ✅ Deal with Empty cells in a row 🗅
- ✅ Implement Sheet scoping of "definedName"s
- ✅ Implement Row extraction 📟
- ✅ Implement RangeExtraction 📲
- ✅ Add Benchmarks for "Excel readers" That perform Range Extraction
- ✅
ClosedXMLVersion="0.105.0" - ✅
EPPlus_LPGLVersion="4.5.3.13" ⚠️ FastExcelVersion="3.0.13" -> Fails on Range Extraction- ✅
FreeSpire.XLSVersion="14.2.0" - ✅
Aspose.CellsVersion="25.11.0" ⚠️ Extend benchmarks to cover the other large file types- It appears that most of the others do not like the
pivot-tablesfile.!! 🤯 - Performance on 2025-11-28
- It appears that most of the others do not like the
- ✅
- ✅ Investigate memory usage(s) 🧑💻
- ✅ Sacrificed a little speed ➡️ Performance on 2025-12-07
- ✅ Release as Nuget V2.2512-10 💨
- ⛓️💥 Breaking Change(s)
FileTypehas been removed, and Open via the Public class typeIXmlReaderHelpershas becomeIOpenXmlReaderHelpers, with slightly different methodsIXmlWorkBookReaderhas becomeIOpenXmlWorkBookReaderIXmlSheetReaderhas becomeIOpenXmlSheetReader- Removal of the Conversion options
Number### - Changed
GetAllCellsto returnIReadOnlyList<ICell?>?- Watch out for those null rows !
- ✅ Branch and beta yml
- ✅ Convert test data in xlsb format
- ✅ Implement Open / Dispose (Async)
- ✅ Sheet Names
- ✅ Shared Strings
- ✅ Implement Sheet loading
- ✅ Implement Row extraction
- ✅ Skip
- ✅ Delayed read - until a cell is actually needed
- ✅ Deal with Null / Empty cells
- ✅ Cell object type 📅
- ✅ Benchmarks 🖲️
- ✅ Add "Excel readers" That support XLSB Extraction
- ✅ 🚶➡️ 1st Pass Performance on 2025-12-20
- ✅ 👟 2nd Pass Performance on 2025-12-21
- ✅ Read "definedName"s (Ranges / Cell / Value / Dynamic) 📇
- ✅ Read from global
- ✅ Strongly-typed accessors (
AsInt32,AsDateTime, etc)- Slightly slower, but less memory pressure for
xslb - ✅ 2026-01-02
- Slightly slower, but less memory pressure for
- ✅ Parallel Sheet threads Access
- ✅ Multiple times (with locking)
- ✅ Release as Nuget V3.yyMM.dd
- 🎊 Released RC1 as Nuget
V3.2601.04-RC1
- 🎊 Released RC1 as Nuget
- ✅ Investigate Performance and edge cases, then Release as Stable
- 🚀 Big Performance improvements 2026-01-11
- 🎊 Released V3 as Nuget
V3.2601.11
- Remove some
AggressiveOptimizationand allowi-cacheto do its job - Implement "Hot-Paths" for cell type access
- Reduce some memory allocations for ReadOnly CellCollections
- ⛓️💥 Breaking Change(s)
- Removal of
GetSheetFileName(int offsetSheetId); - Removal of
GetDefinedRangeviaint sheetId - Removal of
Indexproperty fromISheet - Internal Creation of WorkBooks
- Internal implementation of
IOpenXmlWorkBookReader::GetSheetNamesnow returns the relative path to the "Sheet Name" CellValueis now aclass, therefore no need to use.ValueICell.CellValueis now nullable
- Removal of
- ✅ Cell object type 📅
- ✅ "Best Effort"
Operatorbased conversion - ✅ TryGet
Typewill returnout type, if stored as that type. - ✅ Unit Tests
- ✅ "Best Effort"
- ✅ Performance
- ✅ Use
ValueTaskand reduce memory allocations in some hot paths - 🚀 Fix fallout from making
CellValueis now aclass - ✅
ArrayPoolsupport has been added to ThreadStringBuilderPool using ArrayPool. - ✅ Release-specific optimizations added
- ✅ EnableTrimAnalyzer: true
- ✅ TieredCompilation: true
- ✅ TieredCompilationQuickJit: true
- ✅ TieredCompilationQuickJitForLoops: true
- ✅ Use
- ✅ Implement
System.DBNullreturn option, for empty cells- ✅ Implement
INullRowreturn option, for empty rows - ✅ Update tests to use
INullRowdetection
- ✅ Implement
- ✅ Implement
GetCell###(string columnLetters, ...)#8 - 🚀 2026-05-05
- ⛓️💥 Breaking Change(s)
- None yet.
- Cell object type 📅
- Store cell style type (see Options enum)
- Use of user defined column schema
- Formatter applied ->
CellConversion.ForceStyles - Unit Tests
- Deal with
DateOnly/TimeOnlyfields ->CellConversion.NumberAndDates💹 Code-Level Optimizations - Implement
ISpanFormattablein CellValue - Use
CollectionsMarshalfor zero-copy operations - Add
System.Runtime.Intrinsicsfor SIMD Advanced Scenarios - Enable
PublishTrimmed=truewith trim warnings resolved - Native AOT compilation testing
- IAsyncEnumerable stream processing Monitoring
- Add performance regression tests
- Implement ETW profiling in CI/CD
- Track JIT compilation metrics
- ⛓️💥 Breaking Change(s)
- None yet.
- Excercise the Implementation of Interfaces for other Libs (Xml / Zip)
- Separate Nuget(s) ?
- Benchmarks
- e.g. search isages of
Class PoolingArrayBufferWriter<T> - [ ]
- e.g. search isages of
-
Investigate a different way of storing the Shared strings to the Filesystem, when they are in the MB's
- e.g. Search for
Class FileBufferingWriter
- e.g. Search for
-
Investigate possibility of using "Pipelining" to get data for Next row / cell population after yield?
- Locking
- How to deal with rows that are completely blank
-
fibres?
-
Indicate that things may be
Hidden🖺- Sheet
- Row
- Column
- Cell ?
-
Indicate that things may be
Readonly- Sheet
- Row
- Column
- Cell ?
-
More ideas to be added later, Please suggest... ;-)