diff --git a/PersistentMap/Readme.md b/PersistentMap/Readme.md new file mode 100644 index 0000000..81517f1 --- /dev/null +++ b/PersistentMap/Readme.md @@ -0,0 +1,89 @@ +# PersistentMap + +A high-performance, persistent (immutable) B-Tree map implementation for .NET, designed for scenarios requiring efficient snapshots and transactional updates. + +## Features + +* **Persistent (Immutable) by Default:** Operations on `PersistentMap` return a new instance, sharing structure with the previous version. This makes it trivial to keep historical snapshots or implement undo/redo. +* **Transient (Mutable) Phase:** Supports a `TransientMap` for high-performance batch updates. This allows you to perform multiple mutations (Set/Remove) without the overhead of allocating new path nodes for every single operation, similar to Clojure's transients or Scala's builders. +* **Optimized B-Tree:** Uses a B-Tree structure optimized for modern CPU caches and SIMD instructions (AVX2/AVX512) for key prefix scanning. +* **Custom Key Strategies:** Flexible `IKeyStrategy` interface allows defining custom comparison and prefix generation logic (e.g., for strings, integers, or custom types). + +## Usage + +### When should I use this? + +Never, probably. This was just a fun little project. If you want a really fast immutable sorted map you should consider it. Despite this map being faster than LanguageExt.HashMap for some key types, you should definitely use that if you don't need a sorted collection. It is well tested and does not have any problems key collisions, which will slow this map down by a lot. + +It is also faster for just about every key that isn't a more-than-30-char-with-few-common-prefixes string. + + +### Basic Persistent Operations + +```csharp +using PersistentMap; + +// 1. Create an empty map with a strategy (e.g., for strings) +var map0 = PersistentMap.Empty(new UnicodeStrategy()); + +// 2. Add items (returns a new map) +var map1 = map0.Set("key1", "value1"); +var map2 = map1.Set("key2", "value2"); + +// map0 is still empty +// map1 has "key1" +// map2 has "key1" and "key2" + +// 3. Remove items +var map3 = map2.Remove("key1"); +// map3 has only "key2" +``` + +### Efficient Batch Updates (Transients) + +When you need to perform many updates at once (e.g., initial load, bulk import), use `ToTransient()` to switch to a mutable mode, and `ToPersistent()` to seal it back. + +```csharp +// 1. Start with a persistent map +var initialMap = PersistentMap.Empty(new IntStrategy()); + +// 2. Convert to transient (mutable) +var transientMap = initialMap.ToTransient(); + +// 3. Perform batch mutations (in-place, fast) +for (int i = 0; i < 10000; i++) +{ + transientMap.Set(i, $"Value {i}"); +} + +// 4. Convert back to persistent (immutable) +// This "seals" the current state. The transient map rolls its transaction ID, +// so subsequent writes to 'transientMap' won't affect 'finalMap'. +var finalMap = transientMap.ToPersistent(); +``` + +## Key Strategies + +The library uses `IKeyStrategy` to handle key comparisons and optimization. + +* **`UnicodeStrategy`**: Optimized for `string` keys. Uses SIMD to pack the first 8 bytes of the string into a `long` prefix for fast scanning. +* **`IntStrategy`**: Optimized for `int` keys. + +You can implement `IKeyStrategy` for your own types. + +## Performance Notes + +* **Structure Sharing:** `PersistentMap` shares unchanged nodes between versions, minimizing memory overhead. +* **Transients:** `TransientMap` uses an internal `OwnerId` (transaction ID) to track ownership. Nodes created within the same transaction are mutated in-place. `ToPersistent()` ensures that any future writes to the transient map will copy nodes instead of mutating the shared ones. This leads to very fast building times compared to using persistent updates. +* **SIMD:** The `PrefixScanner` uses AVX2/AVX512 (if available) to scan node keys efficiently. + +### Key strategies + +For string keys, the prefix optimization lets the library have really fast lookups. For mostly-ascii string keys, we are faster than most persistent hash maps once you pass a certain key size or collection size depending on implementation strategy. The B tree is shallow and has fewer cache misses, meaning it can be faster than either deep trees or hash maps despite doing linear searches. + +## Project Structure + +* `PersistentMap.cs`: The main immutable map implementation. +* `TransientMap.cs`: The mutable builder for batch operations. +* `Nodes.cs`: Internal B-Tree node definitions. +* `KeyStrategies.cs`: implementations of key comparison and prefixing.