Robert Baindourov – webdesigner.la

Posted on March 18, 2025 by Robert Baindourov

Where to Find GPT-2 Source Code

OpenAI’s Official GitHub Repository:

OpenAI released parts of GPT-2’s code in their GitHub repository: github.com/openai/gpt-2.
This includes a TensorFlow implementation for the 124M parameter model (smallest version), along with utilities for downloading pre-trained weights and generating text.
Note: The full 1.5B parameter model’s training code and weights weren’t fully released initially, but smaller models and sample code are available.
Usage example:
bash git clone https://github.com/openai/gpt-2.git cd gpt-2 pip install -r requirements.txt python download_model.py 124M python src/interactive_conditional_samples.py --model_name=124M

Hugging Face Transformers Library:

Hugging Face provides a PyTorch and TensorFlow implementation of GPT-2 that’s widely used and well-documented: huggingface.co/models?filter=gpt2.
You can load pre-trained GPT-2 models easily:
python from transformers import GPT2Tokenizer, GPT2LMHeadModel tokenizer = GPT2Tokenizer.from_pretrained("gpt2") model = GPT2LMHeadModel.from_pretrained("gpt2") input_ids = tokenizer.encode("Hello, world!", return_tensors="pt") output = model.generate(input_ids, max_length=50) print(tokenizer.decode(output[0], skip_special_tokens=True))
Source code is in their GitHub: github.com/huggingface/transformers, under src/transformers/models/gpt2.

Open-Source Replications:

OpenGPT-2: A community replication released in August 2019, approximating the original GPT-2. Check repositories like github.com/openai-community/gpt2 or related forks on GitHub.
NanoGPT by Andrej Karpathy: A minimal, educational implementation in PyTorch (~300 lines), not the full GPT-2 but a simplified version: github.com/karpathy/nanoGPT. It’s great for understanding the architecture.

Minimal Implementations:

Projects like github.com/iVishalr/GPT offer clean, minimal PyTorch implementations of GPT-like models, including GPT-2’s architecture.

GPT-2 Architecture Overview

GPT-2 is a decoder-only transformer model with 1.5 billion parameters (in its largest version), trained on WebText (8 million web pages) to predict the next word in a sequence. Key components:

Embedding Layer: Converts input tokens to dense vectors.
Positional Encoding: Adds position information since transformers lack inherent order.
Transformer Blocks: Stacked layers (e.g., 12 for 124M, 48 for 1.5B) with:
Masked Multi-Head Self-Attention: Attends only to previous tokens.
Feed-Forward Neural Network: Processes each token independently.
Layer Normalization: Stabilizes training.
Output Layer: Maps hidden states to vocabulary probabilities.

Simplified Conceptual C++ Implementation

Below is a basic outline of a GPT-2-like model in C++. This isn’t the full 1.5B-parameter model (which would require massive memory and optimized libraries like PyTorch/TensorFlow) but illustrates the core structure. You’d need a linear algebra library (e.g., Eigen) and significant optimization for a real implementation.

#include <vector>
#include <cmath>
#include <iostream>

// Simplified matrix class (in practice, use Eigen or similar)
class Matrix {
public:
    std::vector<std::vector<float>> data;
    int rows, cols;
    Matrix(int r, int c) : rows(r), cols(c) {
        data.resize(r, std::vector<float>(c, 0.0));
    }
    // Add matrix operations (multiplication, addition, etc.) here
};

// Self-Attention (simplified, no multi-head or scaling)
Matrix selfAttention(const Matrix& query, const Matrix& key, const Matrix& value) {
    // Q * K^T (transpose key)
    Matrix attentionScores(query.rows, key.rows);
    for (int i = 0; i < query.rows; i++) {
        for (int j = 0; j < key.rows; j++) {
            float score = 0;
            for (int k = 0; k < query.cols; k++) {
                score += query.data[i][k] * key.data[j][k];
            }
            attentionScores.data[i][j] = score;
        }
    }
    // Apply mask (causal: no future tokens)
    for (int i = 0; i < attentionScores.rows; i++) {
        for (int j = i + 1; j < attentionScores.cols; j++) {
            attentionScores.data[i][j] = -1e9; // Negative infinity
        }
    }
    // Softmax and multiply with value (simplified)
    Matrix output(query.rows, value.cols);
    // Implement softmax and V multiplication here (omitted for brevity)
    return output;
}

class GPT2Layer {
public:
    Matrix Wq, Wk, Wv; // Weights for Q, K, V
    GPT2Layer(int d_model) : Wq(d_model, d_model), Wk(d_model, d_model), Wv(d_model, d_model) {
        // Initialize weights (random or pre-trained)
    }
    Matrix forward(const Matrix& input) {
        Matrix query = input; // Multiply with Wq
        Matrix key = input;   // Multiply with Wk
        Matrix value = input; // Multiply with Wv
        return selfAttention(query, key, value);
    }
};

class GPT2 {
public:
    std::vector<GPT2Layer> layers;
    int vocab_size, d_model;
    GPT2(int num_layers, int vocab_size, int d_model) 
        : vocab_size(vocab_size), d_model(d_model) {
        for (int i = 0; i < num_layers; i++) {
            layers.emplace_back(d_model);
        }
    }
    Matrix forward(const std::vector<int>& tokens) {
        Matrix input(1, d_model); // Simplified embedding
        Matrix hidden = input;
        for (auto& layer : layers) {
            hidden = layer.forward(hidden);
        }
        return hidden; // Add output layer for vocab probs
    }
};

int main() {
    GPT2 model(12, 50257, 768); // 12 layers, GPT-2 vocab size, embedding dim
    std::vector<int> tokens = {1, 2, 3}; // Dummy input
    Matrix output = model.forward(tokens);
    std::cout << "Model ran successfully!\n";
    return 0;
}

Explanation of Simplified Code

Matrix Class: Placeholder for tensor operations (real implementations use CUDA-optimized libraries).
Self-Attention: Computes attention scores with a causal mask (no future tokens), though softmax and full multi-head logic are omitted for brevity.
GPT2Layer: One transformer block with attention (feed-forward and normalization skipped).
GPT2: Stacks layers, processes token embeddings, and outputs hidden states.
Limitations: This lacks training logic, real embeddings, and optimization—full GPT-2 needs billions of parameters and GPU support.

Getting the Real Deal

For the actual GPT-2 source:

Hugging Face: Clone transformers, explore models/gpt2/modeling_gpt2.py for PyTorch details.
OpenAI Repo: Use the TensorFlow code in gpt-2/src/model.py for the original structure.
Training Data: WebText isn’t public, but OpenWebText (a replication) is available via community efforts.

This gives you access to functional GPT-2 code or a starting point to build your own.

Posted on March 18, 2025March 18, 2025 by Robert Baindourov

Subset Sum Problem from Geeks for Geeks


class Solution {
  public:
    bool isSubsetSum(vector<int>& arr, int sum) {
        
        vector<vector<int>> memo(arr.size()+1, vector<int>(sum+1, -1));
        
        
        function<bool(vector<int>&, int, int)> dp = [&](vector<int>& ar, int index, int sum){
            
            
            if( sum == 0 ) return true;
            if( index == 0 ) return false;
            
            if( memo[index][sum] != -1 )
                return memo[index][sum]==1;
            
            bool result = dp(ar, index-1, sum );
            
            
            if( ar[index-1] <= sum  )
                result = result || dp(ar, index-1, sum - ar[index-1]);
            
            memo[index][sum] = result;
            return result ;
        };
        
        return dp(arr, arr.size(), sum);
        
        
    }
};

Posted on March 17, 2025 by Robert Baindourov

2206. Divide Array Into Equal Pairs


class Solution {
public:
    bool divideArray(vector<int>& nums) {
        int ar[501];
        for( int num:nums){
            ar[num]++;
        }
        for( int element : ar ){
            if( element%2 != 0) return false;
        }

        return true;
        
    }
};

Posted on March 17, 2025March 17, 2025 by Robert Baindourov

Apple Interview Question I Failed

Recently interviewed for a front end developer position where I had no less than 6 rounds through Mphesis.
The job required relocation, and a lower salary then I was used to.
The first was a verbal technical screening on how to optimize ReactJS front ends.
The second round was a test with questions coming from the website BFE.dev.
Then I had to meet with managers at least twice who kept asking me if I was sure I was ready to relocate back to Sunnyvale.
Then I met with an Apple employee and went through my experience and talked about various technical challenges.
Last but not least, was a … LEETCODE ROUND.

This is the question I failed:
https://leetcode.com/problems/search-a-2d-matrix/
And below the solution in C++


class Solution {
public:
    bool searchMatrix(vector<vector<int>>& matrix, int target) {
        int rows = matrix.size()-1;
        int cols = matrix[0].size()-1;
        int startRow = rows;
        int startCol = 0;
        while(true){
            if( startRow < 0 || startCol > cols )
                break;
            int currentVal = matrix[startRow][startCol];
            if( currentVal == target)
                return true;            
            else if( currentVal > target )
                startRow--;
            else if( currentVal < target )
                startCol++;
            
        }
        return false;
    }
    
};

There was one other one which I was able to do and that was:
https://leetcode.com/problems/container-with-most-water/


class Solution {
    public int maxArea(int[] height) {
        int i = 0;
        int j = height.length - 1;
        int container = 0;
        int h = 0;
        while(i<j){
            h = Math.min(height[i],height[j]);
            container = Math.max(container,(j-i)*h);
            while(i<j && height[i]<=h)
                i++;
            while(i<j && height[j]<=h)
                j--;
        }
        return container;
    }
}

Posted on March 16, 2025March 16, 2025 by Robert Baindourov

C++ Solution for LeetCode 2594. Minimum Time to Repair Cars



long long repairCars(vector<int>& ranks, int cars) {
    //long long min_r = *min_element(ranks.begin(), ranks.end());
    long long left = 1;
    long long right = ranks[0] * (long long)cars * cars;
    long long result = right; // Initialize with an upper bound

    auto countRepairCars = [&ranks](long long time) {
        long long repaired = 0;
        for (int r : ranks) {
            repaired += (long long)sqrt((double)time / r);
        }
        return repaired;
    };

    while (left <= right) {
        long long midpoint = left + (right - left) / 2;
        long long repaired = countRepairCars(midpoint);
        if (repaired >= cars) {
            result = midpoint; // Found a valid time, try to minimize it
            right = midpoint - 1;
        } else {
            left = midpoint + 1;
        }
    }

    return result;
}

Posted on March 16, 2025 by Robert Baindourov

How was the Manus AI code was Leaked?

It seems likely that Jian Liao, known online as jlia0, asked Manus for its source code and received it, though in an encrypted form.
Research suggests this was part of a leak incident, with discussions around the code’s usability and security.

Background

Manus is an AI agent, often described as a general-purpose tool capable of tasks like research and content creation, developed by a Chinese startup and currently in closed beta with limited invite codes. Its source code is typically not publicly available, making any access notable.

The Incident

Jian Liao, using the username jlia0 on GitHub, obtained an invite code for Manus and reportedly asked the AI to output its own source code, specifically requesting the “/opt/.manus/” directory as a zip file. He received the code, but it was encrypted, limiting its immediate usability. This event sparked discussions on platforms like GitHub about the encryption and potential for reverse-engineering.

Unexpected Detail

While most expected Manus to be a secure, closed system, the ability to extract even encrypted code highlights vulnerabilities in AI agent security, raising questions about prompt injection and system isolation.

Comprehensive Analysis of the Source Code Request Incident

This report delves into the details surrounding the request and acquisition of Manus AI’s source code, focusing on the individual involved, the context of Manus AI, and the implications of the incident. The analysis is based on recent online discussions, GitHub activity, and media coverage as of March 15, 2025.

Context of Manus AI

Manus AI, launched by a Chinese startup, is a general-purpose AI agent designed to perform autonomous tasks such as information retrieval, data processing, content creation, and web automation. It has garnered significant attention, with its Discord channel boasting over 186,000 members and invite codes being resold for high prices on platforms like Xianyu (Manus AI Invitation Code: Application Guide & Success Tips). The system is currently in closed beta, requiring an invite code for access, and is not open source, distinguishing it from projects like DeepSeek, which is an LLM rather than an agent.

Early reviews, such as those from MIT Technology Review (Everyone in AI is talking about Manus. We put it to the test.), describe Manus as promising but imperfect, with capabilities likened to a highly intelligent intern. However, its closed nature and limited access have fueled interest in its underlying technology, leading to replication efforts and security concerns.

The Individual: Jian Liao (jlia0)

Jian Liao, known by the GitHub handle jlia0, is identified as the CTO at Pointer and has been active in AI-related discussions. His GitHub profile (jlia0 (Jian Liao) · GitHub) shows a history of contributions, including a notable gist titled “Manus tools and prompts” (Manus tools and prompts · GitHub). In this gist, published on March 11, 2025, Liao states, “I got invite code of Manus, and ask Manus to output /opt/.manus as zip.” This action resulted in him obtaining the source code, though it was encrypted, as noted in subsequent comments where users discuss the encryption and its implications.

Media reports, such as an article on AIbase (Manus AI System Prompt Leakage: Official Response), confirm that a user named “jian” (likely Jian Liao) “cracked the Manus system” by requesting the directory contents, retrieving “some sensitive information and operational data.” This incident is described as a prompt leak, highlighting potential security flaws in Manus’s sandbox isolation, with the co-founder Ji Yichao noting that the code is lightly obfuscated for command reception.

Details of the Request and Acquisition

Liao’s method involved leveraging Manus AI’s capabilities to output its own internal directory, a technique that exploited the AI’s ability to execute file system operations. The output was a zip file containing the source code, but it was encrypted, likely using tools like PyArmor, as discussed in the gist comments. One comment notes, “A straight forward memory dump -> strings didn’t reveal any manus or pyarmor internals,” indicating the encryption’s robustness (Manus tools and prompts · GitHub).

The encryption limited the code’s usability, with users like @PeterZhao119 questioning how Liao obtained detailed prompts, suggesting skepticism about the leak’s authenticity. However, Liao’s X post (X post) and subsequent discussions, including on Reddit (r/AI_Agents on Reddit: Created an open-source alternative to Manus AI!), reinforce that he did receive the code, albeit in a form requiring further analysis.

Implications and Community Response

The leak sparked significant interest, with open-source alternatives like OpenManus emerging, developed by contributors from MetaGPT (GitHub – mannaandpoem/OpenManus: No fortress, purely open ground. OpenManus is Coming.). OpenManus, launched within three hours, aims to replicate Manus’s functionality without an invite code, but it’s unclear if it directly used Liao’s leaked code. Discussions on GitHub and Reddit highlight efforts to decrypt or reverse-engineer the code, with projects like whit3rabbit/manus-open (GitHub – whit3rabbit/manus-open: Manus code from container) offering AI-generated guesses, noting the code’s potential research value.

Security concerns arose, with articles like “Manus AI’s Agentic Moment: A Case Study in Prompt Leak and Risk Mitigation” on Medium (Manus AI’s Agentic Moment: A Case Study in Prompt Leak and Risk Mitigation | by Xiwei Zhou | Mar, 2025 | Medium) discussing prompt injections and system prompt leakage as risks in generative AI. Manus’s co-founder acknowledged the sandbox’s isolation but noted the code’s light obfuscation, suggesting ongoing efforts to mitigate such vulnerabilities.

Comparative Analysis with Other Leaks

To contextualize, source code leaks are not unique to Manus. High-profile examples include Microsoft’s 37GB leak in 2022 (r/DataHoarder on Reddit: Hackers leak 37GB of Microsoft’s source code (Bing, Cortana and more)), but Manus’s case is distinct due to the method—asking the AI itself rather than a security breach. This highlights a novel vulnerability in AI agents, where user commands can inadvertently expose internal data.

Table: Key Details of the Incident

Aspect	Details
Individual Involved	Jian Liao (jlia0), CTO at Pointer, GitHub user
Method of Acquisition	Asked Manus AI to output “/opt/.manus/” directory as zip, received encrypted code
Date of Incident	Around March 9-11, 2025, based on gist and media reports
Code Usability	Encrypted, likely using PyArmor, limiting immediate use
Community Response	Discussions on encryption, replication efforts (OpenManus, manus-open)
Security Implications	Highlighted prompt leak risks, sandbox isolation concerns

Conclusion

Jian Liao, known as jlia0, is the individual who asked Manus AI for its source code and received it, though in an encrypted form. This incident, occurring around early March 2025, underscores vulnerabilities in AI agent security and has spurred community efforts to replicate and analyze the technology. The encrypted nature of the code and ongoing discussions suggest a complex landscape of accessibility and security in AI development.

Key Citations

Posted on March 12, 2025March 12, 2025 by Robert Baindourov

2529. Maximum Count of Positive Integer and Negative Integer

So the daily leetcode challenge can be naively solved with a loop.
But the best solution in the python category uses a built in method called bisect_left, so here is a binary search implementation of that method in JavaScript and the solution that followed.


const bisectLeft = (array, x, lo = 0, hi = array.length ) => {
    while( lo < hi ){
        const mid = lo + ((hi-lo)>>1);
        if( array[mid] < x )
            lo = mid + 1;
        else
            hi = mid;
    }
    return lo;
}
function maximumCount(nums: number[]): number {
    const firstNonNegative = bisectLeft(nums, 0)
    const firstPositive = bisectLeft(nums, 1)
    const negativeCount = firstNonNegative;
    const positiveCount = nums.length - firstPositive; 
    return Math.max( negativeCount, positiveCount );
    
};

Posted on March 12, 2025 by Robert Baindourov

Design of a Real-Time Collaborative Document Editing System (like Google Docs)

Below is a comprehensive design for a cloud-based real-time collaborative document editing system that meets the specified functional and non-functional requirements. The system allows multiple users to edit documents simultaneously, supports offline editing with synchronization, and scales to millions of users while ensuring security, low latency, and efficient storage.

Solution Overview

The system is designed to provide a seamless collaborative editing experience similar to Google Docs, addressing challenges such as real-time synchronization, conflict resolution, offline support, scalability, and security. Here’s how each component is architected to meet the requirements.

Key Components and Design Decisions

1. Data Model & Storage

Document Storage:
Documents are stored using a Conflict-Free Replicated Data Type (CRDT)-based system (e.g., Yjs or Automerge). CRDTs represent the document as a sequence of operations (e.g., insert, delete, update) rather than plain text, enabling efficient merging of concurrent edits. This approach ensures that multiple users can edit the same document without conflicts.
Versioning Storage:
An event log records every operation applied to the document, allowing reconstruction of any previous version by replaying the log. To optimize performance and avoid replaying the entire log for older versions, periodic snapshots of the document state are saved (e.g., every 100 operations or at fixed time intervals). This balances storage efficiency with fast version retrieval.
Database Choice:
A NoSQL database (e.g., DynamoDB or MongoDB) is used to persist the CRDT data and event logs, as it supports high write throughput and horizontal scaling better than traditional SQL databases.

Why?
CRDTs natively handle concurrent edits, and the event log with snapshots minimizes storage overhead while enabling efficient versioning.

2. Concurrency & Conflict Resolution

Technique:
Use operation-based CRDTs (e.g., Logoot or Treedoc) to manage concurrent edits. Each edit is an operation with a unique identifier and timestamp, ensuring that operations can be applied in any order and still converge to the same document state.
Handling Same-Word Edits:
If two users edit the same word simultaneously (e.g., User A inserts “x” at position 5, and User B deletes character 5), the CRDT assigns unique identifiers to each operation based on user ID and timestamp. The merge function ensures both changes are preserved (e.g., “x” is inserted, and the deletion shifts accordingly), avoiding data loss.

Why?
CRDTs simplify conflict resolution compared to Operational Transformations (OT) and eliminate the need for locking, providing a seamless user experience under high concurrency.

3. Real-Time Synchronization

Communication Protocol:
Use WebSockets for real-time, bidirectional communication between clients and the server. When a user makes an edit, the operation is sent to the server via WebSocket, which broadcasts it to all connected clients editing the same document. Clients apply the operation to their local CRDT state.
Event Propagation:
The server acts as a central coordinator, receiving operations from clients and pushing them to others in near real-time (targeting sub-100ms latency). WebSocket connections are maintained for each active user per document.

Why?
WebSockets offer low-latency, full-duplex communication, making them ideal for real-time updates compared to Server-Sent Events (unidirectional) or gRPC (more complex for this use case).

4. Offline Editing & Sync

Offline Editing:
When a user goes offline, their edits are stored locally in the browser using IndexedDB as a queue of operations. The client continues to apply these operations to its local CRDT state, allowing uninterrupted editing.
Synchronization:
Upon reconnection, the queued operations are sent to the server. The server merges them with the current document state using the CRDT merge function, resolving conflicts automatically. If the merge result is ambiguous (e.g., significant divergence), users can optionally review changes via a manual conflict resolution interface.

Why?
Local storage ensures offline functionality, and CRDTs handle conflict resolution naturally, minimizing data loss during sync.

5. Scalability & Performance

Sharding:
Documents are sharded across multiple servers based on document ID, distributing the load and enabling horizontal scaling. Each shard manages a subset of documents and their associated event logs.
Event-Driven Architecture:
A message broker (e.g., Kafka or RabbitMQ) handles operation propagation. When an edit occurs, the operation is published to the broker, and relevant servers and clients consume it. This decouples the system, improving scalability and fault tolerance.
Caching:
Frequently accessed documents and their current states are cached in memory (e.g., using Redis) to reduce database load and ensure fast reads.
Performance Optimization:
Sub-100ms latency is achieved through WebSockets, in-memory caching, and efficient CRDT operations. Periodic snapshots reduce computation for version reconstruction.

Why?
Sharding and an event-driven approach scale the system to millions of users, while caching and snapshots ensure high performance even with large documents and frequent edits.

6. Security & Access Control

Authentication:
Users are authenticated using OAuth2 or JWT, ensuring only authorized individuals can access the system.
Authorization:
Implement Role-Based Access Control (RBAC) with roles such as:
Owner: Can edit, share, and delete the document.
Editor: Can edit the document.
Viewer: Can only view the document.
Role assignments are stored in a centralized database and checked for every operation (read, write, delete).
Encryption:
In Transit: Use TLS to secure all WebSocket and HTTP communications.
At Rest: Encrypt sensitive document data in the database using AES-256.

Why?
RBAC ensures fine-grained permissions, and encryption protects data confidentiality, meeting security requirements.

7. Versioning & Undo Feature

Document History:
The event log stores all operations (delta changes) applied to the document, enabling reconstruction of any version by replaying operations from the start or the nearest snapshot. Snapshots are taken periodically to optimize retrieval.
Undo Feature:
Clients maintain a local stack of recent operations. An undo reverses the last operation (e.g., deleting an inserted character) and sends the reversal to the server, which propagates it to other clients.

Why?
Delta changes are storage-efficient, and snapshots improve performance. The local undo stack provides a responsive user experience.

Final Architecture

Clients: Web browsers or mobile apps connect via WebSockets for real-time collaboration and use IndexedDB for offline edits.
API Gateway: Authenticates users, enforces RBAC, and routes requests to services.
Document Service: Manages document CRUD operations, coordinates real-time updates, and applies CRDT logic.
CRDT Engine: Merges operations and maintains document consistency.
Event Log Database: Persists operations and snapshots (e.g., DynamoDB).
Message Broker: Distributes operations across servers and clients (e.g., Kafka).
Caching Layer: Stores document states in memory (e.g., Redis).
Storage Layer: Holds encrypted document data and metadata.

Meeting Requirements

Functional Requirements

CRUD Operations: Supported via the document service.
Concurrent Editing: Enabled by CRDTs and WebSockets.
Conflict Resolution: Handled automatically by operation-based CRDTs.
Version History: Provided by the event log and snapshots.
Offline Editing: Supported with local storage and sync via CRDTs.
Speed: Optimized with caching and efficient CRDT operations.

Non-Functional Requirements

Low Latency: Sub-100ms updates via WebSockets and caching.
Scalability: Achieved with sharding and event-driven architecture.
Fault Tolerance: Message broker and geo-redundant storage ensure minimal data loss.
Security: RBAC, TLS, and encryption protect access and data.
High Availability: Sharding and redundancy target 99.99% uptime.
Efficient Storage: Delta changes and periodic snapshots minimize duplication.

Conclusion

This design delivers a robust, scalable, and secure real-time collaborative document editing system. By leveraging CRDTs for conflict-free editing, WebSockets for low-latency synchronization, and an event-driven architecture for scalability, it meets all specified requirements while providing a user experience comparable to Google Docs.

Posted on March 11, 2025March 11, 2025 by Robert Baindourov

Design a Real-Time Collaborative Document Editing System (like Google Docs)

Problem Statement:

You need to design a cloud-based real-time collaborative document editor that allows multiple users to edit the same document simultaneously. The system should support real-time updates, conflict resolution, and offline editing.

Requirements:

Functional Requirements:

Users can create, edit, and delete documents.
Multiple users can edit the same document concurrently and see real-time changes.
The system should handle conflicting edits and merge them efficiently.
Users should be able to view document history and revert to previous versions.
Users should be able to edit documents offline, and changes should sync once they’re back online.
The system should be fast, even for large documents with thousands of edits per second.

Non-Functional Requirements:

Low latency (sub-100ms updates for real-time collaboration).
Scalability: The system should support millions of users simultaneously.
Fault tolerance: Ensure minimal data loss even if servers crash.
Security: Handle role-based access control (RBAC) for documents (read-only, edit, admin).
High availability: 99.99% uptime with geo-redundancy.
Efficient storage: Maintain versions without excessive data duplication.

What I Expect –

A quick architecture diagram on – https://excalidraw.com/ outlining major blocks
Any database design / schema.

Discussion Points:

Data Model & Storage
- How will you store documents? (SQL vs NoSQL, CRDT-based storage, event logs)
- How do you efficiently store document history without excessive duplication?
Concurrency & Conflict Resolution
- Which technique would you use to merge concurrent edits? (CRDTs, Operational Transformations, or custom locking mechanisms)
- How do you handle two users making changes to the same word at the same time?
Real-Time Synchronization
- How would you design the event propagation system for real-time updates?
- Would you use WebSockets, Server-Sent Events (SSE), or gRPC?
Offline Editing & Sync
- How do you handle offline users editing a document and later syncing changes?
- How would you resolve conflicts when an offline user makes a conflicting edit?
Scalability & Performance
- How do you scale the system to handle millions of documents and users?
- Would you use sharding, leader-based consensus, or event-driven architecture?
- How would you ensure that real-time updates are fast and efficient?
Security & Access Control
- How do you prevent unauthorized users from modifying a document?
- How do you implement role-based access control (RBAC)?
Versioning & Undo Feature
- How do you store document history efficiently?
- Would you store delta changes or full snapshots?

Key Points

It seems likely that a real-time collaborative document editing system can be designed using ShareJS for real-time updates and MongoDB for storage, supporting features like concurrent editing and offline syncing.
Research suggests that Operational Transformation (OT) is effective for handling conflicts, while Conflict-Free Replicated Data Types (CRDTs) like Automerge could simplify offline editing but may need additional conflict resolution for text.
The evidence leans toward using sharding and a pub-sub system like Redis for scalability, ensuring low latency and high availability for millions of users.

System Overview

This design aims to create a cloud-based real-time collaborative document editor, similar to Google Docs, that supports multiple users editing simultaneously, handles conflicts, and allows offline editing with sync capabilities. The system will be fast, scalable, and secure, meeting all functional and non-functional requirements.

Architecture

The system architecture includes:

Clients: Web browsers using WebSockets for real-time communication.
Load Balancer: Distributes connections to appropriate shards based on document ID for scalability.
Shards: Each shard handles a subset of documents, with multiple server instances running ShareJS for real-time collaboration.
Database: A MongoDB cluster stores document data, with replication for fault tolerance.
Pub-Sub System: Redis facilitates real-time updates within shards, ensuring low latency.

For offline editing, clients queue local operations and sync them when reconnected, leveraging ShareJS’s OT for conflict resolution.

Database Design

The database schema includes:

documents collection: Stores current document state and version.
permissions collection: Manages role-based access control (RBAC) for users.
operations collection: Logs all operations for versioning and undo functionality.

This design ensures efficient storage and quick access to document history without excessive duplication.

Detailed System Design and Analysis

This section provides a comprehensive analysis of designing a real-time collaborative document editing system, addressing all requirements and discussion points. The design leverages established technologies and methodologies to ensure scalability, performance, and user experience.

System Requirements and Design Goals

The system must support:

Functional Requirements: Creation, editing, and deletion of documents; real-time concurrent editing; conflict resolution; document history and versioning; offline editing with sync; high performance for large documents.
Non-Functional Requirements: Low latency (sub-100ms updates), scalability for millions of users, fault tolerance, security with RBAC, high availability (99.99% uptime), and efficient storage.

The design aims to balance these requirements using a combination of Operational Transformation (OT) for real-time collaboration and considerations for offline editing, ensuring a robust and scalable solution.

Data Model and Storage

Storage Strategy

The system uses ShareJS, which implements OT, for real-time collaboration. Documents are stored in a MongoDB cluster for scalability and fault tolerance. The storage strategy involves:

Document State: Stored as JSON objects in the documents collection, with each document having a type (e.g., “text”) and current data.
Operation History: Maintained in the operations collection for versioning and undo, logging each operation with details like document ID, operation data, timestamp, and user ID.
Snapshots: Considered for efficiency, where periodic snapshots of the document state are stored to reduce the need to replay long operation logs for historical versions.

Database Schema

The database design is as follows:

Collection	Fields	Description
documents	_id (string), type (string), data (object), v (integer)	Stores current document state and version number
permissions	_id (string), users (array of objects with user_id and role)	Manages RBAC for each document
operations	_id (string), document_id (string), operation_data (object), timestamp (date), user_id (string)	Logs all operations for versioning and undo

This schema ensures efficient storage and retrieval, with indexing on document_id for quick access to operations and permissions.

SQL vs. NoSQL

MongoDB was chosen over SQL due to its flexibility with JSON-like documents and scalability for handling large volumes of concurrent writes and reads, essential for real-time collaboration.

CRDT-Based Storage

Initially, CRDTs like Automerge were considered for their offline-first capabilities and conflict-free merging. However, for real-time text editing, OT was preferred due to better handling of concurrent edits without manual conflict resolution, which Automerge might require for text overlaps. Automerge remains a viable option for offline editing, but the design leans toward OT for consistency.

Concurrency and Conflict Resolution

Technique Selection

Operational Transformation (OT): Chosen for real-time collaboration, as implemented by ShareJS. OT transforms concurrent operations to maintain document consistency, ensuring that when two users edit the same word simultaneously, the server adjusts operations to merge changes seamlessly.
Conflict Resolution: OT handles conflicts by transforming operations based on their order and position, preserving all changes without loss. For example, if User A inserts text at position 5 and User B deletes at position 5 concurrently, OT adjusts the operations to apply both changes correctly.

Comparison with CRDTs

CRDTs were evaluated, particularly Automerge, for their decentralized merging capabilities. However, for text editing, CRDTs might preserve conflicting edits (e.g., both insertions at the same position), requiring application-level resolution, which could disrupt real-time flow. OT’s centralized approach ensures a single consistent view, making it more suitable.

Real-Time Synchronization

Event Propagation System

WebSockets: Used for bidirectional communication, enabling clients to send edits and receive updates in real-time. Each document has a unique channel in the pub-sub system (Redis), ensuring updates are broadcast to all connected users.
Pub-Sub Implementation: Redis facilitates efficient message passing within each shard, with server instances subscribing to document channels to propagate changes, achieving sub-100ms latency.

Technology Choice

WebSockets were preferred over Server-Sent Events (SSE) or gRPC due to their bidirectional nature, essential for real-time collaboration. gRPC could be considered for high-performance backend communication, but WebSockets align better with browser-based clients.

Offline Editing and Sync

Handling Offline Users

When offline, clients queue local operations using ShareJS’s client-side capabilities, storing them locally. Upon reconnection, these operations are sent to the server.
The server applies these operations, transforming them based on the current document state to handle any intervening changes, ensuring consistency.

Conflict Resolution for Offline Edits

The server uses OT to merge offline operations with the current state. If conflicts arise (e.g., offline user edited the same part as online users), OT transforms the operations to resolve them, maintaining document integrity.
This approach ensures that offline edits are not lost and are seamlessly integrated, with the server broadcasting the updated state to all connected clients.

Scalability and Performance

Scaling Strategy

Sharding: Documents are distributed across multiple shards based on document ID, with each shard handling a subset. This distributes load and ensures scalability for millions of users and documents.
Leader-Based Consensus: Each shard has a primary server instance for document updates, with secondary instances for failover, ensuring consistency and availability.
Event-Driven Architecture: The pub-sub system (Redis) enables event-driven updates, reducing server load by broadcasting changes efficiently.

Ensuring Fast Updates

Low latency is achieved by routing users to the nearest data center (geo-redundancy) and using WebSockets for real-time communication. Redis’s in-memory data structure ensures quick message passing, meeting the sub-100ms requirement.
For large documents, ShareJS’s OT implementation is optimized for frequent updates, with periodic snapshots reducing the need for full operation replays.

Security and Access Control

Preventing Unauthorized Access

All communications are encrypted using HTTPS for web traffic and secure WebSockets, ensuring data privacy.
Authentication is handled through an identity provider, with user sessions validated before allowing operations.

Implementing RBAC

The permissions collection stores user roles (read-only, edit, admin) for each document. Before applying operations, the server checks the user’s role, denying unauthorized actions. This ensures fine-grained access control, meeting security requirements.

Versioning and Undo Feature

Efficient Storage of History

Document history is maintained in the operations collection, logging each edit with timestamp and user ID. This allows replaying operations to reconstruct any version, supporting undo functionality.
To optimize storage, periodic snapshots are stored in the documents collection, reducing the need to process long operation logs for historical access.

Delta Changes vs. Full Snapshots

The system uses delta changes (operations) for real-time updates, stored in the operation log. Full snapshots are taken at intervals (e.g., every 1000 operations) to balance storage efficiency and quick access, ensuring users can revert to previous versions without excessive computation.

Unexpected Detail: Hybrid Approach Consideration

While OT is central to real-time collaboration, the initial exploration of CRDTs like Automerge highlights a potential hybrid approach for offline editing, where CRDTs could simplify syncing but require additional conflict resolution for text. This dual consideration adds flexibility but increases complexity, which was ultimately resolved by favoring OT for consistency.

Conclusion

This design leverages ShareJS for OT-based real-time collaboration, MongoDB for scalable storage, and Redis for efficient pub-sub, ensuring low latency, high availability, and support for offline editing. The sharding mechanism and RBAC implementation meet scalability and security needs, with operation logs and snapshots providing robust versioning and undo features.

Key Citations

Posted on March 7, 2025March 12, 2025 by Robert Baindourov

Plugins to install into vscode.

Here’s a collection of essential VS Code extensions for a lead front-end developer working with React and TypeScript:

Core TypeScript & React Extensions

ESLint – For code quality and style enforcement
Prettier – For consistent code formatting
TypeScript – Official extension with IntelliSense
ES7+ React/Redux/React-Native snippets – Provides shortcuts for common React patterns

Developer Experience

IntelliCode – AI-assisted development with context-aware completions
GitLens – Supercharged Git capabilities within editor
Import Cost – Shows the size of imported packages
Error Lens – Improves error visibility in the editor
Auto Rename Tag – Automatically renames paired HTML/JSX tags

Debugging & Testing

Jest – For running and debugging tests
Debugger for Chrome/Edge – Debugging React apps in browser
Redux DevTools – If using Redux for state management

Performance & Code Quality

Lighthouse – For performance auditing
SonarLint – Detect and fix quality issues
Better Comments – Improve comments with alerts, TODOs, etc.

Component Development

vscode-styled-components – Syntax highlighting for styled-components
Tailwind CSS IntelliSense – If using Tailwind
Material Icon Theme – Better file/folder icons

Architecture & Documentation

Draw.io Integration – For creating architecture diagrams
Todo Tree – Track TODO comments in your codebase
Path Intellisense – Autocompletes filenames in imports

Team Collaboration

Live Share – Real-time collaborative editing
CodeStream – Code discussions and reviews inside VS Code

These extensions cover the technical aspects of React and TypeScript development while also addressing the leadership aspects of documentation, collaboration, and maintainability that are important for a lead developer role.