3
0 Comments

SPL assists MongoDB: Grouping adjacent documents

The data of a collection (named states) in the MongoDB database is as follows:

[
  {order: 1, state: 'one'},
  {order: 2, state: 'one'},
  {order: 3, state: 'one'},
  {order: 4, state: 'two'},
  {order: 5, state: 'two'},
  {order: 6, state: 'one'},
  {order: 7, state: 'two'},
  {order: 8, state: 'three'},
  {order: 9, state: 'three'}
]

Requirement: Group by adjacent states.
The expected results are as follows:

[
  [
    {order: 1, state: 'one'},
    {order: 2, state: 'one'},
    {order: 3, state: 'one'}
  ],
  [
    {order: 4, state: 'two'},
    {order: 5, state: 'two'}
  ],
  [
    {order: 6, state: 'one'}
  ],
  [
    {order: 7, state: 'two'}
  ],
  [
    {order: 8, state: 'three'},
    {order: 9, state: 'three'}
  ]
]

In MongoDB, it is difficult to directly use query statements to implement the function of grouping adjacent states, because MongoDB’s query language itself does not support this complex grouping logic. This requirement can be indirectly implemented by combining an Aggregation Pipeline with some custom logic.

Here is a way to write it:

db.states.aggregate([
  {
    $sort: { order: 1 }
  },
  {
    $group: {
      _id: null,
      documents: { $push: "$$ROOT" }
    }
  },
  {
    $project: {
      grouped: {
        $reduce: {
          input: "$documents",
          initialValue: {
            previousState: null,
            groups: []
          },
          in: {
            $let: {
              vars: {
                currentState: "$$this.state",
                lastGroup: { $arrayElemAt: ["$$value.groups", -1] }
              },
              in: {
                $cond: [
                  { $eq: ["$$value.previousState", "$$this.state"] },
                  {
                    previousState: "$$this.state",
                    groups: {
                      $concatArrays: [
                        { $slice: ["$$value.groups", { $subtract: [{ $size: "$$value.groups" }, 1] }] },
                        [
                          {
                            $concatArrays: [
                              "$$lastGroup",
                              ["$$this"]
                            ]
                          }
                        ]
                      ]
                    }
                  },
                  {
                    previousState: "$$this.state",
                    groups: {
                      $concatArrays: [
                        "$$value.groups",
                        [["$$this"]]
                      ]
                    }
                  }
                ]
              }
            }
          }
        }
      }
    }
  },
  {
    $project: {
      _id: 0,
      grouped: "$grouped.groups"
    }
  },
  {
    $unwind: "$grouped"
  },
  {
    $project: {
      _id: 0,
      documents: "$grouped"
    }
  },
  {
    $group: {
      _id: null,
      result: { $push: "$documents" }
    }
  },
  {
    $project: {
      _id: 0,
      result: 1
    }
  }
])

The running result is:

{
  result: [
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a427'),
        order: 1,
        state: 'one'
      },
      {
        _id: ObjectId('67c65e846d497a00cd02a428'),
        order: 2,
        state: 'one'
      },
      {
        _id: ObjectId('67c65e846d497a00cd02a429'),
        order: 3,
        state: 'one'
      }
    ],
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a42a'),
        order: 4,
        state: 'two'
      },
      {
        _id: ObjectId('67c65e846d497a00cd02a42b'),
        order: 5,
        state: 'two'
      }
    ],
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a42c'),
        order: 6,
        state: 'one'
      }
    ],
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a42d'),
        order: 7,
        state: 'two'
      }
    ],
    [
      {
        _id: ObjectId('67c65e846d497a00cd02a42e'),
        order: 8,
        state: 'three'
      },
      {
        _id: ObjectId('67c65e846d497a00cd02a42f'),
        order: 9,
        state: 'three'
      }
    ]
  ]
}

SPL provides the function of grouping by adjacent records, and the code is concise and clear:
SPL code

The first three lines are readings, and the calculation code has only one line of A4. A5 returns the results to the json format, and the output results are identical to the original requirements.


esProc SPL is FREE to download: esProc SPL FREE Download

on March 27, 2025
Trending on Indie Hackers
This Week in AI: The Gap Is Getting Clearer User Avatar 45 comments 1 small portfolio change got me 10x more impressions User Avatar 28 comments AI Is Destroying the Traditional Music Business and Here’s Why. User Avatar 22 comments Fixing my sleep using public humiliation and giving away a Kindle User Avatar 16 comments A Tiny Side Project That Just Crossed 100 Users — And Somehow Feels Even More Real Now User Avatar 13 comments From 1k to 12k visits: all it took was one move. User Avatar 11 comments