chipiron.scripts.generate_datasets package

Submodules

chipiron.scripts.generate_datasets.generate_boards module

chipiron.scripts.generate_datasets.generate_boards.decompress_zst(zst_path: Path, output_pgn_path: Path) None[source]

Decompress .zst to .pgn using zstd command line or Python fallback.

chipiron.scripts.generate_datasets.generate_boards.download_month_zst(month: str, dest_dir: Path) Path[source]

Download the compressed monthly PGN (.zst) file for a given month into dest_dir and return its path.

chipiron.scripts.generate_datasets.generate_boards.ensure_month_pgn(month: str, dest_dir: Path, keep_decompressed: bool = False) Path[source]

Ensure decompressed monthly PGN exists; download & decompress if needed; return .pgn path.

chipiron.scripts.generate_datasets.generate_boards.generate_board_dataset_multi_months(output_file_path: str, max_boards: int = 10000000, sampling_frequency: int = 50, offset_min: int = 5, seed: int | None = 0, start_month: str = '2015-03', max_months: int | None = None, delete_pgn_after_use: bool = True, intermediate_every_games: int = 10000, dest_dir: Path | None = None) None[source]

Generate dataset streaming through monthly Lichess dumps downloaded on-the-fly.

Stops when max_boards collected or month limit reached. Each month PGN is deleted when done (optional).

chipiron.scripts.generate_datasets.generate_boards.iterate_months(start_month: str) Generator[str, None, None][source]

Yield month strings (YYYY-MM) starting at start_month incrementing by one month indefinitely.

chipiron.scripts.generate_datasets.generate_boards.process_game(game: GameNode, total_count_move: int, the_dic: list[dict[str, str]], sampling_frequency: int, offset_min: int = 5) int[source]

Process a single game and extract positions (no eval requirement).

chipiron.scripts.generate_datasets.generate_boards.save_dataset_progress(the_dic: list[dict[str, str]], output_file_path: str, count_game: int, total_count_move: int, max_boards: int, total_games_in_file: int | None, total_moves_in_file: int | None, input_pgn_file_path: str, sampling_frequency: int, offset_min: int, seed: int | None, is_final: bool = False, months_used: list[str] | None = None) int[source]

Save dataset progress (intermediate or final) and display statistics.

Parameters:
  • the_dic – Current list of board positions

  • output_file_path – Path where to save the pickle file

  • count_game – Number of games processed so far

  • total_count_move – Number of moves processed so far

  • max_boards – Maximum target board positions

  • total_games_in_file – Total games in source file (optional)

  • total_moves_in_file – Total moves in source file (optional)

  • input_pgn_file_path – Source PGN file path

  • sampling_frequency – Sampling frequency for moves

  • offset_min – Minimum offset for sampling

  • seed – Random seed used

  • is_final – Whether this is the final save (adds additional metadata)

  • months_used – List of months processed (for dynamic mode)

Returns:

Number of board positions recorded so far

chipiron.scripts.generate_datasets.generate_labelled_boards module

chipiron.scripts.generate_datasets.generate_labelled_boards.empty() None[source]

chipiron.scripts.generate_datasets.generate_over_boards module

Generate chess over boards dataset from Lichess PGN dumps.

chipiron.scripts.generate_datasets.generate_over_boards.generate_over_boards_dataset_legacy(input_pgn_file_path: str, output_file_path: str, max_boards: int = 1000000, total_games_in_file: int | None = None, total_moves_in_file: int | None = None, intermediate_every_games: int = 10000) None[source]

Generate a dataset of game-ending chess board positions from a single PGN file. Legacy function for backwards compatibility.

chipiron.scripts.generate_datasets.generate_over_boards.generate_over_boards_dataset_multi_months(output_file_path: str, max_boards: int = 1000000, seed: int | None = 0, start_month: str = '2015-03', max_months: int | None = None, delete_pgn_after_use: bool = True, intermediate_every_games: int = 10000, dest_dir: Path | None = None) None[source]

Generate over boards dataset streaming through monthly Lichess dumps downloaded on-the-fly.

Stops when max_boards collected or month limit reached. Each month PGN is deleted when done (optional).

chipiron.scripts.generate_datasets.generate_over_boards.is_game_over_position(board: Board) bool[source]

Check if a board position represents a game-ending situation. Only considers checkmate, stalemate, and insufficient material. Since we only check final positions, this will only be called on game endings.

Parameters:

board – The chess board to check

Returns:

True if the position represents a game-ending situation

chipiron.scripts.generate_datasets.generate_over_boards.process_game_for_over_positions(game: Game, total_count_move: int, the_dic: list[dict[str, Any]]) int[source]

Process a single game and extract only the final game-ending board position. Since game-over situations only happen at the end, we only need to check the final position.

Parameters:
  • game – The chess game to process

  • total_count_move – Current total move count across all games

  • the_dic – List to append board positions to

Returns:

Updated total_count_move after processing this game

Module contents