chipiron.scripts.generate_datasets package
Submodules
chipiron.scripts.generate_datasets.generate_boards module
- chipiron.scripts.generate_datasets.generate_boards.decompress_zst(zst_path: Path, output_pgn_path: Path) None[source]
Decompress .zst to .pgn using zstd command line or Python fallback.
- chipiron.scripts.generate_datasets.generate_boards.download_month_zst(month: str, dest_dir: Path) Path[source]
Download the compressed monthly PGN (.zst) file for a given month into dest_dir and return its path.
- chipiron.scripts.generate_datasets.generate_boards.ensure_month_pgn(month: str, dest_dir: Path, keep_decompressed: bool = False) Path[source]
Ensure decompressed monthly PGN exists; download & decompress if needed; return .pgn path.
- chipiron.scripts.generate_datasets.generate_boards.generate_board_dataset_multi_months(output_file_path: str, max_boards: int = 10000000, sampling_frequency: int = 50, offset_min: int = 5, seed: int | None = 0, start_month: str = '2015-03', max_months: int | None = None, delete_pgn_after_use: bool = True, intermediate_every_games: int = 10000, dest_dir: Path | None = None) None[source]
Generate dataset streaming through monthly Lichess dumps downloaded on-the-fly.
Stops when max_boards collected or month limit reached. Each month PGN is deleted when done (optional).
- chipiron.scripts.generate_datasets.generate_boards.iterate_months(start_month: str) Generator[str, None, None][source]
Yield month strings (YYYY-MM) starting at start_month incrementing by one month indefinitely.
- chipiron.scripts.generate_datasets.generate_boards.process_game(game: GameNode, total_count_move: int, the_dic: list[dict[str, str]], sampling_frequency: int, offset_min: int = 5) int[source]
Process a single game and extract positions (no eval requirement).
- chipiron.scripts.generate_datasets.generate_boards.save_dataset_progress(the_dic: list[dict[str, str]], output_file_path: str, count_game: int, total_count_move: int, max_boards: int, total_games_in_file: int | None, total_moves_in_file: int | None, input_pgn_file_path: str, sampling_frequency: int, offset_min: int, seed: int | None, is_final: bool = False, months_used: list[str] | None = None) int[source]
Save dataset progress (intermediate or final) and display statistics.
- Parameters:
the_dic – Current list of board positions
output_file_path – Path where to save the pickle file
count_game – Number of games processed so far
total_count_move – Number of moves processed so far
max_boards – Maximum target board positions
total_games_in_file – Total games in source file (optional)
total_moves_in_file – Total moves in source file (optional)
input_pgn_file_path – Source PGN file path
sampling_frequency – Sampling frequency for moves
offset_min – Minimum offset for sampling
seed – Random seed used
is_final – Whether this is the final save (adds additional metadata)
months_used – List of months processed (for dynamic mode)
- Returns:
Number of board positions recorded so far
chipiron.scripts.generate_datasets.generate_labelled_boards module
chipiron.scripts.generate_datasets.generate_over_boards module
Generate chess over boards dataset from Lichess PGN dumps.
- chipiron.scripts.generate_datasets.generate_over_boards.generate_over_boards_dataset_legacy(input_pgn_file_path: str, output_file_path: str, max_boards: int = 1000000, total_games_in_file: int | None = None, total_moves_in_file: int | None = None, intermediate_every_games: int = 10000) None[source]
Generate a dataset of game-ending chess board positions from a single PGN file. Legacy function for backwards compatibility.
- chipiron.scripts.generate_datasets.generate_over_boards.generate_over_boards_dataset_multi_months(output_file_path: str, max_boards: int = 1000000, seed: int | None = 0, start_month: str = '2015-03', max_months: int | None = None, delete_pgn_after_use: bool = True, intermediate_every_games: int = 10000, dest_dir: Path | None = None) None[source]
Generate over boards dataset streaming through monthly Lichess dumps downloaded on-the-fly.
Stops when max_boards collected or month limit reached. Each month PGN is deleted when done (optional).
- chipiron.scripts.generate_datasets.generate_over_boards.is_game_over_position(board: Board) bool[source]
Check if a board position represents a game-ending situation. Only considers checkmate, stalemate, and insufficient material. Since we only check final positions, this will only be called on game endings.
- Parameters:
board – The chess board to check
- Returns:
True if the position represents a game-ending situation
- chipiron.scripts.generate_datasets.generate_over_boards.process_game_for_over_positions(game: Game, total_count_move: int, the_dic: list[dict[str, Any]]) int[source]
Process a single game and extract only the final game-ending board position. Since game-over situations only happen at the end, we only need to check the final position.
- Parameters:
game – The chess game to process
total_count_move – Current total move count across all games
the_dic – List to append board positions to
- Returns:
Updated total_count_move after processing this game