@<1523701205467926528:profile|AgitatedDove14> Then it isn't working at intended. To test it I started the scheduler and set a simple dead man snitch process to run once a day. In the web-app (on your site app.cleearml.ml), when looking at the scheduler process in the DevOps section, I was able to see a configuration file under artifacts but it was not as all obvious how you'd change that because it wasn't part of the configuration section, it was just an artifact. So I thought maybe it was because the process was still running, so I aborted and reset the scheduler, as you suggested, but that just cleared the artifact and there was still nothing in the configuration object (see attached).
So I decided to rerun the code that created the scheduler in the first place.
import pyrootutils
root = pyrootutils.setup_root(
search_from=__file__,
indicator=[".git", "pyproject.toml"],
pythonpath=True,
dotenv=True,
)
from clearml.automation import TaskScheduler
from clearml import Task
from loguru import logger
from src.utils import get_config
@logger.catch
def main():
logger.add("logs/task_scheduler.log", rotation="1 month", retention="1 year")
schedule: TaskScheduler = TaskScheduler(sync_frequency_minutes=120)
# load the project config
cfg = get_config()
task_name = cfg["snitch_task_name"]
queue = cfg["queue"]
############################################
# Connect the snitch to the TaskScheduler. #
############################################
task_list = Task.get_tasks(
project_name="DevOps",
task_name=task_name,
allow_archived=False,
task_filter={"status": ["completed"], "order_by": ["-last_update"]},
)
if len(task_list) > 0:
task_id = task_list[0].id
logger.info(f"Found task {task_name} with ID {task_id}")
schedule.add_task(
name="Snitch-TaskScheduler",
schedule_task_id=task_id,
reuse_task=True,
queue=queue,
minute=00,
hour=7,
day=1,
)
else:
logger.error(
f"Task {task_name} not found in ClearML. Make sure to run the snitch.py script first."
)
return
schedule.start_remotely(queue=queue)
if __name__ == "__main__":
main()
The scheduler process starts, enqueues, and seems to work, but I don't see any config object or artifact this time (see attached). The console shows:
agent.cudnn_version = 0
sdk.storage.cache.default_base_dir = ~/.clearml/cache
sdk.storage.cache.size.min_free_bytes = 10GB
sdk.storage.direct_access.0.url = file://*
sdk.metrics.file_history_size = 100
sdk.metrics.matplotlib_untitled_history_size = 100
sdk.metrics.images.format = JPEG
sdk.metrics.images.quality = 87
sdk.metrics.images.subsampling = 0
sdk.metrics.tensorboard_single_series_per_graph = false
sdk.network.metrics.file_upload_threads = 4
sdk.network.metrics.file_upload_starvation_warning_sec = 120
sdk.network.iteration.max_retries_on_server_error = 5
sdk.network.iteration.retry_backoff_factor_sec = 10
sdk.network.file_upload_retries = 3
sdk.aws.s3.key =
sdk.aws.s3.region = eu-west-1
sdk.aws.boto3.pool_connections = 512
sdk.aws.boto3.max_multipart_concurrency = 16
sdk.log.null_log_propagate = false
sdk.log.task_log_buffer_capacity = 66
sdk.log.disable_urllib3_info = true
sdk.development.task_reuse_time_window_in_hours = 72.0
sdk.development.vcs_repo_detect_async = true
sdk.development.store_uncommitted_code_diff = true
sdk.development.support_stopping = true
sdk.development.default_output_uri =
sdk.development.force_analyze_entire_repo = false
sdk.development.suppress_update_message = false
sdk.development.detect_with_pip_freeze = false
sdk.development.worker.report_period_sec = 2
sdk.development.worker.ping_period_sec = 30
sdk.development.worker.log_stdout = true
sdk.development.worker.report_global_mem_used = false
sdk.development.worker.report_event_flush_threshold = 100
sdk.development.worker.console_cr_flush_period = 10
sdk.apply_environment = false
sdk.apply_files = false
api.version = 1.5
api.verify_certificate = true
api.default_version = 1.5
api.http.max_req_size = 15728640
api.http.retries.total = 240
api.http.retries.connect = 240
api.http.retries.read = 240
api.http.retries.redirect = 240
api.http.retries.status = 240
api.http.retries.backoff_factor = 1.0
api.http.retries.backoff_max = 120.0
api.http.wait_on_maintenance_forever = true
api.http.pool_maxsize = 512
api.http.pool_connections = 512
api.api_server =
api.web_server =
api.files_server =
api.credentials.access_key = S8T2YH1QWZCYNT1KNWP7
api.host =
environment.SOPS_AGE_KEY_FILE = ****
Executing task id [4586de32d5244b76bfd31cc810b2fb48]:
repository = git@github.com:TicketSwap/task-scheduler.git
branch = main
version_num = 7e7f40fe05e453b51207dfe0d735978fa9936634
tag =
docker_cmd =
entry_point = src/main.py
working_dir = .
::: Using Cached environment /home/natephysics/.clearml/venvs-cache/43c3da0e830954e1387980300f6708df.87471c9bd5c92dc7daad6e47efc48aee :::
Using cached repository in "/home/natephysics/.clearml/vcs-cache/task-scheduler.git.90fcaf8f8f73b3fd1d12fd948a2f9c52/task-scheduler.git"
Note: switching to '7e7f40fe05e453b51207dfe0d735978fa9936634'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 7e7f40f feat: :tada: Initial Commit
type: git
url: git@github.com:TicketSwap/task-scheduler.git
branch: HEAD
commit: 7e7f40fe05e453b51207dfe0d735978fa9936634
root: /home/natephysics/.clearml/venvs-builds.8/3.11/task_repository/task-scheduler.git
Applying uncommitted changes
Executing: ('git', 'apply', '--unidiff-zero'): b'<stdin>:7: trailing whitespace.\nclearml==1.14 # MLops platform \nwarning: 1 line adds whitespace errors.\n'
Adding venv into cache: /home/natephysics/.clearml/venvs-builds.8/3.11
2024-01-15 10:51:43
Running task id [4586de32d5244b76bfd31cc810b2fb48]:
[.]$ /home/natephysics/.clearml/venvs-builds.8/3.11/bin/python -u src/main.py
Summary - installed python packages:
pip:
- attrs==23.2.0
- certifi==2023.11.17
- charset-normalizer==3.3.2
- clearml==1.14.0
- Cython==3.0.8
- furl==2.1.3
- idna==3.6
- jsonschema==4.20.0
- jsonschema-specifications==2023.12.1
- loguru==0.7.2
- numpy==1.26.3
- orderedmultidict==1.0.1
- pathlib2==2.3.7.post1
- pillow==10.2.0
- psutil==5.9.7
- PyJWT==2.8.0
- pyparsing==3.1.1
- pyrootutils==1.0.4
- python-dateutil==2.8.2
- python-dotenv==1.0.0
- PyYAML==6.0.1
- referencing==0.32.1
- requests==2.31.0
- rpds-py==0.17.1
- six==1.16.0
- urllib3==2.1.0
Environment setup completed successfully
Starting Task Execution:
ClearML results page:
2024-01-15 09:51:42.520 | INFO | __main__:main:40 - Found task Snitch-TaskScheduler with ID 383d86104e8e44a99bdf9aeabe8296a2
Syncing scheduler
Failed deserializing configuration: the JSON object must be str, bytes or bytearray, not NoneType
Syncing scheduler
2024-01-15 10:51:48
Failed deserializing configuration: the JSON object must be str, bytes or bytearray, not NoneType
Waiting for next run [UTC 2024-01-16 07:00:00], sleeping for 120.00 minutes, until next sync.
I'm not sure why it's even trying to deserialize something because I'm just starting a new scheduler. It worked when I used it before when I ran it locally so I assume it has something to do with the start_remote() but I also get it when I start it locally.