Hi TenseOstrich47 , the StorageManager does use boto3 for those upload (so if its not supported by boto3, same for StorageManager :/ )
Maybe you can use the 'wait_for_upload' and delete the local file after?
The AWS autoscaler doesn’t related to other tasks, you can think about it as a service running in your Trains system.
and are configured in the auto scale task?
Didn’t get that 😕
So you mean args.lastiter
is not the last iteration? Can you try replacing it with task.get_last_iteration()
?
This seems to be the same issue like in https://clearml.slack.com/archives/CTK20V944/p1633599511350600
Whats the pyjwt
version you are using?
Hi IntriguedRat44
If you don’t want sending framework’s outputs, you can disable those with auto_connect_frameworks=False
in your Task.init
call.
You can find more options https://github.com/allegroai/trains/blob/master/trains/task.py#L328
this is the one from the original (template) task? I can’t see the package that raise the error, can you try adding it and re-run? do you have the imports analysis?
BTW you have both trains
and clearml
, can you try with clearml
only? it should support all the trains
imports
Hi SteepDeer88 ,
You can use https://clear.ml/docs/latest/docs/apps/clearml_task for this, what do you think?
Yap, you right the math part. You can add column from metrics and hyper-params too, but currently we don’t have total duration as a column.
Let me check about the duration and what we can do
python invoked oom-killer
Out of memory, CloudySwallow27 in the scaler app task, can you check if you have scalers reporting?
NonchalantDeer14 thanks for the logs, do you maybe have some toy example I can run to reproduce this issue my side?
So which data is being deleted? which folder is the “artifact folder”?
Hi HealthyStarfish45
If you are running the task via docker, we dont auto detect the image and docker command, but you have more than one way to set those:
You can set the docker manually like you suggested. You can configure the docker image + commands in your ~/trains.conf
https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L130 (on the machine running the agent). You can start the agent with the image you want to run with. You can change the base docker image...
can you try with the latest?pip install clearml==1.7.3rc1
Hi ElegantCoyote26 ,
` - cleanup_period_in_days (float): The time period between cleanups. Default: 1.
- run_as_service (bool): The script will be execute remotely (Default queue: "services"). Default: True.
so
run_as_servicewill not run the script locally on your machine but just enqueue the script to the
services ` queue (you should have clearml-agent in services mode listening to this queue, and the agent will run this service)
I want to verify it doesn’t start more than one instance for the same task
Hi SpotlessFish46
In order to mark task as completed, it should start first, can you try:
task = Task.create(project_name="projectX", task_name='YYY') task.mark_started() task.completed()
?
SteepDeer88 which clearml
version are you using?
Hi SubstantialElk6 .
ClearML with add you entire script to the uncommitted changes section if its a standalone script (not part of a git repository).
If you run a script that is part of a git repository, the uncommitted changes section will contain the git diff of you work, along with the git repository address, branch, commit id or tag.
In order to re run a clone of this task, the agent running it will need to have the credentials to the repository (in order to clone it)
If run_as_service
is False
, the script will start running on your machine, once you will clone and enqueue it, it will run twice (on your machine and by clearml-agent)
i’m guessing the cleanup_period_in_days can only actually run every day or whatever if the script is enqueued to
services
you can change this value if you like (e.g. 0.5 for every 12 hours)
I think the only way you can get it is from the task attribute:
ds = Dataset.get(dataset_id="your dataset id") ds_uri = ds._task.artifacts.get("data").url
whats the clearml version you are using?
According to the message above, can you try installing nbconvert
and re run it? You should be able to view the script in the uncommitted changed
` pip install nbconvert
OR
conda install nbconvert `
is there a resources like youtube videos or tuotorials about using clearML? I watched and learned from ClearML chaneel in youtube but I think I need more to see maybe I have done something wrong?
Yes,
ClearML Youtube (best ever) channel - https://www.youtube.com/c/ClearML/featur...
SquareFish25 Will try to reproduce it
Hi GrievingTurkey78
If you like to have the same environment in trains-agent
, you can use on your local machine the detect_with_pip_freeze
option, on you ~/trains.conf
file.
Just change detect_with_pip_freeze: true
( https://github.com/allegroai/trains/blob/master/docs/trains.conf#L168 is an example)
ImmensePenguin78 I think you can get it with the APIClient
, you can add force
to the call:
` from clearml import Task
from clearml.backend_api.session.client import APIClient
api_client = APIClient()
t = Task.init(project_name="Your project", task_name="Your task name")
t.close()
api_client.tasks.failed(t.id, force=True, status_reason="Your status reason", status_message="Your status message") `
If you want to clear the parameters, you can try overriding with an empty dict
cloned_task.set_parameters({})
thanks for the answer, so for example (to make sure I understand) with the example you gave above when I’ll print the config I’ll see the new edited parameters?
Correct
What about the second part of the question, would it be parsed according to the type hinting?
It should
BattyLion34 only add will also work, because it will be in the diff section
Hi HelpfulHare30 , can you try upgrade to the latest ClearML agent?
pip install clearml-agent==1.0.0