JitteryCoyote63 This seems like exactly what you are saying, elastic license issue...
Need - in my CI, the url used is https but I need the ssh url to be used. I see that we can pass repo to Task.create but not Task.init
Are you cloning an existing Task, or creating a new one ?
Thank you!
one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can
That might be it! let me check the code again...
BTW, this one seems to work ....
` from time import sleep
from clearml import Task
Task.set_offline(True)
task = Task.init(project_name="debug", task_name="offline test")
print("starting")
for i in range(300):
print(f"{i}")
sleep(1)
print("done") `
are you referring toΒ
extra_docker_shell_
scrip
t
Correct
the thing is that this runs before you create the virtual environment, so then in the new environment those settings are no longer there
Actually that is better, because this is what we need to setup the pip before it is used. So instead of passing --trusted-host just do:
` extra_docker_shell_script: ["echo "[global] \n trusted-host = pypi.python.org pypi.org files.pythonhosted.org YOUR_S...
CooperativeFox72 I would think the easiest would be to configure it globally in the clearml.conf (rather than add more arguments to the already packed Task.init) π
I'm with on 60 messages being way too much..
Could you open a Github Issue on it, so we do not forget ?
Based on what I see when the ec2 instance starts it installs the latest, could it be this instance is still running?
Hi @<1558986821491232768:profile|FunnyAlligator17>
What do you mean by?
We are able to
set_initial_iteration
to 0 but not
get_last_iteration
.
Are you saying that if your code looks like:
Task.set_initial_iteration(0)
task = Task.init(...)
and you abort and re-enqueue, you still have a gap in the scalars ?
This seems to be the issue:PYTHONPATH = '.'How is that happening ?
Can you try to run the agent with:PYTHONPATH= clearml-agent daemon ....(Notice the prefix PYTHONPATH= clears the environment variable that obviously fails the python commands)
How does a task specify which docker image it needs?
Either in the code itself 'task.set_base_docker' or with the CLI, or set it in the UI when you clone an experiment (everything becomes editable)
Hi CloudySwallow27
how can I just "define" it on my local PC, but not actually run it.
You can use the clearml-task CLI
https://clear.ml/docs/latest/docs/apps/clearml_task#how-does-clearml-task-work
Or you can add the following line in your code, that will cause the execution to stop, and to continue on a remote machine (basically creating the Task and pushing it into an execution queue, or just aborting it)task = Task.init(...) task.execute_remotely()https://clear.ml/do...
WARNING:root:Could not lock cache folder /home/ronslos/.clearml/venvs-cache: [Errno 11] Resource temporarily unavailable
Hi @<1549927125220331520:profile|ZealousHare78>
could it be you are also working on the same machine ? are you running the agent in docker mode or venv mode ?
This looks exactly like the timeout you are getting.
I'm just not sure what's the diff between the Model autoupload and the manual upload.
Hi WittyOwl57
I think what happens is it auto-logs the joblib load/save calls, these calls track models used/created by the code, and attach them to the model repository representing these model.
I'm assuming there are multiple load/save , and there are multiple model instances pointing to the same local file "file:///tmp/..." . The earning basically says it is re-registering existing models.
Make sense ?
I double checked the code it's always being passed π
BroadMole98 as one can expect long answer as well π
I have a workflow with 19000 job nodes in it.
wow, 19k job nodes? as in a single pipeline 19k steps?
The main idea of the trains-agent is to allow multi-node workloads, and creating pipelines on top of a scheduler without worrying about docker packaging (done automatically for you), and to have a proper scheduler with priority (that is missing from k8s)
If the first step is just "logging" all the steps, you can easily add "Task...
shared "warm" folder without having to download the dataset locally.
This is already supported π
Configure the sdk.storage.cache.default_base_dir in your clearml.conf to point to a shared (mounted) folder
https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L205
That's it π
We should probably add (set_task_type :))
Hi ReassuredTiger98
To separate between minio and S3 we use:
s3://bucket/file for AWS S3 service and s3://server :port/bucket/file for minio.
this means if your S3 links would have been s3://<minio-address>:<port>/bucket/file.bin the UI would have popped the cred window.
Make sense ?
Sadly, I think we need to add another option like task_init_kwargs to the component decorator.
what do you think would make sense ?
SoreDragonfly16 could you test with Task.init using reuse_last_task_id=False for example:task = Task.init('project', 'experiment', reuse_last_task_id=False)The only thing that I can think of is running two experiments with the same project/name on the same machine, this will ensure every time you run the code, you create a new experiment.
MelancholyBeetle72 there is an RC with a fix, check the GitHub issue for details :)
JitteryCoyote63 I think that without specifically adding torch to the requirements, the agent will not be able to automatically resolve the correct cuda/torch version. Basically you should add torch to the requirements.txt file, and provide it to Task create, or use Task.force_requirements_env_freeze
@<1523707653782507520:profile|MelancholyElk85> I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?
Ohh then we can definitely support it, could you maybe post a toy example for testing? Or even better PR it to the examples/tensorboardX folder?
It does not upload, the default behavior is to log the artifact (so you know where you stored, but not enforce unnecessary uploads)
If you were to change:task = Task.init(project_name='examples', task_name='Keras with TensorBoard example')to:task = Task.init(project_name='examples', task_name='Keras with TensorBoard example', output_uri=" ")It would also upload the model
UI for some anomalous file,
Notice the metrics are not files/artifacts, just scalars/plots/console