Reputation
Badges 1
25 × Eureka!Hi GrievingTurkey78
I'm assuming similar to https://github.com/pallets/click/
?
Auto connect and store/override all the parameters?
An upload of 11GB took around 20 hours which cannot be right.
That is very very slow this is 152kbps ...
Yes, I do have my files in the git repo. Although I have not quite understood which part it takes from the remote git repo, and which part it takes from my local system.
it will do "git pull" on the remote machine and then apply any uncommitted changes it has stored in the Task
It seems that one also needs to explicitly hand in the git repo in the pipeline and task definitions via PipelineController,
Correct, unless the pipeline logic and the steps are the same git repo, you can...
EcstaticGoat95 any chance you have an idea on how to reproduce? (even 1 out of 6 is a good start)
(Not sure it actually has that information)
LazyLeopard18 you can point the artifact directly to your azure object storage and have StorageManager download and cache it for you:
MelancholyBeetle72 it will be great if you could also open an issue on Trains and reference the pytorch lightning issue, could you please?
Hi @<1724960468822396928:profile|CumbersomeSealion22>
As soon as I refactor my project into multiple folders, where on top-level I put my pipeline file, and keep my tasks in a subfolder, the clearml agent seems to have problems:
Notice that you need to specify the git repo for each component. If you have a process (step) with more than a single file, you have to have those files inside a git repository, otherwise the agent will not be able to bring them to the remote machine
trains-agent should be deployed to GPU instances, not the trains-server.
The trains-agent purpose is for you to be able to send jobs to a GPU (at least in most cases) instance.
The "trains-server" is a control plane , basically telling the agent what to run (by storing the execution queue and tasks). Make sense ?
It is available of course, but I think you have to have clearmls-server 1.9+
Which version are you running ?
Itβs the correct way to do it, right?
Yep π that said this is not running as a service you will need to spin it on your machine. that said you can definitely connect it with the free SaaS server, and spin the serving on your machine with docker-compose
JitteryCoyote63 This seems like exactly what you are saying, elastic license issue...
Need - in my CI, the url used is https but I need the ssh url to be used. I see that we can pass repo to Task.create but not Task.init
Are you cloning an existing Task, or creating a new one ?
Thank you!
one thing i noticed is that it's not able to find the branch name on >=1.0.6x , while on 1.0.5 it can
That might be it! let me check the code again...
BTW, this one seems to work ....
` from time import sleep
from clearml import Task
Task.set_offline(True)
task = Task.init(project_name="debug", task_name="offline test")
print("starting")
for i in range(300):
print(f"{i}")
sleep(1)
print("done") `
are you referring toΒ
extra_docker_shell_
scrip
t
Correct
the thing is that this runs before you create the virtual environment, so then in the new environment those settings are no longer there
Actually that is better, because this is what we need to setup the pip before it is used. So instead of passing --trusted-host just do:
` extra_docker_shell_script: ["echo "[global] \n trusted-host = pypi.python.org pypi.org files.pythonhosted.org YOUR_S...
CooperativeFox72 I would think the easiest would be to configure it globally in the clearml.conf (rather than add more arguments to the already packed Task.init) π
I'm with on 60 messages being way too much..
Could you open a Github Issue on it, so we do not forget ?
Based on what I see when the ec2 instance starts it installs the latest, could it be this instance is still running?
Hi @<1558986821491232768:profile|FunnyAlligator17>
What do you mean by?
We are able to
set_initial_iteration
to 0 but not
get_last_iteration
.
Are you saying that if your code looks like:
Task.set_initial_iteration(0)
task = Task.init(...)
and you abort and re-enqueue, you still have a gap in the scalars ?
This seems to be the issue:PYTHONPATH = '.'How is that happening ?
Can you try to run the agent with:PYTHONPATH= clearml-agent daemon ....(Notice the prefix PYTHONPATH= clears the environment variable that obviously fails the python commands)
How does a task specify which docker image it needs?
Either in the code itself 'task.set_base_docker' or with the CLI, or set it in the UI when you clone an experiment (everything becomes editable)
Hi CloudySwallow27
how can I just "define" it on my local PC, but not actually run it.
You can use the clearml-task CLI
https://clear.ml/docs/latest/docs/apps/clearml_task#how-does-clearml-task-work
Or you can add the following line in your code, that will cause the execution to stop, and to continue on a remote machine (basically creating the Task and pushing it into an execution queue, or just aborting it)task = Task.init(...) task.execute_remotely()https://clear.ml/do...
WARNING:root:Could not lock cache folder /home/ronslos/.clearml/venvs-cache: [Errno 11] Resource temporarily unavailable
Hi @<1549927125220331520:profile|ZealousHare78>
could it be you are also working on the same machine ? are you running the agent in docker mode or venv mode ?
This looks exactly like the timeout you are getting.
I'm just not sure what's the diff between the Model autoupload and the manual upload.
Hi WittyOwl57
I think what happens is it auto-logs the joblib load/save calls, these calls track models used/created by the code, and attach them to the model repository representing these model.
I'm assuming there are multiple load/save , and there are multiple model instances pointing to the same local file "file:///tmp/..." . The earning basically says it is re-registering existing models.
Make sense ?