Reputation
Badges 1
25 × Eureka!Task.init(..., output_uri='s3://...')
Thanks SmallDeer34 !
This is exactly what I needed
DepressedChimpanzee34 I cannot find cfg.py here
https://github.com/allegroai/clearml/tree/master/examples/frameworks/hydra/config_files
(or anywhere else)
Hi AgitatedTurtle16
My question is how to use it to manage my experiments (docker containers). Simply put, let's say:
So basically once you see an experiment in the UI, it means you can launch it on an agent.
There is No need to containerize your experiment (actually that's kind of the idea, removing the need to always containerize everything).
The agent will clone the code, apply uncommitted changes & install the packages in the base-container-image at runtime.
This allows you to u...
AdventurousRabbit79 you are correct, caching was introduced in v1.0 , also notice the default is no caching, you have to specify that you want caching per step.
AgitatedTurtle16 could you check with the latest clearml RC (I remember a similar issue was fixed).pip install clearml==0.17.5rc3
Then run againclearml-task ...
Hi SmallDeer34
On the SaaS you can right click on an experimenter and publish it π
This will make the link available for everyone, would that help?
BTW: this is probably more efficient than pickling
https://pandas.pydata.org/pandas-docs/version/1.1.5/reference/api/pandas.DataFrame.to_parquet.html
Can you see all the agent in the UI (that basically means they are configured correctly and can connect to the server)
Seems that api has changed pretty much since a few versions back.
Correct, notice that your old pipelines Tasks use the older package and will still work.
There seems to be no need inΒ
controller_task
Β anymore, right?
Correct, you can just call pipeline.start()
π
The pipeline creates the tasks, but never executes/enqueues them (they are all inΒ
Draft
Β mode). No DAG graph appears inΒ
RESULTS/PLOTS
Β tab.
Which vers...
` from time import sleep
from clearml import Task
import tqdm
task = Task.init(project_name='debug', task_name='test tqdm cr cl')
print('start')
for i in tqdm.tqdm(range(100)):
sleep(1)
print('done') `The above example code will output a line every 10 seconds (with the default console_cr_flush_period=10) , can you verify it works for you?
Hi LovelyHamster1
As you noted, passing overrides in Args/overrides
, for example ['training.max_epochs=1000']
should work when running with the agent.
Could you verify with the latest RC, there was a fix to support the latest hydra versionpip install clearml==0.17.5rc5
Thanks MagnificentSeaurchin79 !
Let me check what's the status with this one, could it be the same as this one?
https://github.com/allegroai/clearml/issues/322
Hi RobustGoldfish9 ,
I'd much rather just have trains-agent just automatically build the image defined there than have to build the image separately and make it available for all the agents to pull.
Do you mean there is no docker image in the artifactory built based on your Dockerfile ?
I think, this all ties into the none-standard git repo definition. I cannot find any other reason for it. Is it actually stuck for 5 min at the end of the process, waiting for the repo detection ?
By your description it seems to make no difference whether I added the files via sync or add, since I will have to create a new dataset either way.
Sync is design to take a local folder/s and add/remove files from a dataset based on the local changes (it does that automatically based on file existence / content)
The changes (i.e. added files) are uploaded as delta changes relative to the parent version, this means we are not always uploading all files.
Add on the other hand means you...
Hi TrickySheep9
So basically the idea is you can quickly code a scheduler with your own logic, then launch is on the "services queue" to run basically forever π
This could be a good example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py
parser.add_argument( "--dataset_mean", type
=
float, nargs
=
"+", default
=
0.5)
I think providing nargs='+ ' assumes the type is a list. nonetheless we should be able to support it. Could you please add a GitHub issue so we do not forget ?
on the side note, is there any way to automatically give more meaningful names to the running docker containers?
What do you mean by that? running where? and where will you see them ?
TenseOstrich47
I noticed that with one agent, only one task gets executed at one time
Yes you can π
Also, you are correct, a single agent will run a single Task at a time, that said you can have multiple agents running on the same machine, and when you launch them you specify which GPUs they use (in theory they can share the same GPU, but your code might not like it π )
You can see a few examples here:
https://github.com/allegroai/clearml-agent#running-the-clearml-agent
Need - in my CI, the url used is https but I need the ssh url to be used. I see that we can pass repo to Task.create but not Task.init
Are you cloning an existing Task, or creating a new one ?
Ssh is used to access the actual container, all other communication is tunneled on top of it. What exactly is the reason to bind to 0.0.0.0 ? Maybe it could be a flag that you, but I'm not sure in what's the scenario and what are we solving, thoughts?
Creating a dataset sounds like a good idea, but that does not seem to be the issue.
Can you verify you can manually clone using the same link (notice the log should specify the exact clone it is using, with the password replaced with *)
maybe this can cause the issue?
Not likely.
In the original pipeline (the one executed from the Pycharm) do you see the "Pipeline" section under Configuration -> "Config objects" in the UI?
Hi AgitatedTurtle16 could you verify you can access the API server with curl?
OK, so if I've got, like, 2x16GB GPUs ...
You could do:clearml-agent daemon --queue "2xGPU_32gb" --gpus 0,1
Which will always use the two gpus for every Task it pulls
Or you could do:clearml-agent daemon --queue "1xGPU_16gb" --gpus 0 clearml-agent daemon --queue "1xGPU_16gb" --gpus 1
Which will have two agents, one per GPU (with 16gb per Task it runs)
Orclearml-agent daemon --queue "2xGPU_32gb" "1xGPU_16gb" --gpus 0,1
Which will first pull Tasks from the "2xGPU_32gb" qu...
and this path should follow linux folder structure not a single file like the current .zip.
I like where this is going π
So are we thinking like a "shared" folder where the data is kept "warm" and a single source of truth where the packaged zip file is stored (like object storage, e.g. S3)
I think the clearml-session CLI is missing the ability to add cutom port to the external address, does that make sense ?
but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?
This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)
DeliciousBluewhale87 not on the opensource, for some reason it is not passed π
Could you explain the use case ?