so when inside the docker, I don’t see the git repo and that’s why ClearML doesn’t see it
Correct ...
I could map the root folder of the repo into the container, but that would mean everything ends up in there
This is the easiest, you can put it on the ENV variable :
None
yes, so you can have a few options 🙂
, I generate some more graphs with a file called
graphs.py
and want to attach/upload to this training task
Make total sense to use Task.get_task, I just want to make sure that you are aware of all the options, so you pick the correct one for you :)
but when I run the same task again it does not map the keys.. (edited)
SparklingElephant70 what do you mean by "map the keys" ?
so the docker didnt use the dns of the host?
I'm assuming it is not configured on your DNS, otherwise it would have been resolved...
gm folks, really liking ClearML so far as my top choice (after looking at dvc, mlflow), and thank you for your help here!
Thanks HurtWoodpecker30 !
Is there a recommended workflow to be able to “drop into” the
exact
env
(code, venv, data) of a previous experiment (which may have been several commits ago), to reproduce that experiment?
You can use clearml-agent on your local machine to build the env of any Task,
` clearml-agent build --id <ta...
so what should the value of "upload_uri" to set to,
fileserver_url
e.g.
?
yes, that would work.
That sounds like an issue with "working dir" , check the "Execution" "Working Directory" field.
'.' means the root of the git repository
'subfolder' means run the script from the subfolder etc. also make sure that the script path is adjusted accordingly.
btw: Trains should have filled in all the correct paths... If you have time get the latest trains (0.14.3) and run again see if the problem consts, we should probably fix that bug 🙂
The base task is self-contained i.e. it downloads training/eval directly data and has direct access to it
I think this is the main issue, how come it does not catch it? Are you using argparser ?
Good news a dedicated class for exactly that will be out in a few days 🙂
Basically task scheduler and task trigger scheduler, running as a service cloning/launching tasks either based on time (cron alike) or based on a trigger).
wdyt?
What would be the best way to get all the models trained using a certain Task, I know we can use query_models to filter models based on Project and Task, but is it the best way?
On the Task object itself you have all the models.Task.get_task(task_id='aabb').models['output']
query
tasks
that are both Running --> You mean
status=["in_progress"]
Yes!
How do I figure out other possible parameter I can use with
status
parameter?
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksget_all
https://clear.ml/docs/latest/docs/references/api/definitions#taskstask
Filter only tasks that start say
10 min ago
. Is there any parameter for it also ?
last_update or created then use...
GrievingTurkey78 yes, you are correct on both.
Will the sweep functionality work?
Yes it should, that said, it will not use the trains-agent
so you are limited to the machine running the sweep.
If you want to do HPO on multi-node, checkout this example 🙂
https://github.com/allegroai/trains/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.py
Hmm I see, if this is the case, would it make sense to run the pipeline logic locally? (notice the pipeline compute, i.e. the components will be running on remote machines with the agents)
CrookedWalrus33 can you post the clearml.conf you have on the agent machine?
We would "donate" back to the community a docker stack template that can be used to set up the community edition.
Perfect, feel free to PR to the clearml-server repository, we can take it from there
🙏 🙏 😍
Any chance you can test with the latest RC ? 1.8.4rc2
is there a way that i can pull all scalars at once?
I guess you mean from multiple Tasks ? (if so then the answer is no, this is on a per Task basis)
Or, can i get experiments list and pull the data?
Yes, you can use Task.get_tasks to get a list of task objects, then iterate over them. Would that work for you?
https://clear.ml/docs/latest/docs/references/sdk/task/#taskget_tasks
files_server:
://genuin-ai/
should be:
files_server:
Cloud Access section is in the
Profile
page.
Any storage credentials (S3 for example) are only stored on the client side (never the trains-server), this is the reason we need to configure them in the trains.conf. When the browser needs to access those URL's (downloading an artifact) it also needs the secret/key, it automatically display a popup requesting them, and will store them in this section. Notice they are stored on the browser session (as a cookie).
Okay found the issue, to disable SSL verification global add the following env variable:CLEARML_API_HOST_VERIFY_CERT=0
(I will make sure we fix the actual issue with the config file)
ResponsiveHedgehong88 so I would suggest using execute_remotely in your code, basically you start locally you make sure everything is passed as intended, then from within the code you call task.execute_remotely(...)
which will stop the current process and enqueue the Task on the selected queue for the agent to execute.
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127
This way you can both easily test...
is it possible to change an existing model's URL?
Edit the DBs ... That's basically the only way 😞
I guess we should have obfuscated the name better 😄
That should work 🙂
BTW, you might play around with "clearml-agent execute --id <task_id_here>"
This will basically clone the code, create a venv with the python packages, apply uncommitted changes and will run the actual code. This could be a replacement for your bash. (notice it means that you need to clone the Task in the UI, then you can Change parameters, then the run the agent manually in SLURM and it will take the params from the UI.)
SmarmyDolphin68
BTW: there is no automatic reporting when you have task = Task.get_task(task_id='your_task_id')
It's only active when you have one "main" task.
You can also check the continue_last_task
argument in Task.init , it might be a good fit for your scenario
https://allegro.ai/docs/task.html#trains.task.Task.init