
Reputation
Badges 1
25 × Eureka!I think the real issue is that I am not able to specify a platform for the model,
None
there is no need to specify it, remove it from the config.pbtxt - the clearml-serving will automatically add the background
. That speed depends on model sizes, right?
in general yes
Hope that makes sense. This would not work under heavy loads, but eg we have models used once a week only. They would just stay unloaded until use - and could be offloaded afterwards.
but then you still might encounter timeout the first time you access them, no?
Notice you have configure the shared driver for the docker, as the volume mount doesn't work without it. https://stackoverflow.com/a/61850413
Hi PungentLouse55 ,
I think can see how these magic lines solved it, and I think you are onto something.
Any chance what happened is multiple workers were trying to simultaneously save/load the same Model ?
Any chance your code needs more than the main script, but it is Not in a git repo? Because the agent supports either single script file, or a git repo with multiple files
BTW: for future reference, if you set the ulimit in the bash, all processes created after that should have the new ulimit
DistressedGoat23 check this example:
https://github.com/allegroai/clearml/blob/master/examples/optimization/hyper-parameter-optimization/hyper_parameter_optimizer.pyaSearchStrategy = RandomSearch
It will collect everything on the main Task
This is a curial point for using clearml HPO since comparing dozens of experiments in the UI and searching for the best is just not manageable.
You can of course do that (notice you can actually order them by scalars they report, and even do ...
Hi @<1692345677285167104:profile|ThoughtfulKitten41>
Is it possible to trigger a pipeline run via API?
Yes! a pipeline is at the end a Task, you can take the pipeline ID and clone and enqueue it
pipeline_task = Task.clone("pipeline_id_here")
Task.enqueue(pipeline_task, queue_name="services")
You can also monitor the pipeline with the same Task inyerface.
wdyt?
but this would be still part of the clearml.conf right?
You can pass it per Task , also you can configure the agent to always pass it add this env.
https://github.com/allegroai/clearml-agent/blob/5a080798cb4292e198948fbe16cba70136cb6bdf/docs/clearml.conf#L137
Could it be the credentials are actually incorrect? because it seems like you can access the server? (I assume you were able to browse to it and generate credentials. right?)
Seems lime someone sitting in the middle and reroutes the request (maybe both https and port) ?!
WhimsicalLion91 I guess import/export is going to be more challenging, doable though. You will need to get all the Tasks, then collect all the artifacts, then collect all the reported logs (console/plots/etc). Then import everything back to your own server...
Exporting a single Tasktask.export_task
and Task.import_task
If you need all the scalars :task.get_reported_scalars(...)
And the console logs:Task.get_reported_console_output
It is http btw, i don't know why it logged https://
This is odd could it be it automatically forwards to https ?
I would try the certificate check thing first
None
Change to:
CLEARML_AGENT_GIT_USER: ${CLEARML_AGENT_GIT_USER:my_git_user_here}
and the same for the password.
You can also just set the environment variables before launching docker-compose, whatever is more convenient for you
Hi OutrageousGrasshopper93
which framework are you using? trains-agent will pull the correct torch based on the cuda version it detects, but no such thing for TF the default venv mode, trains-agent creates a new venv for the experiment (not conda) then everything is installed there. If you need conda you need to change the package_manager to conda: https://github.com/allegroai/trains-agent/blob/de332b9e6b66a2e7c6736d12614de9870eff48bc/docs/trains.conf#L49 The safest way to control CUDA dri...
Hi @<1541954607595393024:profile|BattyCrocodile47>
But the files API is still open to the world, right?
No, of course not 🙂 (i.e. API is authenticated with JWT header, this is why you need to generate the secret/key in the UI)
That said, the login process itself is user/pass stored on the server, but other than that the web/api are secured. The file server on the other hand is plain http storage and does not verify the connection like the API does. So if you are going the self-ho...
Hi WickedGoat98
but is there also a way to delete them, or wipe complete projects?
https://github.com/allegroai/trains/issues/16
Auto cleanup service here:
https://github.com/allegroai/trains/blob/master/examples/services/cleanup/cleanup_service.py
EnviousStarfish54 Sure, see scatter2d
https://allegro.ai/docs/examples/reporting/scatter_hist_confusion_mat_reporting/#2d-scatter-plots
"erasing" all the packages that had been set in the base task I'm cloning from. I
Set is not add, if you are calling set_packages, you are overwriting all of them with this single call.
You can however do:
task_data = task.export_task()
requirements = task_data["script"]["requirements"]["pip"]
requirements += "new packages"
task.set_packages(requirements)
I guess we should have get_requirements
?!
OHH nice, I thought that it just some kind of job queue on up and running machines
It's much more than that, it's a way of life 🙂
But seriously now, it allows you to use any machine as part of your cluster, and send jobs for execution from the web UI (any machine, even just a standalong GPU machine under your desk, or any cloud GPU instance any mixing the two together:)
Maybe I need to change something here:
apiserver.conf
Not sure, I'm still waiting on answer... It...
GrievingTurkey78 where do you see this message? Can you send the full server log
?
clearml - WARNING - Could not retrieve remote configuration named 'hyperparams'
What's the clearml-server version you are working with ?
In both logs I see (even in the single GPU log, it seems you "see" two GPUs, is that correct?)GPU 0,1 Tesla V100-SXM2-32GB (arch=7.0)
Last question, this is using relatively old clearml version (0.17.5), can you test with the latest version (1.1.1)?