Reputation
Badges 1
25 × Eureka!For example, for some of our models we create pdf reports, that we save in a folder in the NFS disk
Oh, why not as artifacts ? at least you will be able to access from the web UI, and avoid VFS credential hell 🙂
Regrading clearml datasets:
https://www.youtube.com/watch?v=S2pz9jn26uI
Because it lives behind a VPN and github workers don’t have access to it
makes sense
If this is the case, I have to admit that combining offline-mode and remote execution makes sense, no?
3.a
Regarding the model query, sure from Python or restapi you can query based on any metadata
https://clear.ml/docs/latest/docs/references/sdk/model_model/#modelquery_modelsmodels
3.b
If you are using clearml-serving then check the docs / readme, but in a nutshell yes you can.
If the inference code is batchprocessing, which means a Task, then of course you can and lauch it, check the clearml agent f...
Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
Basically you should have the docker service login before running the agent, then the agent uses docker to run the image from the ECR.
Make sense ?
because fastai’s tensorboard doesn’t work in multi gpu
keep me posted when this is solved, so we can also update the fastai2 interface,
What's the output_uri
you are passing ?
And the OS / Python version?
I think this is due to the label map including some keys with a
.
in them.
Hi TenseOstrich47 what do you mean "label"
SmarmyDolphin68 What's the matplotlib version ? and python version?
But adding a simple
force_download
flag to the
get_local_copy
That's sounds like a good idea
Would I be able to add customized columns like I am able to in
task.connect
? Same question applies for parallel coordinates and all kinds of comparisons
No to both 😞
Depending on your security restrictions, but generally yes.
BattyLion34 I have a theory, I think that any Task on the "default" queue qill fail if a Task is running on the "service" queue.
Could you create a toy Task that just print "." and sleeps for 5 seconds and then prints again.
Then while that Task is running, from the UI launch the Task that passed on the "default" queue. If my theory holds it should fail, then we will be getting somewhere 🙂
Also, how do pipelines compare here?
Pipelines are a type of Task, so like Tasks you can clone and enqueue them, or set them as the target of the trigger.
the most flexible solution would be to have some way of triggering the execution of a script in the parent task environment,
This is the exact idea of the TriggerScheduler None
What am I missing here?
how to make sure it will traverse only current package?
Just making sure there is no bug in the process, if you call Task.init in your entire repo (serve/train) you end up with "installed packages" section that contains all the required pacakges for both use cases ?
I have separate packages for serving and training in a single repo. I don’t want serving requirements to be installed.
Hmm, it cannot "know" which is which, because it doesn't really trace all the import logs (this w...
Wtf? can you try with = (notice single not double)?
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- cudatoolkit=11.1.1
- pytorch=1.8.0
MoodyCentipede68 seems you did not pass any configuration (os env or conf file) so it does nor know how to find the server and authenticate. Make sense?
I could merge some steps, but as I may want to cache them in the future, I prefer to keep them separate
Makes total sense, my only question (and sorry if I'm dwelling too much in it) is how would you pass the data between step 2 to step 3, if this is a different process on the same machine ?
I think you can force it to be started, let me check (I pretty sure you can on aborted Task).
Programmatically before , importing the package, set os.environ['TRAINS_CONFIG_FILE']='~/my_new_trains.conf'
BTW: What's the use case for doing so?
thanks for helping again
My pleasure :)
I was trying to do exactly as you mentioned setting the environment variable
before
any trains import but it didn't work
In your entry point script, (even if you do not call trains/ Task.init
) add:import os os.environ['TRAINS_CONFIG_FILE']='~/my_new_trains.conf' import trains
Then when you actually import trains, everything is already set and it will not read the configuration again.
Make sense ?
Hi RipeGoose2
when I'm using the set_credentials approach does it mean the trains.conf is redundant? if
Yes this means there is no need for trains.conf , all the important stuff (i.e. server + credentials you provide from code).
BTW: When you execute the same code (i.e. code with set_credentials call) the agent's coniguration will override what you have there, so you will be able to run the Task later either on prem/cloud without needing to change the code itself 🙂
I came across it before but thought its only relevant for credentials
We are working on improving the docs, hopefully it will get clearer 😉
with remote machine where the code actually runs (you know this pycharm pro remote).
Are you using the pycharm plugin ? (to sync the local git changes with clearml)
https://github.com/allegroai/clearml-pycharm-plugin
So clearml server already contains an authentication layer (JWT Token), and you do have a full user management on top:
https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#web-login-authentication
Basically what I'm saying if you add httpS on top of the communication, and only open the 3 ports, you should be good to go. Now if you really need SSO (AD included) for user login etc, unfortunately this is not part of the open source, but I know they have it in the scale/ent...