Reputation
Badges 1
25 × Eureka!Hi FunnyTurkey96
what's the clearml server you are using ?
Okay. AndΒ
110
Β means 11.1 and not 11.0?Β (edited)
110 means 11.0, the odd thing is, it actually installed 11.1, and from the pytorch website this is exactly how they suggest to install with conda...
Let me know if forcing the CUDA version changes anything
2021-07-11 19:17:32,822 - clearml.Task - INFO - Waiting to finish uploads
I'm assuming a very large uncommitted changes π
Hi AbruptHedgehog21
can you send the two models info page (i.e. the original and the updated one) ?
do you see the two endpoints ?
BTW: --version would add a version to the model (i.e. create a new endpoint with version "endpoint/{version}"
Will be shortly released with news RC :)
Could you test with the latest "cleaml"pip install git+Task.add_requirement(".") should be supported now π
(BTW: any reason not to use the agent?)
Question - why is this the expected behavior?
It is π I mean the original python version is stored, but pip does not support replacing python version. It is doable with conda, but than you have to use conda for everything...
Anyway, in the docs, there is a function called task.register_artifact()
Yes, this is rather deprecated... The idea is that it will monitor an obejct and auto sync it (i.e. serialize and upload).
That said, it is just so much easier to do task.upload_artifact and you can always update/overrwrite if you are passing the same name, that I cannot see the actual use case. Does that make sense? What are you using it for ?
My pleasure π
Maybe we should do a webinar... I have a feeling the MLOps aspects are not as straight forward as we would like to think ...
clearml-task
Β seems does not allow me passing theΒ
run
Β argument without value
EnviousStarfish54 did you try --args run=True
I'm assuming run is a boolean of a sort ?
a. The submitted job would automatically download data from internal data repository, but it will be time consuming if data is re-downloaded every time. Does ClearML caching the data somewhere?
What do you mean by the agent will download the data ? are you referring to Dataset ?
well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?
ClumsyElephant70
Could it be virtualenv package is not installed on the host machine ?
(From the log it seems you are running in venv mode, is that correct?)
So just to be clear - the file server has nothing to do with the storage?
Think of it as a quick and dirty "minio", storing files and serving them over http. If you have minio (or any object storage) you can replace it all together π
PYTHONPATH is still not working as expected
inside your code if you do :import os print("PYTHONPATH", os.environ["PYTHONPATH"])what are you getting?
because fastaiβs tensorboard doesnβt work in multi gpu
keep me posted when this is solved, so we can also update the fastai2 interface,
and those env variables are credentials for ClearML. Since they are taken from k8s secrets, they are the same for every user.
Oh ...
I can create secrets for every new user and set env variables accordingly, but perhaps you see a better way out?
So the thing is, if a User spins the k8s job, the user needs to pass their credentials (so the system knows who it is)... You could just pass the user's key/secret (not nice, but probably not a big issue, as everyone is an Admin anyhow,...
Could that be the proper way to install ?
https://github.com/facebookresearch/pytorch3d/blob/main/INSTALL.md#3-install-wheels-for-linux
trains-agent should be deployed to GPU instances, not the trains-server.
The trains-agent purpose is for you to be able to send jobs to a GPU (at least in most cases) instance.
The "trains-server" is a control plane , basically telling the agent what to run (by storing the execution queue and tasks). Make sense ?
JitteryCoyote63
Yes this extremely annoying, I think it was updated on the community server, let me check if we deployed a new docker with a fix ...
Could I just build it and log these parameters using
task.set_parameters()
so that I call
task.get_parameters()
later?
instead of manually calling set/get, you call task.connect(some_dict_or_object) , it does both:
When running manually (i.e. without an agent) it logs the keys/values on the Task,
when running with an agents, it takes the values from the backend (Task) and sets them on the dict/object
Make sense ?
GreasyLeopard35 I think you are on to something, I think UniformParameterRange just misses a min value:
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/parameters.py#L168
Should be:[self.min_value + v*step_size for v in range(0, int(steps))]
Wait, why aren't you just calling Popen? (or os.system), I'm not sure how it relates to the torch multiprocess example. What am I missing ?
@<1699955693882183680:profile|UpsetSeaturtle37> good progress, regrading the error, 0.15.0 is supposed to be out tomorrow, it includes a fix to that one.
BTW: can you run with --debug
BoredHedgehog47 were you able to locate the issue ?
Thank you EnviousStarfish54 !
This is very helpful!
I'm looking at Kedro and the project you shared, and a few thoughts came to mind:
I very much like the idea of using functions as "nodes" (and to extend, using notebook cells with tags as nodes). This got me thinking, I'm pretty sure we could have a similar imlmentation with ClearML. My thinking is using inspect or dill to convert the functions/cells into plain text code, automatically analyze the runtime requirements, and creat...