Reputation
Badges 1
25 × Eureka!My model files are also there, just placed in some usual non-shared linux directory.
So this is the issue, How would the container Get to these models? you either need to mount the folder to the container,
or you push them to ClearML model repo with the OutputModel class , does that make sense ?
Yes, I mean use the helm chart to deploy the server, but manually deploy the agent glue.
wdyt?
btw: what's the OS and python version?
For the on-prem you can check the k8s helm charts it case spin agents for you (static agents).
For the GKE the best solution is the k8s glue:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
This one should work:
` path = task.connect_configuration(path, name=name)
if task.running_locally():
my_params = read_from_path(path)
my_params = change_parmas(my_params) # change some staff
store back the change, my_params assumed to be the content of the param file (text)
task.set_configuration_object(name=name, config_taxt=my_params) `
AFAIK that's the only way right now (see my comment here - https://clearml.slack.com/archives/CTK20V944/p1657720159903739?thread_ts=1657699287.630779&cid=CTK20V944 )
Or then if you have the ClearML paid service, I believe there is a "vaults" service, right AgitatedDove14 ?
Yep UnevenDolphin73 :)
currently I'm doing it by fetching the latest dataset, incrementing the version and creating a new dataset version
This seems like a very good approach, how would you improve ?
Hmm, you are missing the entry point in the execution (script path).
Also as I mentioned you can either have a git repo or script in the uncommitted changes, but not both (if you have a git repo then the uncommitted changes are the git diff)
Hmm, let me check, the link is definitely there but this is not a valid link
it knows itβs a notebook and automatically adds the notebook as an artifact right?
correct
and the uncommited changes becomes the nottebook converted to a script?
correct
In one case I am seeing actual git diff coming in instead of the notebook.
it might be there is both a git repository and a notebook and the git diff will show before the notebook is detected and shown instead ? (there is a watchdog refreshing the notebook every 30sec or so)
Then the type hints are not removed from helper and the code immediately crashes when being run
Oh yes I see your point, that does make sense (btw removing the type hints will solve the issue)
regardless let me make sure this is solved
Hi @<1535069219354316800:profile|PerplexedRaccoon19>
On debugging, it looks like indices are corrupt.
ishhhhh, any chance you have a backup?
Maybe there is setting in docker to move the space used in a different location?
No that I know of...
I can simply increase the storage of the first disk, no problem with that
probably the easiest π
But as you describedΒ
Β it looks like an edge case, so I donβt mindΒ
π
I guess last followup question, is there a way to cap costs?
Scale tier ? (I know it is not per usage, but it is probably more than 15$ per user π )
Hi @<1536881167746207744:profile|EnormousGoose35>
, Could we just share the entire project instead of Workspace ?
You mean allow access to a project between workspaces ?
If the answer is yes, then unfortunatly the SaaS version (app.clear.ml) does not really support these level of RBAC, this is part of the enterprise version, which assumes a large organization with the need for that kind of access limit.
What is the use case ? Why not just share the entire workspace ?
Yep... they are pushing "heavy" users away from these instances. Nothing really you can do, maybe switch to Azure/GCP, but it might be the same there
Hi CooperativeFox72 ,
From the backend guys, long story short, upgrade your machine => more cpu cores , more processes , it is that easy π
Hi @<1562973083189383168:profile|GrievingDuck15>
Thanks for noticing, yes the api is always versioned, we should make it clear in the docs. Also if you need the latest one use version 999 , it will default to the latest one it can support
If you have a requirements file then you can specify it:Task.force_requirements_env_freeze(requirements_file='requirements.txt')
If you just want pip freeze output to be shown in your "Installed Packages" section then use:Task.force_requirements_env_freeze()
Notice that in both cases you should call the function Before you call Task.init()
btw, what do you mean by "Packages will be installed from projects requirements file" ?
Omg that's a lot of submodules!
It has nothing with what the tasks sees if you are inside a git repo you will have to cone it on the remote machine. Let me check in the code maybe you have a workaround
Are there any services OOB like this?
On the open-source, I can't recall any but will probably be easy to write. Paid tier might have an offering though, not sure π
Hi ShaggyHare67 ,
Yes the trains.conf created by trains-agent is basically an extension of the trains usage (specifically it adds a section for the agent)
I'm assuming you are running the agent on the same development machine.
I guess the easiest is to rename the trains.conf to trains.conf.old and run trains-agent init
(No need to worry, the trains package supports it , so the new configuration file that will be generated will work just fine)
Hi @<1556812486840160256:profile|SuccessfulRaven86>
I'm assuming this relates to the SaaS service.
API calls are away to measure usage, basically metric reports are bunched into a single call, agents pings / query is API call, and so on so forth.
How many hours you had training tasks reporting data? how many agents running and so on
Hi @<1526371965655322624:profile|NuttyCamel41>
How are you creating the model? specifically what do you have in "config.pbtxt"
specifically any python code should be in the pre/post processing code (actually not running on the GPU instance)
Verified.
BattyLizard6 can you open a github issue? I want to make sure this issue is addressed π
logger.report_scalar("loss", "train", iteration=0, value=100)logger.report_scalar("loss", "test", iteration=0, value=200)
Hmm, is there a way to do this via code?
Yes, clone the Task Task.clone
Then do data=task.export_task() and edit the data object (see execution section)
Then update back with task.update_task(data)