MelancholyBeetle72 there is an RC with a fix, check the GitHub issue for details :)
GiddyTurkey39 Hmm I'm assuming that by default it cannot access that IP range.
Are you using virtual-box for the VM?
EDIT:
Can I assume the machine running the VM (a.k.a the host) can access the trains-server ?
So there is no copying of the data to the pod, it is simply references via the EFS
Correct
Hi DrabCockroach54
I think the Kubernetes integration (k8s glue) is not part of the open-source features, and is only available as enterprise feature 😞
No, I just want to register a new model in the storage.
Is the model file is already uploaded, you can register it without a Task:InputModel.import_model(...)https://github.com/allegroai/clearml/blob/b3a2b3425c5098ebfc0598c9dfb3e670d4a87706/clearml/model.py#L521
I need to create a separate task for this right?
If you want the model to be uploaded, then yes you have to create a Task.
Hi @<1523702786867335168:profile|AdventurousButterfly15>
Make sure you pass output_uri=true in Task.init
It will automatically upload your model to the file server. You can also configure it in the clearml.conf, look for defualt_output_uri
OddAlligator72 okay, that is possible, how would you specify the main python script entry point? (wouldn't that make more sense rather than a function call?)
How do you determine which packages to require now?
Analysis of the actual repository (i.e. it will actually look for imports 🙂 ) this way you get the exact versions you hve, but nit the clutter of the entire virtual environment
Just so I understand,
scheduler executes main every 60sec
main spins X sub-processes
Each subprocess needs to report scalars ?
I think it would make sense to have one task per run to make the comparison on hyper-parameters easier
I agree. Could you maybe open a GitHub issue on it, I want to make sure we solve this issue 🙂
Would love to just cap it at a fixed amount for a month for API calls.
Try the timeout configuration, I think this shoud solve all your issues, and will be fairly easy to set for everyone
I see. If you are creating the task externally (i.e. from the controller), you should probably call. task.close() it will return when everything is in order (including artifacts uploaded, and other async stuff).
Will that work?
I have to commit the YAML with my AWS credentials to git.
CleanPigeon16 please do not 🙂
either put them on the Task itself, or as OS env on the machine/agent running the Task.
Regrading where it is stored (I think the default is DevOps project, need to look at the code)
when I duplicate the experiment and clone it remote, the call is ignored and the recorded values are used?
Yes ScantChimpanzee51 exactly.
Think of it as the inital value you want to put on the Task when you are running the code on your machine, later when you clone the Task, you can edit the base docker image in the UI (or with the API), of course the new value is used when the agent spins this Task, and to avoid the actual docker (the one you changed in the UI) to be overwritten by ...
Hi TightElk12
it would raise an error if the env where execution happens is not configured to track things on our custom server to prevent logging to the public demo server ?
What do you mean by that? catching the default server instead of the configured one ?
Hi @<1691620877822595072:profile|FlutteringMouse14>
In the latest project I created, Hydra conf is not logged automatically.
Any chance the Task.init call is not on the main script (where the Hydra is) ?
can I mount the s3 bucket as file system on place where
you need to mount it where the file server is storing it's files, correct (notice, not the DBs, just the files server)
I'm hoping i can find an end to end solution that also includes experiment management
Well of course biased here, but ClearML with the hyperdatasets is probably the most complete one.
Specifically with model performance analysis I would add voxel open-source to dissect specific results. but the combination of the abstraction and query capabilities of hyperdatasets, orchestration and experiment management are really unmatched for.
(and again of course I'm biased, but really there is n...
That is correct. Unfortunately though this is not part of the open source, this means that for the open source it might be a bit more hands-on to deploy an llm model
Create a new version of the dataset by choosing what increment in SEMVER standard I would like to add for this version number (major/minor/patch) and uploadOh this is already there
` cur_ds = Dataset.get(dataset_project="project", dataset_name="name")
if version is not given it will auto increase based on semantic versions incrementing the last number 1.2.3 -> 1.2.4
new_ds = Dataset.create(dataset_project="project", dataset_name="name", parents=[cur_ds.id]) `
Hi WittyOwl57 , that is awesome fix! what does "dynamic_ncols" change?
it seems like a tqdm parameter, not sure what it does ...
And you pass:
scheduler.add_task(..., reuse_task=True)
?
Hi RipeGoose2
Are you continuing the Task, i.e. passing Task.init(..., continue_last_task=True)
I get what you're saying. Only problem is in the case of AutoLogging, I don't have the model id, for the model being saved.
Task.models['output'] should return all the model objects the autologging created
I am very confused now, I tried switch to my local machine and change the clearml.conf.
It only partly worked :
Notice that the Dataset.get (...) is downloading an artifact that was uploaded before, basically it gets the full URL and downloads the data. it seems the original dataset uploaded to "localhost:8081", could that be the case?
TrickyRaccoon92
I guess elegant is the challenge 🙂
What exactly is the use case ?
Ohh SubstantialElk6 please use agent RC3, (latest RC is somewhat broken sorry, we will pull it out)
VivaciousWalrus99 any chance the original Task was executed with python2 ?
what do you have for:ls -la /cs/usr/gal.hyams/.trains/venvs-builds/3.7/bin/