Is there any way to get just one dataset folder of a Dataset? e.g. only "train" or only "dev"?
They are usually stored in the same "zip" so basically you have to download both folders anyhow, but I guess if this saves space we could add this functionality, wdyt?
LovelyHamster1 verified, this is a UI bug with old limitation enforced.
I will make sure they know about it, it should be fixed for the upcoming release 🙂
If I edit directly the OmegaConf in the UI than the port changes correctly
This will only work if you change the Hydra/allow_omegaconf_edit to True in the UI. Did you?
Hi LovelyHamster1
As you noted, passing overrides in Args/overrides
, for example ['training.max_epochs=1000']
should work when running with the agent.
Could you verify with the latest RC, there was a fix to support the latest hydra versionpip install clearml==0.17.5rc5
Hi @<1547390415320125440:profile|SilkySparrow85>
because it is trying to send a debug-sample to fileserver!
Yes, you should always configure the "files server" to point to your minio S3, basically:
None
files_server: "
"
But do not forget to also configure the credentials here:
[None](https://github.com/allegroai/clearml/blob/40c6db9d95016382c721546d42...
Hi SkinnyPanda43
No idea what the ImageId actually is.
That's the ami image string that the new EC2 will be started with, make sense ?
This workflow however is the only way I have found to easily fix my previous ‘Module not found’ errors
Hmm okay make sense,
Did you try to set these ?
or even hack the sys.path with something likeimport sys, os sys.path.insert(0, os.path.abspath(os.path.dirname(__file__)+"/../")
Pycharm does get confused sometimes
Hi @<1545216070686609408:profile|EnthusiasticCow4> let me know if this one solves the issue
pip install clearml==1.14.2rc0
Thanks for the detials @<1597762318140182528:profile|EnchantingPenguin77>
clearml.Auto-Scaler - INFO - New instance b97e702d-e2b3-4f28-adab-be59648601ea listening to test-gpu queue
This looks like a new agent was spined on your EC2 account, can you see it in the "Workers" page ?
@<1597762318140182528:profile|EnchantingPenguin77> can you provide the full log?
Hi SparklingElephant70
Anyone know how to solve?
I tired git push before,
Can you send the entire log? Could it be that the requested commit ID does not exist on the remote git (for example force push deleted it) ?
Oh you can definitely use the RestAPI, but in this specific case, I'm not sure there is something better.
(BTW: Look for APIClient it a pythonic interface for the RestAPI)
Hmm that should have worked ...
I'm assuming the Task itself is running on a remote agent, correct ?
Can you see the changes in the OmegaConf section ?
what happens when you pass--args overrides="['dataset.path=abcd']"
AFAIK that's the only way right now (see my comment here - https://clearml.slack.com/archives/CTK20V944/p1657720159903739?thread_ts=1657699287.630779&cid=CTK20V944 )
Or then if you have the ClearML paid service, I believe there is a "vaults" service, right AgitatedDove14 ?
Yep UnevenDolphin73 :)
WittyOwl57 could it be the EC2 instance is too small (i.e. not enough storage / memory) ?
I tried specifying helpers functions but it still gives the same error.
What's the error you are getting ?
I would like to force the usage of those requirements when running any script
How would you force it? Will you just ignore the "Installed Packages" section ?
Hi OutrageousSheep60
Is there a way to instantiate a
clearml-task
while providing it a
Dockerfile
that it needs to build prior to executing the task?
Currently not really, as at the aned the agent does need to pull a container,
But you can cheive basically the same by adding the "dockerfile" script as --docker_bash_setup_script
Notice of course that this is an actual bash script not Docker script, so no need for "RUN" prefix.
wdyt?
okay this points to an issue with the k8s glue, I think it somehow failed to launch the pod. Can you send me the log of the clearml-k8s-glue ?
PompousParrot44 What is the "working directory" on the experiment itself? and the "script path"?
Based on what you wrote above, in order for it work you should have:
working directory: "."
script path: "-m test.scripts.script"
notice no "--args" and working directory is "." (i.e. the root of the repository)
(just using local server not connected to Internet), am I right?
You can if you host your own git server, Or if your code is a single file / jupyter notebook, then the entire code is stored on the Task.
btw: what is the exact setup, how come there is no git repo?
I useÂ
torch.save
 to store some very large model, so it hangs forever when it uploads the model. Is there some flag to show a progress bar?
I'm assuming the upload is http upload (e.g. the default files server)?
If this is the case, the main issue we do not have callbacks on http uploads to update the progress (which I would love a PR for, but this is actually a "requests" issue)
I think we had a draft somewhere, but I'm not sure ...
So General would have created a General instead of Args?
yes,
This is a must, you have to specify the hyperparameters section you are referencing.
https://github.com/allegroai/clearml/blob/5a9155b2039413280f13dfded1121470c4c4323d/examples/pipeline/step2_data_processing.py#L21
This is actually:task.connect(args, name='General')
Basically there is no "random_state" only "General/random_state"
Make sense ?
Hi @<1523701304709353472:profile|OddShrimp85>
there anywhere I could get a charr that can work with lower version of k8s? Or any other methods?
I think the solution is to install it manually from the helm chart (basically take it out and build a Job YAML, wdyt?
Are you doing from keras import ...
or from tensorflow.keras import
?
This means it will Always authenticate with SSH force_git_ssh_protocol
...
But it seems you need mixed behavior ?
Are you using github as git provider ?