Reputation
Badges 1
25 × Eureka!GreasyLeopard35 I think you are on to something, I think UniformParameterRange just misses a min value:
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/parameters.py#L168
Should be:[self.min_value + v*step_size for v in range(0, int(steps))]
But the missing implementation of LogUniformRange for hpbandster still causes problems.
wdym?
GreasyLeopard35
I can update that the fix to UniformIntegerParameterRange should be pushed with tomorrows release π
(which would fix in turn LogUniformParameterRange)
Hi BurlyRaccoon64
What do you mean by "custom_build_script" ? not sure I found it in "clearml,conf"
https://github.com/allegroai/clearml-agent/blob/master/docs/clearml.conf
WickedGoat98 Forever π
The limitation is on the storage size
SarcasticSquirrel56
if I configure manually the pods for the different nodes, how do I make clearml server aware that those agents exist?
Basically the agent register themselves on your cleaml-server, and they register on which Queue(s) they listen to. In other words the interface to choose the different types of machines/gpus is by enqueue the Task to different queues.
For example: Queue(1): "CUDA11_GPUx1" , Queue(2): "CUDA10_GPUx1"
Make sense ?
EDIT:
I guess to achieve what I w...
If there a way to do this without manually editing installed packages?
Running your code once with Task.init
should automatically detect all the directly imported packages, then when trains-agent
executes the Task, it will install them to a clean venv and put back all the packages inside the venv.
In order for all the used packages (e.g. bigquery) to appear in the "Installed packages" your cide needs to be executed once manually (i.e. not with trains-agent) then the ` tra...
diff line by line is probably not useful for my data config
You could request a better configuration diff feature π Feel free to add to GitHub
But this also mean I have to first load all the configuration to a dictionary first.
Yes π
Let me ping you back when the GitHub repo is synced, so you can test the latest and greatest :)
Thanks BoredHedgehog47 !
And yes if the Task.init() call was only in main.py
then the TB inside the subprocess (train.py) would as you perceived not be captured.
Did you by any chance test calling Task.init in Both main.py
and train.py
?
oh the pipeline logic itself holds one "job" on the worker, and this is why you do not have any other spare workers to run the components of the pipeline.
Run your worker with --services-mode
, it will launch multiple Tasks at the same time, it should solve the issue
Hi JitteryCoyote63
Do you have a specific example in mind ?
HealthyStarfish45 the pycharm plugin is mainly for remote debugging, you can of course use it for local debugging but the value is just to be able to configure your user credentials and trains-server.
In remote debbugging, it will make sure the correct git repo/diff are stored alongside the experiment (this is due to the fact that pycharm will no sync the .git folder to the remote machine, so without the plugin Trains will not know the git repo etc.)
Is that helpful ?
in Your Additional ClearML Configuration
(which is basically clearml.conf configuration)
Add the following:environment { GOOGLE_APPLICATION_CREDENTIALS="~/gs.cred" } files { gsc { contents: "<this is your GCP storage credentials file>" path: "~/gs.cred" } }
Reference:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L421
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a...
Hi PanickyMoth78
Hmm yes, I think the StorageManager (i.e. the google storage pythonclinet) also needs a json file with the credentials.
Let me check something
I think it should look something like:files { gsc { contents: """{"type": "service_account", "project_id": "ai-platform", "private_key_id": "9999", "private_key": "-----BEGIN PRIVATE KEY-----==\n-----END PRIVATE KEY-----\n", "client_email": "a@ai.iam.gserviceaccount.com", "client_id": "111", "auth_uri": "
", "token_uri": "
", "auth_provider_x509_cert_url": "
", "client_x509_cert_url": "
"}""" path: "~/gs.cred" } }
Yes, I mean trains-agent. Actually I am using 0.15.2rc0. But, I am using local files, I mean I clone trains and trains-agent repos and install them. Their versions are 0.15.2rc0
I see, that's why we get the git ref, not package version.
Yes, found the issue :) I'll see to it there is a fix in the next RC. ETA early next week
Hi @<1556450111259676672:profile|PlainSeaurchin97>
Is there any simple way to use
argparse
to pass a clearml task name?
need to call
args = task.connect(args)
.
noooo π there is no need to do that, the arguments are automatically detected
see for yourself
args = parse_args()
task = Task.init(task_name=args.task_name)
Why would that require refactoring ? Dataset class should take care if it internally ,no?
The reason my_name is a subproject , is that so every version could be a "Task" inside that project , just easier to manage (or at least that was the idea)
Fix pushed to github πpip install git+
You mean like a name of the artifact ?
Could it be that this is the callback that causes it?
None
Hi AttractiveWoodpecker16
I think is the correct channel for that question.
(any chance you can move your thread there?)
Specifically just email billing@clear.ml they will cancel (no need to worry about the beginning of the month, just explain and they will not charge over Nov)
EDIT: I know they are working on making it a one click in the UI, main limit is what happens with the data that was stored and was above the free tier threshold, anyhow I think next version will sort that as well.
BoredHedgehog47 can you test this one? Is it close to your code ?
because fastaiβs tensorboard doesnβt work in multi gpu
keep me posted when this is solved, so we can also update the fastai2 interface,