This is the thread checking the state of the running pods (and updating the Task status, so you have visibility into the state of the pod inside the cluster before it starts running)
So this is optuna π the idea is it will test which parameters have potential (with early stopping), then launch a subset of the selected parameters
Thank you ElegantCoyote26 for catching that! π
Nice!
is trainsConfig
pure text blob ?
Could it be pandas was not installed on the local machine ?
Also there was a truck that worked in the previous big, could you zoom out in the browser, and see if you suddenly get the plot?
Eg, i'm creating a task usingΒ
clearml.Task.create
Β , often it doesn't properly get the git diff correctly,
ShakyJellyfish91 Task.create does not store any "git diff" automatically, is there a reason not to use Task.init
?
Import Error sounds so out of place it should not be a problem :)
Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-sto...
The easiest is to pass an entire trains.conf
file
UnevenDolphin73 sounds great, any chance you can open a git issue on clearml-agent repo for this feature request ?
Can my request be made as new feature so that we can tag same type of graphs under one main tag
Sure, open a Git Issue :)
Sounds great! let me know what you find out π
With pleasure π
:param list(str) xlabels: Labels per entry in each bucket in the histogram (vector), creating a set of labels for each histogram bar on the x-axis. (Optional)
Hi SmallDeer34
Did you call Task.init ?
Hi ColossalDeer61 ,
the next trains-agent RC (solving the #196 issue) will also solve the double install issue π
Hi TrickyRaccoon92
Yes please update me once you can, I would love to be able to reproduce the issue so we could fix for the next RC π
RoughTiger69
move the files locally (i.e. based on the example move folder b
into folder a
) Create a new version with two parents ('a' and 'b') then sync the local root folder ('a' in your case). Only the meta-data should change (because the referenced files are already in one of the datasets)wdyt?
What about the epochs though? Is there a recommended number of epochs when you train on that new batch?
I'm assuming you are also using the "old" images ?
The main factor here is the ratio between the previously used data and the newly added data, you might also want to resample (i.e. train on more) new data vs old data. make sense ?
now i cant download neither of them
would be nice if address of the artifacts (state and zips) was assembled on the fly and not hardcoded into db.
The idea is this is fully federated, the server is not actually aware of it, so users can manage multiple storage locations in a transparent way.
if you have any tips how to fix it in the mongo db that would be great ....
Yes that should be similar, but the links would be in artifact property on the Tasks object
not exactly...
To automate the process, we could use a pipeline, but first we need to understand the manual workflow
i've tried setting up a clearml application on openshift
First, my condolences π openshift ...
Second, what you need to make sure is that each container (i.e. ELK/Monogo etc) has their own PV for persistent storage , I'm assuming this is the root cause for the error.
Make sense to you ?