Reputation
Badges 1
25 × Eureka!WickedGoat98 is this related to plotly opening a web page when you call show()
method ?
You can do:if not Task.running_locally() fig.show()
Okay, I was able to reproduce it (this is odd) let me check ...
WickedGoat98 what's the clearml version you are using?
Hi GrotesqueOctopus42
In theory it can be built, the main hurdle is getting elk/mongo/redis containers for arm64 ...
Hi WickedGoat98
I try to write an article on medium about ClearML and face some a problem with plotly figures.
This is awesome !
I ran the plotly_reporting.py example locally and the uploaded plot was ok.
So are you saying the same example code from the repository worked okay on your server but showed nothing on the hosted server ?
WickedGoat98 until the next RC release (should not take long) this will solve it:df = pd.concat([tickerDf.Close, tickerDf_Change.Close_pcent], axis=1) df = df[1:] df.index = df.index.astype(str) setattr(df, 'ticker', args.symbol)
Basically removing the nan and converting the datetime to string representation (so plotly.js likes it)
Hey WickedGoat98
I found the bug, it is due to the fact the numpy (passed to plotly) contains both datetime and nan, and plotly.js does not like it. I'll make sure this is fixed, in the meantime you can just remove the first row (it contains the nan):df = pd.concat([tickerDf.Close, tickerDf_Change.Close_pcent], axis=1) df = df[1:]
WickedGoat98 Nice!!!
BTW: The fix should solve both (i.e. no need to manually cast), I'll make sure the fix is on GitHub so you'll be able to verify π
WickedGoat98 Same for me, let me ask the UI guys, I think this is a UI bug.
Also maybe before you post the article we could release a fix to both, what do you think?
EDIT:
Never mind π i just saw the medium link, very cool!!!
WickedGoat98 this is awesome! Let me know how I could help π
BTW: I checked regrading the plot comparison, this is a BE issue due to the size of the plot, I was told a fix will be deployed in a day or two.
WickedGoat98 give me a minute, I'm not sure it is not ClearML related
One last thing make sure you spin the pod container with privileged mode, because the trains-agent docker will spin a sibling docker for your actual experiment.
WickedGoat98 Basically you have two options:
Build a docker image with wget installed, then in the UI specify this image as "Base Docker Image" Configure the trains.conf
file on the machine running the trains-agent, with the above script. This will cause trains-agent
to install wget on any container it is running, so it is available for you to use (saving you the trouble of building your own container).With any of these two, by the time your code is executed, wget is installed an...
trains-agent should be deployed to GPU instances, not the trains-server.
The trains-agent purpose is for you to be able to send jobs to a GPU (at least in most cases) instance.
The "trains-server" is a control plane , basically telling the agent what to run (by storing the execution queue and tasks). Make sense ?
WickedGoat98 sorry, I missed the thread...
that the trains.conf has to be located on the node running the trains-agent.
Correct π
The easiest way to check is to see if you can curl to the ip:port from the docker.
If you fail it is probably the wrong IP.
the IP you need to use is the IP of the machine running the docker-compose (not the IP of the docker inside that machine).
Make sense ?
Hi WickedGoat98
A few background notions:
Docker do not store their state, so if you install something inside a docker, the moment you leave, it is gone, and the next time you start the same docker you start from the same initial setup. (This is a great feature of Dockers) It seems the docker you are using is missing wget. You could build a new docker (see the Docker website for more details on how to use a Dockerfile). The way trains-agent works in dockers is it installs everything you ne...
Okay, so basically set a template for the pod, specifying the docker image. Make sure you pass the correct trains-server configuration (i.e. api/web/file server addresses and credentials), and select the queue name the agent will listen to.
container image / details
https://hub.docker.com/r/allegroai/trains-agent
https://github.com/allegroai/trains-agent/tree/master/docker/agent
Full environment variable list to pass can be found here:
https://github.com/allegroai/trains-server/blob/...
WickedGoat98
Put the agent.docker_preprocess_bash_script
in the root of the file (i.e. you can just add the entire thing at the top of the trains.conf)
Might it be possible that I can place a trains.conf in the mapped local folder containing the filesystem and mongodb data etc e.g.
I'm assuming you are referring to the trains-=agent services, if this is the case, sure you can,
Edit your docker-compose.yml, under line https://github.com/allegroai/trains-server/blob/b93591ec3226...
For example:agent.docker_preprocess_bash_script = [ "echo 'Binary::apt::APT::Keep-Downloaded-Packages \"true\";' > /etc/apt/apt.conf.d/docker-clean", "apt-get update", "apt-get install -y wget", "echo \"we have wget\"", ]
Hi WickedGoat98
This sounds like a great design (obviously you have scale in mind π ) Feel free to ask "stupid" questions, based on what you already wrote I doubt they will be
A few questions that come to mind (probably a few others after):
You mentioned FS synchronization, from where? i.e. what is the single source of truth ? K8s (Rancher 2.0 is basically k8s manager) can take care of mounting volumes, so no need to sync, is this a valid solution ?
BTW : (you can drag and drop an i...
ColossalAnt7 I would do the following:
Configure trains-server user/pass, mounting the API server configuration file as pointed in the trains-server documentation (intermediate temporary step) Start by providing the ML guys with a VPN access that allows them to access directly the trains-server api/web/file pos (caveat is the IP/sub-domain needs to be solved) Configure a ConfigMap to do the routing/ingest (this solves the IP/Sub-Domain issue) and allow the VPN to access the single entrypoint...
Hi UpsetBlackbird87
This is an Optuna decision on how many concurrent tests to run simultaneously.
You limited it to 100, but remember Optuna does a Bayesian optimization process, where it decides on the best set of arguments based on the performance of the previous set, this means it will first try X trials, then decide on the next batch.
That said you can a pruner to Optuna specifying how it should start
https://optuna.readthedocs.io/en/v1.4.0/reference/pruners.html#optuna.pruners.Median...
Hi @<1523715429694967808:profile|ThickCrow29>
clearml.automation.auto_scaler.AutoScaler which runs smoothly (kudos!!).
NICE!
The only thing I am missing is the in the clearml dashboard/orchestration --> Is there a way to make it
hmm kind of needs backend support for that π
For now, I can just see the log of the clearML task to monitor whatβs happening
Or is this retricted to pro user ?
Yeah the GCP and AWS autoscalers dashboards are paid tier feature. But...
Thank you @<1689446563463565312:profile|SmallTurkey79> !!!
Hi BeefyHippopotamus73
. I checked the template task and the list of βInstalled Packagesβ indeed does not have one of my required packages in the list.
Basically the "installed packages" is auto populated based on the directly imported packages n your code base.
Could it be you do not have import snowflake-connector-python
and this is a derivative package (i.e. required from a different package)
BTW: when you clone your Task in the UI you can edit and add the missing packages,...
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
I look forward to your response on Github.
Great, I would like to make this discussion a bit more open and accessible so GitHub is probably better
I'd like to start contributing to the project...
That will be awesome!