
Reputation
Badges 1
25 × Eureka!Just one more question, do you have any idea about how I could change the x-axis label from "Iterations" to "Epochs"
You mean in the UI (i.e. just the title) ? or are you actually reporting iterations instead of epochs? and if so is this auto connected to tensorboard or is it reported manually ?
However, when 'extra' is a positional argument then it is transformed to 'str'
Hmm... okay let me check something
- At its simplest, this could just mean checking that all of the steps and the pipeline itself have completed successfully (by checking their βTask statusβ).If a pipeline step ends with "failed" status in the pipeline execution function an exception will be raised, if the exception is not caught, the pipeline itself will also fail
run
pipeline_script.py
which contains the pipeline code as decorators.
So in theory the following should actually work.
Let's assume you ...
GiganticTurtle0 so this was already supposed to be out (v1.1) but a minor py2 backwards compatibility delayed it. Anyhow you can now just call pipeline.start(..)
https://github.com/allegroai/clearml/blob/889d2373988a0d6630703cc1c865e09e58f8f981/examples/pipeline/pipeline_from_tasks.py#L47
(to run it locally call start_locally(...) )pip install git+
(the new version will be out in a few days, meanwhile you can test the new pipeline interface directly from git)
I am thinking about just installing this manually on the worker ...
If you install them system wide (i.e. with sudo) and add agent.package_manager.system_site_packages
then they will always be available for you π
And then also useΒ
priority_optional_packages: ["carla"]
This actually means that it will always try to install the package clara
first, but if it fails, it will no raise an error.
BTW: this would be a good use case for dockers, just saying :w...
Can you fix locally, just to verify ?
NastySeahorse61 it might that the frequency it tests the metric storage is only once a day (or maybe half a day), let me see if I can ask around
(just making sure you can still login to the platform?)
In order to facilitate the multiple credentials one must use the Clearml SDK obviously.
Yes π
Hi UnsightlySeagull42
Basically you can get the agent to always add additional arguments for the docker run, such as -v for mounting:
https://github.com/allegroai/clearml-agent/blob/948fc4c6ce1ecf33a74619ad570d69b8188f6db9/docs/clearml.conf#L133
Hi SmarmySeaurchin8 , you can point to any configuration file by setting the environment variable:TRAINS_CONFIG_FILE=/home/user/my_trains.conf
GreasyPenguin66 Nice !!!
Very cool setup, and kudos on making it work with multiple users!
Quick question, shouldn't the JUPYTERHUB_API_TOKEN env variable be enough to gain access to the server? Why did you need to add it to the 'nbserver-x.json' as well?
Hi RoundMole15
What exactly triggers the "automagic" logging of the model and weights?
framework save call, for example torch.save or joblib.save
I've pulled my simple test project out of jupyter lab and the same problem still exists,
What is "the same problem" ?
Or is this a feature of hyperdatasets and i just mixed them up.
Ohh yes, this is it. Hyper Datasets are part of the UI (i.e. there is a Tab with the HyperDataset query) Dataset Usage is currently listed on the Task. make sense ?
SmugOx94 Yes, we just introduced it π with 0.16.3
Discussion was here (I'll make sure to update the issue that the version is out)
https://github.com/allegroai/trains/issues/222
In your trains.conf
add the following line:sdk.development.store_code_diff_from_remote = true
It will store the diff from the remote HEAD instead of the local one.
Hi WorriedParrot51
So I think what you need is to map your external code into the docker, is that correct?
Also you want to always set the PYTHONPATH.
You can achieve both by configuring the trains.conf:
Here you can always add a predefined environment and mount point, regardless of the docker image or other docker argument arguments:
https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L98
Will this solve the issue?
Hi WorriedParrot51
Take a look at the Experiment execution section:
there is script
and working directory
working directory is the base of the git repository (which is cloned into the docker file)
So if for some reason trains did not properly detect the current working dir here is what should solve the issue, without changing the PYTHONPATH
script path: ./sub_folder/scripy.py working directory: .
What do you think?
Hi ConvolutedSealion94
Just making sure, you spinned the docker-compose of the clearml serving as well ?
Are you getting the error from boto failing to launch additional ec2 instances ?
DeliciousBluewhale87
You could also just upload the data (i.e do not call close). Then you will be able to change it later obviously, this will make in intractable.
BTW: the clearml-data stores delta changes, so if you only change a few files it will only store those.
NastyOtter17 can you provide some more info ?
JitteryCoyote63 of course there is πTask.debug_simulate_remote_task(task_id="<task_id_here>")
oh, if this is the case, why not use the "main" server?
Because it lives behind a VPN and github workers donβt have access to it
makes sense
If this is the case, I have to admit that combining offline-mode and remote execution makes sense, no?
, i thought there will be some hooks for deploying where the integration with k8s was also taken care automatically.
Hi ObedientToad56
Yes you are correct, basically now you have a docker-compose (spinning everything, even though per example you can also spin a standalone container (mostly for debugging).
We are working on a k8s helm chart so the deployment is easier, it will be based on these docker-compose :
https://github.com/allegroai/clearml-serving/blob/main/docker/docker-comp...
Although it's still really weird how it was failing silently
totally agree, I think the main issue was the agent had the correct configuration, but the container / env the agent was spinning was missing it,
I'll double check how come it did not print anything
So can you verify it can download the model ?
Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)
This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).
Yes, that's exactly how clearml is designed, a...
I will take any suggestion πgit remote -v
could be a good start but I'm not familiar with the output structure, is there a template for parsing ?
By default the agent will add the root of the git repository into the pythonpath , so that you can import...
I think CostlyOstrich36 managed to reproduce?!