Reputation
Badges 1
25 × Eureka!AstonishingSeaturtle47 I think there's a workaround for the GitHub multiple repo issue. See https://gist.github.com/gubatron/d96594d982c5043be6d4
Seems like something is not working with the server, i.e. it cannot connect with one of the dockers.
May I suggest to carefully go through all the steps here, make sure nothing was missed
https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md
Especially number (4)
Try removing this magic environment that tells the sub-process there was already an Initialized Task.
import os env = dict(**os.environ) env.pop('TRAINS_PROC_MASTER_ID', None) 🙂
Are you getting the error from boto failing to launch additional ec2 instances ?
Nooooooooooooooooooooooo
NICE! MoodyCentipede68 this is awesome 🙂
Hi @<1631102016807768064:profile|ZanySealion18>
I'm using SSH for authentication, however, known_hosts doesn't seem to be passed to the docker so it prompts for authentification/fingerprint. Any ideas?
Hmm it is supposed to automatically mount your ~/.ssh folder into the docker to solve for that.
First try to set force_git_ssh_protocol: true
None
If that does not he...
Thanks GorgeousMole24
That is a very good point! passing to product guys
Hmm two questions: 1. How come it did not detect the packages when you were running the original task manually? 2. Could it be the poetry manager option is not working correctly?! Can you verify the venv is created with all packages? If so can you post the full log?
Any specific use case for the required "draft" mode?
But essentially Prefect also has agents to run jobs on machines where the processes run (which seems to be exactly the same model as in ClearML),
Yes ait is conceptually very similar
this data is highly regulated data, ...
The main difference that with ClearML the agents are running on Your machines (either local or on Your cloud account) the clearml-server does not actually have access to the data streaming through it.
Does that make sense ?
Hi CleanPigeon16
can I make the steps in the pipeline use the latest commit in the branch?
Yes:
manually clone the stesp's Task (in the UI), and in the UI edit the Execution section and change to "last sommit on branch" and specify the branch name programmatically (as the above, clone+edit)
ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found
Seems like the "run_experiment" step is not defined. Could that be ...
Yep that will fi it, nice one!!
BTW I think we should addtge ability to continue aborted datasets, wdyt?
yes, so it does exist the local process (at least, the command returns),
What do you mean the command returns ? are running the scipt from bash and it returns to bash ?
If you have the check point (see output_uri for automatically uploading it) then you can always load it. Do you mean if you can change the iteration/ step counter? Or do you mean with trains-agent?
Actually this is by default for any multi node training framework torch DDP / openmpi etc.
to setup ClearML agent in kubernetes with the SSH keys?
You can add env variable:CLEARML_AGENT__AGENT__FORCE_GIT_SSH_PROTOCOL="true"https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server_config#dynamic-environment-variables
Hi WittyOwl57
I'm guessing clearml is trying to unify the histograms for each iteration, but the result is in this case not useful.
I think you are correct, the TB histograms are actually a 3d histograms (i.e. 2d histograms over time, which would be the default for kernel;/bias etc.)
is there a way to ungroup the result by iteration, and, is it possible to group it by something else (e.g. the tags of the two plots displayed below side by side).
Can you provide a toy example...
Hi JumpyPig73
import data from old experiments into the dashboard.
what do you mean by "old experiments" ?
Hmm EmbarrassedPeacock82
Let's try with--input-size -1 60 1 --aux-config input.format=FORMAT_NCHWBTW: this seems like a triton LSTM configuration issue, we might want to move the discussion to the Triton server issue, wdyt?
Optional[Sequence[Union[str, Dataset]]]None, list of string or list of Datasets objects
(each one is a parent (supporting multiple parents)
No Task.create is for creating an external Task not logging your own process,
That said you can probably override the git repo with env vars:
None
Okay we have located the issue, thanks guys! We will push a patch release hopefully later today
Here you go:
` @PipelineDecorator.pipeline(name='training', project='kgraph', version='1.2')
def pipeline(...):
return
if name == 'main':
Task.force_requirements_env_freeze(requirements_file="./requirements.txt")
pipeline(...) If you need anything for the pipeline component you can do: @PipelineDecorator.component(packages="./requirements.txt")
def step(data):
some stuff `
I cannot reproduce, tested with the same matplotlib version and python against the community server
Shouldn't this be a real value and not a template
you mean value being pulled to the pod that failed ?
Hi @<1688721797135994880:profile|ThoughtfulPeacock83>
the configuration vault parameters of a pipeline step with the add_function_step method?
The configuration vault are a per set at execution user/project/company .
What would be the value you need to override ? and what is the use case?