Reputation
Badges 1
25 × Eureka!I'm not sure I follow the example... Are you sure this experiment continued a previous run?
What was the last iteration on the previous run ?
Hmm can you test with the latest RC? or even better from the GitHub (that said the Github will break the interface, as we upgraded the pipeline π )
Hi SourOx12
How do you set the iteration when you continue the experiment? is it with Task.init continue_last_task ?
See here:
https://pip.pypa.io/en/stable/user_guide/#environment-variables
Pass these environment variables as part of the YAML template you are using with the k8s.
Should work for both π
Hi @<1564785037834981376:profile|FrustratingBee69>
It's the previous container I've used for the task.
Notice that what you are configuring is the Default container, i.e. if the Task does not "request" a specific container, then this is what the agent will use.
On the Task itself (see Execution Tab, down below Container Image) you set the specific container for the Task. After you execute the Task on an Agent, the agent will put there the container it ended up using. This means that ...
yes, or (because I deployed clearml using helm in kubernetes) from the same machine, but multiple pods (tasks).
Oh now I see, long story short, no π the correct way of doing that is every node/pod creates it's own dataset,
then when you are done, you create a new version with the X datasets that you created as parents, the newly created version is just "meta" it basically tells the system how to combine the previously generated datasets (i.e. no data is actually re-uploa...
If I call explicitlyΒ
task.get_logger().report_scalar("test", str(parse_args.local_rank), 1., 0)
Β , this will log as expected one value per process, so reporting works
JitteryCoyote63 and do prints get logged as well (from all processes) ?
or even different task types
Yes there are:
https://clear.ml/docs/latest/docs/fundamentals/task#task-types
https://github.com/allegroai/clearml/blob/b3176a223b192fdedb78713dbe34ea60ccbf6dfa/clearml/backend_interface/task/task.py#L81
Right now I dun see differences, is this a deliberated design?
You mean on how to use them? I.e. best practice ?
https://clear.ml/docs/latest/docs/fundamentals/task#task-states
Hmm do you host it somewhere? Is it pre-installed on the container?
Hi HandsomeGiraffe70
First:# During pipeline initialisation pipeline_params is empty and we need to use default values. # When pipeline start the run, params are lunched again, and then pipeline_params can be used.Hmm that should probably be fixed, maybe a function on the pipeline to deal with it ?
When I reduce tune_optime value to just 'recall'. Pipeline execution failed with msg:
ValueError: Node 'tune_et_for_Precision', base_task_id is empty
.
I would...
Hi QuaintPelican38
Can you ssh to {instance_public_ip_address}:10022 (something like ssh -p 10022 user@IP_HERE )?
Basically just getting the password prompt means you are okay.
I suspect that you have some AWS security definition (firewall) that prevents a direct access to the instance, could that be?
Ohh I see, so basically the ASG should check if the agent is Idle, rather than the Task is running ?
Do we have it on the git issue ?
The easiest is to pass an entire trains.conf file
Is it possible to make a connection to a S3 bucket via this authentication method with the open source version on EKS?
Hi BoredBluewhale23
In your setup, are we talking about agents running inside the Kubernetes cluster, or clients connecting from their own machine ?
Hover over the border (I would suggest to use the full screen, i.e. maximize)
It's always the details... Is the new Task running inside a new subprocess ?
basically there is a difference between
remote task spawning new tasks (as subprocesses, or as jobs on remote machine), remote task still running remote task, is being replaced by a spawned task (same process?!)UnevenDolphin73 am I missing a 3rd option? which of these is your case?
p,s. I have a suspicion that there might be a misuse of "Task" here?! What are you considering a Task? (from clearml perspective a Task...
Hi @<1523715429694967808:profile|ThickCrow29>
I am using the PipelineController with abort_on_failure set to False.
Is this a pipeline from code or from Tasks?
What is the clearml version?
Lastly, if a component fails, and another components is dependent on it's output, how would it run? if it is not dependent, why is it a child component?
Hi GrotesqueMonkey62 any chance you can be a bit more specific? Maybe a screen grab?
Here is how it works, if you look at an individual experiment scalars are grouped by title (i.e. multiple series on the same graph if they have the same title)
When comparing experiments, any unique combination of title/series will get its own graph, then the different series on the graph are the experiments themselves.
Where do you think the problem lays ?
Hi @<1523709807092043776:profile|GrittyKangaroo27>
some of my completed datasets,
This only has an effect on the dataset when it is being uploaded, if completed it is there for logging purposes only. What is exactly the use case? (just to be verify, once a Task/Dataset is completed you cannot edit it)
As I installed ClearML using pip,
Where is the clearml-serving runs ? usually your configuration file is in ~/clearml.conf
Notice if it is not there it means it is using the defaults so just create a new one and add that line
Hmm make sense, then I would call the export_task once (kind of the easiest to get the entire Task object description pre-filled for you) with that, you can just create as many as needed by calling import_task.
Would that help?
Hi EnchantingWorm39
Great question!
Regrading the data management, I know the enterprise edition has full support for unstructured data, and we plan to soon have a solution for structured data as part of the open source (soon= hopefully in a month time)
Regrading model serving, I know you can integrate with TFServing or seldon with very little effort (usually the challenge is creating triggers etc, but but in most cases this is custom code anyhow π )
I do not have experience with Cortex/B...
@<1541954607595393024:profile|BattyCrocodile47> not restarting the docker, restarting the Docker service (on Mac it's an app, I think there is an option on the Docker app to do that)
think perhaps it came across as way more passive aggressive than I was intending.
Dude, you are awesome for saying that! no worries π we try to assume people have the best intention at heart (the other option is quite depressing π )
I've been working on a Azure load balancer example, ...
This sounds exciting, let me know if we can help in any way
additionally, I found is that clearml==1.0.5 package is able to find these partial changes, newer versions find nothing at all, maybe it's because it's always comparing against remote
Hmm it was always from remote...
it is actually doing the following:git rev-parse --abbrev-ref --symbolic-full-name @{u}Then with the branch name output,git diff --submodule=diff <add_branch_name_here>
BTW: Can you also please test with the latest clearml version , 1.7.2