Reputation
Badges 1
25 × Eureka!Hmm I wonder, can you try with this line before?Task._report_subprocess_enabled = False frameworks = { 'tensorboard': True, 'pytorch': False } Task.init(...)
task.update({'script': {'version_num': 'my_new_commit_id'}})
This will update to a specific commit id, you can pass empty string '' to make the agent pull the latest from the branch
How about this one:
None
potential sources of slow down in the training code
Is there one?
GiganticTurtle0 we had this discussion in the wrong thread, I moved it here.
Moved from the wrong thread
Martin.B ย ย [1:55 PM]
GiganticTurtle0 ย the sample mock pipeline seems to be running perfectly on the latest code from GitHub, can you verify ?
Martin.B ย ย [1:55 PM]
Spoke too soon, sorryย ๐ ย issue is reproducible, give me a minute here
Alejandro C ย ย [1:59 PM]
Oh, and which approach do you suggest to achieve the same goal (simultaneously running the same pipeline with differen...
You can always specify diff clearml.conf files with --config-file ๐
Hmm SuccessfulKoala55 what do you think?
hi @<1546303293918023680:profile|MiniatureRobin9>
I can still see the metrics in Grafana. I
it will not delete it from grafana, it means it's no longer collected, make sense ?
What's the clearml-server version ?
So inside the pipeline logic you can do Task.current_task().id
Or inside a component Task.current_task().parent
Hmm so the SaaS service ? and when you delete (not archive) a Task it does not ask for S3 credentials when you select delete artifacts ?
single task in the DAG is an entire ClearML
pipeline
.
just making sure detials are not lost, "entire ClearML pipeline ." : the pipeline logic is process A running on machine AA.
Every step of that pipeline can be (1) subprocess, but that means the exact same environement is used for everything, (2) The DEFAULT behavior, each step B is running on a different machine BB.
The non-ClearML steps would orchestrate putting messages into a queue, doing retry logic, and tr...
Hey SarcasticSparrow10 see here ๐
https://allegro.ai/clearml/docs/docs/deploying_clearml/clearml_server_linux_mac.html#upgrading
Whoa, are you saying there's an autoscaler that
doesn't
use EC2 instances?...
Just to be clear the ClearML Autoscaler (aws) will spin instances up/down based on jobs in the queue it is listening to (the type of EC2 instances and configuration is fully configurable)
Essentially, I think the key thing here is we want to be able to build the entire Pipeline including any updates to existing pipeline steps and the addition of new steps without having to hard-code any Task IDโs and to be able to get the pipelineโs Task ID back at the end.
Oh if this is he case then basically you CI/CD code will be something like:
@PipelineDecorator.component(return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_...
It seems something is wrong with the server itself...
We should probably change it so it is more human readable ๐
where is it running? could you restart all the dockers ? Is it running on your machine?
I wanted to know what the best way to create and register the SSL keys is.
of I see, so basically you need to add it to add nginx with SSL certificates on top of the hosted service (or configure the dockercompose nginx container to add that)
Then you need to add the self signed SSL into any host machine (I'm assuming these are not "valid" SSL certificates generated by a reputable SSL provider)
But generally speaking if you are using self hosted clearml-server on a local machine that n...
Hi @<1552101458927685632:profile|FreshGoldfish34>
self-hosted, you mean the open source ? if so, then yes totally free ๐
That said I would recommend to have the server inside your VPN, just in case from a security perspective
How do I best utilize clearml in this scenario such that any coworker of mine is able to reproduce my work with the same pipeline?
Basically this sounds to me like proper software developemnt design (i.e. the class vs stages).
In order to make sure Anyone can reproduce it, you mean anyone can rerun the "pipeline" ? If this is the case just add Task.init (maybe use a specific Task type) and the agents will make sure this is Fully reproducible.
If you mean the data itself is stored, the...
. but when we try to do a "New Run" from UI, it tries to follow the DAG of previous run (the run with all child nodes skipped) and the new run fails too.
This is odd, is this reproducible ? what's the clearml python package version ?
Hi PanickyMoth78
it was uploading fine for most of the day but now it is not uploading metrics and at the end
Where are you uploading metrics to (i.e. where is the clearml-server) ?
Are you seeing any retry logging on your console ?packages/clearml/backend_interface/metrics/reporter.py", line 124, in wait_for_events
This seems to be consistent with waiting for metrics to be flushed to the backend, but usually you will see retry messages on your console when that happens
Hmm, I still wonder what is the "correct" answer for most people, is empty string in argparse redundant anyhow? will someone ever use it?
DistressedGoat23
you can now access the weights model objectpip install 1.8.1rc0
then:
` def callback(_, model_info):
model_info.weights_object # this is your xgboost object
model_info.name = "my new name"
return model_info
WeightsFileHandler.add_pre_callback(callback) `
Welp, it's been a day with the new settings, and stats went up 140K for API calls
... going to check again tomorrow to see if any of that was spill over from yesterday
140K calls a day, how often are you sending scalars ? how long is it running? how many experiments are running ?
I want pipeline / task dispatch to be reported and monitored outside of clearml. For example, I might want to log the dispatch event in some non-clearml system and then monitor the health of the pipeline and alert if if it is pending for too long.Hmm interesting, so like a callback?!
I'm thinking a callback is being executed after the Pipelines is sent, but once the callback is done, the pipeline process leaves?
Does that make sense ?
I might want to dispatch other jobs from within the same p...