Reputation
Badges 1
25 × Eureka!Hi JitteryCoyote63 ,
These properties are usually not available on the UI and are used internal, hence the lack of documentation. Regrading parent
property, it will hold a parent Task.id (str) , that said it has no real effect on the Task itself. You can however search for Tasks with a specific parent ID (For examples, this is how the the hyper parameter class is using this property)
Thanks CynicalBee90 I appreciate the discussion! since I'm assuming you will actually amend the misrepresentation in your table, let me followup here.
1.
SPSS license may be a significant consideration for some, and so we thought it was important to point this out clearly.
SPSS is fully open-source compliant unless you have the intention of selling it as a service, I hardly think this is any users consideration, just like anyone would be using mongodb or elastic search without think...
Hi CynicalBee90
Sorry, I missed the reply.
"I think weβll leave the checkmark and the warning and just write SSPL below," Sounds like a good solution π
2. I have to admit, I would just write "language agnostic", but I will not insist further, so if you feel "platform" helps in explaining the reasoning, I'm with you.
3. "... to do smart analysis on my logged data easily, ..."
If this is the criteria, none of the options is Very easy, but they all have an interface.. not sure how to com...
Please feel free to do so (always better to get it from a user not the team behind the product π )
SubstantialElk6 feel free to tweet them on their very inaccurate comparison table π
Hi CynicalBee90
Always great to have people joining the conversation, especially if they are the decision makers a.k.a can amend mistakes π
If I can summarize a few points here (and feel free to fill in / edit any mistake or leftovers)
Open-Source license: This is basically the mongodb license, which is as open as possible with the ability to, at the end, offer some protection against Amazon giants stealing APIs (like they did for both mongodb and elastic search) Platform & language agno...
TrickySheep9
you are absolutely correct π
GiddyPeacock64 Are you sending the jobs from JupyterLab Kale extension ?
EDIT:
Is the pipeline step itself calling Task.init?
Hi GiddyPeacock64
If you already have K8s setup, and are already using ClearML.
In your kubeflow Yaml:trains-agent execute --id <task_id> --full-monitoring
This will install everything your Task needs inside the docker. Just make sure that you pass the env variable setting the ClearML , see here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L127
Hi RobustHippopotamus53
The way "latest from branch" works:
On the Task you specify the branch name (e.g. "master", no need to add the origin/ prefix) The agent then pulls the latest commit from that branch and updates back the Task to the current commit ID (the latest on the branch at the time of execution) This process ensures reproduciblity and traceability as we can always be certain the exact commit that was executed.Could it be the you "forced-push" a commit/squash, hence the "origina...
Internally we use blob.upload_from_file
it has a default 60sec timeout on the connection (I'm assuming the upload could take longer).
Thatβs the question i want to raise too,
No file size limit
Let me try to run it myself
Maybe that's the issue :
https://github.com/googleapis/python-storage/issues/74#issuecomment-602487082
Hi MortifiedCrow63
I have to admit this is very strange, I think the fact it works for the artifacts and not for the model is kind of a fluke ...
If you use "wait_on_upload" argument in the upload_artifact you end up with the same behavior. Even if uploaded in the background, the issue is still there, for me it was revealed the minute I limited the upload bandwidth to under 300kbps.It seems the internal GS timeout assumes every chunk should be uploaded in under 60 seconds.
The default chunk...
Hi MortifiedCrow63
sawΒ
Β , ...
By default ClearML
will only log the exact local place where you stored the file, I assume this is it.
If you pass output_uri=True
to the Task.init
it will automatically upload the model to the files_server and then the model repository will point to the files_server (you can also have any object storage as model storage, e.g. output_uri=s3://bucket
)
Notice yo...
Hi MortifiedCrow63 , thank you for pinging! (seriously greatly appreciated!)
See here:
https://github.com/googleapis/python-storage/releases/tag/v1.36.0
https://github.com/googleapis/python-storage/pull/374
Can you test with the latest release, see if the issue was fixed?
https://github.com/googleapis/python-storage/releases/tag/v1.41.0
Hi MortifiedCrow63
Sorry getting GS credentials is taking longer than expected π
Nonetheless it should not be an issue (model upload is essentially using the same StorageManager internally)
MortifiedCrow63 , hmmm can you test with manual upload and verify ?
(also what's the clearml version you are using)
Could you test with the same file? Maybe timeout has something to do with the file size ?
I do not think this is the upload timeout, it makes no sense to me for GCP package (we do not pass any timeout, it's their internal default for the argument) to include a 60sec timeout for upload...
I'm also not sure where is the origin of the timeout (I'm assuming the initial GCP handshake connection could not actually timeout, as the response should be relatively quick, so 60sec is more than enough)
Hi MortifiedCrow63
I finally got GS credentials, there is something weird going on. I can verify the issue, with model upload I get timeout error while upload_artifacts just works.
Just updating here that we are looking into it.
Hi SkinnyPanda43
Can you attache the full log?
Clearml agent is installed before your requirements.txt , at least in theory it should not collide
I can add files to the data set, even after I finish the experiment?
Correct
https://clear.ml/docs/latest/docs/clearml_data#creating-a-dataset
https://clear.ml/docs/latest/docs/guides/data%20management/data_man_cifar_classification
https://github.com/allegroai/clearml/blob/master/docs/datasets.md#create-dataset-from-code
Notice the parents
argument when creating a new Dataset
Hi SmallDeer34
Hmm I'm not sure you can, the code will by default use rglob
with the last part of the path as wildcard selection
π
You can of course manually create a zip file...
How would you change the interface to support it ?
So the main difference is kedro pipelines are function based steps (I might be overly simplifying, so please take it with a grain of salt), while in ClearML pipeline is Job, i.e. it needs its own environment and is longer than a few seconds (as opposed to a single function)