But only 1 node will copy it.
they can only copy it after the first is finished, and they are not aware it is trying to set the exact venv, hence the race
think this is because of the version of xgboost that serving installs. How can I control these?
That might be
I absolutely need to pin the packages (incl main DS packages) I use.
you can basically change CLEARML_EXTRA_PYTHON_PACKAGES
https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L100
for example:export CLEARML_EXTRA_PYTHON_PACKAGES="xgboost==1.2.3 numpy==1.2.3"
Using the dataset.create command and the subsequent add_files, and upload commands I can see the upload action as an experiment but the data is not seen in the Datasets webpage.
ScantCrab97 it might be that you need the latest clearml
package installed on the client end (as well as the new server with the UI)
What is your clearml package version ?
basically @<1554638166823014400:profile|ExuberantBat24> you can think of hyper-datasets as a "feature-store for unstructured data"
Plan is to have it out in the next couple of weeks.
Together with a major update in v0.16
Hmmm that sounds like a good direction to follow, I'll see if I can come up with something as well. Let me know if you have a better handle on the issue...
DilapidatedDucks58 so is this more like a pipeline DAG that is built ?
I'm assuming this is more than just grouping ?
(by that I mean, accessing a Tasks artifact does necessarily point to a "connection", no? Is it a single Task everyone is accessing, or a "type" of a Task ?
Is this process fixed, i.e. for a certain project we have a flow (1) executed Task of type A, then Task of type (B) using the artifacts fro Task (A). This implies we might have multiple Tasks of types A/B but they are alw...
TypeError:
init
() got an unexpected keyword argument 'base_pod_num'
Could you post the entire log?
So was the issue solved?
shows that the trains-agent is stuck running the first experiment, not
the trains_agent execute --full-monitoring --id a445e40b53c5417da1a6489aad616fee
is the second trains-agent instance running inside the docker, if the task is aborted, this process should have quit...
Any suggestions on how I can reproduce it?
Hi StraightDog31
I am having trouble using the
StorageManager
to upload files to GCP bucket
Are you using the storagemanager
directly ? or are you using task.upload_artifact
?
Did you provide the GS credentials in the clearml.conf file, see example here:
https://github.com/allegroai/clearml/blob/c9121debc2998ec6245fe858781eae11c62abd84/docs/clearml.conf#L110
I guess. or pipelines that you can compose after running experiments to see that experiments are connected to each other
hmm what do you mean by "compose after running experiments" ? like a way to group them? what is the relation between one "item" to another ?
If this is a sequence of Tasks , are they executed by a controller ?
Okay, so the idea behind the new decorator is not to group all the defined steps under the same script so that they share the same environment, but rather to simplify the process of creating scripts for each step and avoid manually calling
Task.init
on those scripts.
Correct, and allow users to more easily create Tasks from code.
Regarding virtual environment creation from caching, I will keep running benchmarks (from what you say it might be due to high workload ...
Hi VexedCat68
txt file or pkl file?
If this is a string , it just stored it (not as a file, this is considered a "link")
https://github.com/allegroai/clearml/blob/12fa7c92aaf8770d770c8ed05094e924b9099c16/clearml/binding/artifacts.py#L521
I don't have the compose file, or at least can't seem to find it in
/opt
you can manually take down all dockers with:docker ps
then docker stop <container id>
for each container id
We should probably change it so it is more human readable 🙂
Yep... they are pushing "heavy" users away from these instances. Nothing really you can do, maybe switch to Azure/GCP, but it might be the same there
Okay, I think I understand, but missing something. It seems you call get_parameters from old API , is your code actually calling get_parameters ? The trains-agent runs the code externally, whatever happens inside the agent should have now effect on the code. So who exactly is calling the task.get_parameters, and well, why ? :)
Hi SubstantialElk6
ClearML-Serving is already out with a new version, the ETA for the next ClearML-serving full 1.0 (which is the new redesign version) is the end of May
Hmm is this similar to this one https://allegroai-trains.slack.com/archives/CTK20V944/p1597845996171600?thread_ts=1597845996.171600&cid=CTK20V944
Hi PerfectChicken66
every X iterations and delete the older ones with
I have to ask, why not just overwrite the artifact? it is basically the same, no ?!
older ones with
delete_artifacts
from
Task
I think you are correct, when you delete the entire Task you can specify, delete artifacts, but it does not do that on delete_artifact 😞
You can manually do that with:
` task._delete_uri(task.artifacts["artifact"].url)
task.delete_artifact() ...
hmm can you share the log of the Task? (the clearml-session created Task)
I see, so there’s no way to launch a variant of my last run (with say some config/code tweaks) via CLI, and have it re-use the cached venv?
Try:clearml-task ... --requirements requirements.txt
You can also clone / override args withclearml-task --base-task-id <ID-of-original-task-post-agent> --args ...
See full doc: https://clear.ml/docs/latest/docs/apps/clearml_task/
a bit sad that there is no working integration with one of the leading time series framework...
You mean a series darts reports ? if it does report it, where does it do so? are you suggesting we have Darts integration (which sounds like a good idea) ?
what does it mean to run the steps locally?
start_locally : means the pipeline code itself (the logic that runs / controls the DAG) runs on the local machine (i.e. no agent), but this control logic creates/clones Tasks and enqueues them, for those Tasks you need an agent to execute them
run_pipeline_steps_locally=True: means the Tasks the pipeline creates, instead of enqueuing them and having an agent runs them, they will be launched on the same local machine (think debugging, other...
Sure, venv mode
It will automatically switch to docker mode
JitteryCoyote63 What did you have in mind?