Reputation
Badges 1
25 × Eureka!Not yet π
It should not be complex to implement,
The actual aws auto scaler class is implementing just two functions:
def spin_up_worker(self, resource, worker_id_prefix, queue_name):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L104
def spin_down_worker(self, instance_id):
https://github.com/allegroai/clearml/blob/e9f8fc949db7f82b6a6f1c1ca64f94347196f4c0/clearml/automation/auto_scaler.py#L...
think this is because of the version of xgboost that serving installs. How can I control these?
That might be
I absolutely need to pin the packages (incl main DS packages) I use.
you can basically change CLEARML_EXTRA_PYTHON_PACKAGES
https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L100
for example:export CLEARML_EXTRA_PYTHON_PACKAGES="xgboost==1.2.3 numpy==1.2.3"
so for example if there was an idle GPU and Q3 take it and then there is a task comes to Q2 which we specified 3GPU but now the Q3 is taken some of these GPU what will happen
This is a standard "race" the first one to come will "grab" the GPU and the other will wait for it.
I'm pretty sure enterprise edition has preemption support, but this is not currently part of the open source version (btw: also the dynamic GPU allocation, I think, is part of the enterprise tier, in the opensource ...
Ad1. yes, think this is kind of bug. Using _task to get pipeline input values is a little bit ugly
Good point, let;s fix it π
new pipeline is built from scratch (all steps etc), but by clicking "NEW RUN" in GUI it just reuse existing pipeline. Is it correct?
Oh I think I understand what happens, the way the pipeline logic is built, is that the "DAG" is created the first time the code runs, then when you re-run the pipeline step it serializes the DAG from the Task/backend.
Th...
The additional edges in the graph suggest that these steps somehow contain dependencies that I do not wish them to have.
PanickyMoth78 I think I understand what you are saying, but it is hard to see if there is a "bug" here or a feature...
Can you post the full code of the pipline?
'relaunch_on_instance_failure'
This argument is Not part of the Pipeline any longer, are you running the latest clearml
python version?
I wonder if this hack would work
Assume you upload an artifact/model to ' s3://storage.yandexcloud.net:443/clearml-models ' notice the port is added. Would that trigger a popup in the UI?
Also what happens if you add tge credential manually in the profile page?
HugeArcticwolf77 I think this issue was resolved with the latest version 1.8.0, can you try to rerun the entire pipeline with the latest version?
task.set_script(working_dir=dir, entry_point="my_script.py")
Why do you have this part? isn't it the same code, the script entry point is auto detected ?
... or when I run my_script.py locally (in order to create and enqueue the task)?
the latter, When the script is running locally
So something like
os.path.join(os.path.dirname(file), "requirements.txt")
is the right way?
Sure this will work π
ReassuredTiger98 yes this is odd:
also:Warning, could not locate PyTorch torch==1.12 matching CUDA version 115, best candidate 1.12.0.dev20220407
Seems like it found a matching version and did not use it...
Let me check that
UnevenDolphin73 since at the end plotly is doing the presentation, I think you can provide the extra layout here:
https://github.com/allegroai/clearml/blob/226a6826216a9cabaf9c7877dcfe645c6ae801d1/clearml/logger.py#L293
But itβs running in docker mode and it is trying to ssh into the host machine and failing
It is Not sshing to the machine it is sshing directly Into the container.
Notice the port is is sshing to is 10022 which is mapped into the container
MuddySquid7 the fix was pushed to GitHub, you can now install directly from the repo:pip install git+
Yep π
Basically:
` task = Task.get_task(task_id='aaaa')
while task.status not in ('completed', 'stopped',):
do something ?
sleep(15) `(Notice task.status / task.get_status() will refresh the Task status on every call)
It should have been:output_uri="s3://company-clearml/artifacts/bethan/sales_journeys/artifacts/examples/load_artifacts.f0f4d1cd5eb54795b11508dd1e739145/artifacts/filename.csv.gz/filename.csv.gz
SmallBluewhale13
And the Task.init registers 0.17.2 , even though it prints (while running the same code from the same venv) 0.17.2 ?
HealthyStarfish45 could you take a look at the code, see if it makes sense to you?
What I'm getting to, is maybe we build a template, then you could fill in the gaps ?
SmarmySeaurchin8 checks the logs, maybe you can find something there
Notice the error code:Action failed <400/401: tasks.create/v1.0 (Invalid project id: id=first_attempt)>
If that is the case, The project ID is incorrect (project id is not the project name)
Hi @<1526371965655322624:profile|NuttyCamel41>
. I do that because I do not know how to get the pickle file into the docker container
What would the pickle file do?
and load the MinMaxScaler within the script, as the sklearn dependency is missing
what do you mean by that? are you getting an error when loading your model ?
I understand I can change the docker image for a component in the pipeline, but for the
it isnβt possible.
you can always to Task.current_task.connect()
from the pipeline function itself, to connect more configuration arguments you basically add via the function itself, all the pipeline logic function arguments become pipeline arguments, it's kind of neat π regrading docker, the idea is that you use a very basic python docker (the default for services) queue for all...
Yeah the doctring is always the most updated π
DeliciousKoala34 any chance you are using PyCharm 2022 ?
FierceHamster54 what you are saying that Inside the container it took 20 min to run? or that spinning the GCP instance until it registered as an Agent took 20min ?
Most of the time is took by building wheels for
nympy
and
pandas
...
BTW: This happens if there is a version mismatch and pip decides it needs to build the numpy from source, Can you send the full logs of that? Maybe we can somehow avoid that?
This looks good to me...
I will have to look into it, because it should not download it...
another option is the download fails (i.e. missing credentials on the client side, i.e. clearml.conf)
And where is the ClearmlLogger
comming from?
hmm that is odd, let me check
LethalCentipede31 I think seaborn is using matplotlib, it should just work:
https://github.com/allegroai/clearml/blob/6a91374c2dd177b7bdf4c43efca8e6fb0d432648/examples/frameworks/matplotlib/matplotlib_example.py#L48