Reputation
Badges 1
25 × Eureka!Hi JitteryCoyote63
could you check if the problem exists in the latest RC?pip install clearml==1.0.4rc1
The difference is that I want a single persistent machine, with a single persistent python script that can pull execute and report multiple tasks
So basically instead of using the agent, so simply spin a sub process ?
Hi DisgustedDove53
When you say "deployment" there are a lot of way to interpret that 🙂 what exactly are you looking for ?
LovelyHamster1 Now I see... Interesting credentials ability. Specifically all the S3 access on trains is derived from the ~/clearml.conf
credentials section :
https://github.com/allegroai/clearml/blob/ebc0733357ac9ead044d0ed32d41447763f5797e/docs/clearml.conf#L73
( or the AWS S3 environment variables )
I'm not sure how this AWS feature works, I suspect it is changing the AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY variables on the ec2 instance. If this is the case, it should work out of...
cuda 10.1, I guess this is because no wheel exists for torch==1.3.1 and cuda 11.0
Correct
how can I enforce a specific wheel to be installed?
You mean like specific CUDA wheel ?
you can simple put the http link to the wheel in the "installed packages", it should work
Hi YummyFish22
Looks like the task does not have "Task.init" call on the main script (or an import of clearml)? could that be the case?
Hi PanickyMoth78
So the current implantation of the pipeline parallelization is exactly like python async function calls:for dataset_conf in dataset_configs: dataset = make_dataset_component(dataset_conf) for training_conf in training_configs: model_path = train_image_classifier_component(training_conf) eval_result_path = eval_model_component(model_path)
Specifically here since you are passing the output of one function to another, image what happens is a wait operation, hence it ...
Hi @<1598487094601191424:profile|MysteriousCow84>
only one of them uses an already created venv from cache for this task. And the other node starts to re-create the same virtual environment.
Just be clear, the second one is running, but it does not use the same venv as the other one (that is running in parallel), is that correct?
I guess no hurdles vs. safety is inherently no solvable.
LOL
Point taken, I reserve the option to comeback with alternative solutions 😉
so that one app I am using inside the Task can use the python packages installed by the agent and I can control the packages using clearml easily
That's the missing part for me, You have all the requiremnts on the Task (that you can fully control), the agent is setting a brand new venv for each Task inside a container (the venv is cahced, and you can also make the agent just use the default python without installing anything). The part where I'm lost is why would you need the path to t...
force_analyze_entire_repo to to True 🙂
(false is the default)
Let's try:
` echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/docker-clean ; chown -R root /root/.cache/pip ; export DEBIAN_FRONTEND=noninteractive ; export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL libsm6 libxext6 libxrender-dev libglib2.0-0" ; [ ! -z $(which git) ] || export CLEARML_APT_INSTALL="$CLEARML_APT_INSTALL git" ; declare LOCAL_PYTHON ; for i in {10..5}; do which python3.$i && python3.$i -m pip --version && export LOCAL_PYTHON=$(which python3.$i) && b...
I am just about to move house, which is stressful enough without a global pandemic(!), so until that's completed I won't commit to anything.
Sure man 🙂 no rush, I appreciate the gesture regardless of the outcome
Many thanks!
BitterStarfish58 I would suspect the upload was corrupted (I think this is the discrepancy between the files size logged, to the actual file size uploaded)
I think that what happened was you are running it on the host machine (not inside the docker)
I probably missed a "
somewhere
model_path/run_2022_07_20T22_11_15.209_0.zip , err: [Errno 28] No space left on device
Where was it running?
I take it that these files are also brought into pipeline tasks's local disk?
Unless you changed the object, then no, they should not be downloaded (the "link" is passed)
Hmm can you try:--args overrides="['log.clearml=True','train.epochs=200','clearml.save=True']"
If you do not have a lot of workers, that I would guess console outputs
Hi @<1523704757024198656:profile|MysteriousWalrus11>
in the pipeline quickly between pipeline.add_step() functions?
You mean you want to get access to the parent Task ids and query them directly ?
I think the easiest way is to pass it as one of the parameters
(you can get to the pipeline Task itself from the running component, then get the dag, but these are internal functions, maybe we should make them external for easier querying ?)
pipe.add_step(
name="stage_process",
...
I don't know whether you have access to the backend,
Creepy , no I do not 🙂
I can't make anything appear in the console part of the ui
clearml_task.logger.report_text("some text")
should work
RoughTiger69 I think this could work, a pseudo example:
` @PipelineDecorator.component(...)
def the_last_step_before_external_stuff():
print("doing some stuff")
@PipelineDecorator.pipeline()
def logic():
the_last_step_before_external_stuff()
if not check_if_data_was_ingested_to_the_system:
print("aborting ourselves")
Task.current_task().abort()
# we will not get here, the agent will make sure we are stopped
sleep(60)
# better safe than sorry
exit(0) `wdyt? (the...
Out of curiosity, if Task flush worked, when did you get the error, at the end of the process ?
MassiveHippopotamus56
the "iteration" entry is actually the "max reported iteration over all graphs" per graph there is different max iteration. Make sense ?
Ohh I see, could you copy paste what you put there (instead of the secret and key *** will do 🙂 )
Hi CheekyElephant36
First you need to run it once on your machine, once this is done (only a few steps is enough), you can one it and enqueue it. Then to actually connect the aws autoscaler (the part that spins machines and runs tasks) go to applications and select the aqs autoscaler.
Btw i think the next video will be about YOLO + autoscaler