AgitatedDove14

49 Questions, 8122 Answers

Active since 10 January 2023

Last activity one year ago

Reputation

Badges 1

25 × Eureka!

Answers 8122

0 Hi, I Was Getting A Really Weird Error Due To Mismatch On The Versions Between The Installed Libraries In My Environment And The Ones Ran In The Node (I Manually Changed The Installed Packages And Everything Worked). How Can I Force Trains To Use Exactly

GrievingTurkey78

maybe since the package is not directly imported in my code it is possible to get a different version to what I have locally (?).

If these are derivative packages (i.e. imported by other packages) they are not automatically logged when executing the Task manually (in order to keep the "installed packages as lean as possible on the one hand but specify also specify the important packages for you)
That said, when the "trains-agent" executed the task it will store nack...

4 years ago

0 Hey Has Anyone Managed To Capture Darts Logging With Clearml When Using The Temporal Fusion Transformers ? Even When Overriding Their Trainer With A Custom Pytorch Lightning Trainer It Seems That Clearml Cannot Retrieve The Iteration Log...

Where do you have your Task.init ?

2 years ago

0 Hello Everyone, I'M Currently Trying Clearml-Serving To Serve A Model Via An Endpoint. I Followed The Tutorial In The Documentation, But When I Try A Request, I Get An Error. Here It Is: Curl -X Post "

Is it not possible to serve a model with preprocessing pipeline from scikit-learn using clearml-serving?

of course it is, did you first try the example , here: None
If you need to run your own LogisticRegression call you can use this example:
None
Notice this is where the custom endpoint actually calls the prediction: [None](https...

one year ago

0 Hi, I'M Trying To Set Up My Trains-Server And I'M Getting The Following:

sudo curl -L " -s)-$(uname -m)" -o /usr/local/bin/docker-compose

4 years ago

0 Hi Everybody, I’M Getting Errors With Automatic Model Logging On Pytorch (Running On A Dockered Agent).

(with older clearml versions though…).

Yes, we added content type header for the files when uploading to S3 (so it is easier for users to serve them back). But it seems the python 3.5 casting from Path to str breaks it mimetype call....

3 years ago

0 How Can I Modify The Line Executed By The Agent At The Beginning

MotionlessCoral18 so did it solve the issue ?

3 years ago

0 Hello! I'M Just Starting Out With Clearml, And I Seem To Be Having Some Sort Of Conflict Between

SmallDeer34
I think this is somehow related to the JIT compiler torch is using.
My suspicion is that JIT cannot be initialized after something happened (like a subprocess, or a thread).
I think we managed to get around it with 1.0.3rc1.
Can you verify ?

4 years ago

0 Hey, I'Ve Spin Up A Worker Using Aws Autoscaler In Clearml Self Hosted Server Running On Kubernetes. However, I Can'T Find The Agent On The Workers Page. Any Idea Why It'S Not Showing Up? Full_Log:

Nicely found @<1595587997728772096:profile|MuddyRobin9> !

2 years ago

0 Hi All! Please Tell Me There Are Examples Of Clearml And Pytorch-Lightning Integration

we made two tb versions of / task and wrote in parallel.
And I wanted to know if it is possible here as well.

Basically you will have different series (based on the TB log file) on the same graph so you can compare 🙂 all automatically

4 years ago

0 Hi There! I Am Using A Custom Clearml Installed In K8S Using The Official Helm-Chart (With Some Modifications). I Am Trying To Set Up Training That Runs From An Engineer’S Local Laptop In The K8S Cluster Using Clearml-Task. The Single File Variant (E.G. T

The problem is that even when I mount the SSH key into the root home directory (e.g.,

/root/.ssh/id_rsa

with the correct permissions set to 400) I still encounter the same error.

The agent automatically mount's the .ssh folder from the host into the container, making sure all the permissions are set,

how can I run

pip install -e .

in general the agent will add the "working" dir into the PYTHONPATH so that you should not have to manually do "-e ."
Tha...

12 months ago

0 Is There Any Simple Way To Orchestrate A Batch To Train A Model With Different Features (In Order To Do Feature Selection, For Example) Through A Single .Py File? I Saw The Following Example

ShallowGoldfish8 this call does that:
https://github.com/allegroai/clearml/blob/0397f2b41e41325db2a191070e01b218251bc8b2/examples/advanced/execute_remotely_example.py#L127

3 years ago

0 Hello, We Are Currently Working On A Hyperparameter Tuning Job For Object Detection Following This Tutorial

BTW: from the instance name it seems like it is a VM with preinstalled pytorch, why don't you add system site packages, so the venv will inherit all the preinstalled packages, it might also save some space 🙂
DeterminedToad86 see here:
https://github.com/allegroai/clearml-agent/blob/0462af6a3d3ef6f2bc54fd08f0eb88f53a70724c/docs/clearml.conf#L55
Change it on the agent's conf file to:
system_site_packages: true

4 years ago

0 My Agent Is Not Fully Utilized. I Wonder Anyhow I Could Run Multi-Task On A Same Agent Without Queuing?

HugePelican43 sure you can, usually the limiting factor is memory, as it cannot be shared among processes, so if one allocated all memory the second process will crash with out of memory error

4 years ago

0 Hey, I Hope This Is The Right Place To Ask. We'Re A Small Data Science Team That Wants To Log Everything About Our Ml Models. Looking Around On The Internet, Mostly Mlflow Is Being Recommended, But Occasionally The Name Trains Pop-Ups. According To You,

Wiki edit anyone ?

5 years ago

0 Hi I Have An Issue Where Experiments Are All Showing That They Started From Iteration 0. This Is Even True For Experiments Which I Know Used To Show The Correct Iteration, So It Seems To Be Due To An Update Of The Web Interface. Here You Can See That Sup

No, an old experiment changed, nothing was rerun

ohh, that is odd. I think the max iteration value is stored on the DB, which is odd if it changed after an update.
BTW: just making sure, could it be these Tasks were imported ? (i.e. offline execution + import)

3 years ago

0 In Ui Under Execution Tab, I See That The Trains Has

What's the "working directory" ?
What's the trains-agent version?
(yes this should have worked, as long as the package "test" is there)

4 years ago

0 Quick Question, Can Trains Log Keras Loss Values And/Or Metrics Automatically? Or Would I Have To Attach A Tensorboard Callback?

FYI 🙂
https://www.tensorflow.org/guide/keras/custom_callback

4 years ago

0 Hi I Wanted To Use Method Task.Reset() Or Task.Delete() However None Of That Seems To Be Able To Delete

I want to be able to delete only the logs since they are taking a lot of space in my case.

I see... I do not think this is possible 😞
You can disable the auto logging though ... pass auto_connect_streams=False to Task.init

2 years ago

0 Greetings, Could You Please Clarify If It Is Possible To Reinstall All Packages Every Time? For Example, I Tried To Start The Agent With Docker Options And Got The Following Message:

How so? they are in one place? the creation of the venv is transparent, and the packages that are there are everything you have in the docker, plus the ability to override them from the UI.
What am I missing here ?

4 years ago

0 Hello Guys, I Have A Strange Situation With A Pipeline Controller I'M Testing Atm. If I Run The Controller Directly In My Pycharm On Notebook It Connects Correctly To The K8S Cluster With Trains Installed. After This, If I Go Directly In The Ui, I Reset T

I mean this blob is then saved on the fs

It can if you do:
temp_file = task.connect_configuration('/path/to/config/file', name='configuration object is a config file')Then temp_file is actually a local copy of the text coming from the Task.
When running in manual mode the content of '/path/to/config/file' is stored on the Task When running remotely by the agent, the content from the Task is dumped into a temp file and the path to the file is returned in temp_file

4 years ago

0 Hello I'M New Here, I Found This Error When Testing My Tensorflow / Keras Model. I Already Create The Model Endpoint By Running Command 'Clearml-Serving --Id <Service_Id> Model Add --Engine Triton --Endpoint "Model_Name"... '. Also My Tensorflow / Keras M

NICE! MoodyCentipede68 this is awesome 🙂

3 years ago

0 Hi, How Can I Use

I see, by default it will look for requirements.txt in the root of the repo (the actual repo).
That said in code you can specify the requirements .txt:
Task.force_requirements_env_freeze(requirements_file='repo/project-a/requirements.txt') task = Task.init(...)Notice, you need to call it prior to the Task.init call

3 years ago

0 [Clearml Serving] Hi Everyone! I Am Trying To Automatically Generate An Online Endpoint For Inference When Manually Adding Tag

I notice that, in my Serving Service situated in the DevOps project, the "endpoints" section doesn't seem to get updated when I tag a new model with "released".

It takes it a few minutes (I think 5 min is the default) to update.
Notice that you need to add the model with

model auto-update --engine triton --endpoint "test_model_pytorch_auto" ...

Not with model add (if for some reason that does not work please let me know)
No need to pass the model version i.e. 1 you can ...

one year ago

0 I'M Using Tensorboard Summarywriter To Add Scalar Metrics For The Experiment. If Experiment Crashed, And I Want To Continue It From Checkpoint, For Some Reason It Plots Metrics In A Really Weird Way. Even Though I Pass Global_Step=Epoch To The Summarywrit

Lol, :)
I think the issue is that you do not need to manually set the initial iteration, it's supposed to get it , as it is stored on the Task itself

3 years ago

0 Hey Guys, I'M Trying To Run An Experiment Using Trains-Agent. I Have A Custom Docker Image With Nightly Versions Of Pytorch And Our Own Library Installed From A Private Repo. I Was Assuming That These Packages Will Be Automatically Available To Trains Dur

hmmm, somehow I have a bed feeling about it... Could you check the log, it should say something like "Collecting torch==1.6.0.dev20200421+cu101 from https://"
It should be right at the top of the installation. What do you have there?

5 years ago

0 Quick Question, Can Trains Log Keras Loss Values And/Or Metrics Automatically? Or Would I Have To Attach A Tensorboard Callback?

ElegantCoyote26 point me to where Keras stores the data 🙂
If in the process of integration you had to add a logger/callback to your Keras code, that is the equivalent of using the TB.

4 years ago

0 Hi

clearml launches a subprocess

correct, this subprocess is used fgor resource monitoring and sending logs in the background (i.e metrics console etc.)
Where does the "training" part coming from? I'm assuming the training is your main code?
Follow up, is this happening when running manually or when executed via the agent ?

4 years ago

0 A Suggestion. Sometimes Newcomers That Join An Existing Project That Uses Clearml Forget To Configure Their Clearml For The Organization'S Server Resulting In Them Launching Experiments To The Public Cloud Possibly With Sensitive Data - I Think That If Y

WackyRabbit7 basically starting v1.1 if you are running code without any configuration file, you will get an error (in contrast to previous versions where it defaulted to the demo-server)

3 years ago

0 Back To Autoscaler; Is There Any Way To Ensure The Environment Variables On The Services Queue (Where The Scaler Runs) Will Be Automatically Exposed To New Ec2 Instance? Some Bash Hack Or Similar Would Be Nice, Really

LOL yes 🙂
just make sure it won't be part of the uncommitted changes of the AWS autoscaler 😉

3 years ago

0 Hey, I'M Trying To Set Up A Clearml Server On Docker As Per Documentation. Everything Goes Well Until The Docker-Compose Up Step, That'S When I Get This Error; Error: Error Pulling Image Configuration: Download Failed After Attempts=6: X509: Certificate

WickedElephant66 this seems like a general network issue, like the docker service is missing your companies firewall certificate.
Can you pull any container from docker hub ?

3 years ago

Show more results