AgitatedDove14

48 Questions, 8051 Answers

Active since 10 January 2023

Last activity 7 months ago

Reputation

Badges 1

25 × Eureka!

Answers 8051

0 Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

JitteryCoyote63 What did you have in mind?

4 years ago

0 Hi, Together With

JitteryCoyote63 passed ?

4 years ago

0 Hi. After Upgrading Clearml To Latest Version, Got This Error From My Pipeline (Windows10, Configured And Running Tensorflowod For Tf 2.3.):

but instead, they cannot be run if the files they produce, were not committed.

The thing with git, if you have new files and you did not add them, they will not appear in the git diff, hence missing when running from the agent. Does that sound like your case?

3 years ago

0 Hi Guys, Any Plan To Integrate The

Hi JitteryCoyote63
Wait a few hours, there is a new fix, I'll make sure we upload it later today (scheduled to be there anyhow, I'll push it forward)

4 years ago

0 If I Set

sure

3 years ago

0 If I Set

(some packages that are not inside the cache seem to have be missing and then everything fails)

How did that happen?

3 years ago

0 Hi Guys! What Is The Best Way To Access Artifacts From Other Step Of The Pipeline? I Have Step One Returning Dataframe And Step Two Takes It As An Input But When First Step Is Cached I Only Get An Artifact Url. So How Should I Read It From Artifacts Stora

However, it's very interesting why ability to cache the step impacts artifacts behavior

From you log:

videos_df = StorageManager.download_file(videos_df)

Seems like "videos_df" is the DataFrame, why are you trying to download the DataFrame ? I would expect to try and download the pandas file, not a DataFrame object

one year ago

0 Is It Possible To Run An Agent, Listen To The Services Queue Without Using Docker?

It's the safest way to run multiple processes and make sure they are cleaned afterwards ...

4 years ago

0 I'M A Little Confused As To How Force_Requirements_Env_Freeze Works When No Requirements File Is Supplied. Is It Supposed To Store The Full Reqs Of The Environment That Calls It?

pip freezeworks ?

3 years ago

0 I Am Running Trains=0.16.4 Python==3.7.5 , And Notice That The "Log" Page Sometimes Didn'T Capture The Console Log From My Program. Is This A Known Issue, Anyone Have Experienced Similar Behavior?

but the logger info is missing.

What do you mean? Can I reproduce it ?
BTW: The code sample you shared is very similar to how you create pipelines in ClearML, no?
(also could you expand on how you create the Kedro node ? from te face o fit it looks like another function in the repo, but I have a feeling I'm missing something)

3 years ago

0 Hi, I Know That Clearml Uses Local Changes For Patching And Running Script. Can It Also Do The Same With Local Commits?

The main issue is applying the patch requires git clone and that would fail on local (not pushed) commits.
What's the use case itself ?
(btw, if you copy the uncommitted changed into a file and git apply it, it will work)

one year ago

0 , This Is A Great Tool For Visualizing All Your Experiments. I Wanted To Know That When I Am Logging Scalar Plots With Title As Train Loss And Test Loss They Are Getting Diplayed As Train Loss And Test Loss In The Scalar Tab. I Wanted That The Title Shoul

If you one each "main" process as a single experiment, just don't call Task.init in the scheduler

4 years ago

0 Hey, I Have One Question Regarding The Cleanup_Service Task In The Devops Project: Does It Assume That The Agent In Services Mode Is In The Trains-Server Machine?

Hi JitteryCoyote63 ,
The easiest would probably be to list the experiment folder, and delete its content.
I might be missing a few things but the general gist should be:
from trains.storage import StorageHelper h = StorageHelper('s3://my_bucket') files = h.list(prefix='s3://my_bucket/task_project/task_name.task_id') for f in files: h.delete(f)Obviously you should have the right credentials 🙂

4 years ago

0 Is There Any Documentation For

are models technically

Task

s and can they be treated as such? If not, how to delete a model permanently (both from the server and from AWS storage)?

When you call Task.delete() it actually goes over a;; the models/artifacts and deletes them from the storage

2 years ago

0 Hi There

Okay, I think I understand, but missing something. It seems you call get_parameters from old API , is your code actually calling get_parameters ? The trains-agent runs the code externally, whatever happens inside the agent should have now effect on the code. So who exactly is calling the task.get_parameters, and well, why ? :)

4 years ago

0 If I Set

and run it locally...

You mean you run the clearml-agent locally ?

3 years ago

0 Hi Everyone! Is Anybody Using Log-Scale Parameter Ranges For Hyper-Parameter Optimization? It Seems That There Is A Bug In The Hpbandster Module. I'M Getting Negative Learning Rates..

Hmm GreasyLeopard35 can you specify the range you are passing to the HPO, as well as the type of optimization class ? (grid/random/optuna etc.)

2 years ago

0 I Just Deployed Clearml Into K8 Cluster Using Clearml Helm Package. When I Ran A Job, It Gave This Error In The Clearml Web Server (Attached Below). I Sshed Into The Pod Running The Clearml-Agent. Upon Typing Clearml-Agent Init, I Realised The Clearml.Con

It might be that the worker was killed before unregistered, you will see it there but the last update will be stuck (after 10min it will be automatically removed)

3 years ago

Hi DeliciousBluewhale87
clearml-agent 0.17.2 was just release with the fix, let me know if it works

3 years ago

0 Hey All. Another Question - How Are Private Packages Handled/Installed So That Clearml-Agent Can Execute A Task? I Have A Bunch Of Private Repos For Communicating With The Data Warehouse. I Could Do A System-Wide Installation For It On The Clearml-Agent I

I'm guessing the extra index URL can be a URL to the github repo of interest?

The extra index URL is exactly what you would be passing to pip install, meaning it has to comply to pypi artifactory api.
Make sense ?

3 years ago

0 Hi Guys, I Have Many Questions To Ask, Sorry If This Questions Were Posted Already - If The Answer Exist, Please, Point Me To It. Thank You For Your Help. I'M Training Object Detection Model Using Tf 2.3 Object Detection Api And Use Clearml On Local Serve

BTW MagnificentSeaurchin79 just making sure here:

but I don't see the loss plot in scalars

This is only with Detect API ?

3 years ago

0 Very Weird Error, Trying To Run An Experiment Through An Agent In Docker Mode, And I Get This Error

and the inet of the same card ?

3 years ago

0 Hey I Have A Buggy Behavior With The Dictionary Hyper-Parameters Features Which I Think Is Related To Multi Config Support Feature. I Have A Template Task With Some Parameters Under The Prefix “Args”(This Is The Only Config Set In The Task) . And Inside

SlipperyDove40
FYI:
args = task.connect(args, name="Args")Is "kind of" reserved section for argparse. Meaning you can always use it, but argparse will also push/pull things from there. Is there any specific reason for not using a different section name?

3 years ago

0 Hi, Currently It Seems That Trains-Agent Writes Files With The User "Nobody", Group "Nogroup" And Permissions 777 To Created Files. How Can I Change That? To The Very Least, Change The User Group It Uses? Running On Linux Ubuntu

SmarmySeaurchin8 what's the mount command you are using?

3 years ago

0 Anyone Deployed Trains On Azure, I Am Interested To Know About Your Experience.

For setting trains-server I would recommend the docker-compose, it is very easy to setup, and you just need a single fixed compute instance, details https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md With regards to the "low prio clusters", are you asking how they could be connected with the trains-agent or if running code that uses trains will work on them?

4 years ago

0 Hi Guys, I’M Trying To Install It My Lab Server, But When I Try To Create Credentials, It Says Error And Gives More Info: Error 301 : Invalid User Id: Id=F46262Bde88B4928997351A657901D8B, Company=D1Bd92A3B039400Cbafc60A7A5B1E52B

and i found our lab seems only have shared user file because i installed trains on one node, but it doesn’t appear on the others

Do you mean there is no shared filesystem among the different machines ?

3 years ago

And do you need to run your code inside a docker, or is venv enough ?

3 years ago

0 Hey, I Was Wondering How Can I Do Hparams Tuning With Trains? Couldn'T Find Anything On The Documentation

ShaggyHare67 in the HPO the learning should be (based on the above):
General/training_config/optimizer_params/Adam/learning_rate
Notice the "General" prefix (notice it is case sensitive)

4 years ago

0 Hey, I Was Wondering How Can I Do Hparams Tuning With Trains? Couldn'T Find Anything On The Documentation

Yes, this seems like the problem, you do not have an agent (trains-agent) connected to your server.
The agent is responsible for pulling the experiments and executing them.
pip install trains-agent trains-agent init trains-agent daemon --gpus all

4 years ago

0 Hi Guys, I Got A Very Unexpected Error Today On In One Of My Agents:

trains-agent doesn't run the clone, it is pip...
basically calling "pip install git+https://..."
Not sure you can pass extra arguments
Also, this is not a setup problem, otherwise it would have seen consistently failing ... this actually looks like a network issue.
The only thing I can think of is retrying to install if we get network error (not sure whats the exit code of pip though (maybe 9?)

4 years ago

Show more results